LOADING

加载过慢请开启缓存,浏览器默认开启

COMP310-Intro-to-FileSystems

OS = Process Management + Memory Management + File System + I/O

I/O Devices

I/O System Architecture
Buses Type:

  • PCIE (peripheral component interconnect express)
    • PCIE is often used to connect to high performance devices
  • DMI (Direct media interface)
  • USB (Universal serial bus)
  • eSATA (external SATA)
    • SATA (Serial ATA)
    • ATA (the AT attachment, in reference to providing connection to the IBM PC AT)

      Reason for the classification(architecture)

      1. Physical significance
        Higher performance devices connected via high performance buses(wider and shorter)
      2. To put a large number of devices into the system

How does OS communicate with I/O devices?

By Canonical Device Interface.
Canonical Device Interface:
Interface made of 3 registers:

  • Status - current status of device
  • Command - OS tells device what command to perform
  • Data - send/receive data from device

OS alters the registers to communicate with IO/O devices

CDI(Canonical Device Interface)
All the hidden internals are visible to the hardware but invisible to the OS.

How does OS use device interface?

  • DMA

  • Polling
    OS waits untile the device is ready. Repeatedly checks the STATUS of the registers in a loop.

    • Pro: simple
    • Con: Wasted CPU cycles
  • Interrupt
    Put process requesting I/O to sleep. -> Context switch to a different process. -> When I/O finishes, wake sleeping process with an interrupt. -> CPU jumps to Interrupt Handler in the OS. -> wake up A

    • Pro: no waste of CPU cycles
    • Con: expensive context switch (polling can be better for fast devices since $T(I/O)<T(context switch)$)
      1. Same mechanism is used for demand paging
      2. A is waken up by interrupt handler

    NOTICE: Interrupt is generated by the hardware to initiate a context switch(switching between user and kernel mode). Syscall is generated by the process to request functionality from kernel mode, also initiates a context switch(switching between user and kernel mode). Trap goes from user space to kernel space.

How to handle different devices in OS?

The problem is that each device may have its own protocol(The interface is specific).

Solution: Abstraction:
Adding a layer of abstraction between OS and devices: device driver.

  • Device driver creates a connection between each device and the OS.
  • Each device needs its own driver
  • Device driver is a piece of software in OS

Permanent Storage

  • Permanent across machine failures/restarts
  • Permanent across disk failures

Permanent Storage Media:
Flash SSD, HDD, …

File System API

FILE

  • An un-intepreted collection of objects composed of bytes, records,etc.
    • un-intepreted: File system does not know the meaning of the data; only application does

Type:
Typed = File System knows what the object means
(Sometimes, the file type is distinguiished by file extensions but it actually depends on the OS itself: UNIX ignore the extension, Windows does not ignore)

  • Pro: prevent errors; efficient storage
  • Con: inflexible; lot of code

We only look untyped files in this course

File System Primitives

Checking the lecture

  • Access
  • Concurrency
  • Naming
  • Protection

ACCESS

Sequential & Random Access

  • Sequential:
    A portion of the file is read in order. If the read(write) is suspended and restart, it will continue at the last suspension point.
  • Random:
    No connection between accesses in different times.

Read() itself is random access but we can use file pointer to convert it into sequential access.

Sequential access is very common and all OS provide it. Not all OS provide random access.(but they may provide seek())

CONCURRENCY(concueerent sequential access)

Two processes access the same file.
Use Open() and Close() to deal with the situation.

The combination of Open() and Read() is easy; however, the combination of Open() to Write() is tricky since the file itself is being modified as writing.

  • Seperate file instances altogether
    Writes by one process not visible to others.
  • Seperate file instances until Close()
    Write visible after Close(), OS decides the merging
  • Single instance of the file
    Writes visible immediately to others

    In all cases, fp is private.

NAMING

Naming primitives
Naming = mapping : human-redable string -> uid
Directory(folder) = collection of mappings(namings)

Directory Structure

  • Flat structure
  • Two-level structure
  • Hierarchical
    Hierarchical directory structure can have different shapes(data structure)
    EXP: tree, acyclic graph(acyclic graph enables short cut creation compared to tree)

    Shortcut creation(in Linux): Hard Link & Soft Link
    Hard Link: Creates a new mapping of a string to the existing UID
    Soft Link: Creats a new mapping of a string to the existing string
    Def
    Difference

LINUX PRIMITIVES

Collapses in a single interface of the previous primitives.

UID is never visible at the user level!!

Disks

Structure of DISK
Each platter has two surfaces, each surface has its own tracks; each track has its own sectors.
Cylinder: a logical grouping of tracks (same distance from the spindle)
Block: multiple sectors make up a block.
Structure of DISK
Structure of DISK

Disk Interface

Disk interface is accessible by sector only.


For memory management, the CPU can directly access memory. CPU can’t read directly from disk, it must go from OS.

Disk Access

  • Head Selection: choose the platter to read/write
  • Seek Time: choose the corresponding track
  • Rotational Latency: point to the corresponding block
  • Transfer Time: reading or writing to the devixce
  • Controller Overhead:

Seek Time dominates the time complexity.
Disk Access time >> Memory access time(nanoseconds)

ComponentTime
Head Selectionnanoseconds
Seek Time3-12 milliseconds
Rotational Latency2-7 milliseconds
Transfer Timemicroseconds
Controller Overhead less than 1 millisecond

To minimize latency, disk saves the file to the nearest track with priority.
When the disk is doing sequential access, the seek and rotational delay only appears once.

Optimizing Disk Access

Rules:

  1. Do not access disk, use a cache
    What? -> Keep currently accessed blocks in memory.
    Why? -> Reduce latency and disk load
    How? -> Reserve kernel memory for cache
    *Cache entries: file blocks (of block size)

    • Read with a cache

      • If in cache: return data from cache
      • If not: find free cache slot -> initial disk read -> when disk read completes, return data
    • Write with a cache

      • writing will always be buffered in memory, so we always writing in the cache
      • How does it get to disk?
        • write through: after writing to the cache, directly write to the disk and then return to the user
        • write behind: return to the user first then write to the disk
          (The buffer needs to be flushed into the device at some point later)

      Response time: write-behind is better
      Disk Load: write behind is better
      Crash: write-through is better

  2. Read ahead and do not wait disk (prefetching)
    #Only for sequential access
    What? -> request a block and the next blocks will be read also
    Why? -> No disk I/O on user access to block i+1
    How? -> Put the next blocks in the buffer cache

  3. Minimize Seeks

    • Clever Disk Allocation
      Idea: locate related data on same cylinder, and allocate related(sequential access / consecutive blocks in the same file) blocks together(on the same or nearby cylinder)
    • Disk Scheduling
      Idea: reorder requests to seek as little as possible, rearranging the order of file tasks(read/write)
      • FCFS
      • SSTF(Shortest Seek Time First)
        • Pro: good seek times
        • Con: Starvation
      • SCAN
        • Seperate the movement in two parts, one continually moving up, another continually moving down
      • C-SCAN
        • always moving in one direction (see class)
      • C-LOOK
        • same as C-SCAN but not going all the way to zero or max of cylinder

          Clever disk allocation and disk scheduling are complementary.
          Low load: clever allocation is better (not many scheduling oppurtunities)
          High load: disk scheduling is better

  4. Avoid rotational latency