LOADING

加载过慢请开启缓存,浏览器默认开启

Aubade_Blog

Persisting will immortalizes freedom.

Handling Crashes and Performance

COMP 310 2023/4/1

RAID

RAID is used for combining many disks, for: Capacity, Reliability, Performance
Challange: Most FS work with one disk

Building Multiple Disks

Approaches

JBOD (just a bunch of disks)

If the application is smart, it can stores different files on different file systems.

RAID(Redundant Array of Independent Disks)

Create the illusion of one disk from many disks.
The core is to use fake logical disk.

Essential Idea

  • Optimize I/O bandwidth through parallel I/O
  • Parallel I/O = I/O to multiple disks at once

Strategies

  • Mapping
  • Redundancy
Mapping

Provides an illusion that multiple disks behave as one.

Striping
Striping is a form of mapping. It put file across several disks : File = Stripe0 | Stripe1 | Stripe2 …

RAID-0 : No Redundancy

  • Uses Striping
  • Best possible read and write bandwidth
  • Failure results in data loss -> If one of the disks get crashed, then file may loss
  • More disks increase throughput but not latency

Latency: How fast we can do a request
Throughput: How many request can one do in a unit time

Redundancy

Redundancy is leading in to solve the problem of tolerance to disk failure.
The core idea is: store redundant data on different disks
it’s an improvement based on mapping

RAID-1 : Mirroring

Mirroring does not only increase the tolerance to disk failure, but also boost the efficiency for reading.

  • Storage capacity is half of the whole disk

RAID-4 : Parity Disk
N data disks + 1 Parity Disk

Parity:

  • A simple form of error detection and repair
  • Not specific to RAID
  • Also used in communications

Parity Updates:
Each time when we write to the disk we need to update the parity.

  • Additive Parity = Read all other datablocks in parallel and XOR them with the new block
  • Subtractive Parity
    • Read old data and parity in parallel
    • compare new data with old data
    • if new data == old data -> do nothing
    • else, flip old parity bit

RAID-4 requires access to parity disk for each write, this cause bottleneck in write-heavy workload.
Parallelism in data disks puts time lag on parity disk
The issue is called small-write problem

RAID-5 : Distributed Parity


Distributed File Systems

Distributed System

A distributed system is one where a machine I’ve never heard of can cause my program to fail.
Definition: More than 1 machine working together to solve a problem.
Advantage:

  • More computing power
  • More storage capacity
  • Fault Tolerance
  • Data sharing

  • Availability: Able to access the machine though others crack down
    Fault-tolerance: Fault procession is incorporated as a functionality
    Scalability: More machine, better performance
    Transperancy: No realization to distribution

Type

  • Clien/Server Model
  • Peer-to-peer Model

Distributed File System

File systems are great use for distributed systems.

  • Local FS: Processes on same machine access shared files
  • Network FS: Process on different machines access shared files in same way

Network File System (NFS)



The client must mount a seperate file system named NFS to access the file stored in the file server. The accession must go through NFS by Remote Procedural Call.
Problems

Solution for handling crashes

  • Stateless protocol with
  • Idempotent operations : The operations which can be done many times without chhanging info


The issue of this is thre path may depends on the file storing structure. Which means each time the same access to the same path does not always be the same.

We can retry Read for many times, but not for write since multiple modification may deviate from the purpose, thus we introduce the idempotent operations.


Use offset to rewrite particular part.

Solution for slowness

阅读全文

COMP310-Advanced FS-LogStructureDesigns

Advanced FS: Log-Structured Designs

Key Concepts

  • Log-Structure File System
  • Log-Structure Key-Value Store

Reason
Dealing with Crashes:
If the crash happens at the point in the middle of the data, thenn it will be a problem since the os doesn’t know which part is the new data and which part is the old data.
For write-behind, the crash may be a problem also even after finished since the file may not be transffered.

Handling Crash

Crash: a situation which the operation of the application errs, stopping its functionality.

Solution: Atomicity

Implementation

Assumption:

  1. A single sector disk write is atomic
    The assumption is true with very high probability(YES)
    Approach:
  2. Shadow Paging
  3. Intentions Log
Shadow Paging
  • Make sure both old copy and new copy are on the disk.
  • Make atomic switch between the two => By doing a WriteSecor()
    • The atomic change must be done on inode since it’s smaller than sector, so we write in inode entry

Step(Write Through):


We first write new data blocks on the disk -> after each writing, we update the inode in memory pointing to the new block -> After the update is done in cache, we use atomic writesector() -> Start garbage collection to clean the old data blocks

Step(Write Behind):

The difference to write-through is that the data will not be directly write to the disk, instead, they will be write to the memory first then transfer to the disk. The block space will be allocated first for the disk before transferrence.

The step of transferring the block to the disk is not atomic.
(If failed, we may still recover the old version of the file) <- Notice that the old blocks are not collected as the garbage yet

But for the WriteSector() instruction, it is atomic.

The old blocks will be cleaned finally.

The OS will de-allocate the data blocks; if the OS crashes before de-allocation, FS will check.

Intentions Log

Log: Special area on disk which can only be appended instead of being overwritten

Log itself acts as an intermediate media for transferrance between cache and disk.






The log will still be checked if the old data is there.
The log will be garbage collected after all the instructions are finished.

Comparision: which one is better

Criterion

  • The number of disk I/O
  • the number of random disk I/O

    It’s true theoratically that the shadow paging takes less operations; however, intention log works better.


Log works better mainly because accession in log is sequential, and there’s less data fragmentation inside the log compared to the shadow paging. Shawdow paging requires jumping around facing accession, but log does not.

Log-structured File System(LFS)

  • Alternative way of structuring file system
  • Log = append-only data structure(on disk)

Idea: Use the disk purely sequentially

  • Good for writing due to the sequential property
  • Bad for reading since may read from spacing sectors

Strategy

Data Structures

We store all the information including both data blocks and inodes to the log, which causes the problem of distinguishing between them. This need to be solved by using specific data structure.

Naive Idea:

  • Use current offset on disk instead of table index for uid
  • When update inode, inode number changes



Because the log can only append

Better Idea:

  • Add a level of indirection
    • Map: file uid -> inode location on disk
    • Data structure is called Imap(Inode map)

Recovery


Log-structure key value stores

Key-Value Stores(KVs)
Crucial component in systems with large amounts of diverse data, unstructured data.

Log-structured key value stores

LSM KVs Intervals

  • Commit log is a datastructure used to backup the info in Memory to the disk
  • Commit log is optional and can be turned off(makes the transmission faster)
  • Memory Sorted Buffer is any data structure preserving the order

LSM Writing
Each time the client write something to the memory, if the commit log is on, OS will automatically transfer the info to the commit log.

When the buffer is full, we write the old one from the memory to disk sequentially, creating one sorted immutable files. -> Then we empty the write buffer and the commit log.

LSM Reading

The above hierarchy is not the actual separation, instead, they are logical hierarchy.

  • L0 is reserved for flushing memory component to disk (Special since multiple version of the file can be in L0 at the same time)
  • Starting from L1 (to Ln), each key in the key space can exist in only one of the files, which is guarantee by the key value store.
  • The oldest version is in the lowest level, newest is in the highest level.

L0 can contain many files with the same K, but L1 - Ln only contain one file with its own key

one Key k can correspond to multiple files in different level and we only want the latest version.
Read from memory =(not in memory)=> Read from Block cache =(not in cache)=>Sequentially iterates through L0 to Ln

The log-structured key value stores is optimizaed for writing but not reading

Each file contains a range of the key, so we can determine where does the key reside. In L0, the range of the file is not distinguished.

LSM Internal Operations

Problems



Ideal WA = 1

阅读全文

Persistent Storage:Basic File System Implementation

2023/3/28

Different file system can have different implementations, but they have essential common triats.

File System: Creating a mapping between files(a sequence of uninterrupted untyped byte) and blocks on storage device. (Block itself is another abstraction to the storage device)

File System Role

The main task of file system is to translate : from user interface functions to disk interface fucntions.

File System Implementation

Key aspects:

  • Data Structures : data structures on disk and data structures in memory
  • Access methods

Data structure must be on disk since disk is involatile. The metadata needed by the OS for mapping also must be on disk.

Disk Data Structures

  • Data Region(actual information) -> occupies most space in FS
    • User data
    • free space
  • Meta data
    • Indicates the specific space used for user data and free space(Inode)
    • house keeping stuff

We use metadata to track files, specifically, by the inode of the metadata.
example

Structure Implementation

  • Free-list
  • Bitmaps
    • Data structure where each bit indicates if corresponding object is free or in use
    • 1 = in use ; 0 = free
    • we associate each block to a bit in bitmap
    • We have inode bitmap and data bitmap
      Bitmap Capacity
      Structure
      Superblock is the block used to track the metadata.

Allocation of files to blocks

Based on inode.
Criteria for good strategy

  • Amount of fragmentation (external and internal)
  • Ability to grow file over time
  • Sequential access performance
  • Random access performance
  • Metadata overhead

Strategy:

Contiguous allocation

  • Strategy: Allocate file data blocks contiguously on disk
  • Metadata: Starting block + size of file
  • Fragmentation: severe external frgamentation
  • Growing: may be time intense
  • Sequential access: good
  • Random access: good
  • Metadata overhead: low since just two number needs for matadata
    The only case to use this is of the read-only.
    Impractical for file system since it change files alot.
    Contiguous allocation

Extent-based allocation
Trying to take advantage of the fragmentation but get rid of the holes.

  • Stratgey: Allocate multiple contiguous extent per file.
    Extent-based allocation

Linked-List Allocation
Each block likes a node in the linked list.

  • Strategy: Allocate linked-list of blocks.
  • Metadata: Location of first block of file, plus each block contains pointer to next block.
    Linked-list allocation

FAT(File-Allocation Tables)
Exactly like the linked list with the difference that pointers are store in a different data structure.
Pointers are stored in FAT.
FAT allocation
Since Table can be stored in cache, the accession time dies down.

Index Allocation
Index allocation

Index Allocation with Indirect Blocks
IIB allocation
Support Large files.
IIB allocation
slower since an indirect node is leading in, 2 times slower. 4 more blocks.

Directory stores as files.

If the file size is larger than the size that inode can represent, then the open will fail.


Operations of data stuctures on disk

Inefficient.

Create /foo/bar
If a new file is created, there’s no content in it, only inode of that file will be created.

Open /foo/bar

Write /foo/bar
Open it before write. The above graph omits the steps for openning the file.

Read /foo/bar
Open it before read. The above graph omits the steps for openning the file.

Close /foo/bar

Operations of data stuctures in memory

Cache

  • Finxed contiguous area of kernel memory.
  • Size = max number of chache blocks x block size
  • A large chunck pf memeory of the machine
  • In general, write behind is used
  • For user data is OK
  • For metadata, depends

Cache Directory

index in the cache pointing to the disk data (usually hash table)
LRU

Cache Flush

(System-Wide) Active File Table

Indicates what files are in use in the system.

  • One array for the entire system
  • One entry per onpen file
  • Each entry contains
    • file inode
    • Additional Information
      • Reference count of number of file opens

(Per-Process) Open File Table

  • One array per process
  • One entry per file open of that process
  • Indexed by file descriptor $fd$
  • Each entry contains
    • Pointer to file inode in active file table
    • File pointer $fp$
    • Additional Information

Open File Table and Active File Table are put together in use.
Open File Table is in the perspective of the process(PCB); Active File Table is in the perspective of the kernel.

UID is a component of the inode.

The two tables are combined instead of singally use is because of the concurrency. We wish the process can access to the file concurrently.

Combination of Memory Structure with Disk Structure

CREATE
Most of the times, data bitmap and inode bitmap are in chache,even some specific inode.
If the data(info) is already cached, there’s no need to read again.

OPEN

WRITE

read

close

Setting up the FS

  • By default, OS sees all storage devices as
    • Chunks of unallocated space
    • which are unusable
  • Cannot start writing file to a blank drive
  • Need to set up the FS first(Disk Parititioning)

Disk Partition(Slicing)

  • FS needs a “container”(partition) on the storage device
  • Partition allows different FS to be installed on same OS.
  • Each partition appears to OS as a logical disk
  • Disk stores partition info in partition table
  • OS reads partition table before anyother parts of the disk

Mounting a FS

FS needs to be mounted for OS to acces its files.(Make OS know where the file system is).

  • Mounting attaches FS to a directory
  • Directory is called mount point

All the file systems would be attached to a root file system.

Mounting does not delete the files under the mount point; it only makes the file system can’t see the files.
If we unmount, the appearance goes to initial state.

Booting

  • BIOS lookes for clues on what it needs to start OS. BIOS checkes boot block first.
  • A fixed location on disk (usually sector 0), containing boot loader and partition table.
  • Read by BIOS on machine boot.

Alternative File Access Method: Memory Mapping

Memory mapping: a way to not use the file system.

阅读全文

COMP310-Intro-to-FileSystems

COMP310 2023/3/25

OS = Process Management + Memory Management + File System + I/O

I/O Devices

I/O System Architecture
Buses Type:

  • PCIE (peripheral component interconnect express)
    • PCIE is often used to connect to high performance devices
  • DMI (Direct media interface)
  • USB (Universal serial bus)
  • eSATA (external SATA)
    • SATA (Serial ATA)
    • ATA (the AT attachment, in reference to providing connection to the IBM PC AT)

      Reason for the classification(architecture)

      1. Physical significance
        Higher performance devices connected via high performance buses(wider and shorter)
      2. To put a large number of devices into the system

How does OS communicate with I/O devices?

By Canonical Device Interface.
Canonical Device Interface:
Interface made of 3 registers:

  • Status - current status of device
  • Command - OS tells device what command to perform
  • Data - send/receive data from device

OS alters the registers to communicate with IO/O devices

CDI(Canonical Device Interface)
All the hidden internals are visible to the hardware but invisible to the OS.

How does OS use device interface?

  • DMA

  • Polling
    OS waits untile the device is ready. Repeatedly checks the STATUS of the registers in a loop.

    • Pro: simple
    • Con: Wasted CPU cycles
  • Interrupt
    Put process requesting I/O to sleep. -> Context switch to a different process. -> When I/O finishes, wake sleeping process with an interrupt. -> CPU jumps to Interrupt Handler in the OS. -> wake up A

    • Pro: no waste of CPU cycles
    • Con: expensive context switch (polling can be better for fast devices since $T(I/O)<T(context switch)$)
      1. Same mechanism is used for demand paging
      2. A is waken up by interrupt handler

    NOTICE: Interrupt is generated by the hardware to initiate a context switch(switching between user and kernel mode). Syscall is generated by the process to request functionality from kernel mode, also initiates a context switch(switching between user and kernel mode). Trap goes from user space to kernel space.

How to handle different devices in OS?

The problem is that each device may have its own protocol(The interface is specific).

Solution: Abstraction:
Adding a layer of abstraction between OS and devices: device driver.

  • Device driver creates a connection between each device and the OS.
  • Each device needs its own driver
  • Device driver is a piece of software in OS

Permanent Storage

  • Permanent across machine failures/restarts
  • Permanent across disk failures

Permanent Storage Media:
Flash SSD, HDD, …

File System API

FILE

  • An un-intepreted collection of objects composed of bytes, records,etc.
    • un-intepreted: File system does not know the meaning of the data; only application does

Type:
Typed = File System knows what the object means
(Sometimes, the file type is distinguiished by file extensions but it actually depends on the OS itself: UNIX ignore the extension, Windows does not ignore)

  • Pro: prevent errors; efficient storage
  • Con: inflexible; lot of code

We only look untyped files in this course

File System Primitives

Checking the lecture

  • Access
  • Concurrency
  • Naming
  • Protection

ACCESS

Sequential & Random Access

  • Sequential:
    A portion of the file is read in order. If the read(write) is suspended and restart, it will continue at the last suspension point.
  • Random:
    No connection between accesses in different times.

Read() itself is random access but we can use file pointer to convert it into sequential access.

Sequential access is very common and all OS provide it. Not all OS provide random access.(but they may provide seek())

CONCURRENCY(concueerent sequential access)

Two processes access the same file.
Use Open() and Close() to deal with the situation.

The combination of Open() and Read() is easy; however, the combination of Open() to Write() is tricky since the file itself is being modified as writing.

  • Seperate file instances altogether
    Writes by one process not visible to others.
  • Seperate file instances until Close()
    Write visible after Close(), OS decides the merging
  • Single instance of the file
    Writes visible immediately to others

    In all cases, fp is private.

NAMING

Naming primitives
Naming = mapping : human-redable string -> uid
Directory(folder) = collection of mappings(namings)

Directory Structure

  • Flat structure
  • Two-level structure
  • Hierarchical
    Hierarchical directory structure can have different shapes(data structure)
    EXP: tree, acyclic graph(acyclic graph enables short cut creation compared to tree)

    Shortcut creation(in Linux): Hard Link & Soft Link
    Hard Link: Creates a new mapping of a string to the existing UID
    Soft Link: Creats a new mapping of a string to the existing string
    Def
    Difference

LINUX PRIMITIVES

Collapses in a single interface of the previous primitives.

UID is never visible at the user level!!

Disks

Structure of DISK
Each platter has two surfaces, each surface has its own tracks; each track has its own sectors.
Cylinder: a logical grouping of tracks (same distance from the spindle)
Block: multiple sectors make up a block.
Structure of DISK
Structure of DISK

Disk Interface

Disk interface is accessible by sector only.


For memory management, the CPU can directly access memory. CPU can’t read directly from disk, it must go from OS.

Disk Access

  • Head Selection: choose the platter to read/write
  • Seek Time: choose the corresponding track
  • Rotational Latency: point to the corresponding block
  • Transfer Time: reading or writing to the devixce
  • Controller Overhead:

Seek Time dominates the time complexity.
Disk Access time >> Memory access time(nanoseconds)

ComponentTime
Head Selectionnanoseconds
Seek Time3-12 milliseconds
Rotational Latency2-7 milliseconds
Transfer Timemicroseconds
Controller Overhead less than 1 millisecond

To minimize latency, disk saves the file to the nearest track with priority.
When the disk is doing sequential access, the seek and rotational delay only appears once.

Optimizing Disk Access

Rules:

  1. Do not access disk, use a cache
    What? -> Keep currently accessed blocks in memory.
    Why? -> Reduce latency and disk load
    How? -> Reserve kernel memory for cache
    *Cache entries: file blocks (of block size)

    • Read with a cache

      • If in cache: return data from cache
      • If not: find free cache slot -> initial disk read -> when disk read completes, return data
    • Write with a cache

      • writing will always be buffered in memory, so we always writing in the cache
      • How does it get to disk?
        • write through: after writing to the cache, directly write to the disk and then return to the user
        • write behind: return to the user first then write to the disk
          (The buffer needs to be flushed into the device at some point later)

      Response time: write-behind is better
      Disk Load: write behind is better
      Crash: write-through is better

  2. Read ahead and do not wait disk (prefetching)
    #Only for sequential access
    What? -> request a block and the next blocks will be read also
    Why? -> No disk I/O on user access to block i+1
    How? -> Put the next blocks in the buffer cache

  3. Minimize Seeks

    • Clever Disk Allocation
      Idea: locate related data on same cylinder, and allocate related(sequential access / consecutive blocks in the same file) blocks together(on the same or nearby cylinder)
    • Disk Scheduling
      Idea: reorder requests to seek as little as possible, rearranging the order of file tasks(read/write)
      • FCFS
      • SSTF(Shortest Seek Time First)
        • Pro: good seek times
        • Con: Starvation
      • SCAN
        • Seperate the movement in two parts, one continually moving up, another continually moving down
      • C-SCAN
        • always moving in one direction (see class)
      • C-LOOK
        • same as C-SCAN but not going all the way to zero or max of cylinder

          Clever disk allocation and disk scheduling are complementary.
          Low load: clever allocation is better (not many scheduling oppurtunities)
          High load: disk scheduling is better

  4. Avoid rotational latency

阅读全文

COMP310_MemoryManagement:DemandPaging

COMP310 2023/3/6

Memory Management: Demand Paging

For system using right now, most are based on paging instead of segmentation or base&bounds.

  • Base&Bounds => used only for niche
  • Segmentation => abandoned
    • high complexity for little gain
    • Effect approximated with paging + valid bits
  • Paging => universal

Segmentation can be compacted but hard to manage. Segmentation usually used in embedded development.

Paging

Terminology:
Page: fixed-size portion of virtual memory
Frame: fixed-size portion of physical memory
Page size = Frame size
Typical size: 4k-8k (always power of 2)

  • Virtual address space
    Linear from 0 up to a multiple of page size
  • Physical address space
    Non-contiguous set of frames, one per page
  • Virtual Address
    • Two components in the virtual address :
      VPN:Virtual page number
      Offser:offset within the page (page size = $2^{offset}$ Bytes)
    • VPN –(Address Translation(MMU))–>PFN(Physical Frame Number)
  • MMU
    • Page Table
      • Page table is the data structure mapping VPN to PFN.
      • Each process has a page table. -> each time we do a context switch, the page table needs to be switched also
    • MMU need a pointer to page table in memory and the length of the page table
Advantage & Disadvantage
  • Advantage: easier to implemented compared to segmentation with compaction
  • Disadvantage:
    • P1: may have free(null) page = sparsely used page may appears = internal fragmentation $\Rightarrow$ external fragmentation
      The severity of this problem depends on the page size
    • P2: Low translation performance (virtuial address ->page table(in main emory)->accession(in main memory)) = reducing performance by factor of 2
Solution to P1: valid/invalid bit

We can denote the corresponding pages with a valid bit to indicate whether its used or not. (Used for valid bit, unused for invalid bit)
This actually means the page table itself no longer covers the entire virtual address space(all possible pages in thrtual memory) but the allocated pages by the process.

Solution to P2: TLB(Translation Lookaside Buffer)

TLB(Translation lookaside buffer) is a part of MMU which makes the map(translation from virtual address(page number) to physical address) fast.

Part of the page table is in the MMU(TLB). If the map is in it, we get a “hit”, thus no need to find in memory for the map again. If a miss happened, we need to find it in the memory.

Page Table is stored in main memory and MMU has a pointer to it with the length. TLB is not in main memory but in MMU.

Notice that actually finding in memory is by use TLB again.


TLB MISS1
TLB MISS2

The hardware basis of TLB
Use associative memory(special hardware)

  • Regular Memory -> lookup by address
  • Associative Memory -> lookup by contents & lookup in parallel(lookup all at the same time)

Regular Memory
Regular Memory
Key = Pageno , Value = Frameno

TLB Size
TLB is expensive and small(64-1024 entries)
Want TLB hit rate close to 100%
If TLB full, need to replace existing entry

TLB Hit Rate and Locality

  • Temporal Locality -> An instruction that has been accessed will be re-accessed soon in the future
  • Spatial Locality -> An instruction is accessed, then the instructions near to it will likely soon be accessed

    Accession to an array can be both an example to spatial locality and temporal locality

They are used together most of the time but may depend.

Process Switching with TLB
Problem

Inability to distinguish between pageno of different processes in TLB.
TLB PROBLEM

Solution
  • Invalidation of TLB : invalidate all TLB entries(by valid bits) as switching
    • Pro: Simply requires one Valid Bit
    • Con: process switch expensive
    • Con: new process incurs 100% TLB misses
  • Process Identifier : add process identifier to TLB entries
    • Con: complicated
    • Pro: cheaper

Computation for the size of TLB

Check the lecture
COMPUTATION

Explanation:
4kB Pages => The number of entries in a page is 4kB(4028).
Page table entries => number of entries in a page table
Every page table entry => the size of each table entry in the main memory

How to make Page Table Smaller?

  • Big Pages
  • Segmentation + Paging
  • Multi-level page tables
Big Pages

Advantage: Easy to implement
Disadvantage: Larger internal fragmentation
Most systems use 4KB or 8KB pages in common sense

Segmentation + Paging

Divide address space into segments, than devide each segment into fixed-sized pages.
Virtual address divided into three portions:

segpage numberpage offsers

Advantage:

  • Supports sparse address spaces
    • decrease size of page tables
    • if segment not used, not need for page table
  • enable sharing
  • No external fragmentation

Disadvantage:

  • each segment might have large page table
  • must allocate each page table contiguously
Multi-level Page Tables

Turns the linear page table into tree structure. The level is not limited.


Leaving the assumption

Previous discussion is based on the assumption that the whole program resides in the memory. However, now we are dropping this assumption.

  • Impossible to reside a program in a memory if it’s huge
  • Only needed pieces need to be resided <= load the needed pieces from the file system => many advantages(multiple programs running in a memory&load useful pieces only)
Swapping

Def:
Store the whole program temporarily on the disk to get rid of one or more processes.
Explanantion:
Processes running on the OS resides on disk; whenever the scheduler(context switching) decides one to run, the whole process switch into the memory from disk.(also the same for switching out)
Disadvantage:

  • High latency since need to read from disk(~100million cycles).
  • Parallelism disabled if the program switching in is huge.
    Solution:
    Demand Paging
Demand Paging

Difference with swapping:

  • Swapping = all of the program is in memory or all of the program is on disk
  • Demand paging = part of program is in memory

Reason for use demand paging:

  1. We want to create an illusion for the program that it has a huge memory space(virual memory space >> physical memory space)
  2. Shorter process startup latency
    • start process with part of them in memory (Even 1 page suffices)
  3. Better use of main memory
    • Enable parellelism of running multiple programs

Location of the program:
Part of it is in memory but most of it is on disk(a special partition called backing store(only the OS has the access to the backing store)).

NOTICE: Backing store has all the page information of a program.

NOTICE: CPU only directly access memory. CPU access data on disk through OS(OS connect page to the resding backings store).

High-level Mechanism:

  • Page fault:
    Program needs to access part only on disk
  • Page fault handling:
    The process for the OS to handle the page fault: program suspended -> OS runs and gets page from disk -> program is restarted
  1. Discover Page Fault
    Idea: Using the valid bit in page table

    • With demand paging:
      (Exception from the following situation raise error of the OS)
      Valid bit = 0: page is invalid OR page is on disk
      Valid bit = 1: page is valid AND page is in memory
      (indicates more info needed for the OS (Invalid/on-disk?))
  2. Suspendisng thne Faulting Process
    Idea: Trap into the OS

    • invalid bit access generates trap
    • Process info stored into the PCB and suspend the process(to increas the efficiency of the CPU)

      Trapping: Making the switch between the user mode and the kernel mode.

  3. Getting the page from the disk
    Idea: OS handles fetchs from disk

    • Allocate the free frame(if exist) in memory to the process

    • Find page on disk and transfer the page from disk to frame

      • Two tables are needed here for transferring: Extra table mapping page to backing store
    • While the disk is Busy

      • invoke scheduler to run another process
      • when disk interrupt arrives, suspend the running process and get back to page default handling
  4. Completing page fault handling

    • set the introduced page to valid and assign it a frame number(renew page table)
    • set process state to ready
    • invoke scheduler => next time to run will try toaccess the page again

Demand Paging Summary1
Demand Paging Summary2

How to find a free frame in the main memory?
If no free frame available, pick a frame to be replaced -> invalidates its page table entry and TLB entry -> write the frame to disk

NOTICE: No worry of loosing information by replacing since all pages are stored in backing store.

How to pick the page/frame to replace?(raplacing policies)

  • Random
  • FIFO
  • OPT
  • LRU

Goal: minimizing the page fault since they slow down the program

Random (Relatively universal)
Random page is replaced.

  • PRO: Easy to implement
  • CON: Does not take advantage of spatial/temporal locality

FIFO
Oldest page is replaced.(The first one brough into memory)

  • PRO: Easy to implement, Fair
  • CON: Does not account of “hot” pages

OPT(An optimal algorithm)
Replace the page that will be referenced the furthest in the future.(OPT insetad of approximated OPT)

  • PRO: Provably optimal
  • CON: Hard to implement(Can’t predict)

    OPT always acts as a basis of comaprison for other algorithms

LRU(Least Recently Used)
Use the past information to predict future usage

  • PRO: Approximate OPT
  • CON: Hard to implement, does not handle all workloads well
    LRU Hardware Support

是否virtual address包含了在硬盘上的部分而并没有对于物理内存形成一个完全的一一映射?

阅读全文
1 2 3
avatar
Terrance

Honor Mathematics & Honor Econ & Honor CS