Aubade_Blog

Persisting will immortalizes freedom.

Handling Crashes and Performance

COMP 310 2023/4/1

RAID

RAID is used for combining many disks, for: Capacity, Reliability, Performance
Challange: Most FS work with one disk

Building Multiple Disks

Approaches

JBOD (just a bunch of disks)

If the application is smart, it can stores different files on different file systems.

RAID(Redundant Array of Independent Disks)

Create the illusion of one disk from many disks.
The core is to use fake logical disk.

Essential Idea

Optimize I/O bandwidth through parallel I/O
Parallel I/O = I/O to multiple disks at once

Strategies

Mapping
Redundancy

Mapping

Provides an illusion that multiple disks behave as one.

Striping
Striping is a form of mapping. It put file across several disks : File = Stripe0 | Stripe1 | Stripe2 …

RAID-0 : No Redundancy

Uses Striping
Best possible read and write bandwidth
Failure results in data loss -> If one of the disks get crashed, then file may loss
More disks increase throughput but not latency

Latency: How fast we can do a request
Throughput: How many request can one do in a unit time

Redundancy

Redundancy is leading in to solve the problem of tolerance to disk failure.
The core idea is: store redundant data on different disks
it’s an improvement based on mapping

RAID-1 : Mirroring

Mirroring does not only increase the tolerance to disk failure, but also boost the efficiency for reading.

Storage capacity is half of the whole disk

RAID-4 : Parity Disk
N data disks + 1 Parity Disk

Parity:

A simple form of error detection and repair
Not specific to RAID
Also used in communications

Parity Updates:
Each time when we write to the disk we need to update the parity.

Additive Parity = Read all other datablocks in parallel and XOR them with the new block
Subtractive Parity
- Read old data and parity in parallel
- compare new data with old data
- if new data == old data -> do nothing
- else, flip old parity bit

RAID-4 requires access to parity disk for each write, this cause bottleneck in write-heavy workload.
Parallelism in data disks puts time lag on parity disk
The issue is called small-write problem

RAID-5 : Distributed Parity

Distributed File Systems

Distributed System

A distributed system is one where a machine I’ve never heard of can cause my program to fail.
Definition: More than 1 machine working together to solve a problem.
Advantage:

More computing power
More storage capacity
Fault Tolerance
Data sharing
Availability: Able to access the machine though others crack down
Fault-tolerance: Fault procession is incorporated as a functionality
Scalability: More machine, better performance
Transperancy: No realization to distribution

Type

Clien/Server Model
Peer-to-peer Model

Distributed File System

File systems are great use for distributed systems.

Local FS: Processes on same machine access shared files
Network FS: Process on different machines access shared files in same way

Network File System (NFS)

The client must mount a seperate file system named NFS to access the file stored in the file server. The accession must go through NFS by Remote Procedural Call.
Problems

Solution for handling crashes

Stateless protocol with
Idempotent operations : The operations which can be done many times without chhanging info

The issue of this is thre path may depends on the file storing structure. Which means each time the same access to the same path does not always be the same.

We can retry Read for many times, but not for write since multiple modification may deviate from the purpose, thus we introduce the idempotent operations.

Use offset to rewrite particular part.

Solution for slowness

COMP310 Lecture Notes

阅读全文

COMP310-Advanced FS-LogStructureDesigns

COMP310, Lecture Notes 2023/3/30

Advanced FS: Log-Structured Designs

Key Concepts

Log-Structure File System
Log-Structure Key-Value Store

Reason
Dealing with Crashes:
If the crash happens at the point in the middle of the data, thenn it will be a problem since the os doesn’t know which part is the new data and which part is the old data.
For write-behind, the crash may be a problem also even after finished since the file may not be transffered.

Handling Crash

Crash: a situation which the operation of the application errs, stopping its functionality.

Solution: Atomicity

Implementation

Assumption:

A single sector disk write is atomic
The assumption is true with very high probability(YES)
Approach:
Shadow Paging
Intentions Log

Shadow Paging

Make sure both old copy and new copy are on the disk.
Make atomic switch between the two => By doing a WriteSecor()
- The atomic change must be done on inode since it’s smaller than sector, so we write in inode entry

Step(Write Through):

We first write new data blocks on the disk -> after each writing, we update the inode in memory pointing to the new block -> After the update is done in cache, we use atomic writesector() -> Start garbage collection to clean the old data blocks

Step(Write Behind):

The difference to write-through is that the data will not be directly write to the disk, instead, they will be write to the memory first then transfer to the disk. The block space will be allocated first for the disk before transferrence.

The step of transferring the block to the disk is not atomic.
(If failed, we may still recover the old version of the file) <- Notice that the old blocks are not collected as the garbage yet

But for the WriteSector() instruction, it is atomic.

The old blocks will be cleaned finally.

The OS will de-allocate the data blocks; if the OS crashes before de-allocation, FS will check.

Intentions Log

Log: Special area on disk which can only be appended instead of being overwritten

Log itself acts as an intermediate media for transferrance between cache and disk.

The log will still be checked if the old data is there.
The log will be garbage collected after all the instructions are finished.

Comparision: which one is better

Criterion

The number of disk I/O
the number of random disk I/O

It’s true theoratically that the shadow paging takes less operations; however, intention log works better.

Log works better mainly because accession in log is sequential, and there’s less data fragmentation inside the log compared to the shadow paging. Shawdow paging requires jumping around facing accession, but log does not.

Log-structured File System(LFS)

Alternative way of structuring file system
Log = append-only data structure(on disk)

Idea: Use the disk purely sequentially

Good for writing due to the sequential property
Bad for reading since may read from spacing sectors

Strategy

Data Structures

We store all the information including both data blocks and inodes to the log, which causes the problem of distinguishing between them. This need to be solved by using specific data structure.

Naive Idea:

Use current offset on disk instead of table index for uid
When update inode, inode number changes

Because the log can only append

Better Idea:

Add a level of indirection
- Map: file uid -> inode location on disk
- Data structure is called Imap(Inode map)

Recovery

Log-structure key value stores

Key-Value Stores(KVs)
Crucial component in systems with large amounts of diverse data, unstructured data.

Log-structured key value stores

LSM KVs Intervals

Commit log is a datastructure used to backup the info in Memory to the disk
Commit log is optional and can be turned off(makes the transmission faster)
Memory Sorted Buffer is any data structure preserving the order

LSM Writing
Each time the client write something to the memory, if the commit log is on, OS will automatically transfer the info to the commit log.

When the buffer is full, we write the old one from the memory to disk sequentially, creating one sorted immutable files. -> Then we empty the write buffer and the commit log.

LSM Reading

The above hierarchy is not the actual separation, instead, they are logical hierarchy.

L0 is reserved for flushing memory component to disk (Special since multiple version of the file can be in L0 at the same time)
Starting from L1 (to Ln), each key in the key space can exist in only one of the files, which is guarantee by the key value store.
The oldest version is in the lowest level, newest is in the highest level.

L0 can contain many files with the same K, but L1 - Ln only contain one file with its own key

one Key k can correspond to multiple files in different level and we only want the latest version.
Read from memory =(not in memory)=> Read from Block cache =(not in cache)=>Sequentially iterates through L0 to Ln

The log-structured key value stores is optimizaed for writing but not reading

Each file contains a range of the key, so we can determine where does the key reside. In L0, the range of the file is not distinguished.

LSM Internal Operations

Problems

Ideal WA = 1

COMP310, Lecture Notes

阅读全文

Persistent Storage:Basic File System Implementation

2023/3/28

Different file system can have different implementations, but they have essential common triats.

File System: Creating a mapping between files(a sequence of uninterrupted untyped byte) and blocks on storage device. (Block itself is another abstraction to the storage device)

File System Role

The main task of file system is to translate : from user interface functions to disk interface fucntions.

File System Implementation

Key aspects:

Data Structures : data structures on disk and data structures in memory
Access methods

Data structure must be on disk since disk is involatile. The metadata needed by the OS for mapping also must be on disk.

Disk Data Structures

Data Region(actual information) -> occupies most space in FS
- User data
- free space
Meta data
- Indicates the specific space used for user data and free space(Inode)
- house keeping stuff

We use metadata to track files, specifically, by the inode of the metadata.
example

Structure Implementation

Free-list
Bitmaps
- Data structure where each bit indicates if corresponding object is free or in use
- 1 = in use ; 0 = free
- we associate each block to a bit in bitmap
- We have inode bitmap and data bitmap
  
  Superblock is the block used to track the metadata.

Allocation of files to blocks

Based on inode.
Criteria for good strategy

Amount of fragmentation (external and internal)
Ability to grow file over time
Sequential access performance
Random access performance
Metadata overhead

Strategy:

Contiguous allocation

Strategy: Allocate file data blocks contiguously on disk
Metadata: Starting block + size of file
Fragmentation: severe external frgamentation
Growing: may be time intense
Sequential access: good
Random access: good
Metadata overhead: low since just two number needs for matadata
The only case to use this is of the read-only.
Impractical for file system since it change files alot.

Extent-based allocation
Trying to take advantage of the fragmentation but get rid of the holes.

Stratgey: Allocate multiple contiguous extent per file.

Linked-List Allocation
Each block likes a node in the linked list.

Strategy: Allocate linked-list of blocks.
Metadata: Location of first block of file, plus each block contains pointer to next block.

FAT(File-Allocation Tables)
Exactly like the linked list with the difference that pointers are store in a different data structure.
Pointers are stored in FAT.
FAT allocation
Since Table can be stored in cache, the accession time dies down.

Index Allocation

Index Allocation with Indirect Blocks
IIB allocation
Support Large files.

slower since an indirect node is leading in, 2 times slower. 4 more blocks.

Directory stores as files.

If the file size is larger than the size that inode can represent, then the open will fail.

Operations of data stuctures on disk

Inefficient.

Create /foo/bar
If a new file is created, there’s no content in it, only inode of that file will be created.

Open /foo/bar

Write /foo/bar
Open it before write. The above graph omits the steps for openning the file.

Read /foo/bar
Open it before read. The above graph omits the steps for openning the file.

Close /foo/bar

Operations of data stuctures in memory

Cache

Finxed contiguous area of kernel memory.
Size = max number of chache blocks x block size
A large chunck pf memeory of the machine
In general, write behind is used
For user data is OK
For metadata, depends

Cache Directory

index in the cache pointing to the disk data (usually hash table)
LRU

Cache Flush

(System-Wide) Active File Table

Indicates what files are in use in the system.

One array for the entire system
One entry per onpen file
Each entry contains
- file inode
- Additional Information
  - Reference count of number of file opens

(Per-Process) Open File Table

One array per process
One entry per file open of that process
Indexed by file descriptor $fd$
Each entry contains
- Pointer to file inode in active file table
- File pointer $fp$
- Additional Information

Open File Table and Active File Table are put together in use.
Open File Table is in the perspective of the process(PCB); Active File Table is in the perspective of the kernel.

UID is a component of the inode.

The two tables are combined instead of singally use is because of the concurrency. We wish the process can access to the file concurrently.

Combination of Memory Structure with Disk Structure

CREATE
Most of the times, data bitmap and inode bitmap are in chache,even some specific inode.
If the data(info) is already cached, there’s no need to read again.

OPEN

WRITE

read

Setting up the FS

By default, OS sees all storage devices as
- Chunks of unallocated space
- which are unusable
Cannot start writing file to a blank drive
Need to set up the FS first(Disk Parititioning)

Disk Partition(Slicing)

FS needs a “container”(partition) on the storage device
Partition allows different FS to be installed on same OS.
Each partition appears to OS as a logical disk
Disk stores partition info in partition table
OS reads partition table before anyother parts of the disk

Mounting a FS

FS needs to be mounted for OS to acces its files.(Make OS know where the file system is).

Mounting attaches FS to a directory
Directory is called mount point

All the file systems would be attached to a root file system.

Mounting does not delete the files under the mount point; it only makes the file system can’t see the files.
If we unmount, the appearance goes to initial state.

Booting

BIOS lookes for clues on what it needs to start OS. BIOS checkes boot block first.
A fixed location on disk (usually sector 0), containing boot loader and partition table.
Read by BIOS on machine boot.

Alternative File Access Method: Memory Mapping

Memory mapping: a way to not use the file system.

阅读全文

COMP310-Intro-to-FileSystems

COMP310 2023/3/25

OS = Process Management + Memory Management + File System + I/O

I/O Devices

I/O System Architecture
Buses Type:

PCIE (peripheral component interconnect express)
- PCIE is often used to connect to high performance devices
DMI (Direct media interface)
USB (Universal serial bus)
eSATA (external SATA)
- SATA (Serial ATA)
- ATA (the AT attachment, in reference to providing connection to the IBM PC AT)
  Reason for the classification(architecture)
  1. Physical significance
    Higher performance devices connected via high performance buses(wider and shorter)
  2. To put a large number of devices into the system

How does OS communicate with I/O devices?

By Canonical Device Interface.
Canonical Device Interface:
Interface made of 3 registers:

Status - current status of device
Command - OS tells device what command to perform
Data - send/receive data from device

OS alters the registers to communicate with IO/O devices

CDI(Canonical Device Interface)
All the hidden internals are visible to the hardware but invisible to the OS.

How does OS use device interface?

DMA
Polling
OS waits untile the device is ready. Repeatedly checks the STATUS of the registers in a loop.
- Pro: simple
- Con: Wasted CPU cycles
Interrupt
Put process requesting I/O to sleep. -> Context switch to a different process. -> When I/O finishes, wake sleeping process with an interrupt. -> CPU jumps to Interrupt Handler in the OS. -> wake up A
- Pro: no waste of CPU cycles
- Con: expensive context switch (polling can be better for fast devices since $T(I/O)<T(context switch)$)
  1. Same mechanism is used for demand paging
  2. A is waken up by interrupt handler
NOTICE: Interrupt is generated by the hardware to initiate a context switch(switching between user and kernel mode). Syscall is generated by the process to request functionality from kernel mode, also initiates a context switch(switching between user and kernel mode). Trap goes from user space to kernel space.

How to handle different devices in OS?

The problem is that each device may have its own protocol(The interface is specific).

Solution: Abstraction:
Adding a layer of abstraction between OS and devices: device driver.

Device driver creates a connection between each device and the OS.
Each device needs its own driver
Device driver is a piece of software in OS

Permanent Storage

Permanent across machine failures/restarts
Permanent across disk failures

Permanent Storage Media:
Flash SSD, HDD, …

File System API

FILE

An un-intepreted collection of objects composed of bytes, records,etc.
- un-intepreted: File system does not know the meaning of the data; only application does

Type:
Typed = File System knows what the object means
(Sometimes, the file type is distinguiished by file extensions but it actually depends on the OS itself: UNIX ignore the extension, Windows does not ignore)

Pro: prevent errors; efficient storage
Con: inflexible; lot of code

We only look untyped files in this course

File System Primitives

Checking the lecture

Access
Concurrency
Naming
Protection

ACCESS

Sequential & Random Access

Sequential:
A portion of the file is read in order. If the read(write) is suspended and restart, it will continue at the last suspension point.
Random:
No connection between accesses in different times.

Read() itself is random access but we can use file pointer to convert it into sequential access.

Sequential access is very common and all OS provide it. Not all OS provide random access.(but they may provide seek())

CONCURRENCY(concueerent sequential access)

Two processes access the same file.
Use Open() and Close() to deal with the situation.

The combination of Open() and Read() is easy; however, the combination of Open() to Write() is tricky since the file itself is being modified as writing.

Seperate file instances altogether
Writes by one process not visible to others.
Seperate file instances until Close()
Write visible after Close(), OS decides the merging
Single instance of the file
Writes visible immediately to others

In all cases, fp is private.

NAMING

Naming primitives
Naming = mapping : human-redable string -> uid
Directory(folder) = collection of mappings(namings)

Directory Structure

Flat structure
Two-level structure
Hierarchical
Hierarchical directory structure can have different shapes(data structure)
EXP: tree, acyclic graph(acyclic graph enables short cut creation compared to tree)

Shortcut creation(in Linux): Hard Link & Soft Link
Hard Link: Creates a new mapping of a string to the existing UID
Soft Link: Creats a new mapping of a string to the existing string

LINUX PRIMITIVES

Collapses in a single interface of the previous primitives.

UID is never visible at the user level!!

Disks

Structure of DISK
Each platter has two surfaces, each surface has its own tracks; each track has its own sectors.
Cylinder: a logical grouping of tracks (same distance from the spindle)
Block: multiple sectors make up a block.

Disk Interface

Disk interface is accessible by sector only.

For memory management, the CPU can directly access memory. CPU can’t read directly from disk, it must go from OS.

Disk Access

Head Selection: choose the platter to read/write
Seek Time: choose the corresponding track
Rotational Latency: point to the corresponding block
Transfer Time: reading or writing to the devixce
Controller Overhead:

Seek Time dominates the time complexity.
Disk Access time >> Memory access time(nanoseconds)

Component	Time
Head Selection	nanoseconds
Seek Time	3-12 milliseconds
Rotational Latency	2-7 milliseconds
Transfer Time	microseconds
Controller Overhead	less than 1 millisecond

To minimize latency, disk saves the file to the nearest track with priority.
When the disk is doing sequential access, the seek and rotational delay only appears once.

Optimizing Disk Access

Rules:

Do not access disk, use a cache
What? -> Keep currently accessed blocks in memory.
Why? -> Reduce latency and disk load
How? -> Reserve kernel memory for cache
*Cache entries: file blocks (of block size)
- Read with a cache
  - If in cache: return data from cache
  - If not: find free cache slot -> initial disk read -> when disk read completes, return data
- Write with a cache
  - writing will always be buffered in memory, so we always writing in the cache
  - How does it get to disk?
    - write through: after writing to the cache, directly write to the disk and then return to the user
    - write behind: return to the user first then write to the disk
      (The buffer needs to be flushed into the device at some point later)
  Response time: write-behind is better
  Disk Load: write behind is better
  Crash: write-through is better
Read ahead and do not wait disk (prefetching)
#Only for sequential access
What? -> request a block and the next blocks will be read also
Why? -> No disk I/O on user access to block i+1
How? -> Put the next blocks in the buffer cache
Minimize Seeks
- Clever Disk Allocation
  Idea: locate related data on same cylinder, and allocate related(sequential access / consecutive blocks in the same file) blocks together(on the same or nearby cylinder)
- Disk Scheduling
  Idea: reorder requests to seek as little as possible, rearranging the order of file tasks(read/write)
  - FCFS
  - SSTF(Shortest Seek Time First)
    - Pro: good seek times
    - Con: Starvation
  - SCAN
    - Seperate the movement in two parts, one continually moving up, another continually moving down
  - C-SCAN
    - always moving in one direction (see class)
  - C-LOOK
    - same as C-SCAN but not going all the way to zero or max of cylinder
      
      Clever disk allocation and disk scheduling are complementary.
      Low load: clever allocation is better (not many scheduling oppurtunities)
      High load: disk scheduling is better
Avoid rotational latency

COMP310 Lecture Notes

阅读全文

COMP310_MemoryManagement:DemandPaging

COMP310 2023/3/6

Memory Management: Demand Paging

For system using right now, most are based on paging instead of segmentation or base&bounds.

Base&Bounds => used only for niche
Segmentation => abandoned
- high complexity for little gain
- Effect approximated with paging + valid bits
Paging => universal

Segmentation can be compacted but hard to manage. Segmentation usually used in embedded development.

Paging

Terminology:
Page: fixed-size portion of virtual memory
Frame: fixed-size portion of physical memory
Page size = Frame size
Typical size: 4k-8k (always power of 2)

Virtual address space
Linear from 0 up to a multiple of page size
Physical address space
Non-contiguous set of frames, one per page
Virtual Address
- Two components in the virtual address :
  VPN:Virtual page number
  Offser:offset within the page (page size = $2^{offset}$ Bytes)
- VPN –(Address Translation(MMU))–>PFN(Physical Frame Number)
MMU
- Page Table
  - Page table is the data structure mapping VPN to PFN.
  - Each process has a page table. -> each time we do a context switch, the page table needs to be switched also
- MMU need a pointer to page table in memory and the length of the page table

Advantage & Disadvantage

Advantage: easier to implemented compared to segmentation with compaction
Disadvantage:
- P1: may have free(null) page = sparsely used page may appears = internal fragmentation $\Rightarrow$ external fragmentation
  The severity of this problem depends on the page size
- P2: Low translation performance (virtuial address ->page table(in main emory)->accession(in main memory)) = reducing performance by factor of 2

Solution to P1: valid/invalid bit

We can denote the corresponding pages with a valid bit to indicate whether its used or not. (Used for valid bit, unused for invalid bit)
This actually means the page table itself no longer covers the entire virtual address space(all possible pages in thrtual memory) but the allocated pages by the process.

Solution to P2: TLB(Translation Lookaside Buffer)

TLB(Translation lookaside buffer) is a part of MMU which makes the map(translation from virtual address(page number) to physical address) fast.

Part of the page table is in the MMU(TLB). If the map is in it, we get a “hit”, thus no need to find in memory for the map again. If a miss happened, we need to find it in the memory.

Page Table is stored in main memory and MMU has a pointer to it with the length. TLB is not in main memory but in MMU.

Notice that actually finding in memory is by use TLB again.

TLB MISS1
TLB MISS2

The hardware basis of TLB
Use associative memory(special hardware)

Regular Memory -> lookup by address
Associative Memory -> lookup by contents & lookup in parallel(lookup all at the same time)

Regular Memory

Key = Pageno , Value = Frameno

TLB Size
TLB is expensive and small(64-1024 entries)
Want TLB hit rate close to 100%
If TLB full, need to replace existing entry

TLB Hit Rate and Locality

Temporal Locality -> An instruction that has been accessed will be re-accessed soon in the future
Spatial Locality -> An instruction is accessed, then the instructions near to it will likely soon be accessed

Accession to an array can be both an example to spatial locality and temporal locality

They are used together most of the time but may depend.

Process Switching with TLB

Problem

Inability to distinguish between pageno of different processes in TLB.
TLB PROBLEM

Solution

Invalidation of TLB : invalidate all TLB entries(by valid bits) as switching
- Pro: Simply requires one Valid Bit
- Con: process switch expensive
- Con: new process incurs 100% TLB misses
Process Identifier : add process identifier to TLB entries
- Con: complicated
- Pro: cheaper

Computation for the size of TLB

Check the lecture
COMPUTATION

Explanation:
4kB Pages => The number of entries in a page is 4kB(4028).
Page table entries => number of entries in a page table
Every page table entry => the size of each table entry in the main memory

How to make Page Table Smaller?

Big Pages
Segmentation + Paging
Multi-level page tables

Big Pages

Advantage: Easy to implement
Disadvantage: Larger internal fragmentation
Most systems use 4KB or 8KB pages in common sense

Segmentation + Paging

Divide address space into segments, than devide each segment into fixed-sized pages.
Virtual address divided into three portions:

seg

page number

page offsers

Advantage:

Supports sparse address spaces
- decrease size of page tables
- if segment not used, not need for page table
enable sharing
No external fragmentation

Disadvantage:

each segment might have large page table
must allocate each page table contiguously

Multi-level Page Tables

Turns the linear page table into tree structure. The level is not limited.

Leaving the assumption

Previous discussion is based on the assumption that the whole program resides in the memory. However, now we are dropping this assumption.

Impossible to reside a program in a memory if it’s huge
Only needed pieces need to be resided <= load the needed pieces from the file system => many advantages(multiple programs running in a memory&load useful pieces only)

Swapping

Def:
Store the whole program temporarily on the disk to get rid of one or more processes.
Explanantion:
Processes running on the OS resides on disk; whenever the scheduler(context switching) decides one to run, the whole process switch into the memory from disk.(also the same for switching out)
Disadvantage:

High latency since need to read from disk(~100million cycles).
Parallelism disabled if the program switching in is huge.
Solution:
Demand Paging

Demand Paging

Difference with swapping:

Swapping = all of the program is in memory or all of the program is on disk
Demand paging = part of program is in memory

Reason for use demand paging:

We want to create an illusion for the program that it has a huge memory space(virual memory space >> physical memory space)
Shorter process startup latency
- start process with part of them in memory (Even 1 page suffices)
Better use of main memory
- Enable parellelism of running multiple programs

Location of the program:
Part of it is in memory but most of it is on disk(a special partition called backing store(only the OS has the access to the backing store)).

NOTICE: Backing store has all the page information of a program.

NOTICE: CPU only directly access memory. CPU access data on disk through OS(OS connect page to the resding backings store).

High-level Mechanism:

Page fault:
Program needs to access part only on disk
Page fault handling:
The process for the OS to handle the page fault: program suspended -> OS runs and gets page from disk -> program is restarted

Discover Page Fault
Idea: Using the valid bit in page table
- With demand paging:
  (Exception from the following situation raise error of the OS)
  Valid bit = 0: page is invalid OR page is on disk
  Valid bit = 1: page is valid AND page is in memory
  (indicates more info needed for the OS (Invalid/on-disk?))
Suspendisng thne Faulting Process
Idea: Trap into the OS
- invalid bit access generates trap
- Process info stored into the PCB and suspend the process(to increas the efficiency of the CPU)
  
  Trapping: Making the switch between the user mode and the kernel mode.
Getting the page from the disk
Idea: OS handles fetchs from disk
- Allocate the free frame(if exist) in memory to the process
- Find page on disk and transfer the page from disk to frame
  - Two tables are needed here for transferring: Extra table mapping page to backing store
- While the disk is Busy
  - invoke scheduler to run another process
  - when disk interrupt arrives, suspend the running process and get back to page default handling
Completing page fault handling
- set the introduced page to valid and assign it a frame number(renew page table)
- set process state to ready
- invoke scheduler => next time to run will try toaccess the page again

Demand Paging Summary1
Demand Paging Summary2

How to find a free frame in the main memory?
If no free frame available, pick a frame to be replaced -> invalidates its page table entry and TLB entry -> write the frame to disk

NOTICE: No worry of loosing information by replacing since all pages are stored in backing store.

How to pick the page/frame to replace?(raplacing policies)

Random
FIFO
OPT
LRU

Goal: minimizing the page fault since they slow down the program

Random (Relatively universal)
Random page is replaced.

PRO: Easy to implement
CON: Does not take advantage of spatial/temporal locality

FIFO
Oldest page is replaced.(The first one brough into memory)

PRO: Easy to implement, Fair
CON: Does not account of “hot” pages

OPT(An optimal algorithm)
Replace the page that will be referenced the furthest in the future.(OPT insetad of approximated OPT)

PRO: Provably optimal
CON: Hard to implement(Can’t predict)

OPT always acts as a basis of comaprison for other algorithms

LRU(Least Recently Used)
Use the past information to predict future usage

PRO: Approximate OPT
CON: Hard to implement, does not handle all workloads well

是否virtual address包含了在硬盘上的部分而并没有对于物理内存形成一个完全的一一映射？

COMP310 Lecture Note

阅读全文

1 2 3

Terrance

Honor Mathematics & Honor Econ & Honor CS

LOADING

Aubade_Blog

Persisting will immortalizes freedom.

Handling Crashes and Performance

RAID

Building Multiple Disks

Approaches

JBOD (just a bunch of disks)

RAID(Redundant Array of Independent Disks)

Mapping

Redundancy

Distributed File Systems

Distributed System

Type

Distributed File System

Network File System (NFS)

COMP310-Advanced FS-LogStructureDesigns

Advanced FS: Log-Structured Designs

Handling Crash

Solution: Atomicity

Implementation

Shadow Paging

Intentions Log

Comparision: which one is better

Log-structured File System(LFS)

Strategy

Data Structures

Naive Idea:

Better Idea:

Recovery

Log-structure key value stores

LSM KVs Intervals

Problems

Persistent Storage:Basic File System Implementation

File System Role

File System Implementation

Disk Data Structures

Allocation of files to blocks

Strategy:

Operations of data stuctures on disk

Operations of data stuctures in memory

Cache

Cache Directory

(System-Wide) Active File Table

(Per-Process) Open File Table

Combination of Memory Structure with Disk Structure

Setting up the FS

Disk Partition(Slicing)

Mounting a FS

Booting

Alternative File Access Method: Memory Mapping

COMP310-Intro-to-FileSystems

I/O Devices

How does OS communicate with I/O devices?

How does OS use device interface?

How to handle different devices in OS?

Permanent Storage

File System API

FILE

File System Primitives

ACCESS

CONCURRENCY(concueerent sequential access)

NAMING

LINUX PRIMITIVES

Disks

Disk Interface

Disk Access

Optimizing Disk Access

COMP310_MemoryManagement:DemandPaging

Memory Management: Demand Paging

Paging

Advantage & Disadvantage

Solution to P1: valid/invalid bit

Solution to P2: TLB(Translation Lookaside Buffer)

Process Switching with TLB

Problem

Solution

Computation for the size of TLB

How to make Page Table Smaller?

Big Pages