Advanced FS: Log-Structured Designs
Key Concepts
- Log-Structure File System
- Log-Structure Key-Value Store
Reason
Dealing with Crashes:
If the crash happens at the point in the middle of the data, thenn it will be a problem since the os doesn’t know which part is the new data and which part is the old data.
For write-behind, the crash may be a problem also even after finished since the file may not be transffered.
Handling Crash
Crash: a situation which the operation of the application errs, stopping its functionality.
Solution: Atomicity
Implementation
Assumption:
- A single sector disk write is atomic
The assumption is true with very high probability(YES)
Approach: - Shadow Paging
- Intentions Log
Shadow Paging
- Make sure both old copy and new copy are on the disk.
- Make atomic switch between the two => By doing a
WriteSecor()
- The atomic change must be done on inode since it’s smaller than sector, so we write in inode entry
Step(Write Through):
We first write new data blocks on the disk -> after each writing, we update the inode in memory pointing to the new block -> After the update is done in cache, we use atomic writesector() -> Start garbage collection to clean the old data blocks
Step(Write Behind):
The difference to write-through is that the data will not be directly write to the disk, instead, they will be write to the memory first then transfer to the disk. The block space will be allocated first for the disk before transferrence.
The step of transferring the block to the disk is not atomic.
(If failed, we may still recover the old version of the file) <- Notice that the old blocks are not collected as the garbage yet
But for the WriteSector()
instruction, it is atomic.
The old blocks will be cleaned finally.
The OS will de-allocate the data blocks; if the OS crashes before de-allocation, FS will check.
Intentions Log
Log: Special area on disk which can only be appended instead of being overwritten
Log itself acts as an intermediate media for transferrance between cache and disk.
The log will still be checked if the old data is there.
The log will be garbage collected after all the instructions are finished.
Comparision: which one is better
Criterion
- The number of disk I/O
- the number of random disk I/O
It’s true theoratically that the shadow paging takes less operations; however, intention log works better.
Log works better mainly because accession in log is sequential, and there’s less data fragmentation inside the log compared to the shadow paging. Shawdow paging requires jumping around facing accession, but log does not.
Log-structured File System(LFS)
- Alternative way of structuring file system
- Log = append-only data structure(on disk)
Idea: Use the disk purely sequentially
- Good for writing due to the sequential property
- Bad for reading since may read from spacing sectors
Strategy
Data Structures
We store all the information including both data blocks and inodes to the log, which causes the problem of distinguishing between them. This need to be solved by using specific data structure.
Naive Idea:
- Use current offset on disk instead of table index for uid
- When update inode, inode number changes
Because the log can only append
Better Idea:
- Add a level of indirection
- Map: file uid -> inode location on disk
- Data structure is called Imap(Inode map)
Recovery
Log-structure key value stores
Key-Value Stores(KVs)
Crucial component in systems with large amounts of diverse data, unstructured data.
Log-structured key value stores
LSM KVs Intervals
- Commit log is a datastructure used to backup the info in Memory to the disk
- Commit log is optional and can be turned off(makes the transmission faster)
- Memory Sorted Buffer is any data structure preserving the order
LSM Writing
Each time the client write something to the memory, if the commit log is on, OS will automatically transfer the info to the commit log.
When the buffer is full, we write the old one from the memory to disk sequentially, creating one sorted immutable files. -> Then we empty the write buffer and the commit log.
LSM Reading
The above hierarchy is not the actual separation, instead, they are logical hierarchy.
- L0 is reserved for flushing memory component to disk (Special since multiple version of the file can be in L0 at the same time)
- Starting from L1 (to Ln), each key in the key space can exist in only one of the files, which is guarantee by the key value store.
- The oldest version is in the lowest level, newest is in the highest level.
L0 can contain many files with the same K, but L1 - Ln only contain one file with its own key
one Key k can correspond to multiple files in different level and we only want the latest version.
Read from memory =(not in memory)=> Read from Block cache =(not in cache)=>Sequentially iterates through L0 to Ln
The log-structured key value stores is optimizaed for writing but not reading
Each file contains a range of the key, so we can determine where does the key reside. In L0, the range of the file is not distinguished.
LSM Internal Operations
Problems
Ideal WA = 1