RAID
RAID is used for combining many disks, for: Capacity, Reliability, Performance
Challange: Most FS work with one disk
Building Multiple Disks
Approaches
JBOD (just a bunch of disks)
If the application is smart, it can stores different files on different file systems.
RAID(Redundant Array of Independent Disks)
Create the illusion of one disk from many disks.
The core is to use fake logical disk.
Essential Idea
- Optimize I/O bandwidth through parallel I/O
- Parallel I/O = I/O to multiple disks at once
Strategies
- Mapping
- Redundancy
Mapping
Provides an illusion that multiple disks behave as one.
Striping
Striping is a form of mapping. It put file across several disks : File = Stripe0 | Stripe1 | Stripe2 …
RAID-0 : No Redundancy
- Uses Striping
- Best possible read and write bandwidth
- Failure results in data loss -> If one of the disks get crashed, then file may loss
- More disks increase throughput but not latency
Latency: How fast we can do a request
Throughput: How many request can one do in a unit time
Redundancy
Redundancy is leading in to solve the problem of tolerance to disk failure.
The core idea is: store redundant data on different disks
it’s an improvement based on mapping
RAID-1 : Mirroring
Mirroring does not only increase the tolerance to disk failure, but also boost the efficiency for reading.
- Storage capacity is half of the whole disk
RAID-4 : Parity Disk
N data disks + 1 Parity Disk
Parity:
- A simple form of error detection and repair
- Not specific to RAID
- Also used in communications
Parity Updates:
Each time when we write to the disk we need to update the parity.
- Additive Parity = Read all other datablocks in parallel and XOR them with the new block
- Subtractive Parity
- Read old data and parity in parallel
- compare new data with old data
- if new data == old data -> do nothing
- else, flip old parity bit
RAID-4 requires access to parity disk for each write, this cause bottleneck in write-heavy workload.
Parallelism in data disks puts time lag on parity disk
The issue is called small-write problem
RAID-5 : Distributed Parity
Distributed File Systems
Distributed System
A distributed system is one where a machine I’ve never heard of can cause my program to fail.
Definition: More than 1 machine working together to solve a problem.
Advantage:
- More computing power
- More storage capacity
- Fault Tolerance
- Data sharing
Availability: Able to access the machine though others crack down
Fault-tolerance: Fault procession is incorporated as a functionality
Scalability: More machine, better performance
Transperancy: No realization to distribution
Type
- Clien/Server Model
- Peer-to-peer Model
Distributed File System
File systems are great use for distributed systems.
- Local FS: Processes on same machine access shared files
- Network FS: Process on different machines access shared files in same way
Network File System (NFS)
The client must mount a seperate file system named NFS to access the file stored in the file server. The accession must go through NFS by Remote Procedural Call.
Problems
Solution for handling crashes
- Stateless protocol with
- Idempotent operations : The operations which can be done many times without chhanging info
The issue of this is thre path may depends on the file storing structure. Which means each time the same access to the same path does not always be the same.
We can retry Read for many times, but not for write since multiple modification may deviate from the purpose, thus we introduce the idempotent operations.
Use offset to rewrite particular part.
Solution for slowness