Handling Crashes and Performance

2023/4/1 COMP 310 COMP310 Lecture Notes

RAID

RAID is used for combining many disks, for: Capacity, Reliability, Performance
Challange: Most FS work with one disk

Building Multiple Disks

Approaches

JBOD (just a bunch of disks)

If the application is smart, it can stores different files on different file systems.

RAID(Redundant Array of Independent Disks)

Create the illusion of one disk from many disks.
The core is to use fake logical disk.

Essential Idea

Optimize I/O bandwidth through parallel I/O
Parallel I/O = I/O to multiple disks at once

Strategies

Mapping
Redundancy

Mapping

Provides an illusion that multiple disks behave as one.

Striping
Striping is a form of mapping. It put file across several disks : File = Stripe0 | Stripe1 | Stripe2 …

RAID-0 : No Redundancy

Uses Striping
Best possible read and write bandwidth
Failure results in data loss -> If one of the disks get crashed, then file may loss
More disks increase throughput but not latency

Latency: How fast we can do a request
Throughput: How many request can one do in a unit time

Redundancy

Redundancy is leading in to solve the problem of tolerance to disk failure.
The core idea is: store redundant data on different disks
it’s an improvement based on mapping

RAID-1 : Mirroring

Mirroring does not only increase the tolerance to disk failure, but also boost the efficiency for reading.

Storage capacity is half of the whole disk

RAID-4 : Parity Disk
N data disks + 1 Parity Disk

Parity:

A simple form of error detection and repair
Not specific to RAID
Also used in communications

Parity Updates:
Each time when we write to the disk we need to update the parity.

Additive Parity = Read all other datablocks in parallel and XOR them with the new block
Subtractive Parity
- Read old data and parity in parallel
- compare new data with old data
- if new data == old data -> do nothing
- else, flip old parity bit

RAID-4 requires access to parity disk for each write, this cause bottleneck in write-heavy workload.
Parallelism in data disks puts time lag on parity disk
The issue is called small-write problem

RAID-5 : Distributed Parity

Distributed File Systems

Distributed System

A distributed system is one where a machine I’ve never heard of can cause my program to fail.
Definition: More than 1 machine working together to solve a problem.
Advantage:

More computing power
More storage capacity
Fault Tolerance
Data sharing
Availability: Able to access the machine though others crack down
Fault-tolerance: Fault procession is incorporated as a functionality
Scalability: More machine, better performance
Transperancy: No realization to distribution

Type

Clien/Server Model
Peer-to-peer Model

Distributed File System

File systems are great use for distributed systems.

Local FS: Processes on same machine access shared files
Network FS: Process on different machines access shared files in same way

Network File System (NFS)

The client must mount a seperate file system named NFS to access the file stored in the file server. The accession must go through NFS by Remote Procedural Call.
Problems

Solution for handling crashes

Stateless protocol with
Idempotent operations : The operations which can be done many times without chhanging info

The issue of this is thre path may depends on the file storing structure. Which means each time the same access to the same path does not always be the same.

We can retry Read for many times, but not for write since multiple modification may deviate from the purpose, thus we introduce the idempotent operations.

Use offset to rewrite particular part.

Solution for slowness

LOADING