File system

Scenario
Storage
- How to save a file in one machine
- How to save a much larger file in one machine
Scale

Scenario

Write a file
Read a file
Use multiple machines to store these files

Storage

How to save a file in one machine

Metadata
- FileInfo
  - Name = dengchao.mp4
  - CreatedTime = 201505031232
  - Size = 2044323
- Index
  - Block 11 -> diskOffset1
  - Block 12 -> diskOffset2
  - Block 13 -> diskOffset3
Block
- 1 block = 1024 Byte
- Advantages
  - Error checking
  - Fragmenting the data for storage

How to save a much larger file in one machine

Change chunk size
- 1 chunk = 64M = 64 * 1024K
- Advantages
  - Reduce size of metadata
- Disadvantages
  - Waste space for small files

Scale

Architecture style

Peer 2 Peer (BitComet, Cassandra)
- Advantage: No single point of failure
- Disadvantage: Multiple machines need to negotiate with each other
Master slave
- Advantage: Simple design. Easy to keep data consistent
- Disadvantage: Master is a single point of failure
Final decision
- Master + slave
- Restart the single master

How to save an extra large file on several machines

One master + many chunk servers

Move chunk offset from master to slaves

Master don't record the disk offset of a chunk
- Advantage: Reduce the size of metadata in master; Reduce the traffic between master and chunk server

Write process

The client divides the file into chunks. Create a chunk index for each chunk
Send (FileName, chunk index) to master and master replies with assigned chunk servers
The client transfer data with the assigned chunk server.

Do not support modification

Read process

The client sends (FileName) to master and receives a chunk list (chunk index, chunk server) from the master
The client connects with different server for reading files

Master task

Store metadata for different files
Store Map (file name + chunk index -> chunk server)
- Find corresponding server when reading in data
- Write to more available chunk server

Failure and recovery

Single master

Double master (Apache Hadoop Goes Realtime at Facebook)
Multi master (Paxos algorithm)

What if a chunk is broken

Check sum 4bytes = 32 bit
Each chunk has a checksum
Write checksum when writing out a chunk
Check checsum when reading in a chunk

Avoid loss of data when chunk server is down

Replica: 3 copies
- Two copies in the same data center but on different racks
- Third copy in a different data center
How to choose chunk servers
- Find servers which are not busy
- Find servers with lots of available disk space

How to recover when a chunk is broken

Ask master for help

How to find whether a chunk server is down

Heart beat message

How to solve client bottleneck

Client only writes to a leader chunk server. The leader chunk server is responsible for communicating with other chunk servers.
How to select leading slaves

How to solve chunk server failure

Ask the client to retry