Google File System (Distribute Storage)

分布式存储的困难点:

  • 高性能:High performance in many server;
  • 多机器:System with many machine could cause “constant fault”;
  • 一致性错误:To avoid contact fault, we will need replication;
  • 数据同步:During the replication, potential inconsistencies will occur;
  • 一致性与高性能矛盾:To get better consistency, low performance occur;

分布式存储的一个大的课题就是在“一致性”与“高性能”之间的 tradeoff.

GFS Master:

  • RAM 中存储:
    • 一个 filenamehandlers array 的映射表;
    • 每个 handler 都包含 version/chunk servers list/primary/least time 信息;
  • 磁盘中存储着 log/checkpoint

READ,客户端读取流程:

  1. C send filename and offset to the master M;
  2. M finds chunk handle for that offset;
  3. M replies list of chunkservers(aka CS) only with lastest versions;
  4. C caches this response;
  5. C sends request to nearest CS with offset;
  6. CS read the chuck file from disk and returns;

APPEND,客户端追加流程:

  1. C ask M about filename’s last chunk;

  2. If M see filename has no primary hanlder:

    • Pick a primary(aka P) and secondaries(aka S) with latest version (Only these server was allowed to handle storing filename);
    • Increment lastest version;
    • Tell P/S who they are;
  3. M response C with primary, secondaries and version;

  4. C sends append data to all:

    • The paper change thier tone after, C only sends data to the nearest replica and chain the data to all replicas which can reduce trans cost;
  5. C tells primary P to execute append;

  6. P checks that lease hasn’t expired, and chunk has space. And then P picks an offset (at end of chunk), writes chunk file (a Linux file) at the offset;

  7. P tells each secondary the offset, tells to exexute append to chunk file

  8. P waits for all secondaries to reply, or timeout;

  9. P tells C “ok” or “error”. C retries from start if error

Split Brain 问题:

  • 描述:一个 filename 同时对应了多个 primary 处理;
  • 原因:通常是因为 network partition 导致的(部分机器之间无法通信);
  • 解决方案:在指派 primary 时同时指派一个 lease 表示过期时间,masterprimary 同时维护这个过期时间,过期后 master 重新指派 primary

总结:GFS 优点:

  • Global cluster file system as universal infrastructure;
  • Seperation of naming system (master) and storage system (chunk servers)
  • Sharding file into chunk for parallel throughput;
  • Huge files/chunks to reduce overheads;
  • Designation primary to achieve sequential writes;
  • Lease to prevent split-brain chunk servers primaries;

GFS 缺点:

  • Single master performance: ran out of RAM and CPU
  • Chunkservers not very efficient for small files
  • Lack of automatic fail-over to master replica;
  • Maybe consistency was too relaxed