Primary/Backup Replication
Two main replication approaches:
- State Transfer: Primary replica executes and sends 
newstate to backups machine; - Replicated State Machine: Primary just pass the raw external event to backups. Mostly used by recent industry and papers;
 
Overview:
- VM-FT consist of two machine: 
primaryandbackup. Primary deals with all external events and replicates it to backup through “logging channel”; - VM-FT emulates a local disk interface through two remote disk server.
 
Log Entry
events can’t determine all situation, FT must handle the following divergence:
- 指令本身的差异:Most instructions execute identically;
 - 机器所处外部的信号差异:Input from external world: network packet, DMA data, OS interrupt, etc;
 - 并不是所有指令都是状态指令(唯一输入唯一输出的纯函数)。比如:读取当前时间、随机数发生器(某种意义上说与前者是一类);
 - 并发与多核:Parallelism and multi-core races;
 
为了使得 pirmary 和 backup 的状态完全一样,我们就必须要处理上面列举的一些异常情况,在它们通信时将这些信息传递出去使得它们执行的代码完全一样。
So the log entry who transfers message between Primary and Backup should contain these message below:
- Instruction number, interrupt type, interrupt data;
 - Example:
- When executing the 120120(instruction number) instruction since boot;
 - Program get network packet (interrupt type);
 - Carrying a tcp hand shake ACK (interrupt data);
 
 
Timer Interrupt
How does FT handle timer interrupts?
Goal: Make sure primary and backup should see interrupt at totally same situation;
Primary should do:
- FT fields the timer interrupt;
 - FT reads instruction number X from CPU;
 - FT send instruction number X on the logging channel to 
Backup; - FT delivers interrupt to 
Primaryand resume executing; 
Backup should do:
- Ignores its own timer hardware;
 Backupsee instruction number X from logging channel before the exact instruction executed;- FT tells CPU to “interrupt me at instruction X”;
 - FT mimics a timer interrupt to 
Backup; 
Network Interrupt
How does FT handle arrival of network packet?
Goal: Exactly same as timer interrupt, with data designating.
Primary should do:
- Boosting: Tells NIC (Network Interface Controller) to copy packet data into FT’s private “bounce buffer”;
 - At some point, NIC does DMA then interrupt:
- FT pause the primary;
 - FT copies the “bound buffer” into 
Primary’s memory; - FT simulates a NIC interrupt in 
Primary; - FT send “packet data” and “instruction number” to 
Backupthrough log channel; 
 
Backup should do:
Backupreceived instruction number from log stream;- FT tells CPU to interrupt at instruction X;
 - FT copies the data to backup memory and similates NIC interrupt in 
Backup; 
Bounce Buffer
What bounce buffer?
- Bounce buffer is a FT’s memory area that store network packet data;
 
Why bounce buffer?
- We want the data to appear in memory at exactly the same point in execution of the 
PrimaryandBackup; 
More Rule
Output Rule
Suppose we encountered the following situation:
Primarycrashes jsut after sending the reply to client;Backupdoesn’t receive any event fromPrimarybecause it has crashed;
Output rule was brought up to deal with this:
Primaryshould repsonse to client after receivingACKnowledgementfromBackup;
Split Brain
Suppose we encounter the following situation:
- Network between 
PrimaryandBackuphas been cut over; - So both machine think the other one is dead, and think it’s the 
Primaryand stop sending logging event; 
This is a common problem called “split brain”. FT creat a center server support atomic test-and-set instruction, machine who get flag can become Primary.
Summary
When might FT be attractive?
Critical but low-intensity services: name server;
Services whose software is not convenient to modify;
What about replication for high-throughput services?
- Recommend: Applicative-level replicated state machines;
- Example: Database state machine, database only support a limit set of command which is easier to transfer message;
 
 - Result: less fine-grained synchronization (更细粒度的同步), less overhead;