Sunday, August 12, 2007

Availability & Consistency

At QCON London 2007, Werner Vogels gave a talk about Availability & Consistency and how the CAP theorem ruins it all. Werner defines scalability by as adding resources and getting a performance improvement proportional to the resources added. Also, if you add resources to improve redundancy, it must not hurt performance. A scalable service is resilient, becomes more cost effective when it grows, is capable of handling heterogeneity (as tech improves, architecture needs to improve), and is operationally efficient (# people necessary to run a node needs to go down as you scale up).

Next Werner explains the principles for scalable service design:

· Decentralize: avoid single point of failure. No centralized components.

· Asynchrony: Make progress under all circumstances even if some parts are not working. Work locally and not worry about the rest.

· Autonomy: Each node should be able to make decision purely based on local state.

· Controlled concurrency: Reduce concurrency as much as possible. Funnel things through single points, change data design to avoid fancy locking.

· Controlled parallelism: Control traffic going to each node so that there is capacity both in CPU and I/O left to do other tasks.

· Decompose into small well-understood building blocks. Same with teams. If it takes more than 10 people then the team is too big (2 pizza team). Knowledge gets shared automatically. If larger, need to have meetings.

· Symmetry: All nodes should do exactly the same thing.

· Failure tolerant:

· Local responsibility:

· Recovery built-in:

· Simplicity:

Werner states that any algorithm that requires agreement will eventually be your bottleneck. It is guaranteed to fail (example 2 phase commit). Most of these algorithm run on O(n).

Then Werner describes scalability through smart systems engineering:

· Us scalable primitives (RPC does not scale well).

· Cache near the edges

· Employ hierarchies and functional partitioning

· Use aggregation and data fusion techniques

· Don’t conceal heterogeneity

· Be strict in what you emit, liberal in what you accept

· Avoid strong consistency properties

· Use an asynchronous design; make progress under all circumstances

· Treat configuration management as part of the system

· Involve the application into repair and recovery

· Be smart about using the environment (restarting fast can be an option).

· Exploit eager – greedy – lazy techniques.

Werner next describes the CAP Theorem and compares ACID vs. Base. He mentions that transactions give a nice programming paradigm: Ask something to be done, it either succeeds or it fails. As a programmer, you have to deal with both cases. However, a lot of systems do not handle failure cases.

Classics distributed systems focus on ACID semantics: Atomic, Consistent, Isolated, Durable. Modern Internet Systems focuse on BASE: Basically Available, Soft-state(or scalable), Eventually consistent.

ACID offers strong consistency, transactions are the highest priority, and availability is less important. It is pessimistic, does rigorous analysis, complex mechanisms.

In BASE, availability and scaling is the highest priority. Base is optimistic, does best effort, and is simple and fast. Consistency is weak.

For example, if a customer wants to store something in a shopping cart and saving fails, under ACID, you will go back to customer and tell them sorry. Under BASE, what you want is for the system to accept the write statement and then figure out how to reconcile it later.

Strong consistency: (read your writes) All clients see the same view, even in the presence of updates.

High Availability: (can always write and always read)all clients can find some replica of the data, even in the presence of failures

Partition-tolerance: The system properties hold even when the system is partitioned.

The CAP Theorem says of these 3 things (C, A, or P), you can only get 2.

In a system where partitions can happen (some set of servers cannot see another set), the storage system can either provide high availability (sometimes under failure, I will always take your writes and I will give you the reads, but I cannot guarantee that you are seeing the latest updates) or strong consistency (sometimes under failure scenarios have to say no to customer).

Werner then mentions that it is important to realize that not all storage is equal. Sometimes high availability is most important. Sometimes, high consistency is the most available. If you are using same architecture for all, then you have to use the most restrictive option. Instead, optimize based on the needs of each situation.

Examples of consistency and availability include single-site and clustered databases. Typical features used are two phase commit and cache invalidation protocols.

Examples of partition-tolerance and availability include DNS, and web caches. Typical features used are TTLs and lease cache management, optimistic updating and conflict resolution.

Unfortunately Werner runs out of time and does not cover the last couple of slides.


· Expiration-based caching: AP

· Quorum/majority algorithms: PC

· Two-phase commit: AC

Data Architecture:

· Evolve the architecture towards a simple data architecture: Those parts that can be simple should be simple

· Data distribution should be explicit to the application

· Partitioning should become a primary tool to support scalability: partition often and preferably on the fly.

· Involve the application in management of data instances: load balancing and replication should be open and under control of the application.

· Failure of the instances should be considered a common occurrence. Single failures should be masked and should not cause an interruption in the service.

· Geographical distribution of data should be application specific.

This presentation is available on InfoQ at