Sunday, August 12, 2007

Availability & Consistency

At QCON London 2007, Werner Vogels gave a talk about Availability & Consistency and how the CAP theorem ruins it all. Werner defines scalability by as adding resources and getting a performance improvement proportional to the resources added. Also, if you add resources to improve redundancy, it must not hurt performance. A scalable service is resilient, becomes more cost effective when it grows, is capable of handling heterogeneity (as tech improves, architecture needs to improve), and is operationally efficient (# people necessary to run a node needs to go down as you scale up).

Next Werner explains the principles for scalable service design:

· Decentralize: avoid single point of failure. No centralized components.

· Asynchrony: Make progress under all circumstances even if some parts are not working. Work locally and not worry about the rest.

· Autonomy: Each node should be able to make decision purely based on local state.

· Controlled concurrency: Reduce concurrency as much as possible. Funnel things through single points, change data design to avoid fancy locking.

· Controlled parallelism: Control traffic going to each node so that there is capacity both in CPU and I/O left to do other tasks.

· Decompose into small well-understood building blocks. Same with teams. If it takes more than 10 people then the team is too big (2 pizza team). Knowledge gets shared automatically. If larger, need to have meetings.

· Symmetry: All nodes should do exactly the same thing.

· Failure tolerant:

· Local responsibility:

· Recovery built-in:

· Simplicity: