Introduction to the CAP Theorem
The CAP Theorem, formulated by Eric Brewer in 2000, states that a distributed system cannot simultaneously provide all three of the following guarantees: Consistency, Availability, and Partition Tolerance. Understanding CAP is essential for designing resilient microservices.
1. Understanding CAP Components
Consistency (C)
Every read receives the most recent write or an error. All nodes see the same data at the same time.
# Strong Consistency Example
class StronglyConsistentStore:
def write(self, key, value):
# Write to all replicas synchronously
for replica in self.replicas:
response = replica.write(key, value)
if not response.success:
self.rollback_all(key)
raise ConsistencyError("Failed to write to all replicas")
return True
def read(self, key):
# Read from quorum of replicas
responses = [r.read(key) for r in self.replicas]
if all_same(responses):
return responses[0]
raise ConsistencyError("Inconsistent data across replicas")Availability (A)
Every request receives a response, without guarantee that it contains the most recent write.
# High Availability Example
class HighlyAvailableStore:
def write(self, key, value):
# Write to any available replica
for replica in self.replicas:
try:
return replica.write(key, value)
except ReplicaUnavailable:
continue
raise AllReplicasDown()
def read(self, key):
# Read from any available replica
for replica in self.replicas:
try:
return replica.read(key)
except ReplicaUnavailable:
continue
raise AllReplicasDown()Partition Tolerance (P)
The system continues to operate despite network partitions between nodes.
# Partition-Tolerant Design
class PartitionTolerantStore:
def write(self, key, value):
# Write to local replica first
self.local_replica.write(key, value)
# Async replication to remote replicas
for replica in self.remote_replicas:
self.replication_queue.enqueue({
'replica': replica,
'key': key,
'value': value,
'timestamp': time.now()
})
return True # Return immediately
def handle_partition_recovery(self):
# Reconcile data when partition heals
for entry in self.pending_sync:
self.resolve_conflicts(entry)2. CAP Trade-offs in Practice
| System Type | Guarantees | Examples | Use Cases |
|---|---|---|---|
| CP Systems | Consistency + Partition Tolerance | MongoDB, HBase, Redis Cluster | Banking, Inventory |
| AP Systems | Availability + Partition Tolerance | Cassandra, DynamoDB, CouchDB | Social Media, Shopping Carts |
| CA Systems | Consistency + Availability | Traditional RDBMS (single node) | Non-distributed systems |
3. Consistency Patterns
Eventual Consistency
class EventuallyConsistentService:
def __init__(self):
self.local_cache = {}
self.event_bus = EventBus()
async def update(self, entity_id, data):
# Update local state immediately
self.local_cache[entity_id] = data
# Publish event for other services
await self.event_bus.publish(
'entity.updated',
{
'entity_id': entity_id,
'data': data,
'timestamp': time.now(),
'version': self.get_version(entity_id)
}
)
return data
async def handle_update_event(self, event):
# Eventually receive updates from other services
if self.should_apply(event):
self.local_cache[event.entity_id] = event.data
def should_apply(self, event):
# Use vector clocks or timestamps for conflict resolution
current = self.local_cache.get(event.entity_id)
if not current:
return True
return event.version > self.get_version(event.entity_id)Saga Pattern for Distributed Transactions
class OrderSaga:
"""
Choreography-based Saga for order processing
"""
def __init__(self):
self.steps = [
SagaStep('reserve_inventory', self.reserve, self.release),
SagaStep('process_payment', self.charge, self.refund),
SagaStep('ship_order', self.ship, self.cancel_shipment)
]
async def execute(self, order):
executed = []
try:
for step in self.steps:
await step.execute(order)
executed.append(step)
return SagaResult.SUCCESS
except SagaStepFailed as e:
# Compensate in reverse order
for step in reversed(executed):
await step.compensate(order)
return SagaResult.COMPENSATED
async def reserve(self, order):
response = await self.inventory_service.reserve(
order.items,
order.id
)
if not response.success:
raise SagaStepFailed("Inventory reservation failed")
async def release(self, order):
await self.inventory_service.release(order.id)4. Building Resilient Microservices
Circuit Breaker Pattern
from enum import Enum
import time
class CircuitState(Enum):
CLOSED = "closed"
OPEN = "open"
HALF_OPEN = "half_open"
class CircuitBreaker:
def __init__(self, failure_threshold=5, recovery_timeout=30):
self.state = CircuitState.CLOSED
self.failures = 0
self.failure_threshold = failure_threshold
self.recovery_timeout = recovery_timeout
self.last_failure_time = None
async def call(self, func, *args, **kwargs):
if self.state == CircuitState.OPEN:
if self._should_attempt_recovery():
self.state = CircuitState.HALF_OPEN
else:
raise CircuitOpenError()
try:
result = await func(*args, **kwargs)
self._on_success()
return result
except Exception as e:
self._on_failure()
raise
def _on_success(self):
self.failures = 0
self.state = CircuitState.CLOSED
def _on_failure(self):
self.failures += 1
self.last_failure_time = time.time()
if self.failures >= self.failure_threshold:
self.state = CircuitState.OPEN5. Data Partitioning Strategies
class ShardingStrategy:
"""
Consistent hashing for data distribution
"""
def __init__(self, nodes, virtual_nodes=150):
self.ring = {}
self.sorted_keys = []
for node in nodes:
for i in range(virtual_nodes):
key = self.hash(f"{node}:{i}")
self.ring[key] = node
self.sorted_keys = sorted(self.ring.keys())
def get_node(self, key):
if not self.ring:
return None
hash_key = self.hash(key)
# Binary search for the next node
for ring_key in self.sorted_keys:
if ring_key >= hash_key:
return self.ring[ring_key]
# Wrap around
return self.ring[self.sorted_keys[0]]
def hash(self, key):
import hashlib
return int(hashlib.md5(key.encode()).hexdigest(), 16)Conclusion
The CAP Theorem is not about choosing two out of three, but about understanding trade-offs when partitions occur. Modern distributed systems often provide tunable consistency levels, allowing developers to make informed decisions based on their specific requirements.
Key Takeaways
- Network partitions are inevitable in distributed systems
- Choose CP for critical data, AP for high-traffic scenarios
- Eventual consistency is often acceptable with proper conflict resolution
- Use patterns like Saga, Circuit Breaker, and Consistent Hashing
- Design for failure from the beginning