Precisely speaking, this post is about Postgres-BDR and CAP theorem. However, the concept of what I am talking about should not only be applied to Postgres-BDR.
Intro
I had needed to evaluate replication of PostgreSQL recently. One of the tools I had to evaluate is Postgres-BDR. Basically, Postgres-BDR is a Master-Master tool to replicate data cross different PostgreSQLs.
What’s the relationship of database replication and CAP theorem?
- After I read through documents of Postgres-BDR. I found this phrase: asynchronous replication is often called an “eventually consistent”. That’s one of properties of CAP theorem - consistency.
- CAP theorem is used to discuss any networked shared-data systems. Multi-node of RDBMS is just subset of that kind of system.
After I found this relationship between Postgres-BDR and CAP theorem. I experimented with Postgres-BDR in a very limited resource to form my hypothesis.
What did I learn from Postgres-BDR and CAP theorem?
- CAP theorem is still a theorem could help you to deal with the problems related to distributed storage system . Although there were some debates in the early year. The points are:
- Consistency (C) and high availability (A). You don’t have to pick one of them.
- Consistency and high availability are more like percentile in the abstract. You could pick 70 % of C and 30 % A.
- Performance
- When you decide toward to more availability, you could gain more throughput. The reason for it is you have to do replication in the manner of synchronousness. You simply don’t need to wait for replication finish.
- That’s one of the reasons why Postgres-BDR is really fast. (Another reason is Postgres-BDR replicates from WAL (Write-Ahead Logging).)
- In contrast, more consistency means more latency because of you have to replicate synchronously.
- In DynamoDB, it supports Eventually Consistent Reads and Strongly Consistent Reads. AWS also mentions:
- In DynamoDB, it supports Eventually Consistent Reads and Strongly Consistent Reads. AWS also mentions:
- When you decide toward to more availability, you could gain more throughput. The reason for it is you have to do replication in the manner of synchronousness. You simply don’t need to wait for replication finish.
- How could CAP exist at the same time?
- How does system limit the behavior of accessing distributed storage system?
- How do you resolve the conflict?
References:
- BDR overview
- The documentation of Postgres-BDR is really great and well written. I encourage anyone who is interested in database, replication, or NoSQL could read the link.
- CAP Twelve years later: How the “Rules” have Changed by Eric Brewer at ResearchGate
-
CAP and Architectual Consequences by Martin Schoenert