Every now and then I learn about something so surprising, counter-intuitive, or intriguing, that I just have to see it for myself. The 2004 SIGMOD paper by Alan Fekete, Elizabeth O’Neil, and Patrick O’Neil, “A read-only transaction anomaly under snapshot isolation,” ticked all of those boxes for me.
In what follows, I’ll dissect this research finding, describe my efforts to reproduce it in popular databases, and reflect on how it informs practical decisions.
Snapshot isolation is perhaps the most successful approach to providing concurrency with consistency guarantees in relational databases. The idea is simple: once a transaction starts it sees a view reflecting all updates committed to the database before it began, but none of the effects of other in-progress transactions, or of other transactions that commit later on in the course of its execution. Snapshot isolation is the dominant implementation of Multiversion concurrency control (MVCC), and most modern high-performance rely upon some form of it (e.g., Oracle, DB2, PostgreSQL, Microsoft SQL Server). It can eliminate much of the lock contention that otherwise plagues databases, answering reads instantly with non-blocking execution and providing consistent views of the database to long-running read-only transactions, even as updates and queries of new data continue to come in.
How surprising to learn, then, that read-only queries under snapshot isolation can sometimes return irreconcilable results? How surprising to realize that such read-only anomalies can even occur when the database exhibits serializable update behavior, meaning that concurrent transaction execution always results in a final state consistent with some sequential ordering of transactions?
The example provided by Fekete, et al. involves a bank at which the customer has both a checking account and a savings account. If a withdrawal leads to a negative combined balance, then the bank assesses an overdraft fee.
In the example, both checking and savings accounts start with balance 0, and the following concurrent transactions ensue:
- Txn 1: Add 20 to savings.
- Txn 2: Subtract 10 from checking. If doing so causes (checking + savings) to become negative then additionally subtract 1 an overdraft fee.
- Txn 3: Read balances (checking, savings).
If the deposit (Txn 1) comes ahead of the withdrawal (Txn 2) then the final combined balance is 10, whereas if the database processes these transactions in the opposite order then the final combined balance is 9.
Surprisingly, it turns out that snapshot isolation allows Txn 3 to read (checking, savings) as (0, 20), even when the final combined balance ends up as 9. Even though the database ends up in a consistent state (corresponding to withdrawal before deposit), the output it produced in Txn 3 is anomalous, it corresponds to no point-in-time snapshot of a transaction sequence that could have produced the final state!
Fekete et al. put this in context, saying the following:
Starting with [BBGMOO95], it was assumed that read-only transactions always execute serializably, without ever needing to wait or abort because of concurrent update transactions. This seemed self-evident because all reads take place at an instant of time, when all committed transactions have completed their writes and no writes of non-committed transactions are visible.
The implication is that even read-only transactions might need to choose between yielding consistent results and yielding timely results.
By the time this read-only anomaly was revealed in 2004, Oracle’s annual revenue had grown beyond $10 billion and IDC estimated the database software market at $15 billion. It’s still striking today realizing that 1) nobody recognized earlier that such foundational assumptions were wrong, and 2) that this didn’t seem to matter.
How the anomaly occurs
The table below shows sequence of reads, writes, and commits illustrating anomalous behavior under snapshot isolation.
|Txn 1||Txn 2||Txn 3|
|R(checking) → 0|
|R(savings) → 0|
|R(savings) → 0|
|W(savings) ← 20|
|R(checking) → 0|
|R(savings) → 20|
|W(checking) ← -11|
Since Txn 3 starts to read after Txn 1 commits, it sees the deposit to savings. However, because Txn 2 started before Txn 1 committed, it does not see the deposit, and so assesses the overdraft fee to checking.
Txn 1 is concurrent with Tnx 2, and the final result is consistent with the serial ordering Txn 2 before Txn 1. However, the output of Txn 3 (concurrent with Txn 2 but not Txn 1) is consistent with the opposite serial ordering, Txn 1 before Txn 2. At the end of the day, the output of Txn 3 cannot be reconciled with any serial order producing the final state.
I was so surprised to learn of this anomaly that I had to see it myself. Would databases popular in 2015 exhibit this behavior? If so, how frequently would it occur? A simple concurrency stress test provides the answers.
Anomalies per 10,000 tests
With the transaction isolation level configured as READ COMMITTED we expect to see some anomalous reads, and that’s ok as we have not requested that the database operates in a way consistent with a global total order. In practice they occur between 1% and 2% of the time for Oracle and PostgreSQL, and a bit less for DB2.
Oracle’s behavior under SERIALIZABLE transaction isolation level is roughly the same as under READ COMMITTED isolation, suggesting that this setting does not address the causes of read anomalies (I do see other differences in behavior, with some queries rejected under SERIALIZABLE isolation). PostgreSQL does much better, producing read-only anomalies at a rate less than one per million, and even then only when the test program retries transactions that initially error out due to contention. DB2 seems immune to anomalies when configured for serializability.
MySQL, as configured here, uses table-level locks and does not provide support for MVCC or snapshot isolation. This simple approach proves to be reliably free of anomalies
I collected these results on Amazon EC2 using two c4.2xlarge machines, one to host the database servers and another to run the client stress test program. The data above are for 50 database connections with autocommit disabled (interestingly, enabling autocommit can reduce read-only anomalies, but introduces additional anomalies on Oracle, including incorrect updates). I welcome you to review the code, and perhaps to contribute additional examples or support for other databases.
This study helped me understand a few things about concurrency:
- Concurrency really means that things are happening at the same time. So long as things are happening at the same time, we don’t know what comes first, and to provide consistency we must first resolve transaction order.
- Reads can behave like writes. More specifically, consistent reads demand that the database fix its state (just as measurement demands that a physical system fix its quantum state).
There are also some practical implications:
- Simple can be best. Even though Oracle, DB2, and PostgreSQL provide fancy concurrent consistency capabilities, MySQL provides consistency reliably.
- Snapshot isolation, MVCC, and even SERIALIZABLE transactions provide protection against anomalies that is still limited, and behavior usually varies between databases.
- When correctness and consistency matter, you really need to verify your application by testing it under concurrency stress.
Since 2004 some solid research has gone into eliminating anomalies from databases operating with snapshot isolation, not only the read-only anomalies described here but others too. Some approaches modify application SQL, others modify the database’s consistency guarantees (see, e.g., the Ph.D. thesis by Michael Cahill). Intuitive consistency models make life easier for programmers, and while the tests described here show that many databases popular today don’t yet incorporate fixes for snapshot isolation anomalies, it’s reassuring to know that solutions exist.
References and further reading
- A. Fekete, E. O’Neil, and P. O’Neil. A read-only transaction anomaly under snapshot isolation. SIGMOD, 2004.
- H. Berenson, P.Bernstein, J. Gray, J. Melton, E. O’Neil, and P. O’Neil. A critique of ANSI SQL isolation levels. SIGMOD, 1995.
- M. J. Cahill, Serializable Isolation for Snapshot Databases. Ph.D. thesis, University of Sydney, 2009.
- T. Kyte. On Transaction Isolation Levels. Oracle Magazine, November/December 2005.
- Faleiro, J. M, and Abadi, D., J. Rethinking serializable multiversion concurrency control. VLDB, 2015.