This blog post is based on a talk that I have given in the Ruby on Rails User Group.
Who knows Transactions?
Who knows really well what transactions are?
What is a Transaction?
Just common sense what transactions mean:
- We open an transaction (implicitely or explicitely)
- We perform a certain number of operations (for example SQL-statements)
- We decide in the end to either „roll-back“ or „commit“
- Roll-back restores the state before the transaction
- Commit accepts the changes for good
- So, that’s it, we are done…
- Just scrap these stupid confusing advanced stuff, do it simple…
- In the simple world, only one DB-user at a time exists
- Then we are really almost done..
- In a complicated world we have parallel activities taking place:
- Multiple threads, processes or external database clients simultanously using the database
Complications of Complication
- What happens now?
- When things get parallel, everybody is good in doing that, but it usually does not work.
- So DB-developers are really good, because databases do actually sometimes work quite well, even under load…
- What happens with incomplete transactions?
- What happens with complete transactions during one complicated read query?
- Other clients should see an transaction either completely or not at all.
- Is that enough?
- Think of an auditorium for a boring lecture: 90 min, people continuously leaving the room (so boring), but at the end it is still quite full.
- How many people are in the room?
- Counting is hard, because more people leave during the count.
Snapshot too old
- Oracle has solved that problem of counting the students and similar ones…
- The SELECT-statement gets for its duration a consistent data content, based on what was in the DB when the statement started.
- Transactions that have been committed successfully after starting the SELECT are not taken into account
- This is called snapshot. Long SELECTs that work on data that is actively changed sometimes fail with „Snapshot too old“
Now the hardcore stuff
- Read phenomena
- Transaction Isolation Levels
Read Phenomena: Dirty Read
- „Dirty read“ refers to uncommitted data being available for reading by other transactions. Sounds cool, right? Intuitive?
- Interesting question: can you read your own uncommitted data within the running transaction?
Read Phenomena: Non-repeatable Read
- „Non-repeatable read“ refers to a reading operations on the same piece of data that happen during the same transaction (or SELECT statement in the Oracle world) and give different results.
- With Oracle this should not happen, we get snapshot too old instead.
Read Phenomena: Phantom reads
- Data records are the same
- But the set of records may change
Isolation Level: Read Uncommited
- Allows „dirty Read“
- I do not know of any case where this is done
- You can configure it and may get „better“ than that
Isolation Level: Read Committed
- Read data only from completed transaction
- Non-repeatable read is possible
- Default for many databases, like PostgreSQL
Isolation Level: Repeatable Reads
- During one transaction data does not change
- Phantom reads are still possible
- Default for many databases, for example Oracle
Isolation Level: Serializable
- Pretend all transactions are queued and only one at a time is happening
- No phantom reads possible
- They have some default
- But may always be „better“ than that
- So mixtures are possible
- For example Oracle has „read-only“ as isolation level…
Locking vs. Multiversion
- Two mechanisms:
- Locking of parts of the data
- Multiversion: multiple copies (maybe copy on write?) around, for each transaction… Have fun when merging. 😉
Different DB products
- Different DB products use the freedom to interpret the standard quite liberally
- Oracle is the standard, they only pretend to care about SQL-standard…
- Make a wise decision which DB to support with your software in the beginning
- Remember: some commercial DB vendors might not have any customers, they just have hostages
ACIDs in chemistry are substances with a pH-value < 7.
Here acronym for