Start with this first: https://www.youtube.com/watch?v=bB7YAWPiJR4
Notes from videos by https://www.youtube.com/@jordanhasnolife5163
Table of Contents
Indexes
How to look up data in a database
- Hash index - Hash all keys. Fast reads and writes. Bad at range queries
- BTree - Balanced binary tree. Good for reads, range queries. Slow writes.
- LSM + SStable - Some keys in memory in LSM tree, additional in sorted immutable table (SS table) on disk. Slow reads and writes (log n time to search in SStable) but faster than BTree. Add sparse index i.e. store some keys in memory with their location in SStable to optimize reads.
- Multiple Indexes: Instead of storing data directly in the index, store a location of the data. This way for multiple indexes, the data is not duplicated.
ACID
A transaction is a collection of operations (reads and writes) that are treated as a single logical operation. Some databases like MySQL guarantee that all transactions are ACID:
- Atomicity - all or nothing. i.e. all operations in a transaction succeed or none does
- Consistency - referential integrity i.e. a row cannot be removed if it is linked to another table via a foreign key. Eg. Users table and admin table with user_id being the foreign key in admin table. If a user is removed from user table, now the row with the user_id for that user in the admin table becomes “orphan”. Consistency prevents this from happening.
- Isolation - Treat every transaction as if it’s the only one running i.e. avoid race conditions. eg. if 2 threads try to overwrite one value then lock until one is written. Dealt with in detail in next section
- Durability - data persists on disk
Isolation
Databases are multithreaded. So multiple operations are happening concurrently and operations are run in no fixed order.