Monday, 13 October 2025

Understanding ACID Properties in Database Systems

In database systems, maintaining the correctness and reliability of data during transactions is essential. The ACID properties (Atomicity, Consistency, Isolation, and Durability) define a set of standards that ensure transactions are processed accurately and reliably. This blog post explains each of these properties with clear definitions and examples to help you understand how they contribute to data integrity and system reliability in transactional databases.

 

1. Atomicity

Atomicity refers to the "all-or-nothing" nature of a database transaction. In simple terms, this means that a transaction must be fully completed or not executed at all. If any part of the transaction fails, the entire operation is rolled back, and the database remains unchanged.

 

This property ensures that there is no partial execution of a transaction. It prevents scenarios where only some operations in a transaction are applied, leaving the database in an inconsistent or incomplete state.

 

1.1 Why is Atomicity important?

In real-world applications, a transaction often involves multiple steps or operations that are logically connected. These steps must be treated as a single unit of work. If one step fails, none of the steps should take effect. This guarantees that the system's state remains valid and predictable.

 

Consider a simple fund transfer between two bank accounts A and B:

 

·      Debit 5,000 from Account A.

·      Credit 5,000 to Account B.

 

These two operations must occur together. If the system debits Account A but fails to credit Account B (due to a crash, network issue, or any error), the customer would lose money, and the bank records would be incorrect. With atomicity, such a transaction would be rolled back entirely if both operations cannot be completed successfully.

 

1.2 How is Atomicity Achieved in Database Systems?

database systems use three core mechanisms to achieve Atomicity.

 

·      Transaction Logs

·      Rollback

·      Commit

 

Let's understand this with a successful money transfer from Account A to Account B.

 

Let’s say a customer initiates a transaction to transfer 5,000 from Account A to Account B using a banking app. This transfer involves two critical updates.

 

·      Debit 5,000 from Account A

·      Credit 5,000 to Account B

 

Step 1: Transaction Begins

The database system starts a new transaction and assigns it a unique transaction ID (e.g., TX1234). This marks the beginning of the atomic unit of work. Nothing is changed in the actual database yet.

 

Step 2: Validate the Operations

The database performs preliminary validations to avoid logging unnecessary or invalid transactions.

 

It checks:

·      If Account A has sufficient balance.

·      If the accounts involved are valid and active.

·      If the debit and credit operations are logically sound.

·      If the transaction respects all integrity constraints.

 

Step 3: Write-Ahead Logging (Transaction Log)

Before any change is written to the database, the system writes detailed information about the transaction to a transaction log.

 

This includes:

·      Transaction ID (TX1234)

·      Operation types (DEBIT, CREDIT)

·      Before and after values

·      Current status (PENDING)

 

This is known as Write-Ahead Logging (WAL), means the changes must be logged before they are applied. This allows the system to recover if a failure occurs. For high reliability, some databases replicate this log to another disk or standby server. This protects against disk crashes and ensures durability.

 

Step 4: Execute the Operations

After logging, the debit and credit operations are carried out in memory, no actual disk operation occurs yet.

 

Step 5: Commit

If all operations succeed:

 

·      The changes are written from memory to the actual database (on disk).

·      A COMMIT record is added to the transaction log. The transaction is marked as successfully completed.

 

At this point, the updates are permanent and visible to other users. The atomicity principle is honored, both debit and credit happened together as a single unit.

 

2. Consistent

Consistency means that a database must always transition from one valid state to another valid state after any transaction completes. The data should never end up in a state that violates the defined rules, constraints, or logical relationships of the database.

 

2.1 What Does Consistency Really Mean?

Imagine the database as a system that follows strict rules about how data relates to each other. These rules include:

 

·      Data integrity constraints like primary keys, foreign keys, and unique constraints.

·      Business rules such as "an order must always belong to an existing customer."

·      Validation rules ensuring data formats, ranges, or mandatory fields are correct.

 

A consistent database means every transaction preserves all these rules.

 

Example: Deleting a Customer and Their Orders

Suppose you have two tables:

 

·      Customers: stores customer information.

·      Orders: stores all orders placed by customers, linked by a foreign key to the Customers table.

 

If a transaction deletes a customer record, consistency requires that:

·      All their orders must also be deleted (cascading delete), or

·      The system must prevent the deletion if orders exist (restrict delete).

 

2.2 Why Is This Important?

If the system allowed deleting a customer but left behind orders that reference that now-nonexistent customer, the database would enter an inconsistent state:

 

Orders would have foreign keys pointing to a missing customer.

Applications reading these orders could crash or behave unpredictably.

 

2.3 How Is Consistency Enforced?

Database systems use several mechanisms to ensure consistency:

 

·      Constraints: Foreign keys, unique keys, check constraints.

·      Triggers: Automatically execute rules or actions to maintain consistency.

·      Transaction rules: The entire transaction must commit or rollback as a whole, so partial updates can’t leave data inconsistent.

 

In summary, Consistency means that after every transaction, the database stays correct and follows all the rules. It makes sure the database never shows wrong or confusing information to people or programs, so the data can be trusted and is reliable.

 

3. Isolation

Isolation means that when two or more transactions happen at the same time, they don’t get mixed up with each other. Each transaction works by itself, without interfering with others.

 

For example, if two transactions try to change the same piece of data at the same time, one of them will wait until the other finishes. This way, the changes don’t cause mistakes.

 

Isolation keep the data accurate, even when many users are using the database at once.

 

3.1 Why is Isolation Important?

When many people or programs use a database at the same time, they often try to read or change the same data simultaneously. Without isolation:

 

·      Changes made by one transaction could mix up or conflict with changes from another.

·      This can cause errors, like incorrect balances, lost updates, or corrupted data.

 

For example, if two people try to withdraw money from the same bank account at the same time, without isolation, both might see the old balance and withdraw too much.

 

Isolation keeps each transaction separate and protected, so they don’t interfere with each other. This helps the database stay accurate and reliable even when many users work at once.

 

3.2 How is Isolation achieved?

Databases use different techniques like Locking, Transaction Scheduling, Isolation Levels etc., to keep transactions isolated.

 

3.2.1 Locking

When a transaction wants to read or write data, the database locks that data so others can’t change it at the same time.

 

For example, if Transaction A locks a customer’s account to update the balance, Transaction B must wait until Transaction A finishes.

 

Locks can be:

·      Shared locks for reading (many can read at once).

·      Exclusive locks for writing (only one can write at a time).

 

3.2.2 Transaction Scheduling

The database controls the order in which transactions run, making sure they don’t overlap in ways that cause conflicts. It may delay some transactions so that each one appears to run alone, even if they actually run at the same time.

 

3.2.3 Isolation levels

Isolation levels decide how strictly transactions are kept separate from each other when they run at the same time. The choice of level affects both the accuracy of data and the speed of the system.

 

Higher isolation levels keep transactions very separate and avoid almost all conflicts, but they can slow things down because transactions might have to wait longer.

 

Lower isolation levels let transactions overlap more, which can speed things up, but may sometimes allow incorrect or confusing data to appear temporarily.

 

Following are the some common isolation levels:

·      Read Uncommitted

·      Read Committed

·      Repeatable Read

·      Serializable

 

Read Uncommitted

Transactions can see uncommitted changes made by other transactions ("dirty reads") No locking of read data. It offers the highest performance but lowest consistency

 

It is rarely used in practice except for approximate analytics where absolute accuracy isn't critical

 

Read Committed

Transactions can only see committed changes (no dirty reads). It internally use row level locks for writes. Each read operation sees only data committed before that specific read. Following are possible:

 

·      Non-repeatable reads (a row read twice in same transaction may differ if another transaction committed changes). For example, you read an account balance twice during a transaction and see different amounts because another transaction updated it in between.

 

·      Phantom reads (new rows may appear in subsequent reads)

 

Repeatable Read

The Repeatable Read isolation level ensures that if a transaction reads a row once, it will see the same data if it reads that row again later during the same transaction. It uses locks to prevent other transactions from modifying data that has been read.

 

It allows Phantom reads, like if a transaction runs a query like SELECT * FROM orders WHERE amount > 100, and another transaction inserts a new matching row before the first one commits, the new row can appear in repeated queries.

 

Serializable (Highest Isolation)

Transactions run so strictly that it looks like they run one after another, not at the same time. It prevents all types of conflicts, including dirty reads, non-repeatable reads, and phantom reads. It is most secure but slowest, because transactions often wait for others to finish.

 

It is used when absolute accuracy is required, like in banking systems.

 

 

4. Durability

Durability means that once a transaction is successfully completed (committed), its changes are permanent, they will not be lost, even if the system crashes right after.

·      The data is safely stored on disk.

·      It will still be there if the server restarts, power goes out, or the database crashes.

 

Following core mechanisms helps to ensure Durability.

 

Write-Ahead Logging (WAL):  Before applying any changes to the actual database, the database writes a log record describing the changes. In general, this log record is replicated accross multiple machines to ensure durability.

 

Checkpointing: A checkpoint is a snapshot operation that synchronizes in-memory data with disk storage.

 

In Distributed systems, they have Quorum-Based approach, where the durability is achieved via replication. Here a write is only considered successful if multiple nodes acknowledge it.

 

                                                                                System Design Questions

No comments:

Post a Comment