Multiversion Concurrency Control in Databases

This blog explains the concept of multiversion concurrency control in databases, and compares two popular algorithms: multiversion timestamp ordering and multiversion two-phase locking.

Table of Contents

1. Introduction

Concurrency control is a crucial aspect of database systems, as it ensures the consistency and correctness of data in the presence of multiple concurrent transactions. However, concurrency control also introduces challenges, such as locking, blocking, and deadlocks, that can affect the performance and availability of the system.

One way to overcome these challenges is to use multiversion concurrency control, a technique that allows transactions to access different versions of data items without locking or blocking each other. This way, transactions can execute concurrently without interfering with each other, and the system can achieve higher throughput and scalability.

In this blog, you will learn about the concept of multiversion concurrency control and how it works. You will also learn about two popular algorithms that implement multiversion concurrency control: multiversion timestamp ordering and multiversion two-phase locking. You will see the advantages and disadvantages of each algorithm and how they compare with each other.

By the end of this blog, you will have a solid understanding of multiversion concurrency control and how to use it in your database applications.

2. What is Multiversion Concurrency Control?

Multiversion concurrency control (MCC) is a technique that allows multiple transactions to access the same data item without locking or blocking each other. MCC achieves this by maintaining multiple versions of each data item, each with a unique timestamp. Transactions can read the most recent version of a data item that is compatible with their timestamp, and write new versions of a data item with their own timestamp.

MCC has several benefits over traditional locking-based concurrency control methods, such as:

It reduces the number of conflicts and aborts, as transactions can read older versions of data items without waiting for locks to be released.
It improves the performance and scalability of the system, as transactions can execute concurrently without blocking each other.
It preserves the serializability and consistency of the database, as transactions are ordered according to their timestamps and conflicts are detected and resolved.

However, MCC also has some drawbacks, such as:

It requires more storage space and overhead, as multiple versions of data items have to be stored and managed.
It introduces the problem of garbage collection, as obsolete versions of data items have to be removed periodically to free up space and improve efficiency.
It depends on the accuracy and synchronization of the timestamps, as incorrect or inconsistent timestamps can lead to incorrect results or violations of serializability.

In the next sections, you will learn about two popular algorithms that implement MCC: multiversion timestamp ordering and multiversion two-phase locking. You will see how they work, what are their advantages and disadvantages, and how they compare with each other.

3. How Multiversion Concurrency Control Works

Multiversion concurrency control (MCC) works by creating and maintaining multiple versions of each data item in the database. Each version has a unique timestamp that indicates when it was created or modified by a transaction. Transactions can access different versions of data items depending on their timestamps and the concurrency control algorithm used.

To implement MCC, the system needs to perform three main tasks:

Versioning data items: The system needs to create new versions of data items when transactions write to them, and store them in a suitable data structure, such as a version list or a version tree.
Assigning timestamps: The system needs to assign timestamps to transactions and data items, and ensure that they are accurate and synchronized. Timestamps can be either logical or physical, depending on the algorithm used.
Checking conflicts: The system needs to check for conflicts between transactions and data items, and resolve them according to the algorithm used. Conflicts can occur when transactions try to read or write data items that have incompatible timestamps.

In the following sections, you will learn about two algorithms that use MCC to achieve serializability and consistency: multiversion timestamp ordering and multiversion two-phase locking. You will see how they perform the three tasks mentioned above, and what are their advantages and disadvantages.

3.1. Versioning Data Items

The first task of multiversion concurrency control (MCC) is to create and maintain multiple versions of each data item in the database. A data item is any unit of data that can be accessed or modified by a transaction, such as a record, a field, or a page. A version of a data item is a copy of the data item with a unique timestamp that indicates when it was created or modified by a transaction.

There are two main ways to create new versions of data items when transactions write to them:

Copy-on-write: The system copies the old version of the data item to a new location and updates the original location with the new version. This way, the old version is preserved and can be accessed by other transactions.
Append-only: The system appends the new version of the data item to the end of a file or a table and keeps the old version in its original location. This way, both versions are available and can be accessed by other transactions.

There are also different ways to store and manage the versions of data items, such as:

Version list: The system maintains a linked list of versions for each data item, ordered by their timestamps. The head of the list is the most recent version, and the tail of the list is the oldest version. The system can traverse the list to find the appropriate version for a transaction.
Version tree: The system maintains a tree of versions for each data item, where each node represents a version and each edge represents a write operation. The root of the tree is the initial version, and the leaves of the tree are the most recent versions. The system can search the tree to find the appropriate version for a transaction.

In the next section, you will learn how the system assigns timestamps to transactions and data items, and how they are used to order and synchronize the transactions.

3.2. Assigning Timestamps

The second task of multiversion concurrency control (MCC) is to assign timestamps to transactions and data items, and use them to order and synchronize the transactions. A timestamp is a unique identifier that indicates the logical or physical time when a transaction or a data item was created or modified. Timestamps are used to determine the precedence and compatibility of transactions and data items, and to ensure the serializability and consistency of the database.

There are two main types of timestamps that can be used in MCC:

Logical timestamps: Logical timestamps are abstract values that represent the logical order of transactions and data items, without referring to the actual clock time. Logical timestamps can be generated by using counters, vectors, or other data structures that can capture the causal and concurrent relationships between transactions and data items.
Physical timestamps: Physical timestamps are real values that represent the actual clock time when transactions and data items were created or modified. Physical timestamps can be obtained by using the system clock, the network clock, or other sources of time synchronization that can provide accurate and consistent timestamps across the system.

Depending on the type of timestamp used, the system can use different algorithms to assign timestamps to transactions and data items, such as:

Lamport’s algorithm: Lamport’s algorithm is a logical timestamping algorithm that uses counters to generate timestamps. Each transaction and data item has a counter that is incremented by one whenever a write operation occurs. The timestamp of a transaction or a data item is the value of its counter at the time of the write operation.
Bernstein’s algorithm: Bernstein’s algorithm is a physical timestamping algorithm that uses the system clock to generate timestamps. Each transaction and data item has a timestamp that is obtained from the system clock at the time of the write operation. The timestamp of a transaction or a data item is the value of the system clock at the time of the write operation.

In the next section, you will learn how the system checks for conflicts between transactions and data items, and how they are resolved using timestamps.

3.3. Checking Conflicts

One of the main challenges of multiversion concurrency control is to detect and resolve conflicts between transactions that access the same data item. A conflict occurs when a transaction tries to read or write a data item that has been modified by another transaction with a higher timestamp. Depending on the type of conflict, the transaction may have to abort, retry, or wait.

There are two types of conflicts that can occur in multiversion concurrency control: read-write conflicts and write-write conflicts.

A read-write conflict occurs when a transaction tries to read a data item that has been overwritten by another transaction with a higher timestamp. For example, suppose transaction T1 reads data item X with timestamp 10, and then transaction T2 writes a new version of X with timestamp 20. If T1 tries to read X again, it will encounter a read-write conflict, as the version it read before is no longer valid. To resolve this conflict, T1 can either abort and restart with a new timestamp, or wait until T2 commits and then read the latest version of X.

A write-write conflict occurs when a transaction tries to write a data item that has been written by another transaction with a higher timestamp. For example, suppose transaction T1 writes a new version of data item X with timestamp 10, and then transaction T2 writes another version of X with timestamp 20. If T1 tries to write X again, it will encounter a write-write conflict, as the version it wrote before is no longer valid. To resolve this conflict, T1 has to abort and restart with a new timestamp, as it cannot overwrite the version written by T2.

The algorithms that implement multiversion concurrency control have different ways of checking and resolving conflicts. In the next sections, you will learn about two of them: multiversion timestamp ordering and multiversion two-phase locking.

4. Multiversion Timestamp Ordering

Multiversion timestamp ordering (MVTO) is an algorithm that implements multiversion concurrency control by using timestamps to order transactions and versions of data items. MVTO ensures that transactions execute in a serializable order that is consistent with their timestamps, and that conflicts are detected and resolved accordingly.

MVTO assigns two types of timestamps to each transaction: a start timestamp and a commit timestamp. The start timestamp is assigned when the transaction begins, and it represents the logical time at which the transaction starts. The commit timestamp is assigned when the transaction commits, and it represents the logical time at which the transaction finishes. The commit timestamp is always greater than or equal to the start timestamp, and it is also greater than the commit timestamps of any transactions that have read or written any data items that the transaction has written.

MVTO also assigns two types of timestamps to each version of a data item: a write timestamp and a read timestamp. The write timestamp is the commit timestamp of the transaction that created the version, and it represents the logical time at which the version was written. The read timestamp is the maximum of the commit timestamps of all the transactions that have read the version, and it represents the logical time at which the version was last read.

MVTO uses these timestamps to check and resolve conflicts between transactions that access the same data item. MVTO follows two rules:

Read rule: A transaction can read a version of a data item only if the write timestamp of the version is less than or equal to the start timestamp of the transaction, and the read timestamp of the version is greater than or equal to the start timestamp of the transaction. If there is no such version, the transaction has to abort and restart with a new timestamp.
Write rule: A transaction can write a new version of a data item only if the write timestamp of the latest version of the data item is less than the start timestamp of the transaction. If this is not the case, the transaction has to abort and restart with a new timestamp.

By following these rules, MVTO ensures that transactions read and write consistent versions of data items, and that conflicts are detected and resolved by aborting and restarting transactions with new timestamps.

4.1. Basic Algorithm

In this section, you will learn about the basic algorithm of multiversion timestamp ordering (MVTO). You will see how MVTO assigns timestamps to transactions and versions of data items, and how it checks and resolves conflicts using the read and write rules.

The basic algorithm of MVTO consists of the following steps:

When a transaction T starts, it is assigned a start timestamp ST(T) that is equal to the current logical time of the system. The logical time is incremented by one after each assignment.
When a transaction T reads a data item X, it searches for the latest version of X that has a write timestamp WT(X) less than or equal to ST(T). If such a version exists, T reads it and updates its read timestamp RT(X) to the maximum of RT(X) and ST(T). If no such version exists, T aborts and restarts with a new timestamp.
When a transaction T writes a new version of a data item X, it checks if the write timestamp of the latest version of X is less than ST(T). If this is the case, T writes the new version and assigns it a write timestamp WT(X) that is equal to a temporary value. If this is not the case, T aborts and restarts with a new timestamp.
When a transaction T commits, it is assigned a commit timestamp CT(T) that is equal to the maximum of ST(T) and the commit timestamps of all the transactions that have read any data items that T has written. The logical time is incremented by one after each assignment. T then updates the write timestamps of all the versions of data items that it has written to CT(T).

By following these steps, MVTO ensures that transactions execute in a serializable order that is consistent with their timestamps, and that conflicts are detected and resolved by aborting and restarting transactions with new timestamps.

4.2. Advantages and Disadvantages

Multiversion timestamp ordering (MVTO) is a popular algorithm that implements multiversion concurrency control. MVTO uses timestamps to order transactions and versions of data items, and ensures that transactions read and write consistent versions of data items. MVTO also detects and aborts conflicting transactions to preserve serializability.

MVTO has some advantages over other concurrency control methods, such as:

It avoids locking and blocking, as transactions can read the most recent compatible version of a data item without waiting for other transactions to release locks.
It allows more concurrency, as transactions can execute in parallel without interfering with each other.
It supports snapshot isolation, as transactions can see a consistent snapshot of the database at the time of their start.

However, MVTO also has some disadvantages, such as:

It requires more storage space and overhead, as multiple versions of data items have to be stored and managed.
It introduces the problem of garbage collection, as obsolete versions of data items have to be removed periodically to free up space and improve efficiency.
It depends on the accuracy and synchronization of the timestamps, as incorrect or inconsistent timestamps can lead to incorrect results or violations of serializability.
It may cause cascading aborts, as transactions that read from aborted transactions have to be aborted as well.

In the next section, you will learn about another algorithm that implements multiversion concurrency control: multiversion two-phase locking. You will see how it works, what are its advantages and disadvantages, and how it compares with MVTO.

5. Multiversion Two-Phase Locking

Multiversion two-phase locking (MV2PL) is another algorithm that implements multiversion concurrency control. MV2PL uses locks to control the access to different versions of data items, and ensures that transactions read and write consistent versions of data items. MV2PL also detects and aborts conflicting transactions to preserve serializability.

MV2PL differs from traditional two-phase locking (2PL) in two ways:

It allows transactions to read older versions of data items without acquiring shared locks, as long as the version is compatible with their timestamp.
It requires transactions to acquire exclusive locks on the latest version of a data item before writing a new version, regardless of their timestamp.

MV2PL has some advantages over other concurrency control methods, such as:

It reduces the number of read locks, as transactions can read older versions of data items without locking them.
It prevents the lost update problem, as transactions have to lock the latest version of a data item before writing a new version.
It supports snapshot isolation, as transactions can see a consistent snapshot of the database at the time of their start.

However, MV2PL also has some disadvantages, such as:

It requires more storage space and overhead, as multiple versions of data items have to be stored and managed.
It introduces the problem of garbage collection, as obsolete versions of data items have to be removed periodically to free up space and improve efficiency.
It may cause locking and blocking, as transactions have to wait for exclusive locks to be released before writing new versions of data items.
It may cause cascading aborts, as transactions that read from aborted transactions have to be aborted as well.

In the next section, you will learn about the advantages and disadvantages of MV2PL and how it compares with MVTO.

5.1. Basic Algorithm

Multiversion two-phase locking (MV2PL) is an algorithm that implements multiversion concurrency control using locks. MV2PL follows the two-phase locking protocol, which consists of two phases: the growing phase and the shrinking phase. In the growing phase, a transaction can acquire locks on data items, but cannot release them. In the shrinking phase, a transaction can release locks on data items, but cannot acquire new ones.

MV2PL differs from traditional two-phase locking (2PL) in two ways:

It allows transactions to read older versions of data items without acquiring shared locks, as long as the version is compatible with their timestamp. A version is compatible with a transaction’s timestamp if it was created before the transaction started and has not been overwritten by another transaction that committed before the transaction started.
It requires transactions to acquire exclusive locks on the latest version of a data item before writing a new version, regardless of their timestamp. This ensures that no other transaction can overwrite the latest version of a data item while a transaction is writing a new version.

The basic algorithm of MV2PL is as follows:

When a transaction T starts, it is assigned a unique timestamp ts(T).
When T wants to read a data item x, it checks the version table of x, which stores the timestamps and values of all the versions of x. T reads the most recent version of x that is compatible with its timestamp, without acquiring a shared lock on x.
When T wants to write a data item x, it acquires an exclusive lock on the latest version of x. If the lock is already held by another transaction, T waits until the lock is released. Then, T writes a new version of x with its own timestamp and value, and updates the version table of x.
When T wants to commit, it releases all the exclusive locks it holds, and updates the commit table, which stores the timestamps of all the committed transactions.

In the next section, you will learn about the advantages and disadvantages of MV2PL and how it compares with MVTO.

5.2. Advantages and Disadvantages

Multiversion two-phase locking (MV2PL) is another algorithm that implements multiversion concurrency control. MV2PL combines the idea of two-phase locking (2PL) with the idea of versioning data items. MV2PL allows transactions to read older versions of data items without acquiring locks, but requires transactions to acquire locks before writing new versions of data items.

MV2PL has some advantages over multiversion timestamp ordering (MVTO), such as:

It does not depend on the accuracy and synchronization of the timestamps, as it uses locks to order and coordinate transactions.
It does not require transactions to abort due to timestamp conflicts, as it ensures that transactions write new versions of data items in the order of their locks.
It can handle long-running transactions better, as it does not force transactions to abort if they read a version of a data item that is later overwritten by another transaction.

However, MV2PL also has some disadvantages, such as:

It requires more locking and unlocking operations, as transactions have to acquire locks before writing new versions of data items.
It introduces the possibility of deadlocks, as transactions may wait for locks held by other transactions in a circular manner.
It reduces the concurrency and performance of the system, as transactions may block each other due to locking conflicts.

In the next section, you will learn how to compare MVTO and MV2PL and choose the best algorithm for your database applications.

6. Conclusion

In this blog, you have learned about the concept of multiversion concurrency control (MCC) and how it works. You have also learned about two popular algorithms that implement MCC: multiversion timestamp ordering (MVTO) and multiversion two-phase locking (MV2PL). You have seen the advantages and disadvantages of each algorithm and how they compare with each other.

MCC is a powerful technique that allows multiple transactions to access the same data item without locking or blocking each other. MCC achieves this by maintaining multiple versions of each data item, each with a unique timestamp. Transactions can read the most recent version of a data item that is compatible with their timestamp, and write new versions of a data item with their own timestamp.

MVTO and MV2PL are two different ways of implementing MCC. MVTO uses timestamps to order and coordinate transactions, while MV2PL uses locks to order and coordinate transactions. MVTO avoids locking and blocking, but requires accurate and synchronized timestamps. MV2PL avoids timestamp conflicts, but requires locking and unlocking operations.

Both algorithms have their pros and cons, and there is no definitive answer to which one is better. The choice of the algorithm depends on the characteristics of the database system, such as the number and type of transactions, the frequency and size of updates, the availability and accuracy of timestamps, and the performance and scalability requirements.

We hope that this blog has helped you understand the concept of multiversion concurrency control and how to use it in your database applications. If you have any questions or feedback, please feel free to leave a comment below. Thank you for reading!