1. Introduction
Concurrency control is a fundamental concept in database systems, especially in distributed and multi-user environments. It refers to the methods and mechanisms that ensure the consistency and correctness of data when multiple transactions access and modify the same data concurrently. Concurrency control is essential for maintaining the ACID properties of transactions, which are atomicity, consistency, isolation, and durability.
There are different concurrency control techniques and concurrency control protocols that can be used to achieve concurrency control. Concurrency control techniques are the general methods or approaches that can be applied to prevent or resolve conflicts among concurrent transactions. Concurrency control protocols are the specific rules or algorithms that implement a concurrency control technique. For example, locking is a concurrency control technique that uses locks to restrict access to data items, and two-phase locking is a concurrency control protocol that defines how locks are acquired and released by transactions.
In this blog, you will learn about the different concurrency control techniques and protocols, such as locking, timestamping, and validation. You will also learn about the advantages and disadvantages of each technique and protocol, and how to compare and evaluate them. By the end of this blog, you will have a solid understanding of concurrency control and how to apply it in your database systems.
2. Concurrency Control Techniques
In this section, you will learn about the three main concurrency control techniques that are used to prevent or resolve conflicts among concurrent transactions. These techniques are locking, timestamping, and validation. Each technique has its own advantages and disadvantages, and can be implemented using different concurrency control protocols.
Locking is a concurrency control technique that uses locks to restrict access to data items. A lock is a mechanism that grants or denies permission to a transaction to read or write a data item. There are two types of locks: shared locks and exclusive locks. A shared lock allows multiple transactions to read the same data item, but prevents any transaction from writing it. An exclusive lock allows only one transaction to read or write the data item, and prevents any other transaction from accessing it. Locking ensures that no two transactions can modify the same data item at the same time, and that no transaction can read a data item that is being modified by another transaction.
Timestamping is a concurrency control technique that uses timestamps to order the execution of transactions. A timestamp is a unique identifier that represents the logical start time of a transaction. Each transaction is assigned a timestamp when it begins, and each data item has two timestamps: a read timestamp and a write timestamp. The read timestamp records the timestamp of the last transaction that read the data item, and the write timestamp records the timestamp of the last transaction that wrote the data item. Timestamping ensures that the transactions are executed in a serializable order, which means that the result of concurrent execution is equivalent to some serial execution of the transactions.
Validation is a concurrency control technique that uses validation tests to check the consistency of transactions. A validation test is a procedure that verifies whether a transaction has violated any consistency rules or constraints. Validation tests are performed at the end of each transaction, before it commits. If the validation test fails, the transaction is aborted and rolled back. Otherwise, the transaction is committed and its changes are made permanent. Validation ensures that only consistent transactions are allowed to commit, and that any conflicts are detected and resolved before they affect the database.
2.1. Locking
Locking is one of the most widely used concurrency control techniques in database systems. It is based on the idea of restricting access to data items by using locks. A lock is a mechanism that grants or denies permission to a transaction to read or write a data item. By using locks, locking ensures that no two transactions can modify the same data item at the same time, and that no transaction can read a data item that is being modified by another transaction.
There are two types of locks: shared locks and exclusive locks. A shared lock (denoted by S) allows multiple transactions to read the same data item, but prevents any transaction from writing it. An exclusive lock (denoted by X) allows only one transaction to read or write the data item, and prevents any other transaction from accessing it. A transaction can acquire a lock on a data item before it accesses it, and release the lock after it finishes using it. A transaction can also upgrade a shared lock to an exclusive lock, or downgrade an exclusive lock to a shared lock, if needed.
The following table summarizes the compatibility of different types of locks:
S | X | |
---|---|---|
S | Compatible | Incompatible |
X | Incompatible | Incompatible |
As you can see, two shared locks are compatible, meaning that two transactions can hold shared locks on the same data item at the same time. However, an exclusive lock is incompatible with any other lock, meaning that only one transaction can hold an exclusive lock on a data item at any time.
Locking can prevent some common concurrency problems, such as lost updates, uncommitted dependencies, and inconsistent analysis. However, locking can also cause some issues, such as deadlocks, starvation, and reduced concurrency. In the next section, you will learn about one of the concurrency control protocols that uses locking to achieve serializability and avoid these issues: the two-phase locking protocol.
2.2. Timestamping
Timestamping is another concurrency control technique that uses timestamps to order the execution of transactions. A timestamp is a unique identifier that represents the logical start time of a transaction. Each transaction is assigned a timestamp when it begins, and each data item has two timestamps: a read timestamp and a write timestamp. The read timestamp records the timestamp of the last transaction that read the data item, and the write timestamp records the timestamp of the last transaction that wrote the data item. Timestamping ensures that the transactions are executed in a serializable order, which means that the result of concurrent execution is equivalent to some serial execution of the transactions.
Timestamping works by comparing the timestamps of transactions and data items, and applying some rules to determine whether a transaction can access a data item or not. There are two main types of timestamping rules: basic timestamping and Thomas’ write rule. Basic timestamping applies the following rules:
- If a transaction T wants to read a data item Q, it can do so only if its timestamp is greater than or equal to the write timestamp of Q. This means that T can read the latest value of Q, and no other transaction has written Q after T started. If this condition is not satisfied, T is aborted and restarted with a new timestamp.
- If a transaction T wants to write a data item Q, it can do so only if its timestamp is greater than both the read timestamp and the write timestamp of Q. This means that T can write a new value to Q, and no other transaction has read or written Q after T started. If this condition is not satisfied, T is aborted and restarted with a new timestamp.
Thomas’ write rule is a variation of basic timestamping that allows more concurrency by ignoring some write operations that do not affect the final outcome. Thomas’ write rule applies the following rules:
- If a transaction T wants to read a data item Q, it can do so only if its timestamp is greater than or equal to the write timestamp of Q. This is the same as basic timestamping.
- If a transaction T wants to write a data item Q, it can do so only if its timestamp is greater than the read timestamp of Q. This means that T can write a new value to Q, and no other transaction has read Q after T started. However, if T’s timestamp is less than or equal to the write timestamp of Q, T’s write operation is ignored. This means that T’s write is overwritten by a later transaction, and does not affect the final value of Q.
Timestamping can prevent some common concurrency problems, such as lost updates, uncommitted dependencies, and inconsistent analysis. However, timestamping can also cause some issues, such as cascading aborts, wasted work, and reduced concurrency. In the next section, you will learn about another concurrency control technique that uses validation tests to check the consistency of transactions: the validation technique.
2.3. Validation
Validation is a concurrency control technique that uses validation tests to check the consistency of transactions. A validation test is a procedure that verifies whether a transaction has violated any consistency rules or constraints. Validation tests are performed at the end of each transaction, before it commits. If the validation test fails, the transaction is aborted and rolled back. Otherwise, the transaction is committed and its changes are made permanent. Validation ensures that only consistent transactions are allowed to commit, and that any conflicts are detected and resolved before they affect the database.
There are different types of validation tests, such as serializability tests, precedence graph tests, and commit order tests. Each type of test has its own criteria and algorithm for checking the consistency of transactions. For example, a serializability test checks whether the concurrent execution of transactions is equivalent to some serial execution of the same transactions. A precedence graph test checks whether the concurrent execution of transactions preserves the order of conflicting operations. A commit order test checks whether the commit order of transactions is consistent with their timestamp order.
Validation is a concurrency control technique that has some advantages and disadvantages compared to other techniques. One advantage of validation is that it does not require locking or timestamping, which can reduce the overhead and complexity of concurrency control. Another advantage is that it allows more concurrency, as transactions can execute without waiting for locks or timestamps. However, one disadvantage of validation is that it may cause more aborts and rollbacks, as transactions may fail the validation test after executing. Another disadvantage is that it may require more processing and memory, as transactions need to store their read and write sets for validation.
3. Concurrency Control Protocols
In this section, you will learn about the different concurrency control protocols that use the concurrency control techniques of locking, timestamping, and validation to achieve serializability and avoid concurrency problems. Concurrency control protocols are the specific rules or algorithms that implement a concurrency control technique. For example, two-phase locking is a concurrency control protocol that uses locking to ensure serializability and prevent conflicts among transactions.
There are many concurrency control protocols that have been proposed and used in database systems. Some of the most common and important ones are:
- Two-Phase Locking Protocol: This protocol uses locking to ensure serializability and prevent conflicts among transactions. It requires that each transaction follows two phases: a growing phase and a shrinking phase. In the growing phase, the transaction can acquire locks on data items, but cannot release any lock. In the shrinking phase, the transaction can release locks on data items, but cannot acquire any new lock. The point where the transaction switches from the growing phase to the shrinking phase is called the lock point. The two-phase locking protocol guarantees that any schedule of transactions that follow this protocol is serializable.
- Timestamp Ordering Protocol: This protocol uses timestamping to ensure serializability and order the execution of transactions. It assigns a unique timestamp to each transaction when it begins, and uses the timestamp to determine the precedence of transactions. It also uses the basic timestamping or Thomas’ write rule to decide whether a transaction can read or write a data item or not. The timestamp ordering protocol guarantees that any schedule of transactions that follow this protocol is serializable and conflict-equivalent to the schedule that orders the transactions by their timestamps.
- Optimistic Concurrency Control Protocol: This protocol uses validation to ensure serializability and detect conflicts among transactions. It assumes that conflicts are rare and allows transactions to execute without any locking or timestamping. It divides the execution of each transaction into three phases: a read phase, a validation phase, and a write phase. In the read phase, the transaction reads the data items from the database and stores them in a private workspace. In the validation phase, the transaction performs a validation test to check whether it has any conflicts with other transactions. If the validation test fails, the transaction is aborted and restarted. If the validation test succeeds, the transaction is committed and proceeds to the write phase. In the write phase, the transaction writes the data items from its private workspace to the database. The optimistic concurrency control protocol guarantees that any schedule of transactions that follow this protocol is serializable.
These are some of the most widely used concurrency control protocols in database systems. However, there are also other protocols that have been developed for specific purposes or scenarios, such as snapshot isolation, multiversion concurrency control, and distributed concurrency control. In the next section, you will learn how to compare and evaluate the different concurrency control techniques and protocols, and how to choose the best one for your database system.
3.1. Two-Phase Locking Protocol
Two-phase locking protocol is a concurrency control protocol that implements the locking technique. It defines how transactions acquire and release locks on data items, and ensures that the transactions are serializable. The protocol consists of two phases: the growing phase and the shrinking phase.
In the growing phase, a transaction can acquire locks on data items, but cannot release any lock. The transaction can request either a shared lock or an exclusive lock on a data item, depending on whether it wants to read or write the data item. If the lock is available, the transaction gets the lock and proceeds. If the lock is not available, the transaction has to wait until the lock is released by another transaction. The growing phase ends when the transaction acquires all the locks it needs.
In the shrinking phase, a transaction can release locks on data items, but cannot acquire any new lock. The transaction can release a lock on a data item after it has read or written the data item. The transaction can also upgrade a shared lock to an exclusive lock, or downgrade an exclusive lock to a shared lock, if it needs to change the mode of access to a data item. The shrinking phase ends when the transaction releases all the locks it has.
Two-phase locking protocol guarantees serializability, because it prevents any cycle in the precedence graph of transactions. A cycle in the precedence graph means that there is a circular dependency among transactions, which violates serializability. Two-phase locking protocol avoids cycles by ensuring that no transaction can acquire a lock on a data item that has been locked by another transaction that is waiting for a lock held by the first transaction.
3.2. Timestamp Ordering Protocol
Timestamp ordering protocol is a concurrency control protocol that implements the timestamping technique. It defines how transactions use timestamps to order their operations on data items, and ensures that the transactions are serializable. The protocol consists of two rules: the read rule and the write rule.
The read rule states that a transaction can read a data item only if its timestamp is greater than or equal to the write timestamp of the data item. This means that the transaction can read the latest version of the data item, and that it does not read a data item that has been overwritten by a later transaction. If the read rule is violated, the transaction is aborted and restarted with a new timestamp.
The write rule states that a transaction can write a data item only if its timestamp is greater than both the read timestamp and the write timestamp of the data item. This means that the transaction can write a new version of the data item, and that it does not overwrite a data item that has been read or written by a later transaction. If the write rule is violated, the write operation is rejected and the transaction continues without writing the data item.
Timestamp ordering protocol guarantees serializability, because it ensures that the transactions are executed in a serializable order, which is the same as their timestamp order. The protocol also avoids any cycle in the precedence graph of transactions, because the timestamps are unique and strictly increasing.
3.3. Optimistic Concurrency Control Protocol
Optimistic concurrency control protocol is a concurrency control protocol that implements the validation technique. It defines how transactions use validation tests to check the consistency of their operations on data items, and ensures that the transactions are serializable. The protocol consists of three phases: the read phase, the validation phase, and the write phase.
In the read phase, a transaction can read and write data items from a local copy of the database, without acquiring any locks or timestamps. The transaction also records its read set and write set, which are the sets of data items that the transaction has read and written, respectively. The read phase ends when the transaction has performed all its operations.
In the validation phase, a transaction can perform a validation test to check whether it has violated any consistency rules or constraints. The validation test compares the transaction’s read set and write set with the read and write sets of other transactions that have committed or are in the validation phase. The validation test can use different criteria and algorithms, such as serializability, precedence graph, or commit order. If the validation test succeeds, the transaction can proceed to the write phase. If the validation test fails, the transaction is aborted and restarted with a new read phase.
In the write phase, a transaction can write its changes to the actual database, and make them permanent. The transaction also updates the read and write timestamps of the data items that it has written. The write phase ends when the transaction commits and releases its read set and write set.
Optimistic concurrency control protocol guarantees serializability, because it ensures that the transactions are executed in a serializable order, which is the same as their commit order. The protocol also avoids any cycle in the precedence graph of transactions, because the validation test detects and resolves any conflicts before they affect the database.
4. Comparison and Evaluation of Concurrency Control Techniques and Protocols
In this section, you will compare and evaluate the different concurrency control techniques and protocols that you have learned in the previous sections. You will also learn about the factors that affect the performance and suitability of each technique and protocol, such as the degree of concurrency, the overhead cost, the conflict rate, and the application scenario.
The degree of concurrency is a measure of how many transactions can execute concurrently without causing any conflicts or inconsistencies. A higher degree of concurrency means that more transactions can access and modify the data simultaneously, which improves the throughput and efficiency of the database system. However, a higher degree of concurrency also increases the risk of conflicts and inconsistencies, which may require more resources and time to resolve. Therefore, the degree of concurrency depends on the trade-off between the benefits and the costs of concurrency control.
The overhead cost is a measure of how much resources and time are consumed by the concurrency control technique or protocol. The overhead cost includes the cost of acquiring and releasing locks, generating and comparing timestamps, performing validation tests, aborting and restarting transactions, and maintaining data structures and logs. A lower overhead cost means that the concurrency control technique or protocol is more efficient and scalable, as it consumes less resources and time. However, a lower overhead cost may also imply a lower degree of concurrency, as the concurrency control technique or protocol may be more restrictive or conservative.
The conflict rate is a measure of how often conflicts occur among concurrent transactions. A conflict is a situation where two or more transactions attempt to access or modify the same data item in an incompatible way, such as a read-write conflict or a write-write conflict. A higher conflict rate means that the concurrency control technique or protocol is more prone to conflicts, which may degrade the performance and correctness of the database system. However, a higher conflict rate may also indicate a higher degree of concurrency, as the concurrency control technique or protocol may allow more transactions to execute concurrently.
The application scenario is a factor that determines the suitability and appropriateness of the concurrency control technique or protocol for a specific database system or application. The application scenario depends on the characteristics and requirements of the database system or application, such as the size and structure of the data, the frequency and type of transactions, the level of consistency and isolation, and the expected workload and performance. The application scenario may favor one concurrency control technique or protocol over another, depending on the trade-offs and compromises that are acceptable and desirable.
Based on these factors, you can compare and evaluate the concurrency control techniques and protocols as follows:
- Locking is a concurrency control technique that offers a high degree of concurrency and a low conflict rate, as it prevents any conflicting transactions from accessing or modifying the same data item. However, locking also has a high overhead cost, as it requires acquiring and releasing locks, maintaining lock tables, and handling deadlocks and starvation. Locking is suitable for applications that have a high update rate and a low read rate, as locking ensures the consistency and isolation of updates, but may cause blocking and waiting for reads.
- Timestamping is a concurrency control technique that offers a low overhead cost and a high conflict rate, as it does not require any locks or validation tests, but allows any conflicting transactions to access or modify the same data item. However, timestamping also has a low degree of concurrency, as it aborts and restarts any transactions that violate the serializability order, which may cause cascading aborts and wasted work. Timestamping is suitable for applications that have a low update rate and a high read rate, as timestamping avoids blocking and waiting for reads, but may cause inconsistency and isolation violations for updates.
- Validation is a concurrency control technique that offers a moderate degree of concurrency and a moderate overhead cost, as it allows any transactions to access or modify the same data item, but performs validation tests before committing them. However, validation also has a moderate conflict rate, as it may abort and restart any transactions that fail the validation tests, which may cause some wasted work and delays. Validation is suitable for applications that have a balanced update rate and read rate, as validation ensures the consistency and isolation of both reads and updates, but may cause some blocking and waiting for both.
Similarly, you can compare and evaluate the concurrency control protocols that implement each concurrency control technique, such as two-phase locking, timestamp ordering, and optimistic concurrency control. You can use the same factors and criteria to assess the performance and suitability of each concurrency control protocol, and also consider the variations and extensions of each protocol, such as strict two-phase locking, multiversion timestamp ordering, and snapshot isolation.
By comparing and evaluating the concurrency control techniques and protocols, you can choose the best one for your database system or application, depending on your specific needs and preferences. You can also combine or hybridize different concurrency control techniques and protocols, to achieve a better balance and trade-off among the factors.
5. Conclusion
In this blog, you have learned about the different concurrency control techniques and protocols that are used to ensure the consistency and correctness of data when multiple transactions access and modify the same data concurrently. You have also learned how to compare and evaluate the performance and suitability of each technique and protocol, depending on the factors such as the degree of concurrency, the overhead cost, the conflict rate, and the application scenario.
Concurrency control is a fundamental and challenging concept in database systems, especially in distributed and multi-user environments. It requires a careful balance and trade-off among the benefits and costs of allowing concurrent transactions to execute. There is no single best concurrency control technique or protocol that works for all situations and applications. Therefore, you need to understand the characteristics and requirements of your database system or application, and choose the most appropriate concurrency control technique or protocol that meets your needs and preferences.
We hope that this blog has helped you gain a solid understanding of concurrency control and how to apply it in your database systems. If you have any questions or feedback, please feel free to leave a comment below. Thank you for reading and happy learning!