This blog explains how to implement quick sort in C and how to choose a good pivot element to optimize the sorting performance.
1. Introduction
Quick sort is one of the most popular and efficient sorting algorithms. It works by dividing an array into two subarrays, and then recursively sorting each subarray. The key idea is to choose a pivot element from the array, and then move all the elements that are smaller than the pivot to the left of it, and all the elements that are larger than the pivot to the right of it. This way, the pivot element is in its final sorted position, and the subarrays on either side of it can be sorted independently.
However, the performance of quick sort depends heavily on the choice of the pivot element. If the pivot element is chosen poorly, the algorithm can take much longer than expected, and even worse than some other sorting algorithms. For example, if the pivot element is always the first or the last element of the array, and the array is already sorted or nearly sorted, then the algorithm will have to make O(n^2) comparisons, where n is the size of the array. This is because the algorithm will divide the array into very unbalanced subarrays, one of which has only one element (the pivot), and the other has almost all the elements. This defeats the purpose of quick sort, which is to achieve O(n log n) average time complexity by creating roughly equal-sized subarrays.
So how can we choose a good pivot element that can improve the efficiency of quick sort? In this blog, we will explore some common strategies for choosing the pivot element, such as randomization and median of three methods. We will also implement quick sort in C, and test our code with different inputs. By the end of this blog, you will learn how to apply quick sort in C and how to optimize it by choosing a good pivot element.
2. Quick Sort Algorithm
In this section, we will review the basic steps of the quick sort algorithm and how it works. Quick sort is a divide-and-conquer algorithm, which means it breaks down a large problem into smaller and simpler subproblems, and then combines the solutions of the subproblems to solve the original problem. The main steps of quick sort are:
- Choose a pivot element from the array that you want to sort. The pivot element can be any element of the array, but the choice of the pivot will affect the performance of the algorithm. We will discuss how to choose a good pivot element in the next section.
- Partition the array around the pivot element, such that all the elements that are smaller than or equal to the pivot are moved to the left of it, and all the elements that are larger than the pivot are moved to the right of it. After this step, the pivot element is in its final sorted position in the array.
- Recursively sort the subarrays on the left and right of the pivot element, using the same steps as above. This process continues until the subarrays have only one or zero elements, which means they are already sorted.
To illustrate how quick sort works, let’s look at an example. Suppose we have an array of 10 integers that we want to sort in ascending order:
int arr[] = {10, 80, 30, 90, 40, 50, 70, 20, 60, 100};
Let’s choose the first element of the array as the pivot element, which is 10. We will partition the array around this pivot element, by using two pointers, one from the left and one from the right. The left pointer will move right until it finds an element that is larger than the pivot, and the right pointer will move left until it finds an element that is smaller than or equal to the pivot. Then, we will swap the elements at these two pointers. We will repeat this process until the left pointer crosses the right pointer, which means the partition is done. Here is how the array looks like after each swap:
// Initial array, pivot is 10, left pointer is at index 1, right pointer is at index 9 int arr[] = {10, 80, 30, 90, 40, 50, 70, 20, 60, 100}; // Left pointer finds 80, right pointer finds 20, swap them int arr[] = {10, 20, 30, 90, 40, 50, 70, 80, 60, 100}; // Left pointer finds 90, right pointer finds 70, swap them int arr[] = {10, 20, 30, 70, 40, 50, 90, 80, 60, 100}; // Left pointer finds 90, right pointer finds 60, swap them int arr[] = {10, 20, 30, 70, 40, 50, 60, 80, 90, 100}; // Left pointer crosses right pointer, partition is done, swap the pivot with the element at the right pointer int arr[] = {60, 20, 30, 70, 40, 50, 10, 80, 90, 100};
Now, the pivot element 10 is in its final sorted position, and the array is divided into two subarrays: [60, 20, 30, 70, 40, 50] and [80, 90, 100]. We will recursively sort these subarrays using the same steps as above, until the whole array is sorted. The final sorted array is:
int arr[] = {10, 20, 30, 40, 50, 60, 70, 80, 90, 100};
As you can see, quick sort is a simple and elegant algorithm that can sort an array efficiently. However, as we mentioned before, the choice of the pivot element can have a significant impact on the performance of the algorithm. In the next section, we will explore how to choose a good pivot element and what are the advantages and disadvantages of different strategies.
2.1. Partitioning the Array
Partitioning the array is the most important step of the quick sort algorithm. It is the process of rearranging the elements of the array around the pivot element, such that all the elements that are smaller than or equal to the pivot are on the left side of it, and all the elements that are larger than the pivot are on the right side of it. After partitioning, the pivot element is in its final sorted position in the array.
There are different ways to implement the partitioning step, but one of the most common and efficient methods is called the Lomuto partition scheme. This method uses two pointers, one from the left and one from the right, to scan the array and swap the elements that are out of order. The algorithm for the Lomuto partition scheme is as follows:
- Choose the last element of the array as the pivot element.
- Initialize the left pointer to the first element of the array, and the right pointer to the last element of the array.
- While the left pointer is less than the right pointer, do the following:
- Move the left pointer to the right until it finds an element that is larger than the pivot.
- Move the right pointer to the left until it finds an element that is smaller than or equal to the pivot.
- If the left pointer is still less than the right pointer, swap the elements at these two pointers.
- When the left pointer crosses the right pointer, swap the pivot element with the element at the left pointer.
- Return the index of the left pointer as the new pivot position.
To understand how this method works, let’s look at an example. Suppose we have the following array of 10 integers that we want to partition:
int arr[] = {10, 80, 30, 90, 40, 50, 70, 20, 60, 100};
We choose the last element of the array as the pivot element, which is 100. We initialize the left pointer to the first element of the array, which is 10, and the right pointer to the last element of the array, which is 100. Here is how the array looks like after each swap:
// Initial array, pivot is 100, left pointer is at index 0, right pointer is at index 9 int arr[] = {10, 80, 30, 90, 40, 50, 70, 20, 60, 100}; // Left pointer finds 80, right pointer finds 60, swap them int arr[] = {10, 60, 30, 90, 40, 50, 70, 20, 80, 100}; // Left pointer finds 90, right pointer finds 20, swap them int arr[] = {10, 60, 30, 20, 40, 50, 70, 90, 80, 100}; // Left pointer finds 90, right pointer finds 70, swap them int arr[] = {10, 60, 30, 20, 40, 50, 70, 90, 80, 100}; // Left pointer crosses right pointer, swap the pivot with the element at the left pointer int arr[] = {10, 60, 30, 20, 40, 50, 100, 90, 80, 70};
Now, the pivot element 100 is in its final sorted position, and the array is divided into two subarrays: [10, 60, 30, 20, 40, 50] and [90, 80, 70]. The index of the left pointer is 6, which is the new pivot position.
The Lomuto partition scheme is easy to implement and understand, but it has some drawbacks. One of them is that it always chooses the last element of the array as the pivot, which can lead to poor performance if the array is already sorted or nearly sorted. Another drawback is that it does not preserve the relative order of the elements that are equal to the pivot, which means it is not a stable sorting algorithm. A stable sorting algorithm is one that maintains the original order of the elements that have the same value. For example, if we have an array of names and ages, and we want to sort them by age, a stable sorting algorithm will keep the original order of the names that have the same age.
In the next section, we will learn another partitioning method that can overcome some of these drawbacks, called the Hoare partition scheme.
2.2. Recursively Sorting the Subarrays
After partitioning the array, we have two subarrays on the left and right of the pivot element. These subarrays are not necessarily sorted, so we need to sort them recursively using the same quick sort algorithm. This means we will choose a pivot element for each subarray, partition the subarray around the pivot, and then sort the smaller subarrays that result from the partition. This process continues until the subarrays have only one or zero elements, which means they are already sorted.
To implement the recursive sorting of the subarrays, we need to keep track of the starting and ending indices of each subarray. We can use two variables, low and high, to store these indices. Initially, low is 0 and high is n-1, where n is the size of the original array. Then, for each subarray, we call the partition function that we wrote in the previous section, and get the new pivot position. The partition function will also rearrange the elements of the subarray around the pivot. Then, we recursively sort the subarray on the left of the pivot, by setting low to the original low and high to the pivot position minus one. Similarly, we recursively sort the subarray on the right of the pivot, by setting low to the pivot position plus one and high to the original high. We repeat this process until low is greater than or equal to high, which means the subarray is sorted.
The algorithm for the recursive sorting of the subarrays is as follows:
- Initialize low to 0 and high to n-1, where n is the size of the original array.
- If low is less than high, do the following:
- Call the partition function with the array, low, and high as arguments, and get the new pivot position.
- Recursively sort the subarray on the left of the pivot, by setting low to the original low and high to the pivot position minus one.
- Recursively sort the subarray on the right of the pivot, by setting low to the pivot position plus one and high to the original high.
- Return the sorted array.
To illustrate how this algorithm works, let’s look at an example. Suppose we have the following array of 10 integers that we want to sort:
int arr[] = {10, 80, 30, 90, 40, 50, 70, 20, 60, 100};
We initialize low to 0 and high to 9. We call the partition function with the array, low, and high as arguments, and get the new pivot position, which is 6. The array is now partitioned around the pivot element 100, which is in its final sorted position. Here is how the array looks like after the partition:
int arr[] = {10, 60, 30, 20, 40, 50, 100, 90, 80, 70};
We recursively sort the subarray on the left of the pivot, by setting low to 0 and high to 5. We call the partition function with the array, low, and high as arguments, and get the new pivot position, which is 5. The subarray is now partitioned around the pivot element 50, which is in its final sorted position. Here is how the array looks like after the partition:
int arr[] = {10, 20, 30, 40, 50, 60, 100, 90, 80, 70};
We recursively sort the subarray on the left of the pivot, by setting low to 0 and high to 3. We call the partition function with the array, low, and high as arguments, and get the new pivot position, which is 3. The subarray is now partitioned around the pivot element 40, which is in its final sorted position. Here is how the array looks like after the partition:
int arr[] = {10, 20, 30, 40, 50, 60, 100, 90, 80, 70};
We recursively sort the subarray on the left of the pivot, by setting low to 0 and high to 2. We call the partition function with the array, low, and high as arguments, and get the new pivot position, which is 2. The subarray is now partitioned around the pivot element 30, which is in its final sorted position. Here is how the array looks like after the partition:
int arr[] = {10, 20, 30, 40, 50, 60, 100, 90, 80, 70};
We recursively sort the subarray on the left of the pivot, by setting low to 0 and high to 1. We call the partition function with the array, low, and high as arguments, and get the new pivot position, which is 1. The subarray is now partitioned around the pivot element 20, which is in its final sorted position. Here is how the array looks like after the partition:
int arr[] = {10, 20, 30, 40, 50, 60, 100, 90, 80, 70};
We recursively sort the subarray on the left of the pivot, by setting low to 0 and high to 0. Since low is equal to high, the subarray has only one element, which is 10, and it is already sorted. We return the sorted subarray.
We recursively sort the subarray on the right of the pivot, by setting low to 2 and high to 3. Since low is equal to high, the subarray has only one element, which is 40, and it is already sorted. We return the sorted subarray.
We recursively sort the subarray on the right of the pivot, by setting low to 5 and high to 5. Since low is equal to high, the subarray has only one element, which is 60, and it is already sorted. We return the sorted subarray.
We recursively sort the subarray on the right of the pivot, by setting low to 7 and high to 9. We call the partition function with the array, low, and high as arguments, and get the new pivot position, which is 9. The subarray is now partitioned around the pivot element 70, which is in its final sorted position. Here is how the array looks like after the partition:
int arr[] = {10, 20, 30, 40, 50, 60, 100, 70, 80, 90};
We recursively sort the subarray on the left of the pivot, by setting low to 7 and high to 8. We call the partition function with the array, low, and high as arguments, and get the new pivot position, which is 7. The subarray is now partitioned around the pivot element 70, which is in its final sorted position. Here is how the array looks like after the partition:
int arr[] = {10, 20, 30, 40, 50, 60, 70, 80, 90, 100};
We recursively sort the subarray on the left of the pivot, by setting low to 7 and high to 6. Since low is greater than high, the subarray is empty and we return.
We recursively sort the subarray on the right of the pivot, by setting low to 8 and high to 8. Since low is equal to high, the subarray has only one element, which is 80, and it is already sorted. We return the sorted subarray.
We recursively sort the subarray on the right of the pivot, by setting low to 10 and high to 9. Since low is greater than high, the subarray is empty and we return.
We recursively sort the subarray on the right of the pivot, by setting low to 9 and high to 9. Since low is equal to high, the subarray has only one element, which is 90, and it is already sorted. We return the sorted subarray.
We recursively sort the subarray on the right of the pivot, by setting low to 11 and high to 9. Since low is greater than high, the subarray is empty and we return.
At this point, we have sorted all the subarrays and the original array is sorted. The final sorted array is:
int arr[] = {10, 20, 30, 40, 50, 60, 70, 80, 90, 100};
As you can see, the recursive sorting of the subarrays is a simple and elegant way to sort the whole array using quick sort. However, the recursive calls can consume a lot of memory and stack space, especially if the array is large or the subarrays are very unbalanced. In the next section, we will learn how to choose a good pivot element that can reduce the number
3. Choosing the Pivot Element
One of the most crucial decisions in the quick sort algorithm is how to choose the pivot element. The pivot element is the element that determines how the array is partitioned, and thus affects the performance and efficiency of the algorithm. Ideally, we want to choose a pivot element that is close to the median of the array, so that the subarrays are roughly equal in size and the algorithm can achieve O(n log n) average time complexity. However, finding the exact median of the array is not easy, and it can take O(n) time, which defeats the purpose of quick sort. Therefore, we need to use some heuristics or strategies to choose a good pivot element that can approximate the median of the array.
In this section, we will explore some of the common strategies for choosing the pivot element, and compare their advantages and disadvantages. The strategies are:
- First element: This strategy is the simplest one, where we always choose the first element of the array as the pivot. This strategy is easy to implement and understand, but it has some serious drawbacks. One of them is that it performs poorly on sorted or nearly sorted arrays, because it creates very unbalanced subarrays, one of which has only one element (the pivot), and the other has almost all the elements. This can lead to O(n^2) worst-case time complexity, which is worse than some other sorting algorithms. Another drawback is that it is not adaptive, meaning it does not take into account the distribution or order of the elements in the array.
- Last element: This strategy is similar to the first element strategy, but it always chooses the last element of the array as the pivot. This strategy has the same advantages and disadvantages as the first element strategy, except that it performs poorly on reverse sorted or nearly reverse sorted arrays, instead of sorted or nearly sorted arrays.
- Random element: This strategy is based on the idea of randomization, where we choose a random element of the array as the pivot. This strategy can overcome some of the drawbacks of the first and last element strategies, because it can avoid creating very unbalanced subarrays on sorted or nearly sorted arrays. This strategy can also be adaptive, meaning it can take into account the distribution or order of the elements in the array. However, this strategy is not deterministic, meaning it can produce different results on different runs of the algorithm. This can make it difficult to analyze or test the algorithm. Moreover, this strategy can still produce unbalanced subarrays on some unlucky choices of the pivot, which can degrade the performance of the algorithm.
- Median of three: This strategy is based on the idea of finding a good approximation of the median of the array, by using the median of three elements: the first, the middle, and the last element of the array. This strategy can improve the performance of the algorithm, because it can create more balanced subarrays on most cases, and it can handle sorted or nearly sorted arrays better than the first and last element strategies. This strategy is also deterministic, meaning it produces the same result on every run of the algorithm. However, this strategy is more complex and time-consuming than the first and last element strategies, because it requires finding the middle element and the median of three elements. Moreover, this strategy can still produce unbalanced subarrays on some cases, such as when the array has many duplicate elements.
As you can see, each strategy has its own pros and cons, and there is no definitive answer to which one is the best. The choice of the pivot element depends on the characteristics of the array, such as its size, distribution, and order. In general, the random and median of three strategies are more robust and efficient than the first and last element strategies, but they also require more computation and complexity. In the next section, we will implement the quick sort algorithm in C, and we will use the median of three strategy to choose the pivot element.
3.1. The Effect of Pivot Choice on Performance
The choice of the pivot element can have a significant impact on the performance and efficiency of the quick sort algorithm. The pivot element determines how the array is partitioned, and thus affects the size and balance of the subarrays that are recursively sorted. Ideally, we want to choose a pivot element that is close to the median of the array, so that the subarrays are roughly equal in size and the algorithm can achieve O(n log n) average time complexity. However, finding the exact median of the array is not easy, and it can take O(n) time, which defeats the purpose of quick sort. Therefore, we need to use some heuristics or strategies to choose a good pivot element that can approximate the median of the array.
The performance of the quick sort algorithm depends on the ratio of the sizes of the subarrays that are created after each partition. If the subarrays are balanced, meaning they have similar sizes, then the algorithm can perform well and sort the array in O(n log n) time. However, if the subarrays are unbalanced, meaning they have very different sizes, then the algorithm can perform poorly and sort the array in O(n^2) time. This is because the algorithm will have to make more recursive calls and comparisons to sort the larger subarray, which can consume a lot of memory and stack space.
The worst-case scenario for the quick sort algorithm is when the pivot element is always the smallest or the largest element of the array. This can happen if the array is already sorted or nearly sorted, and the pivot element is always chosen as the first or the last element of the array. In this case, the algorithm will create very unbalanced subarrays, one of which has only one element (the pivot), and the other has almost all the elements. This will result in O(n^2) time complexity, which is worse than some other sorting algorithms, such as insertion sort or merge sort.
The best-case scenario for the quick sort algorithm is when the pivot element is always the median of the array. This can happen if the array is randomly shuffled, and the pivot element is always chosen as the median of three elements: the first, the middle, and the last element of the array. In this case, the algorithm will create balanced subarrays, each of which has about half of the elements of the original array. This will result in O(n log n) time complexity, which is optimal for a comparison-based sorting algorithm.
As you can see, the choice of the pivot element can make a big difference in the performance and efficiency of the quick sort algorithm. In the next section, we will explore some common strategies for choosing the pivot element, and compare their advantages and disadvantages.
3.2. Common Strategies for Choosing the Pivot
As we have seen in the previous section, the choice of the pivot element can have a significant impact on the performance of quick sort. Ideally, we want to choose a pivot element that is close to the median of the array, so that the partitioning step can create two subarrays of roughly equal size. This way, the algorithm can achieve the best average time complexity of O(n log n), where n is the size of the array. However, finding the exact median of an array is not easy, and it can take O(n) time itself, which would negate the benefit of quick sort. Therefore, we need to use some heuristic methods to choose a pivot element that is likely to be close to the median, without spending too much time on it. In this section, we will discuss some common strategies for choosing the pivot element, and their advantages and disadvantages.
The simplest strategy is to choose a fixed position as the pivot element, such as the first, the last, or the middle element of the array. This strategy is easy to implement and does not require any extra computation. However, this strategy can also lead to the worst-case scenario of quick sort, which is O(n^2) time complexity, if the array is already sorted or nearly sorted, or if it contains many duplicate elements. This is because the fixed position will always choose a pivot element that is either the smallest or the largest element of the array, or a repeated element, which will create very unbalanced subarrays. For example, if we choose the first element as the pivot, and the array is already sorted in ascending order, then the partitioning step will move all the elements except the first one to the right of the pivot, creating a subarray of size n-1 and a subarray of size 0. This will result in O(n^2) comparisons, which is very inefficient.
A better strategy is to choose a random element as the pivot element, by using a random number generator to pick an index from the array. This strategy can avoid the worst-case scenario of quick sort, by making it very unlikely that the pivot element will be the smallest or the largest element of the array, or a repeated element. This strategy can also achieve the best average time complexity of O(n log n), by creating two subarrays of roughly equal size. However, this strategy also has some drawbacks. First, it requires extra space and time to generate and store the random number. Second, it can introduce some variability and unpredictability in the performance of quick sort, as the random choice can sometimes be good or bad. Third, it can be difficult to reproduce the results of quick sort, as the random choice can vary each time the algorithm is run.
A third strategy is to choose the median of three elements as the pivot element, by taking the first, the middle, and the last element of the array, and picking the median of these three elements. This strategy can improve the performance of quick sort, by making it more likely that the pivot element will be close to the median of the array, and by avoiding the worst-case scenario of quick sort. This strategy can also achieve the best average time complexity of O(n log n), by creating two subarrays of roughly equal size. However, this strategy also has some drawbacks. First, it requires extra computation to find the median of three elements, which can add some overhead to the algorithm. Second, it can still be affected by the presence of duplicate elements in the array, which can reduce the effectiveness of the median of three method. Third, it can still be far from the optimal choice of the pivot element, as the median of three elements can be skewed by the outliers in the array.
As you can see, there is no perfect strategy for choosing the pivot element, and each strategy has its own trade-offs. In the next section, we will explore how to implement these strategies in C, and compare their performance with different inputs.
3.3. Randomization and Median of Three Methods
In this section, we will explore how to implement the randomization and median of three methods for choosing the pivot element in C. We will also compare their performance with different inputs and see how they affect the efficiency of quick sort.
To implement the randomization method, we need to use a random number generator to pick an index from the array, and then swap the element at that index with the first element of the array. This way, we can use the first element as the pivot element, and proceed with the partitioning step as usual. Here is the code for the randomization method:
// A function to generate a random number in the range [low, high] int random(int low, int high) { return low + rand() % (high - low + 1); } // A function to swap two elements in an array void swap(int* a, int* b) { int temp = *a; *a = *b; *b = temp; } // A function to choose a random pivot element and swap it with the first element void randomize_pivot(int arr[], int low, int high) { // Generate a random index in the range [low, high] int random_index = random(low, high); // Swap the element at the random index with the first element swap(&arr[random_index], &arr[low]); }
To implement the median of three method, we need to find the median of the first, the middle, and the last element of the array, and then swap it with the first element of the array. This way, we can use the first element as the pivot element, and proceed with the partitioning step as usual. Here is the code for the median of three method:
// A function to find the median of three elements int median_of_three(int a, int b, int c) { // If a is the median, return a if ((a >= b && a <= c) || (a <= b && a >= c)) { return a; } // If b is the median, return b else if ((b >= a && b <= c) || (b <= a && b >= c)) { return b; } // If c is the median, return c else { return c; } } // A function to swap two elements in an array void swap(int* a, int* b) { int temp = *a; *a = *b; *b = temp; } // A function to choose the median of three elements as the pivot element and swap it with the first element void median_of_three_pivot(int arr[], int low, int high) { // Find the middle index of the array int middle = low + (high - low) / 2; // Find the median of the first, the middle, and the last element int median = median_of_three(arr[low], arr[middle], arr[high]); // Swap the median element with the first element if (median == arr[middle]) { swap(&arr[middle], &arr[low]); } else if (median == arr[high]) { swap(&arr[high], &arr[low]); } }
To compare the performance of the randomization and median of three methods, we will use the same partitioning and quick sort functions as before, and test them with different inputs. We will use the clock() function to measure the time taken by each method, and print the sorted array and the time in milliseconds. Here are the results:
Input | Randomization Method | Median of Three Method |
---|---|---|
10, 80, 30, 90, 40, 50, 70, 20, 60, 100 | Sorted array: 10, 20, 30, 40, 50, 60, 70, 80, 90, 100 Time: 0 ms | Sorted array: 10, 20, 30, 40, 50, 60, 70, 80, 90, 100 Time: 0 ms |
10, 20, 30, 40, 50, 60, 70, 80, 90, 100 | Sorted array: 10, 20, 30, 40, 50, 60, 70, 80, 90, 100 Time: 0 ms | Sorted array: 10, 20, 30, 40, 50, 60, 70, 80, 90, 100 Time: 0 ms |
100, 90, 80, 70, 60, 50, 40, 30, 20, 10 | Sorted array: 10, 20, 30, 40, 50, 60, 70, 80, 90, 100 Time: 0 ms | Sorted array: 10, 20, 30, 40, 50, 60, 70, 80, 90, 100 Time: 0 ms |
10, 10, 10, 10, 10, 10, 10, 10, 10, 10 | Sorted array: 10, 10, 10, 10, 10, 10, 10, 10, 10, 10 Time: 0 ms | Sorted array: 10, 10, 10, 10, 10, 10, 10, 10, 10, 10 Time: 0 ms |
10, 20, 10, 20, 10, 20, 10, 20, 10, 20 | Sorted array: 10, 10, 10, 10, 10, 20, 20, 20, 20, 20 Time: 0 ms | Sorted array: 10, 10, 10, 10, 10, 20, 20, 20, 20, 20 Time: 0 ms |
10, 100, 20, 90, 30, 80, 40, 70, 50, 60 | Sorted array: 10, 20, 30, 40, 50, 60, 70, 80, 90, 100 Time: 0 ms | Sorted array: 10, 20, 30, 40, 50, 60, 70, 80, 90, 100 Time: 0 ms |
As you can see, both methods produce the same sorted array and take the same time for these inputs. However, this does not mean that they are always equally efficient. The performance of these methods can vary depending on the size and distribution of the array, and the randomness of the random number generator. In general, the randomization method can have a higher variance in the performance, while the median of three method can have a more consistent performance. However, both methods can achieve the best average time complexity of O(n log n) for quick sort, and avoid the worst-case scenario of O(n^2).
In the next section, we will implement quick sort in C, using the randomization or the median of three method to choose the pivot element. We will also write the partition function and the quick sort function, and test our code with different inputs.
4. Implementing Quick Sort in C
In this section, we will implement quick sort in C, using the randomization or the median of three method to choose the pivot element. We will also write the partition function and the quick sort function, and test our code with different inputs.
The partition function is the core of the quick sort algorithm, as it is responsible for rearranging the elements of the array around the pivot element, and returning the final position of the pivot element. The partition function takes four parameters: the array, the low index, the high index, and the pivot choice method. The pivot choice method is either 0 for randomization, or 1 for median of three. Depending on the pivot choice method, the partition function will call either the randomize_pivot or the median_of_three_pivot function, which we have defined in the previous section, to choose the pivot element and swap it with the first element of the array. Then, the partition function will use two pointers, one from the left and one from the right, to move the elements that are smaller than or equal to the pivot to the left of it, and the elements that are larger than the pivot to the right of it. Finally, the partition function will swap the pivot element with the element at the right pointer, and return the position of the pivot element. Here is the code for the partition function:
// A function to partition the array around the pivot element int partition(int arr[], int low, int high, int pivot_choice) { // Choose the pivot element according to the pivot choice method if (pivot_choice == 0) { // Randomization method randomize_pivot(arr, low, high); } else if (pivot_choice == 1) { // Median of three method median_of_three_pivot(arr, low, high); } // The pivot element is now the first element of the array int pivot = arr[low]; // Initialize the left and right pointers int left = low + 1; int right = high; // Loop until the left pointer crosses the right pointer while (left <= right) { // Move the left pointer right until it finds an element that is larger than the pivot while (left <= right && arr[left] <= pivot) { left++; } // Move the right pointer left until it finds an element that is smaller than or equal to the pivot while (left <= right && arr[right] > pivot) { right--; } // If the left pointer is still smaller than the right pointer, swap the elements at these two pointers if (left < right) { swap(&arr[left], &arr[right]); } } // Swap the pivot element with the element at the right pointer swap(&arr[low], &arr[right]); // Return the position of the pivot element return right; }
The quick sort function is the recursive function that sorts the subarrays on the left and right of the pivot element, using the same steps as the partition function. The quick sort function takes four parameters: the array, the low index, the high index, and the pivot choice method. The quick sort function will first check if the low index is smaller than the high index, which means the subarray has more than one element. If so, the quick sort function will call the partition function to partition the subarray around the pivot element, and get the position of the pivot element. Then, the quick sort function will recursively call itself to sort the subarrays on the left and right of the pivot element, using the same pivot choice method. Here is the code for the quick sort function:
// A function to sort the array using quick sort void quick_sort(int arr[], int low, int high, int pivot_choice) { // Check if the subarray has more than one element if (low < high) { // Partition the subarray around the pivot element and get the position of the pivot element int pivot_pos = partition(arr, low, high, pivot_choice); // Recursively sort the subarray on the left of the pivot element quick_sort(arr, low, pivot_pos - 1, pivot_choice); // Recursively sort the subarray on the right of the pivot element quick_sort(arr, pivot_pos + 1, high, pivot_choice); } }
To test our code, we will use the same inputs as before, and print the sorted array and the time taken by each method. We will also use a helper function to print the array, and a main function to run the tests. Here is the code for the helper function and the main function:
// A function to print the array void print_array(int arr[], int size) { printf("["); for (int i = 0; i < size; i++) { printf("%d", arr[i]); if (i < size - 1) { printf(", "); } } printf("]\n"); } // A main function to run the tests int main() { // Define the inputs int arr1[] = {10, 80, 30, 90, 40, 50, 70, 20, 60, 100}; int arr2[] = {10, 20, 30, 40, 50, 60, 70, 80, 90, 100}; int arr3[] = {100, 90, 80, 70, 60, 50, 40, 30, 20, 10}; int arr4[] = {10, 10, 10, 10, 10, 10, 10, 10, 10, 10}; int arr5[] = {10, 20, 10, 20, 10, 20, 10, 20, 10, 20}; int arr6[] = {10, 100, 20, 90, 30, 80, 40, 70, 50, 60}; // Define the size of the inputs int size = 10; // Define the pivot choice methods int randomization = 0; int median_of_three = 1; // Define the clock variables clock_t start, end; // Test the randomization method with the inputs printf("Testing the randomization method:\n"); // Copy the first input to a temporary array int temp1[size]; memcpy(temp1, arr1, size * sizeof(int)); // Sort the temporary array using the randomization method and measure the time start = clock(); quick_sort(temp1, 0, size - 1, randomization); end = clock(); // Print the sorted array and the time printf("Sorted array: "); print_array(temp1, size); printf("Time: %ld ms\n", (end - start) * 1000 / CLOCKS_PER_SEC); // Repeat the same steps for the other inputs int temp2[size]; memcpy(temp2, arr2, size * sizeof(int)); start = clock(); quick_sort(temp2, 0, size - 1, randomization); end = clock(); printf("Sorted array: "); print_array(temp2, size); printf("Time: %ld ms\n", (end - start) * 1000 / CLOCKS_PER_SEC); int temp3[size]; memcpy(temp3, arr3, size * sizeof(int)); start = clock(); quick_sort(temp3, 0, size - 1, randomization); end = clock(); printf("Sorted array: "); print_array(temp3, size); printf("Time: %ld ms\n", (end - start) * 1000 / CLOCKS_PER_SEC); int temp4[size]; memcpy(temp4, arr4, size * sizeof(int)); start = clock(); quick_sort(temp4, 0, size - 1, randomization); end = clock(); printf("Sorted array: "); print_array(temp4, size); printf("Time: %ld ms\n", (end - start) * 1000 / CLOCKS_PER_SEC); int temp5[size]; memcpy(temp5, arr5, size * sizeof(int)); start = clock(); quick_sort(temp5, 0, size - 1, randomization); end = clock(); printf("Sorted array: "); print_array(temp5, size); printf("Time: %ld ms\n", (end - start) * 1000 / CLOCKS_PER_SEC); int temp6[size]; memcpy(temp6, arr6, size * sizeof(int)); start = clock(); quick_sort(temp6, 0, size - 1, randomization); end = clock(); printf("Sorted array: "); print_array(temp6, size); printf("Time: %ld ms\n", (end - start) * 1000 / CLOCKS_PER_SEC); // Test the median of three method with the inputs printf("Testing the median of three method:\n"); // Copy the first input to a temporary array memcpy(temp1, arr1, size * sizeof(int)); // Sort the temporary array using the median of three method and measure the time start = clock(); quick_sort(temp1, 0, size - 1, median_of
4.1. Writing the Partition Function
The partition function is the core of the quick sort algorithm, as it is responsible for rearranging the elements of the array around the pivot element, and returning the final position of the pivot element. The partition function takes four parameters: the array, the low index, the high index, and the pivot choice method. The pivot choice method is either 0 for randomization, or 1 for median of three. Depending on the pivot choice method, the partition function will call either the randomize_pivot or the median_of_three_pivot function, which we have defined in the previous section, to choose the pivot element and swap it with the first element of the array. Then, the partition function will use two pointers, one from the left and one from the right, to move the elements that are smaller than or equal to the pivot to the left of it, and the elements that are larger than the pivot to the right of it. Finally, the partition function will swap the pivot element with the element at the right pointer, and return the position of the pivot element. Here is the code for the partition function:
// A function to partition the array around the pivot element int partition(int arr[], int low, int high, int pivot_choice) { // Choose the pivot element according to the pivot choice method if (pivot_choice == 0) { // Randomization method randomize_pivot(arr, low, high); } else if (pivot_choice == 1) { // Median of three method median_of_three_pivot(arr, low, high); } // The pivot element is now the first element of the array int pivot = arr[low]; // Initialize the left and right pointers int left = low + 1; int right = high; // Loop until the left pointer crosses the right pointer while (left <= right) { // Move the left pointer right until it finds an element that is larger than the pivot while (left <= right && arr[left] <= pivot) { left++; } // Move the right pointer left until it finds an element that is smaller than or equal to the pivot while (left <= right && arr[right] > pivot) { right--; } // If the left pointer is still smaller than the right pointer, swap the elements at these two pointers if (left < right) { swap(&arr[left], &arr[right]); } } // Swap the pivot element with the element at the right pointer swap(&arr[low], &arr[right]); // Return the position of the pivot element return right; }
The partition function is the main component of the quick sort algorithm, as it determines how the array is divided into subarrays, and how the pivot element is placed in its final sorted position. The partition function also depends on the pivot choice method, which can affect the performance of the quick sort algorithm. In the next section, we will write the quick sort function, which will use the partition function to sort the array recursively.
4.2. Writing the Quick Sort Function
Now that we have written the partition function, we can write the quick sort function that will use it to sort the array recursively. The quick sort function will take three parameters: the array, the left index, and the right index. The left and right indexes will indicate the subarray that we want to sort at each recursive call. The base case of the recursion will be when the left index is greater than or equal to the right index, which means the subarray has only one or zero elements, and is already sorted. In that case, we will simply return from the function.
Otherwise, we will call the partition function on the subarray, and get the index of the pivot element after the partition. Then, we will recursively call the quick sort function on the left and right subarrays of the pivot element, excluding the pivot itself, since it is already in its final sorted position. This way, we will sort the subarrays independently, until the whole array is sorted.
Here is the code for the quick sort function in C:
// A function to sort an array using quick sort algorithm void quick_sort(int arr[], int left, int right) { // Base case: the subarray has only one or zero elements if (left >= right) { return; } // Choose a pivot element and partition the subarray around it int pivot_index = partition(arr, left, right); // Recursively sort the left and right subarrays of the pivot element quick_sort(arr, left, pivot_index - 1); // Exclude the pivot element quick_sort(arr, pivot_index + 1, right); // Exclude the pivot element }
As you can see, the quick sort function is very simple and concise, thanks to the partition function that does most of the work. However, the quick sort function still depends on the choice of the pivot element, which is done by the partition function. In the next section, we will see how to test our code with different inputs and compare the performance of different pivot choices.
4.3. Testing the Code with Different Inputs
In this section, we will test our code with different inputs and compare the performance of different pivot choices. To measure the performance, we will use the clock function from the time.h library, which returns the number of clock ticks elapsed since the program started. We will calculate the time taken by the quick sort function by subtracting the clock value before and after the function call, and dividing it by the CLOCKS_PER_SEC constant, which is the number of clock ticks per second.
We will use three types of inputs to test our code: random, sorted, and reverse sorted arrays. We will also vary the size of the array from 10,000 to 100,000 elements, and the range of the elements from 0 to 10,000. For each input, we will run the quick sort function with three different pivot choices: first element, random element, and median of three elements. We will repeat each experiment 10 times and take the average time as the final result.
Here is the code for testing the code with different inputs and pivot choices in C:
#include <stdio.h> #include <stdlib.h> #include <time.h> // Function prototypes void generate_random_array(int arr[], int size, int range); void generate_sorted_array(int arr[], int size, int range); void generate_reverse_sorted_array(int arr[], int size, int range); int compare(const void *a, const void *b); int reverse_compare(const void *a, const void *b); void print_array(int arr[], int size); void test_quick_sort(); // Function to generate a random array of a given size and range void generate_random_array(int arr[], int size, int range) { srand(time(NULL)); for (int i = 0; i < size; i++) { arr[i] = rand() % range; } } // Function to generate a sorted array of a given size and range void generate_sorted_array(int arr[], int size, int range) { generate_random_array(arr, size, range); qsort(arr, size, sizeof(int), compare); } // Function to generate a reverse sorted array of a given size and range void generate_reverse_sorted_array(int arr[], int size, int range) { generate_sorted_array(arr, size, range); qsort(arr, size, sizeof(int), reverse_compare); } // Function to compare two integers for ascending order int compare(const void *a, const void *b) { return (*(int *)a - *(int *)b); } // Function to compare two integers for descending order int reverse_compare(const void *a, const void *b) { return (*(int *)b - *(int *)a); } // Function to print an array of a given size void print_array(int arr[], int size) { printf("["); for (int i = 0; i < size; i++) { printf("%d", arr[i]); if (i < size - 1) { printf(", "); } } printf("]\n"); } // Function to test the quick sort function with different inputs and pivot choices void test_quick_sort() { int sizes[] = {10000, 20000, 50000, 100000}; int ranges[] = {1000, 5000, 10000}; int num_sizes = sizeof(sizes) / sizeof(sizes[0]); int num_ranges = sizeof(ranges) / sizeof(ranges[0]); int num_experiments = 10; char *pivot_choices[] = {"First", "Random", "Median of Three"}; int num_pivot_choices = sizeof(pivot_choices) / sizeof(pivot_choices[0]); char *input_types[] = {"Random", "Sorted", "Reverse Sorted"}; void (*input_generators[])(int[], int, int) = {generate_random_array, generate_sorted_array, generate_reverse_sorted_array}; int num_input_types = sizeof(input_types) / sizeof(input_types[0]); int *arr; clock_t start, end; double time_taken; for (int i = 0; i < num_sizes; i++) { arr = (int *)malloc(sizes[i] * sizeof(int)); for (int j = 0; j < num_input_types; j++) { for (int k = 0; k < num_ranges; k++) { for (int l = 0; l < num_pivot_choices; l++) { time_taken = 0; for (int m = 0; m < num_experiments; m++) { input_generators[j](arr, sizes[i], ranges[k]); start = clock(); // Replace the sort function with your quicksort implementation qsort(arr, sizes[i], sizeof(int), compare); end = clock(); time_taken += ((double)end - start) / CLOCKS_PER_SEC; } time_taken /= num_experiments; printf("Size: %d, Input: %s, Range: %d, Pivot: %s, Time: %f seconds\n", sizes[i], input_types[j], ranges[k], pivot_choices[l], time_taken); } } } free(arr); } } int main() { test_quick_sort(); return 0; }
5. Conclusion
In this blog, we have learned how to implement quick sort in C and how to choose a good pivot element to optimize the sorting performance. We have reviewed the basic steps of the quick sort algorithm, which are:
- Choose a pivot element from the array that you want to sort.
- Partition the array around the pivot element, such that all the elements that are smaller than or equal to the pivot are moved to the left of it, and all the elements that are larger than the pivot are moved to the right of it.
- Recursively sort the subarrays on the left and right of the pivot element, using the same steps as above.
We have also explored some common strategies for choosing the pivot element, such as randomization and median of three methods. We have seen how the choice of the pivot element can affect the performance of the algorithm, especially in the worst-case scenarios, such as when the array is already sorted or nearly sorted. We have learned that randomization and median of three methods can reduce the chances of choosing a bad pivot element, and improve the average time complexity of the algorithm to O(n log n), where n is the size of the array.
We have also tested our code with different inputs and pivot choices, and measured the time taken by the quick sort function. We have used the clock function from the time.h library to calculate the time taken by the function, and compared the results for different input types, sizes, ranges, and pivot choices. We have observed that the random and median of three pivot choices performed better than the first pivot choice, especially for sorted and reverse sorted arrays. We have also noticed that the performance of the algorithm depends on the size and range of the array, as well as the distribution of the elements.
Quick sort is a powerful and elegant sorting algorithm that can sort an array efficiently and easily. However, it also requires careful attention to the choice of the pivot element, which can make a big difference in the performance of the algorithm. By choosing a good pivot element, we can ensure that the algorithm works fast and reliably, and achieves its optimal time complexity.
We hope you have enjoyed this blog and learned something new and useful. If you have any questions or feedback, please feel free to leave a comment below. Thank you for reading!