Hands-on: K-sort (with progressive 'hint-by-hint' solution)
July 20, 2018
We’ve just implemented the heap data structure and the heap-sort algorithm. Now, we’ll solve the K-sort problem using heaps. You can work in any programming language of your choice - Python, C++, Java, or anything else you want. Let’s get started!
Problem Statement
We’re given $k$ sorted arrays. We need to combine the sorted arrays into a single sorted array in O($n \log k$) time, where $n$ is the total number of elements in all the arrays combined.
Note: The desired complexity is O($n \log k$), not O($n \log n$).
Constraints: $n \leq 5000000$ and $k \leq 50$.
IO description
Input description: The first line contains the number of test cases T. Each test case begins with a line with a single number $k$ (number of sorted arrays). Each of the next $k$ lines describes a sorted array. The first number in the line is $n_i$, the number of elements in array $i$. Which is then followed by $n_i$ numbers which make the sorted array.
Sample Input:
2
2
3 16 30 126
2 21 113
4
2 95 95
3 16 30 126
2 21 113
1 42
Output description: Output should contain T lines. Each line should contain the answer for the respective test case - the combined sorted array.
Sample output:
16 21 30 113 126
16 21 30 42 95 95 113 126
Explanation:
The first test case consists of 2-sorted arrays that need to be combined. The second test case consists of 4-sorted arrays that need to be combined.
Grading your solution
You can find all the required files to grade your solution here.
First run your solution with the provided input
{COMMAND TO RUN Python/C++/Java CODE} < input > output
Then, grade your output using the grader and answer provided.
python grader.py output answer
Progressive ‘hint-by-hint’ solution
Only look at the hints if you can’t solve the problem for at-least 20-minutes. Read only one hint that you could not figure out on your own. Then spend at-least 20-minutes trying to solve the problem again.
Hint 1:
We are given k sorted arrays. Somehow, we need to use the sorted-ness of the given arrays to do sorting in O(n log k) instead of O(n log n).
Hint 2:
When we were sorting a general array of size n, we had to add all the elements to a heap, and hence the heap operations took log n time each. Now, it’s possible for us to ensure the total size of the heap is at-most k, so each operation takes time log k.
Hint 3:
We essentially need to perform the merge step from merge-sort, but with k arrays instead of 2. We can use a heap to do this efficiently.
Full solution
Here is the official solution for this assignment. The code executes in ~10 seconds. If you implemented your solution in C++ or Java, you should expect your solution to run at-least 5x faster.
A note on the run-time
The above heap-based solution executes in ~10 seconds.
If we simply combine all the elements into a single array and sort the array using quick-sort, then we get the following run-times.
- Our quick-sort implementation: 47 seconds.
- Our in-place quick-sort implementation: 30 seconds.
- Python’s optimized quick-sort implementation: 6 seconds.
As we can see, Pythons optimized implementation for quick-sort is faster than the optimization we achieved because of O($\log k$) vs O($\log n$), however we achieve a significant speed-up when we use unoptimized quick-sort implementation, even if we use in-place quick-sort.