Hands-on: K-sort (with progressive 'hint-by-hint' solution)
July 20, 2018
We’ve just implemented the heap data structure and the heap-sort algorithm. Now, we’ll solve the K-sort problem using heaps. You can work in any programming language of your choice - Python, C++, Java, or anything else you want. Let’s get started!
Problem Statement
We’re given k sorted arrays. We need to combine the sorted arrays into a single sorted array in O(nlogk) time, where n is the total number of elements in all the arrays combined.
Note: The desired complexity is O(nlogk), not O(nlogn).
Constraints: n≤5000000 and k≤50.
IO description
Input description: The first line contains the number of test cases T. Each test case begins with a line with a single number k (number of sorted arrays). Each of the next k lines describes a sorted array. The first number in the line is ni, the number of elements in array i. Which is then followed by ni numbers which make the sorted array.
Sample Input:
2
2
3 16 30 126
2 21 113
4
2 95 95
3 16 30 126
2 21 113
1 42
Output description: Output should contain T lines. Each line should contain the answer for the respective test case - the combined sorted array.
Sample output:
16 21 30 113 126
16 21 30 42 95 95 113 126
Explanation:
The first test case consists of 2-sorted arrays that need to be combined. The second test case consists of 4-sorted arrays that need to be combined.
Grading your solution
You can find all the required files to grade your solution here.
First run your solution with the provided input
{COMMAND TO RUN Python/C++/Java CODE} < input > output
Then, grade your output using the grader and answer provided.
python grader.py output answer
Progressive ‘hint-by-hint’ solution
Only look at the hints if you can’t solve the problem for at-least 20-minutes. Read only one hint that you could not figure out on your own. Then spend at-least 20-minutes trying to solve the problem again.
Hint 1:
We are given k sorted arrays. Somehow, we need to use the sorted-ness of the given arrays to do sorting in O(n log k) instead of O(n log n).
Hint 2:
When we were sorting a general array of size n, we had to add all the elements to a heap, and hence the heap operations took log n time each. Now, it’s possible for us to ensure the total size of the heap is at-most k, so each operation takes time log k.
Hint 3:
We essentially need to perform the merge step from merge-sort, but with k arrays instead of 2. We can use a heap to do this efficiently.
Full solution
Here is the official solution for this assignment. The code executes in ~10 seconds. If you implemented your solution in C++ or Java, you should expect your solution to run at-least 5x faster.
A note on the run-time
The above heap-based solution executes in ~10 seconds.
If we simply combine all the elements into a single array and sort the array using quick-sort, then we get the following run-times.
- Our quick-sort implementation: 47 seconds.
- Our in-place quick-sort implementation: 30 seconds.
- Python’s optimized quick-sort implementation: 6 seconds.
As we can see, Pythons optimized implementation for quick-sort is faster than the optimization we achieved because of O(logk) vs O(logn), however we achieve a significant speed-up when we use unoptimized quick-sort implementation, even if we use in-place quick-sort.