Hands-on: Implementing Merge-sort
June 29, 2018
Now that we have learnt about merge-sort, it’s time to implement it. You can work in any programming language of your choice - Python, C++, Java, or anything else you want. Let’s get started!
Implementing Merge-sort
Python
Below, we provide a short Python code template for you to implement merge-sort.
# merge_sort.py
def merge_sort(array):
# TODO: base case
# TODO: split in two and recurse
return merge(left, right)
def merge(left, right):
merged = list()
ii, jj = 0, 0 # pointers for left and right
# TODO: create merged lists
# check if pointers have reached the end of the list
assert ii == len(left), (ii, len(left))
assert jj == len(right), (jj, len(right))
return merged
array = [5, 10, 8, 7, 3, 7, 6, 12, 2, 7]
result = merge_sort(array)
assert result == [2, 3, 5, 6, 7, 7, 7, 8, 10, 12], result
C++, Java, etc
If you’re not using Python, have your implementation accept test cases from standard input, and print results to standard output.
Input description: The first line contains the number of test cases T. This is followed by T lines. Each line starts with a number N, the size of the array. On the same line, the N numbers of the array are listed one after the other.
Sample input:
2
3 227 198 19
10 458 687 945 378 479 421 33 295 922 840
Output description: Output should contain T lines. Each line should just contain the numbers in sorted order, separated by spaces.
Sample output:
19 198 227
33 295 378 421 458 479 687 840 922 945
Notes
The trickiest part about implementing merge-sort is the merge
function. Specifically, be careful about handling the pointers when one of the segments has been scanned completely, but the other segment’s pointer still has some elements to cover.
Grading your solution
Python
You can use the following file to grade your solution.
# grader.py
from __future__ import print_function
import datetime
import random
from merge_sort import merge_sort
npassed, nfailed = 0, 0
for itest in range(1, 13):
print('Test case:', itest)
nelements = random.randint(int((10**(itest/2.))/2), int(10**(itest/2.)))
print('Array size:', nelements)
array = [random.randint(1, 100*nelements) for _ in range(nelements)]
tic = datetime.datetime.now()
submission = merge_sort(array)
toc = datetime.datetime.now()
answer = sorted(array)
correct = len(submission) == len(answer) and (submission == answer)
if correct:
print('PASSED (Runtime = %.2f seconds)' % (toc-tic).total_seconds())
npassed += 1
else:
print('FAILED')
nfailed += 1
print(array)
print(submission)
print(answer)
print('='*100)
print('TOTAL PASSED: %d. TOTAL FAILED: %d.' % (npassed, nfailed))
Save the file above, and then run the following command:
python grader.py
and you should see the results of running the grader.
C++, Java, etc
You’ll find all the files required to check your solution here: Algorithms - Sorting. In particular, the folder includes 3 files - input
, answer
and grader.py
.
Do the following to check your solution.
{COMMAND TO RUN C++/JAVA CODE} < input > output
python grader.py output answer
Solution
The solution for this assignment can be found here: merge_sort.py. The solution includes two implementations of the merge()
function. The first one is easier to read and understand. The second one, merge_concise()
, is more concise. Either way, the code takes about 5 seconds to run for input sizes around 700,000. Note that if you implemented the solution in C++ / Java, your code should be executing about > 5x faster.