CSSE 230
SortingRaces Programming Assignment

Overview

For the first part of this assignment, you will implement and test a Binary Heap. See here: heaps.html

In this assignment, you will compare the speeds of various O(n log n) sorting algorithms.

  1. This is a pair assignment. Pairs and repository names are on the schedule page.
  2. You will study the runtimes of six sorting algorithms: three built into Java, and three that you write (or have recently written!). Built-in:
    1. mergesort. Java's mergesort is supposed to be optimized when array is partially sorted.
    2. quicksort. Java's quicksort is an optimized "dual-pivot" version that is supposedly faster than the single pivot version we learned.
    3. bstSort. Here you just insert the items into a binary search tree and then iterate through the tree, copying the data back into the array. (Can you explain why this is O(n log n)? It would be great if we could use your solution to EditorTrees, but EditorTrees don't sort their data like AVL trees do. Please use java's TreeSet instead. It won't be a perfect comparison if the array to be sorted has duplicates (since they sets throw out duplicates), but it will suffice. Be sure to resize the array to account for any duplicates that were thrown out (see FAQ below).
    Ones you provide:
    1. heapsort. done in part 1. Drag and drop BinaryHeap and BinaryHeapTest into your project. If Eclipse gives you an error in BinaryHeapTest, hover your mouse over the first error at the top of that file, and it will give suggestions as to how to fix it (you might need to click the error to see the suggestions). Choose the last option, "Fix Project Setup" and then add JUnit 4 to the build path in the next window.
    2. skiplist sort. Copy your SkipList class into this package, so you can do some minor modifications.
      1. First, implement the public void sort(T[] array) method that inserts all the array elements into your SkipList, and iterates through the skip list, updating the array to be sorted.
      2. Second, make the max height to be O(log(n)), where n is the maximum number of elements you want to sort using this algorithm. (Your elements will probably require a height of 3 * log n with probability 1/n2, so that's a pretty safe choice.) You can figure out this value of the array size you are using (say 10 000 000) and hard code it into your program. Better (although admittedly requiring more changes), let your program calculate the size from the number of elements you are sorting.
      3. Third, recognize that SkipLists have two constructors - one takes a seed parameter for the random number generator, so is completely predictable (for debugging) - use any fixed seed you like. If you omit the seed, it will be random. (Oracle's documentation says, "This constructor sets the seed of the random number generator to a value very likely to be distinct from any other invocation of this constructor." Pretty cryptic, but they could use the system time as a seed, for example.)
    3. quicksort. Implement a simple version like we did in class so we can see the effects of Java's optimizations. For simplicity, you can hardcode it for ints. Try to recall it from the concepts you remember from class, but feel free to refer to Weiss or other sources as needed. You may have to modify the given code to handle cases where the pivot repeats.
  3. Test your code using at least four kinds of arrays, so you can investigate sort performance under various conditions.

    1. Random order: We have given you tests for this.
    2. Random order, but guaranteed to have no duplicates. To create such an array, you can create an array such that a[i] = i, and shuffle it. The Collections class has a static method called shuffle that will efficiently and throughly shuffle the elements.
    3. Almost sorted. Start with a sorted array and perform 0.01*n swaps of pairs of elements in random positions (to "slightly" shuffle it). This will introduce some inversions.
    4. Almost sorted, but backwards. Just reverse the array from the previous step.

    Your array size should be large enough so that all of your sorts take at least 25 milliseconds. If you go much smaller than that, it will be hard to tell them apart. You can go much larger for most of the sorts.

Frequently Asked Questions

  1. Do I need to use both of AllBinaryHeapTest and AllBinaryHeapGenericContructorTest? Answer: No. Use the first if your heap implementation used ArrayLists and the second if it uses arrays. Comment out the other one.
  2. Java's TreeSet doesn't allow for duplicates? So what if I get them? Answer: reallocate the size of the array to be the size of the tree (which has no duplicates)so the duplicates are ignored. This doesn't remove emough elements to affact timing significantly (for example, I once had 2 duplicates when testing with n=50000 elements).
  3. The quicksort code you gave us in class doesn't handle the i and j walking out of bounds. What should I do? Answer: Modify the code to handle that case. Note that it doesn't handle the case where the pivot element is duplicated - you'll need to handle that too!
  4. I want to display array ar while debugging, but there is no ar.toString(). Do I need to write my own method to print the array? Answer: No. Instead, call Arrays.toString(ar)

Deliverables and grading outline

  1. (30 pts) Binary Heaps and heapsort, as discussed in that specification. Copy BinaryHeap into this project (you may submit one implementation per pair) and commit it to the repo. 
  2. (20 pts) Code for your other sorts, and your timing code. Each sort you provide (heapsort, skiplistsort, quicksort) should be a method the appropriate class (so BinaryHeap has a sort() method), and committed to the repo. For quicksort, sort() can be static since you don't need an instance of the class.
  3. (20 pts) A writeup giving a 2D table of sort times (sort vs. data). Discuss the performance of each algorithm. Did all of them run fine? Were there any memory issues? Which was fastest for each type of array? Was anything surprising or did you expect this? From what you know, try to explain what you see. Use the document here Analysis.docx as a template.