Jul 20, 2017 by
May 30, 2017 at 12am - Jul 20, 2017 at 11:59pm
about 2 months
This assignment was locked Jul 20, 2017 at 11:59pm.
The instructor is very much interested in offering CMSC420 again in the near future. To that end, he is very interested in new projects! The projects have to be in the form of a library, so paramount unit-testing should be effectuated and presented. Interested parties should agree to an AFL, GNU GPL or other kind of software license which will allow source code sharing with the instructor, for future use in CMSC420. The following are topics of interest to the instructor:
Tries vs Patricia Tries, with a big emphasis on time improvements for search / insertion / delete. Variations between classic Array-based and De la Briandais (list-based implementation of nodes) tries should be possible and tune-able by the user. Possible expansion: comparison with suffix arrays.
String matching with naive shifting algorithm vs KMP vs Boyer-Moore. For some alphabets, can put Rubin-Karp in the mix. Efficiency analysis paramount; can we have any way to reliably test on efficiency with Java across platforms?
Region Quadtrees (RQs) for compressed image description. Both RQ implementations (4-ary tree with collapsible subtrees, binary search tree over the cell's Morton Codes) should be comparable for various operations. Unit tests should be available that test both implementations.
MinHash for document similarity. Testing of data structure on various similarity metrics (Jaccard, weighted Hamming, weighted Euclidean, other semantic metrics from NLP). Unit tests plus qualitative outputs of documents: how are they similar? What are the failure modes of our similarity metric? What of the efficiency of the algorithm?
All projects will have to be implemented in Java and come with a unit testing framework in jUnit4 or 5. No java 8-specific constructs are required, but compatibility with at least Java 7 would be helpful (and probably trivial to achieve).
You've already rated students with this rubric. Any major changes could affect their assessment results.