Assignment 5: k-d Tree

Objectives

Introduction

wavyleaf basketgrass Sirex woodwasp
Images from 1) Wikipedia user keisotyo, licensed under CC BY-SA 3.0 and 2) David R. Lance, USDA APHIS PPQ, Bugwood.org

Imagine an invasive species tracking app that allows users to submit the geographic coordinates of the locations of invasive species. Monitoring organizations could then receive alerts when species they are tracking show up in new locations. But when invasive species are present in frequently-visited locations, there will be many reports around those locations, so the alert-generating system should have some way of filtering reports that are near other reports. To do so efficiently, it could make use of a data structure that allows for efficient nearest neighbor queries. When a new report comes in, the app would then query the data structure for the nearest neighbor to the new point among all other points previously recorded. If the distance to the nearest neighbor is above a certain threshold, then the location could be considered a "new" location for the invasive species, and resources could be sent to that new location to attempt to eradicate the invaders.

A kd-tree is a data structure that allows for efficient nearest neighbor queries. A kd-tree is a binary search tree that rotates which coordinate it compares at each level of the tree. So for 2-dimensional data, the root might compare on x-coordinate and then the children of the root would compare on y-coordinate, then grandchildren of the root on x-coordinate again, and so on.

Assignment

Complete the implementation of the functions described in kdtree.h from the /c/cs223/hw5/Required/ directory. The functions define a Set-of-2D-Points ADT with the ability to find the nearest neighbor among points in the set to a given point. Note that in order to keep the math simple, our points will be 2-D Cartesian coordinates rather than geographic coordinates with latitudes and longitudes (although you should consider what would have to change in your implementation to support geographic points, assuming a suitable version of the location module were provided).

The functions should run in the following time bounds (worst-case except as noted).

In these bounds, $n$ is the number of points in the set. Implementations with a $O(n \log n)$ expected time for kdtree_create (under the same assumptions as for kdtree_nearest_neighbor) will receive much more partial credit than $O(n^2)$ worst-case implemetations. All time bounds assume that malloc and free run in $O(1)$ time.

Proper use of your implementation must not result in valgrind errors.

Your implementation should work with the main function in kdtree_unit.c also found in /c/cs223/hw5/Required/ along with the location module. There is a makefile in /c/cs223/hw5/Optional that builds an executable called Unit. The Unit executable runs a number of different unit tests and you can select one by passing its index on the command line. See the code for the unit tests for more details about each one. There is no additional executable to write for this assignment.

Example

[jrg94@rattlesnake ~]$ for N in `seq 1 14`; do echo === TEST $N ===; ./Unit $N | sort -n; done
=== TEST 1 ===
PASSED
=== TEST 2 ===
PASSED
=== TEST 3 ===
PASSED
=== TEST 4 ===
PASSED
=== TEST 5 ===
PASSED
=== TEST 6 ===
=== TEST 7 ===
15.071 -163.957
15.357 -157.807
16.698 -164.924
17.079 -162.024
18.085 -162.655
18.863 -163.994
19.097 -160.178
19.353 -162.740
19.712 -161.068
20.376 -162.999
21.833 -161.960
22.189 -157.489
22.215 -158.356
22.589 -157.028
22.858 -158.387
22.998 -156.598
23.420 -162.370
23.858 -157.537
24.797 -157.050
24.904 -164.680
=== TEST 8 ===
Distance: inf
=== TEST 9 ===
Distance: 0.000000
Point: 22.588670 -157.028378
=== TEST 10 ===
Distance: 6.324555
Point: 1.000000 1.000000
=== TEST 11 ===
Distance: 6.324555
Point: 1.000000 1.000000
=== TEST 12 ===
PASSED
=== TEST 13 ===
Distance: 0.000000
Point: 22.588670 -157.028378
=== TEST 14 ===
PASSED

Submissions

Submit your source code necessary to build the executable using the provided makefile and supporting code. There is no need to submit the provided makefile if you did not change it. You should not submit the files for the location module, kdtree_unit.c, or kdtree.h and you may not change those files. Your makefile can assume that those supporting files have been copied into the current directory, but should not assume they are still in their original locations.