Assignment 5: Quadtree

Objectives

Introduction

wavyleaf basketgrass Sirex woodwasp
Images from 1) Wikipedia user keisotyo, licensed under CC BY-SA 3.0 and 2) David R. Lance, USDA APHIS PPQ, Bugwood.org

Imagine an invasive species tracking app that allows users to submit the geographic coordinates of the locations of invasive species. Monitoring organizations could then receive alerts when species they are tracking show up in new locations. But when invasive species are present in frequently-visited locations, there will be many reports around those locations, so the alert-generating system should have some way of filtering reports that are near other reports. To do so efficiently, it could make use of a data structure that allows for efficient nearest neighbor queries. When a new report comes in, the app would then query the data structure for the nearest neighbor to the new point among all other points previously recorded. If the distance to the nearest neighbor is above a certain threshold, then the location could be considered a "new" location for the invasive species, and resources could be sent to that new location to attempt to eradicate the invaders.

A quadtree is a data structure that allows for efficient nearest neighbor queries in 2 dimensions. A quadtree is a tree in which each node contains a point and has up to four children. In a binary search tree, all the values in a node's left subtree are less than the value in the node, and all the values in the right subtree are greater. But comparing both coordinates of two distinct 2-D points $(x_1, y_1)$ and $(x_2, y_2)$ yields four possible results: $x_2 \le x_1$ and $y_2 \ge y_1$; $x_2 \le x_1$ and $y_2 \lt y_1$; $x_2 \gt x_1$ and $y_2 \ge y_1$, or $x_2 \gt x_1$ and $y_2 \lt y_1$. Those four results correspond to the children of a node in a quadtree: a node's northwest child is the root of a quadtree containing points that have smaller x-coordinates and larger (or equal) y-coordinates; the southwest child is the root of a subtree that contains points with smaller x- and y-coordinates, and similarly for the southeast child and northeast child. Searching for a point and adding a point to a quadtree are then like searching and adding in a binary search tree, except you must determine which of four branches to follow after each comparison.

Assignment

Complete the implementation of the functions described in pointset.h from the /c/cs223/hw5/Required/ directory. The functions define a Set-of-2D-Points ADT with the ability to find the nearest neighbor and $k$ nearest neighbors of a query point among points in the set. Note that in order to keep the math simple, our points will be 2-D Cartesian coordinates rather than geographic coordinates with latitudes and longitudes; there is a module defining a non-opaque point2d struct and related functions in /c/cs223/hw5/Required. (Although you should consider what would have to change in your implementation to support geographic points, assuming a suitable version of the point2d was available.)

The functions should run in the following time bounds.

In addition, there will be time bounds on pointset_nearest_neighbor and pointset_k_nearest set based on experiments with efficient implementations with points uniformly randomly distributed over a rectangular region.

In these bounds, $n$ is the number of points in the set. Implementations with a $O(n \log n)$ expected time for pointsete_create (under the same assumptions about the initial set of points as for pointset_add) will receive much more partial credit than $O(n^2)$ worst-case implemetations. All time bounds assume that malloc and free run in $O(1)$ time.

Proper use of your implementation of pointset must not result in valgrind errors. Your implementation will not be tested with inputs that violate the preconditions of the functions, and you may assume that malloc never fails. There may, however, be tests for which recursive implementations of pointset_add, pointset_contains, pointset_for_each, pointset_nearest_neighbor, and pointset_destroy may run out of stack space.

Your implementation should work with the main function in pointset_unit.c also found in /c/cs223/hw5/Required/ along with the point2d module. Your makefile must use those files and your implementation of the pointset module to build an executable called PointsetUnit. There is no additional executable to write for this assignment, so PointsetUnit is the name of the executable for testit.

Submissions

Submit your source code necessary to build the unit test executable using the given supporting code. You should not submit the files for the point2d module, pointset_unit.c, or pointset.h and you may not change those files. Your makefile should assume that those supporting files have been copied into the current directory, but should not assume they are still in their original locations.