Exercise 3: Mergesort

Objectives

Assignment

We return to our implementation of heuristics for the Travelling Salesperson Problem. Instead of giving the coordinates of the cities in the input file, we now have a cities module that includes a list of cities and their coordinates and a function find_city that, given the code for a city, returns its coordinates. And instead of storing cities' coordinates in separate arrays for the latitudes and longitudes, we store them in location structs defined in the location module.

Code

Modify your tsp.c program to use the cities and location modules. The cities module must be initialized by calling initialize_city_database at the beginning of main.

Refactor (rewrite and/or reorganize to improve without changing the behavior) main and read_input so that 1) instead of having separate arrays to hold latitudes and longitudes, there is a single array that holds a location struct for each city; and 2) instead of reading latitudes and longitudes from the input file, read_input calls find_city from the cities module to get the coordinates and save them in the array of locations. You will also have to change the code that initializes the array of distances to pass the appropriate locations from the array of locations to the location_distance function from the location module. (You may optionally take this a step further and store all the data in an array of city structs as defined in cities.h, but you will need to use dynamic memory allocation to make space for the strings inside the structs.)

At this point, your program should run as it did before. But the array that holds the city codes and their locations is unsorted, and so find_city has to use a slow sequential search in order to find each city. If that array were sorted, we could use binary search in find_city.

So modify initialize_cities_database so that it sorts the cities array. You can pattern your code off of the implementation in the reading, modifying it so it works with arrays of location structs instead of ints. You can use statically allocated temporary arrays instead of dynamically allocated arrays.

You may want to test your merge and merge_sort functions independently of the rest of the program. You can use something like the code shown below in the "Unit Tests" section to test merge. We leave it to you to write a unit test for merge_sort if necessary.

The sequential search in find_city will still work with the array after it has been sorted by merege_sort. But change it to implement binary search for a more efficient solution.

Unit Tests and Efficiency Tests

In addition to testing the TSP executable, we will test some of the functions individually using the unit tests in /c/cs223/hw3/Required/cities_unit.c. The starter makefile in /c/cs223/hw3/Optional/makefile builds a second executable called Unit that runs the unit tests. You can examine cities_unit.c to see what the unit tests do.

The Unit executable also tests the efficiency of your sort and search functions (initialize_city_database and find_city to determine whether they run in worst-case time $O(n \log n)$ and $O(\log n)$ respectively. The efficiency tests are empirical and so may give both false positives (your code passes the tests but does not run within the asymptotic time bound; perhaps you have exceptionally good constants for an $O(n^2)$ implementation when the goal is $O(n \log n)$) and false negatives (your code fails the tests but does in fact meet the asymptotic time bound; perhaps you have an $O(n \log n)$ implementation with exceptionally high constants). We will allow ourselves to be fooled by false positives. But you should fix false negatives because constants do matter – if your solution runs in at most $100n \log n$ instructions but it is possible to reduce that to $10n \log n$ with reasonable effort, then you should make that effort!

Although the test scripts do not have a unit test for merge and merge_sort, you may find it helpful to have such tests (and also for any other functions you write). You can start with the following test for merge and modify it for other test cases.

void test_merge()
{
  // locations don't matter for testing merge, so using 0.0
  city us[] = {
    {"ABE", {0.0, 0.0}},
    {"CHX", {0.0, 0.0}},
    {"DEN", {0.0, 0.0}},
    {"TLH", {0.0, 0.0}}
  };
  int us_count = sizeof(us) / sizeof(city);
  city asia[] = {
    {"DAC", {0.0, 0.0}},
    {"DAM", {0.0, 0.0}},
    {"NRT", {0.0, 0.0}},
    {"VVO", {0.0, 0.0}}
  };
  int asia_count = sizeof(asia) / sizeof(city);
  int total_count = us_count + asia_count;
  
  city output[total_count];

  // EXPECTED: both lists output in sorted order
  merge(us_count, us, asia_count, asia, output);
  printf("MERGED US AND ASIA\n");
  for (int i = 0; i < total_count; i++)
    {
      printf("%s\n", output[i].name);
    }

  merge(asia_count, asia, us_count, us, output);
  printf("MERGED ASIA AND US\n");
  for (int i = 0; i < total_count; i++)
    {
      printf("%s\n", output[i].name);
    }
}

Examples and Output

If sea_6_dal.in contains
8
SEA
YVR
MCI
STL
BWI
DCA
IAH
DAL
then the output of running your program should be
(base) [jrg94@newt code]$ ./TSP -given sea_6_dal.in
  -given:   9306.22 SEA YVR MCI STL BWI DCA IAH DAL SEA
(base) [jrg94@newt code]$ ./TSP -nearest sea_6_dal.in
-nearest:  10067.11 SEA YVR MCI STL DAL IAH DCA BWI SEA
(base) [jrg94@newt code]$ ./TSP -insert sea_6_dal.in
 -insert:   9320.21 SEA DAL IAH BWI DCA STL MCI YVR SEA

Submissions

Submit your modified tsp.c and cities.c files, any other files you created, a makefile with the TSP executable as the default target and Unit as an additional target, and your log as assignment number 3. You need not submit the location module, the cities.h header file, or the cities_unit.c unit tests (and if you do, they will be ignored).