Assignment 2: Travelling Salesperson

Objectives

Introduction

The Travelling Salesperson Problem (TSP) is the problem of, given a list of locations and the distances between every pair of them, determining the shortest tour that starts and ends at the same city and visits each other city exactly once. It is a very common problem in logistics: consider, for example, a delivery driver who needs to make stops at a list of 50 addresses and wants to minimize fuel use along the way by minimizing the total distance travelled while visiting those 50 addresses.

unconnected cities ALB, BDL, HVN, MHT, ORH, PVD

tour ALB-MHT-BDL-ORH-PVD-HVN-ALB tour ALB-ORH-MHT-PVD-HVN-BDL-ALB tour ALB-MHT-PVD-ORH-BDL-HVN-ALB

6 cities and 3 possible tours with total lengths 806km, 721km, and 685km
Maps generated by the Great Circle Mapper -copyright © Karl L. Swartz

TSP is considered to be a hard problem to solve exactly – the decision problem version (determining if there exists a tour with total distance no greater than some upper limit) is NP-complete. NP-complete problems are a set of decision problems for which there is no known efficient (polynomial-time) solution, no known proof that no such polynomial solution exists; furthermore, if someone finds an efficient algorithm for an NP-complete problem, that algorithm could be used to solve all NP-complete problems efficiently.

However, there are some heuristics – simple rules to apply when choosing how to build or refine a tour – that can quickly produce solutions that often good enough for use. For this assignment, you will implement two of those heuristics.

Overview

Write a program called TSP that produces tours of cities by applying various heuristic algorithms.

We will run the TSP program like

(base) [jrg94@tick code]$ ./TSP east_coast_8.dat -given -insert nearest -insert farthest ALB MHT BDL ORH PVD HVN
-given            :       805.78 ALB MHT BDL ORH PVD HVN ALB
-insert nearest   :       721.47 ALB BDL HVN PVD MHT ORH ALB
-insert farthest  :       685.37 ALB MHT ORH PVD HVN BDL ALB
  
where the command line arguments that are passed as the parameters to main are

The heuristics are as follows.

For an example of the -insert algorithms, see the -insert nearest example, and the -insert farthest example.

Details

Command-line Arguments

If main is declared as
  int main(int argc, char* argv[])
then argc is the number of command-line arguments, including the name of the program, and argv is an array of pointers to strings containing the command line arguments. So for the execution of the program above,
(base) [jrg94@tick code]$ ./TSP east_coast_8.dat -given -insert nearest -insert farthest ALB MHT BDL ORH PVD HVN
argc is 13, argv[0] points to the first character of the string ./TSP, argv[1] points to the first character of the string east_coast_8.dat, argv[2] points to the first character of the string -given, and so on until argv[12] points to the first character of the string HVN.

Location data file

Heuristics

City Names

Output

Note that it is possible that correct programs may end up with different orders of cities depending on how they break ties. In may cases, those different orders are really the same tour, but with different starting points and different directions: HVN-BDL-ALB-MHT-ORH-PVD-HVN, HVN-PVD-ORH-MHT-ALB-BDL-HVN, and MHT-ORH-PVD-HVN-BDL-ALB-MHT all have the same total distance and all look the same when drawn on a map. In order to have the same output from programs that broke ties differently, reorder the final tour by finding the location of the first input city in the sequence, starting there and proceeding through the rest of the tour, wrapping around to the beginning once at the end, and returning to the first input city, and then reversing the tour if the city before the return to the starting point came before the second city in the tour in the list of command-line arguments.

For the above three examples, applying those rules will result in HVN-BDL-ALB-MHT-ORH-PVD-HVN if the cities were given in the order HVN ORH ALB BDL MHT PVD on the command line: the first example remains the same since BDL comes before on the command line; the second example is reversed because PVD comes after BDL on the command line; and the third example is rewritten to start with HVN because HVN comes first on the command line (and then is the same as the first example).

Error Messages

For certain invalid inputs (command-line arguments or contents of the input file), exactly one of the following error messages should be output to standard error followed by a single newline (and it does not matter what is written to standard output): When more than one error message is appropriate, output the one that appears first in the list above. For other invalid input, your program must not crash or hang, but the output can be anything or nothing.

Efficiency

For full credit, your program must execute in the following time and space bounds when only a single heuristic is selected. For purposes of the time bounds, we count each call to location_distance, malloc, free, and printf to print a city name as constant time, and a call to qsort as $O(n \log n)$. The space bounds apply to additional space used by your program – the memory occupied by the command-line arguments and the disk space used by the location data file do not count. where $n$ is the number of cities given on the command line, $k$ is the total number of characters in the city names/codes on the command line, and $p$ is the number of characters in the location data file.

Time and space bounds are checked empirically – the grading scripts have CPU time, machine instruction, and or memory limits that may vary with the size of the input. Solutions that meet the asymptotic bounds but have large multiplicative constants will also fail these empirical tests, and in those cases you must reduce your constant factors in order to pass the tests.

Testing

Your program must not produce any errors when run with valgrind, regardless of whether the input is valid or not. Distances of routes between airports (a convenient source of three-character labels for cities) can be checked at gcmap.com. Beware that the distances computed by gcmap may disagree with the expected values calculated from the input files because the databases may choose different points on airport property to represent the airports, and because of slight differences in the model of the surface of the Earth used.

Submissions

Submit your makefile, log file, and source code necessary to build the executable TSP using your makefile. Do not submit the provided location library. Your makefile should assume that the files for the location library have been copied into the current directory; they may not still exist in their original locations when the final grading scripts run.