Assignment 2: Travelling Salesperson
Objectives
- to process command-line arguments
- to use strings
- to use arrays
- to use files
- to use a provided library
Introduction
The Travelling Salesperson Problem (TSP) is the problem of, given a list of locations and the distances between every pair of them, determining the shortest tour that starts and ends at the same city and visits each other city exactly once. It is a very common problem in logistics: consider, for example, a delivery driver who needs to make stops at a list of 50 addresses and wants to minimize fuel use along the way by minimizing the total distance travelled while visiting those 50 addresses.
Maps generated by the Great Circle Mapper -copyright © Karl L. Swartz
TSP is considered to be a hard problem to solve exactly – the decision problem version (determining if there exists a tour with total distance no greater than some upper limit) is NP-complete. NP-complete problems are a set of decision problems for which there is no known efficient (polynomial-time) solution, no known proof that no such polynomial solution exists; furthermore, if someone finds an efficient algorithm for an NP-complete problem, that algorithm could be used to solve all NP-complete problems efficiently.
However, there are some heuristics – simple rules to apply when choosing how to build or refine a tour – that can quickly produce solutions that often good enough for use. For this assignment, you will implement two of those heuristics.
Overview
Write a program calledTSP
that produces tours of
cities by applying various heuristic algorithms.
We will run the TSP
program like
(base) [jrg94@tick code]$ ./TSP east_coast_8.dat -given -insert nearest -insert farthest ALB MHT BDL ORH PVD HVN -given : 805.78 ALB MHT BDL ORH PVD HVN ALB -insert nearest : 721.47 ALB BDL HVN PVD MHT ORH ALB -insert farthest : 685.37 ALB MHT ORH PVD HVN BDL ALBwhere the command line arguments that are passed as the parameters to
main
are
- the name of a file containing the locations
(latitude and longitude) of
possible cities in a format like
ALB,42.74916667,-73.80194444 BDL,41.93916667,-72.68333333 BWI,39.17527778,-76.66833333 HVN,41.26388889,-72.88694444 MHT,42.93277778,-71.43583333 ORH,42.26722222,-71.87555556 PVD,41.72388889,-71.42833333 SAV,32.1275,-81.20222222
- a list of heuristics to apply; and
- the list of cities to include in the tour.
The heuristics are as follows.
-
-given
, which creates the tour by starting at the first city on the command line and continuing with the cities in the order they appear on the command line, and then returning to the first city. -
-insert nearest
and-insert farthest
, which create the tour by starting with a 2-city subtour (a tour of some subset of the cities) containing only the pair of cities that are nearest or farthest from each other, then repeatedly selecting the nearest/farthest remaining city to the subtour and inserting it at the point in the tour that minimizes the total length of the tour after the insertion, breaking ties arbitrarily. When considering the distance of a city $c$ to a subtour, take the distance from $c$ to the closest city already in the subtour.
For an example of the -insert
algorithms, see
the -insert nearest
example, and
the -insert farthest
example.
Details
Command-line Arguments
Ifmain
is declared as
int main(int argc, char* argv[])then
argc
is the number of command-line arguments,
including the name of the program, and argv
is an
array of pointers to strings containing the command line arguments.
So for the execution of the program above,
(base) [jrg94@tick code]$ ./TSP east_coast_8.dat -given -insert nearest -insert farthest ALB MHT BDL ORH PVD HVN
argc
is 13, argv[0]
points to the
first character of the string ./TSP
,
argv[1]
points to the first character of the
string east_coast_8.dat
,
argv[2]
points to the first character of the
string -given
,
and so on until
argv[12]
points to the first character of the
string HVN
.
Location data file
- the name of the file containing the database of cities and their geographic coordinates (latitude and longitude) will be the first command-line argument
- each line of that file will contain a comma-separated list giving the name of or code for a city followed by its latitude and longitude
- the city names/codes are case-sensitive
- the city names/codes can contain any character except newlines and commas and will not begin with a leading hyphen
- cities are listed in the file in increasing order by name/code,
with order determined according to the
strcmp
function - there may be cities in the database file that are not part of the requested tour
- latitudes will be between -90.0 and 90.0 inclusive and longitudes will be between -180.0 and 180.0 inclusive
- the latitude and longitude of each city will be given in decimal degrees
with no leading whitespace in a format readable by
fscanf
with the%lf
format specifier using theen_US.UTF-8
locale (the default on the Zoo) - each longitude will be followed immediately by a newline character
Heuristics
- the names of the heuristics to use, if any, follow the name of the location data file and come before all city names
- the first command-line argument
after the location data filename that does not start with a hyphen
and is not a modifier of the previous heuristic
(
nearest
orfarthest
after-insert
) is the first city name - the heuristics may be listed in any order and may appear more than once
- the names of the heuristics are case-sensitive
- for determining nearest/farthest for the
-insert
heuristics, distances should be calculated with thelocation_distance
function in thelocation
library that is provided as header and implementation files in/c/cs223/hw2/Required
that you must compile and link with your code.
City Names
- all command-line arguments after the first city name are other city names
- there will be at least two city names/codes given on the command line
- the same rules for names/codes that apply to the location data file apply to the cities on the command line
- there will be no repeats among the cities on the command line
- each city given on the command line will appear in the location data file
- you may assume that there are no more than 32768 city names/codes on the command line; your program will not be tested on larger inputs
Output
- when the input is valid, the output must be written to standard output
must consist of one line per heuristic giving the heuristic
(including
nearest
orfarthest
for the-insert
heuristic), the total distance in kilometers to two places after the decimal point (as computed by summing the values returned bylocation_distance
for each leg), and the tour, starting with the first city in the input and ending with that city again - output for multiple heuristics must be given in the order the heuristics appeared on the command line; repeated heuristics will have repeated output according to that order
- each line of output must be followed by a single newline character
- the colon is in column 19 (counting from 1)
- each city name/code has a single space character before it
- there is no space after the last city name/code
- the decimal point is in column 30, but switch to scientific notation when the total distance does not fit the space provided (with whatever spacing you see fit)
- for other details, see the example and public test cases
- if no heuristics are given on the command line but the command-line arguments and input file are otherwise valid then there must be no output.
Note that it is possible that correct programs may end up with different orders of cities depending on how they break ties. In may cases, those different orders are really the same tour, but with different starting points and different directions: HVN-BDL-ALB-MHT-ORH-PVD-HVN, HVN-PVD-ORH-MHT-ALB-BDL-HVN, and MHT-ORH-PVD-HVN-BDL-ALB-MHT all have the same total distance and all look the same when drawn on a map. In order to have the same output from programs that broke ties differently, reorder the final tour by finding the location of the first input city in the sequence, starting there and proceeding through the rest of the tour, wrapping around to the beginning once at the end, and returning to the first input city, and then reversing the tour if the city before the return to the starting point came before the second city in the tour in the list of command-line arguments.
For the above
three examples, applying those rules will result in HVN-BDL-ALB-MHT-ORH-PVD-HVN
if the cities were given in the order HVN ORH ALB BDL MHT PVD
on the command line: the first example remains the same since BDL comes before
on the command line; the second example is reversed because PVD comes after
BDL on the command line; and the third example is rewritten to start with
HVN because HVN comes first on the command line (and then is the same
as the first example).
Error Messages
For certain invalid inputs (command-line arguments or contents of the input file), exactly one of the following error messages should be output to standard error followed by a single newline (and it does not matter what is written to standard output):TSP: missing filename
when there are no command-line arguments;TSP: could not open FILENAME
when there was an error opening the file whose name is given as the first command-line argument (replace FILENAME with the name of the file);TSP: too few cities
when the number of cities given on the command line is less than two;TSP: could not find location of CITY
when one of the cities on the command line is not present in the location database file (replace CITY with the name of the first such city according tostrcmp
order);TSP: invalid heuristic arguments
for any problem with the heuristic names in the command-line arguments
Efficiency
For full credit, your program must execute in the following time and space bounds when only a single heuristic is selected. For purposes of the time bounds, we count each call tolocation_distance
,
malloc
, free
, and
printf
to print a city name as constant time, and
a call to qsort
as $O(n \log n)$. The space bounds
apply to additional space used by your program – the memory
occupied by the command-line arguments and the disk space used by
the location data file do not count.
-
-given
: $O(n \log n + p + k)$ time, $O(n)$ space -
-insert
: $O(n^3 + p + k)$ time, $O(n^2)$ space
Time and space bounds are checked empirically – the grading scripts have CPU time, machine instruction, and or memory limits that may vary with the size of the input. Solutions that meet the asymptotic bounds but have large multiplicative constants will also fail these empirical tests, and in those cases you must reduce your constant factors in order to pass the tests.
Testing
Your program must not produce any errors when run withvalgrind
, regardless
of whether the input is valid or not.
Distances of routes between airports (a convenient source of
three-character labels for cities) can be checked
at gcmap.com. Beware that the distances
computed by gcmap may disagree with the expected values calculated
from the input files because the databases may choose different points
on airport property to represent the airports, and because of slight
differences in the model of the surface of the Earth used.
Submissions
Submit your makefile, log file, and source code necessary to build the executableTSP
using your makefile. Do not submit the provided
location
library.
Your makefile should assume that the files for the location library have been copied into
the current directory; they may not still exist in their original locations when the final grading
scripts run.