Exercise 2: Travelling Salesperson
Objectives
- to use arrays
- to use files
- to use string
- to use command-line arguments
Introduction
The Travelling Salesperson Problem is the problem of, given a set of locations and the distances between every pair of them, determining the shortest tour that starts and ends at the same city and visits each other city exactly once. It is a very common problem in logistics: consider, for example, a delivery driver who needs to make stops at a list of 50 addresses and wants to minimize fuel use along the way.Assignment
Write a program called TSP
that calculates routes between
cities by various methods. The name of the file containing the cities
and their locations and the method to use to calculate
the routes will be given as command-line arguments.
For each method, the output should be the method used,
the total distance, and the route.
The method to construct the tours (if any) will be given as the first
command-line argument. In all cases, distances
should be calculated with the location_distance
function
in the distance
module that is provided as header and
implementation files in /c/cs223/hw2/Required
that you can
compile and link with your code.
-
-given
, in which case the route should be constructed by taking the cities in the order they appear in the input file, then back to the first city. -
-nearest
, in which case the route should be constructed by starting with the first city listed in the input file and then repeatedly adding the city among the remaining cities that is closest by distance to the last city added (with ties broken arbitrarily) until all cities have been added, then returning to the first city. -insert
, in which case the route should be constructed by starting with a 2-city tour containing the nearest pair of cities, then repeatedly selecting the nearest remaining city to any city already in the tour (minimize over all pairs of cities $(i, j)$ the distance from $i$ to $j$, where $i$ is a city not yet in the tour and $j$ is a city already in the tour). Once a city is selected, insert it at the point in the tour that minimizes the total length of the tour after that step. Break ties arbitrarily both when selecting the city to insert and when selecting the insertion point. Finally, reorder the tour so that it starts and ends at the first city in the input, and then reverse it if necessary so that the second city in the tour is alphabetically before the penultimate city. (None of these reorderings will affect the real total length of the tour – BZN-CAK-ABE-DCA-ELH-BZN has the same length as ABE-CAK-BZN-ELH-DCA-ABE, and it is the latter that should be output if ABE was the starting city. Because of the imprecision of floating point arithmetic, there may be differences of fractions of millimeters in the computed total lengths of the original tour and the one you output, but you should ignore those differences.)
Code
Start with the code in /c/cs223/hw2/Optional/tsp.c
,
which declares some functions that you can define to complete
the assignment. There is also a module
/c/cs223/hw2/Required/distance.c
with a corresponding
header file that contains a function that will compute the great circle
distance between two points on Earth given their latitudes and
longitudes.
Complete the code to read the input file first. In main
,
check that there are the required number of command-line arguments.
If there are not, print an error message and exit the
program. Open the
file whose name is given by the second command-line argument for
reading; if this fails then print an error message and exit.
Use fscanf
to read the first line of the file –
the number of cities in the input. If that is not at least two
then print an error message and exit; otherwise declare three arrays: one
to hold a three-character string for the code of each city,
one to hold the latitude of each city, and one to hold the longitudes.
Pass the file, the number of cities, and the arrays to
read_input
to read the input file.
In read_input
, loop over the cities to read from
the file, and in each iteration of the loop read the city code,
and the latitude or longitude. If there was a problem reading
the data or if the coordinates are out of range (ranges are -90 to 90
inclusive for latitudes and -180 to 180 inclusive for longitudes)
then return the number of cities read successfully
(main
can then use that count to determine
if the all the cities were read in). You may assume
that there are no more than 500 cities, the city codes
are three non-space characters,
that the codes are distinct within the file, and that
the coordinates define distinct points (so each pole may
only appear once despite having an infinite number of possible
representations as (latitude, longitude) pairs);
if this is not the case then the only requirement is that your
program does not crash or hang.
Back in main
, if read_input
didn't read
all the cities, then print an error message and exit.
This would be a good time to check that you're reading the data
correctly by printing the values read_input
stored in the arrays.
Once you're satisfied that you are reading the input
correctly,
add code to check the value of the first command-line argument
and perform the necessary calculations for the selected method
(and if the first argument is not one of the three methods then
print an error message and exit). In all three cases it may
be helpful to keep track of the tour in terms of the indices of
the cities in the arrays rather than trying to rearrange those
arrays. For example, if the input contains
ACC, LOS, JNB, CAI, NBO, ADD in that order, then the tour
CAI-ADD-NBO-ACC-LOS-JNB-CAI would be represented by an array
[3, 5, 4, 1, 2, 0]
.
For the -given
case, simply fill the tour array
with $0, 1, 2, \ldots$, call the calculate_total
function to compute the total distance of the tour
(you will need to complete that function too), and
output the result along with the tour.
The remaining two methods compute the distances between many
pairs of cities, and may need the distance between the same pair
more than once. The location_distance
function is very
slow, so it will speed up our program if we compute the distances
between every pair and store them in a 2-D array for later use.
For the -nearest
case, pass the city count,
the uninitialized tour array, and the 2-D array of distances to
the route_nearest
function.
The route_nearest
function can start by initializing
the tour to $0, 1, 2, \ldots$. It will need to keep track of
the partially constructed tour and the indices of the cities
that haven't been added yet. The tour array can serve both purposes:
the first portion of the array can contain the partial tour and
the rest can contain, in some order, the cities that haven't been
added yet. At the beginning, the first element is the partial tour
and the rest haven't been added. The rest
of the function then repeatedly finds the minimum distance
between the last city in the partial tour and the rest of the
cities and swaps the
index of the selected city into place to grow the partial tour.
Back in main
, after calling route_nearest
,
call calculate_total
and output the results.
(Optional) After testing the code for -nearest
, complete the
case for -insert
by passing the number of cities,
the uninitialized tour array, and the distance array to
route_insert
. You can keep track of the
partial tour and the cities that haven't been added yet in the same
way as in the route_nearest
function, except instead
of initializing the partial tour to [0], you initialize it with
the indices of the two cities that are closest together. Then,
on each step, select the closest city among those not yet in the tour
to those already in the tour, find the best insertion point
(hint: it may be easier to minimize the distance added to
the current tour than the total distance of the resulting
tour), and
update the partial tour and the cities not added yet. Finally,
in main
call (and write the code for) normalize
to
reorder the tour as specified.
As usual, if the input (both the file and the command-line arguments) does not conform to this specification then the only requirement is that your program does not crash or hang. The suggestions to print an error message and exit the program will satisfy that condition for the corresponding violations, but there may be other cases that are not specifically mentioned here.
Examples and Output
Ifsea_6_dal.in
contains
8 SEA 47.450000 -122.311667 YVR 49.194722 -123.183889 MCI 39.297500 -94.713889 STL 38.748611 -90.370000 BWI 39.175278 -76.668333 DCA 38.851944 -77.037778 IAH 29.984444 -95.341389 DAL 32.847222 -96.851667then the output of running your program should be
(base) [jrg94@newt code]$ ./TSP -given sea_6_dal.in -given: 9306.22 SEA YVR MCI STL BWI DCA IAH DAL SEA (base) [jrg94@newt code]$ ./TSP -nearest sea_6_dal.in -nearest: 10067.11 SEA YVR MCI STL DAL IAH DCA BWI SEA (base) [jrg94@newt code]$ ./TSP -insert sea_6_dal.in -insert: 9320.21 SEA DAL IAH BWI DCA STL MCI YVR SEANote that the method is right justified in 8 characters, the distance is given with two digits after the decimal point and 7 digits before (padded with spaces) and there is a single space before each city code and no space at the end of the line.
Submissions
Submit your completedtsp.c
file, any other files you
created, a makefile with the TSP
executable as the default
target, and your log as assignment number 2. You need not submit
the distance
module (and if you do, it will be ignored).