Assignment 1: Parsing GPX Files

Objectives

Introduction

Global Positioning System devices are used in a wide variety of applications, from ride hailing services to fitness tracking to exergaming to migratory animal tracking to illegal fishing monitoring. Many GPS devices record a GPS track, which is a record of the position of the device over time. GPS tracks are often stores in GPX files, which are a specific kind of XML file. For this assignment, we will process GPX files to extract the location information for easier use in other applications.

Assignment

Write a program called ParseGPX that extracts GPS track information from GPX files read from standard input.

GPX, and XML in general, has many features, but for this assignment we will assume that the files we're processing are somewhat more restricted than general GPX/XML files.

In general, an XML file contains a prolog followed by XML elements. XML elements are delimited by start and end tags (we will not consider the special syntax for empty elements), where each tag starts with a < character and ends with a > character. Between those delimiters, a start tag contains a sequence of non-whitespace characters giving the tag type, and optionally a sequence of unique attributes, separated from the tag type by whitespace and given as a whitespace-separated list of attribute names followed by an equal sign (=) and a quoted attribute value. The end tags have only a tag type, preceded by a forward slash (/). XML elements may contain text or other XML elements. Elements must nest properly, so any child elements must have their start tag and end tag both inside their parent element. The prolog of an XML file contains tag-like items that start and end with <? and ?> or <! and !>

A GPX file is a specific kind of XML file with specific kinds of elements structured in a particular way. In particular, a GPX file contains trkpt elements that have a lat and lon attributes whose values give the latitude and longitude of the tracked object at some point in time. The trkpt elements contain ele and time elements whose text gives the elevation of the object and the time of the measurements respectively.

Our task is to extract the values of the lat and lon attributes of the trkpt elements along with the text of the ele and time elements contained in the trkpt elements: for each trkpt, output a comma-separated list of the lat and lon values and ele and time text in that order. The contents of each piece of data should be copied verbatim to the output, except that the quotes at the beginning and end of the value must be removed, and, since the time elements' text may contain commas, we must escape those commas by replacing them with &comma; in the output. The data for each trkpt should be written to standard output, one trkpt per line, with a newline at the end of each, and no other output.

Your program's output will be tested on inputs that obey these rules (note that our inputs will follow the rules below and not the official GPX standard, so our specification is much more permissive than the official standard and you should not assume that rules from the official GPX standard carry over to our specification unless specifically listed below):

We do relax the XML specification in one way: we consider tag types to be not case-sensitive. So, for example, we want to extract trkpt, TRKPT, tRkPt elements, and the start and end tags don't have to match case, so, for example, a <ELE> start tag could be ended with a </ele> end tag.

Your program will also be tested on inputs that do not obey those rules, and in such cases the criteria for passing a test is simply whether your program ran to completion without crashing or going into an infinite loop; the output can be anything or nothing in these cases (although ideally your program would detect violations of the rules and output an appropriate error message -- it is often better to detect malformed input and abort processing rather than continuing execution with unexpected and possibly dangerous consequences).

Additional requirements

Example

If the input is
<?xml version="1.0" encoding="UTF-8"?>
<gpx creator="StravaGPX" version="1.1" xmlns="http://www.topografix.com/GPX/1/1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd">
 <metadata>
  <time>2018-08-24T13:49:45Z</time>
 </metadata>
 <trk>
  <name>Morning Ride</name>
  <type>1</type>
  <trkseg>
   <trkpt lat="41.3078680" lon="-72.9342120">
    <ele>20.0</ele>
    <time>2018-08-24T13:49:45Z</time>
   </trkpt>
   <trkpt lat="41.3078680" lon="-72.9342120">
    <ele>20.0</ele>
    <time>2018-08-24T13:49:46Z</time>
   </trkpt>
   <trkpt lat="41.3078810" lon="-72.9342590">
    <ele>20.0</ele>
    <time>2018-08-24T13:49:49Z</time>
   </trkpt>
  </trkseg>
 </trk>
</gpx>
  
then the output must be
41.3078680,-72.9342120,20.0,2018-08-24T13:49:45Z
41.3078680,-72.9342120,20.0,2018-08-24T13:49:46Z
41.3078810,-72.9342590,20.0,2018-08-24T13:49:49Z
  

Submissions

Submit your source code, a makefile that produces an executable called ParseGPX as its default target, and your log.