Programming Project 1: C Comment Analyzer

Objectives

Introduction

A large part of software development is maintenance – revising existing code to incorporate new features or to fix defects. Programmers often must maintain code written long ago by someone who is no longer around to help. In such cases, comments are essential for program comprehension. There are tools such as Javadoc and doxygen that process programs commented in a particular format that uses tags to specify what is being documented in each section of the comment (for example, which parts of the comment describe the parameters and which parts describe the return value). These tools generate web pages with documentation for the programs (this is especially useful in documenting APIs that other people will use). Could it be that there is a correlation between the number and type of Javadoc/doxygen-style tags and program comprehension?

You will develop a tool to extract Javadoc/doxygen-style tags from comments in valid C source code as the first part of this testing this hypothesis (we will leave measuring program comprehension to someone else).

Assignment

Write a program called Comments that reads C source code and writes the Javadoc/doxygen-style tags contained in the top-level comments – a comment that is not in a C code block delimited by { and }. Each tag should be followed by a newline character and there must be no other output. Your program should operate in one of two modes: either all tags in top-level comments are output, or only the tags at the beginning of a line in such comments. Which mode your program operates in is determined by a command-line argument: -a for all tags, and -l for leading tags (those at the beginning of a line), with leading tags being the default if neither -a nor -l is given. Input and output are by default read from and written to standard input and standard output. This may be changed with up to one occurrence each of the -i and -o command-line switches. In each case, the argument following the switch is the name of the file to read from or write to (which may not be the same physical file after resolving path names and symbolic links). If the output file already exists, its existing contents are overwritten. If a file cannot be opened or there is any other error reading from or writing to a file then your program must "fail gracefully" – it must not crash or go into an infinite loop (note that terminating with an assert is not considered failing gracefully – asserts should only be used for debugging and end users should never see the results of them failing since the output is not helpful to them).

How to determine what is a tag

A tag is a sequence of characters that starts with an at symbol (@) that is preceded by the beginning of the comment, whitespace (as determined by the isspace function), or an asterisk that is part of the beginning of a comment or line (see below); the tag continues to (but not including) the next whitespace character or the end of the comment. For example,
    // *** *** @this_tag_is_considered_to_be_at_the_beginning_of_a_line
    //@this_is_a_tag
    //@this_is_a_single@tag
  

How to determine which tags are at the beginning of a line

A tag is considered to be at the beginning of a line if it is the first thing on the line after any sequence of whitespace characters and asterisks (*). The beginning of a comment is considered to be the beginning of a line regardless of whether the comment itself is at the beginning of a line. For example,
    int x; /* @this_tag_is_considered_to_be_at_the_beginning_of_a_line. */
  

How to process command-line arguments

Normally, a command-line argument that starts with a hyphen (-) is interpreted as a switch. But a command-line argument after an -i or -o that is interpreted as a switch should be interpreted as a filename and never as a switch (and the precedence is left to right). (Hint: what edge cases does this disambiguate?)

Other things to be aware of

What to do when the input is not as specified

If the input is not as specified in any way then the only requirement is that your program fails gracefully. Only that and not the output will be checked. (Ideally, your programs would provide a useful error message so that users would know how to fix what went wrong, but for this course we will simply fail gracefully in such situations unless certain error messages are specifically required.)

Other Requirements

Your program must use only a constant amount of space (in other words, the amount of memory used must not vary with the amount of input read). So, for example, you may not use variable-length arrays or call malloc.

Example

If the input file input_0.txt contains
/* @Comment This is a single-line C comment */
#include <stdio.h>
/******
 * This is a nicely formatted
 * multi-line comment.
 * @author Jim Glenn
 * @version 0.1
 ******/

int main(int argc, char **argv)
{
  // @misplaced This is a C++ comment.
}

// This is a function with a C++-style comment
// @param x an integer @see limits.h
int add_one(int x)
{
  return x + 1;
}
Then the execution of the program would be as follows.
$ ./Comments < input_0.txt
@Comment
@author
@version
@param

Submissions

Submit your comments.c file through the submission system as assignment 91. If you use a different filename, use multiple files, or have other build requirements beyond what the makefile in /c/cs223/hw91/Optional/makefile does, then submit your makefile too using the name makefile (note the lower case m). Also include a log file.