Programming Project 1: C Comment Analyzer
Objectives
- to learn C syntax
- to implement a finite state machine
- to use basic C string functions
- to read from and write to files
- to work with command-line arguments
Introduction
A large part of software development is maintenance – revising existing code to incorporate new features or to fix defects. Programmers often must maintain code written long ago by someone who is no longer around to help. In such cases, comments are essential for program comprehension. There are tools such as Javadoc and doxygen that process programs commented in a particular format that uses tags to specify what is being documented in each section of the comment (for example, which parts of the comment describe the parameters and which parts describe the return value). These tools generate web pages with documentation for the programs (this is especially useful in documenting APIs that other people will use). Could it be that there is a correlation between the number and type of Javadoc/doxygen-style tags and program comprehension?
You will develop a tool to extract Javadoc/doxygen-style tags from comments in valid C source code as the first part of this testing this hypothesis (we will leave measuring program comprehension to someone else).
Assignment
Write a program called Comments
that reads C source code
and writes the Javadoc/doxygen-style tags contained in
the top-level comments –
a comment that is not in a C code block delimited by {
and }
. Each tag should be followed by a newline character
and there must be no other output.
Your program should operate in one of two modes: either all tags
in top-level comments are output,
or only the tags at the beginning of a line in such comments.
Which mode your program operates in is determined by a
command-line argument: -a
for all tags, and
-l
for leading tags (those at the beginning of a line),
with leading tags being the default if neither -a
nor
-l
is given.
Input and output are by default read from and written to
standard input and standard output.
This may be changed with up to one occurrence each of the
-i
and -o
command-line
switches. In each case, the argument following the switch is the name
of the file to read from or write to (which may not be the
same physical file after resolving path names and symbolic links).
If the output file already exists, its existing contents are overwritten.
If a file cannot be opened or
there is any other error reading from or writing to a file then your
program must "fail gracefully" – it must not crash or
go into an infinite loop (note that terminating with an assert
is not considered failing gracefully – assert
s should
only be used for debugging and end users should never see the
results of them failing since the output is not helpful to them).
How to determine what is a tag
A tag is a sequence of characters that starts with an at symbol (@
) that is preceded by the
beginning of the comment, whitespace (as determined by
the isspace
function), or an asterisk that is part of
the beginning of a comment or line (see below); the tag continues to
(but not including) the next whitespace character or the end of the comment.
For example,
// *** *** @this_tag_is_considered_to_be_at_the_beginning_of_a_line //@this_is_a_tag //@this_is_a_single@tag
How to determine which tags are at the beginning of a line
A tag is considered to be at the beginning of a line if it is the first thing on the line after any sequence of whitespace characters and asterisks (*
). The beginning of a comment is considered
to be the beginning of a line regardless of whether the comment itself
is at the beginning of a line. For example,
int x; /* @this_tag_is_considered_to_be_at_the_beginning_of_a_line. */
How to process command-line arguments
Normally, a command-line argument that starts with a hyphen (-
) is interpreted as a switch.
But a command-line argument after an -i
or -o
that is interpreted as a switch should be interpreted as a filename and never
as a switch (and the precedence is left to right). (Hint: what
edge cases does this disambiguate?)
Other things to be aware of
- the draft C99 standard specifies what is valid C source code
- backslash (
\
) at the end of a line immediately before a newline character is a line-continuation character and should be ignored along with the following newline; for all purposes consider those two characters to not appear in the input - within string and character literals,
/*
and//
do not mark the beginning of a comment - quotes within comments have no special meaning and so need not be balanced and do not delimit anything
- you may assume that string and character literals are properly terminated
- you may assume that the character after an unescaped backslash within a string or character literal creates a valid escape sequence
- line-continuation (see above) is valid within string and character literals as well as in comments
- you need not do anything special for preprocessor directives; treat them like any other code
- you may assume there are no trigraphs in the input
What to do when the input is not as specified
If the input is not as specified in any way then the only requirement is that your program fails gracefully. Only that and not the output will be checked. (Ideally, your programs would provide a useful error message so that users would know how to fix what went wrong, but for this course we will simply fail gracefully in such situations unless certain error messages are specifically required.)Other Requirements
Your program must use only a constant amount of space (in other words, the amount of memory used must not vary with the amount of input read). So, for example, you may not use variable-length arrays or callmalloc
.
Example
If the input fileinput_0.txt
contains
/* @Comment This is a single-line C comment */ #include <stdio.h> /****** * This is a nicely formatted * multi-line comment. * @author Jim Glenn * @version 0.1 ******/ int main(int argc, char **argv) { // @misplaced This is a C++ comment. } // This is a function with a C++-style comment // @param x an integer @see limits.h int add_one(int x) { return x + 1; }Then the execution of the program would be as follows.
$ ./Comments < input_0.txt @Comment @author @version @param
Submissions
Submit yourcomments.c
file through the submission system
as assignment 91. If you use a different filename, use multiple files,
or have other build requirements beyond what the makefile in
/c/cs223/hw91/Optional/makefile
does, then submit your
makefile too using the name makefile
(note the lower case m
). Also include a log
file.