Programming Project 1: C Statement Counter
Objectives
- to learn C syntax
- to implement a finite state machine
- to use C I/O functions
Introduction
To paraphrase a former student, computer science faculty like to flex about how few lines of code they wrote to solve the same problems they've assigned to their students. Beyond that, a line count is a simple but useful tool that can be used to measure code complexity and effort.
But what is a "line of code" and how do you count them?
The Unix wc
utility will count lines of text in a file,
but the same code could
be formatted and commented in different ways with different line counts.
For example, wc
reports 1 line for the following code excerpt,
if (species == WAVYLEAF_BASKETBRASS) activate_sprayer(CLETHODIM);but eight for the following equivalent code.
/* check what species was recognized by the neural network */ if (species == WAVYLEAF_BASKETGRASS) { /* activate the Clethodim spray nozzle; Clethodim is reported as effective by Anna Bowen et al (https://doi.org/10.1017/inp.2020.22) */ activate_sprayer(CLETHODIM); }
A better approximation using standard command-line tools is to count
the number of lines containing semi-colons (;
)
and opening braces ({
):
(grep ";" statements.c; grep "{" statements.c) | wc -l
.
This will count each statement, since each statement in C ends with a
semi-colon, and will count other things that many would want to include
in "lines of code", such as loop and function headers – as long as
no line contains both a semi-colon and an opening brace (such lines are
counted twice by the command given above). But it will also count
semi-colons and opening braces that are enclosed in comments and literals,
which is probably not what we want when counting "lines of code".
For this assignment, you will write a C program that does somewhat better
than the approximation given above: your program will count each semi-colon
and opening brace, except for ocurrences of those two characters
that are inside a comment, a string or character
literal, or parentheses.
(Note that the last requirement is so headers of for
loops are
counted at most once, and that this specification is still not perfect,
but we will stick with it for this assignment.)
Assignment
Write a program called Statements
1 that reads C source code
from standard input, and writes a single line
to standard output containing the total count of
semi-colons and opening braces that are not inside C comments, string
or character literals, or parentheses.
The count should be written as a decimal number
with no spaces before or after (aside from the newline character that
terminates the output).
You may assume that what you read from standard input is valid C code
that would result in no errors when compiled with
gcc -c
(although there could be warnings),
provided that any required declarations were present in the included
header files.
The behavior of your program is undefined in other
cases, except it must "fail gracefully" – it must not crash or
go into an infinite loop2. In such cases, we ignore the output and
test only that your program does not crash or hang, so if your program
detects invalid input, it may print an error message or nothing at all
and terminate, or if it does not detect the invalid input then it may
continue to execute and output a nonsensical number, as long as it
does not crash or hang. (Note that terminating with an assert
is not considered failing gracefully – assert
s should
only be used for debugging and end users should never see the
results of them failing since the output is not helpful to them.)
1We're calling this program "Statements
" even though
it is intended to count things like function headers that are not
syntactically statements.
2Ideally, your programs would provide a useful error message so that users would know how to fix what went wrong, but for this course we will simply fail gracefully in such situations unless certain error messages are specifically required.
Other things to be aware of
- the draft C99 standard specifies what is valid C source code
- backslash (
\
) at the end of a line immediately before a newline character is a line-continuation character and should be ignored along with the following newline anywhere they occur together; the effect is as if they did not appear in the input - within string and character literals,
/*
and//
do not mark the beginning of a comment - quotes within comments have no special meaning and so need not be balanced and do not delimit anything
- you may assume that string and character literals are properly terminated
- you may assume that the character after an unescaped backslash within a string or character literal creates a valid escape sequence
- line-continuation (see above) is valid within string and character literals as well as in comments
- you need not do anything special for preprocessor directives; treat them like any other code
- the two-character sequence ("digraph") less-than followed by percent
(
<%
) should be treated the same as an opening brace outside of comments, string and character literals, and parentheses (not relevant to this assignment, but the corresponding digraph for a closing brace is%>
)3 - although you must handle digraphs, in particular
<%
as described above, you may assume there are no trigraphs in the input
3some keyboards, especially those on earlier systems, lacked a way to type braces; digraphs and trigraphs can be used in place of those impossible-to-type characters
Other Requirements
Your program must use only a constant amount of space (in other words, the amount of memory used must not vary with the amount of input read). So, for example, you may not use variable-length arrays, callmalloc
,
or create temporary files.
Example
If the input filehello.c
contains
#include <stdio.h> int main() { for (int i = 0; i < 10; i++) { printf("Hello, world!\n"); } }Then the execution of the program would be as follows.
$ ./Statements < hello.c 3
And if statements.c
contains your instructor's solution
to this assignment, then the execution would be
$ ./Statements < statements.c 84(although a better flex in this case would be how easy it was to modify the code for a different assignment to work for this assignment).
Submissions
Submit yourstatements.c
file through the submission system
as assignment 1. If you use a different filename, use multiple files,
or have other build requirements beyond what the makefile in
/c/cs223/hw1/Optional/makefile
does, then submit your
makefile too using the name makefile
(note the lower case m
). Also include a log
file.