Programming Project 1: C Statement Counter

Objectives

Introduction

To paraphrase a former student, computer science faculty like to flex about how few lines of code they wrote to solve the same problems they've assigned to their students. Beyond that, a line count is a simple but useful tool that can be used to measure code complexity and effort.

But what is a "line of code" and how do you count them? The Unix wc utility will count lines of text in a file, but the same code could be formatted and commented in different ways with different line counts. For example, wc reports 1 line for the following code excerpt,

  if (species == WAVYLEAF_BASKETBRASS) activate_sprayer(CLETHODIM);
  
but eight for the following equivalent code.
    /* check what species was recognized by the neural network */
    if (species == WAVYLEAF_BASKETGRASS)
    {
      /* activate the Clethodim spray nozzle; Clethodim is reported as
         effective by Anna Bowen et al (https://doi.org/10.1017/inp.2020.22) */

      activate_sprayer(CLETHODIM);
    }
  

A better approximation using standard command-line tools is to count the number of lines containing semi-colons (;) and opening braces ({): (grep ";" statements.c; grep "{" statements.c) | wc -l. This will count each statement, since each statement in C ends with a semi-colon, and will count other things that many would want to include in "lines of code", such as loop and function headers – as long as no line contains both a semi-colon and an opening brace (such lines are counted twice by the command given above). But it will also count semi-colons and opening braces that are enclosed in comments and literals, which is probably not what we want when counting "lines of code".

For this assignment, you will write a C program that does somewhat better than the approximation given above: your program will count each semi-colon and opening brace, except for ocurrences of those two characters that are inside a comment, a string or character literal, or parentheses. (Note that the last requirement is so headers of for loops are counted at most once, and that this specification is still not perfect, but we will stick with it for this assignment.)

Assignment

Write a program called Statements1 that reads C source code from standard input, and writes a single line to standard output containing the total count of semi-colons and opening braces that are not inside C comments, string or character literals, or parentheses. The count should be written as a decimal number with no spaces before or after (aside from the newline character that terminates the output). You may assume that what you read from standard input is valid C code that would result in no errors when compiled with gcc -c (although there could be warnings), provided that any required declarations were present in the included header files. The behavior of your program is undefined in other cases, except it must "fail gracefully" – it must not crash or go into an infinite loop2. In such cases, we ignore the output and test only that your program does not crash or hang, so if your program detects invalid input, it may print an error message or nothing at all and terminate, or if it does not detect the invalid input then it may continue to execute and output a nonsensical number, as long as it does not crash or hang. (Note that terminating with an assert is not considered failing gracefully – asserts should only be used for debugging and end users should never see the results of them failing since the output is not helpful to them.)

1We're calling this program "Statements" even though it is intended to count things like function headers that are not syntactically statements.

2Ideally, your programs would provide a useful error message so that users would know how to fix what went wrong, but for this course we will simply fail gracefully in such situations unless certain error messages are specifically required.

Other things to be aware of

3some keyboards, especially those on earlier systems, lacked a way to type braces; digraphs and trigraphs can be used in place of those impossible-to-type characters

Other Requirements

Your program must use only a constant amount of space (in other words, the amount of memory used must not vary with the amount of input read). So, for example, you may not use variable-length arrays, call malloc, or create temporary files.

Example

If the input file hello.c contains
#include <stdio.h>

int main()
{
  for (int i = 0; i < 10; i++)
    {
      printf("Hello, world!\n");
    }
}
Then the execution of the program would be as follows.
$ ./Statements < hello.c
3

And if statements.c contains your instructor's solution to this assignment, then the execution would be

$ ./Statements < statements.c
84
(although a better flex in this case would be how easy it was to modify the code for a different assignment to work for this assignment).

Submissions

Submit your statements.c file through the submission system as assignment 1. If you use a different filename, use multiple files, or have other build requirements beyond what the makefile in /c/cs223/hw1/Optional/makefile does, then submit your makefile too using the name makefile (note the lower case m). Also include a log file.