CS 223: Data Structures. Instructor: Jim Aspnes

How To Use the Computing Facilities

Using the Zoo

Getting into the room

The Zoo is located on the third floor of Arthur K Watson Hall, toward the front of the building.

More information

More information about the Zoo can be found through its top-level web page.

Using Unix

Most people run Unix with a command-line interface provided by a shell. Each line typed to the shell tells it what program to run (the first word in the line) and what arguments to give it (remaining words). The interpretation of the arguments is up to the program.

Getting a shell prompt in the Zoo

When you log in to a Zoo node directly, you may not automatically get a shell window. If you use the default login environment (which puts you into the Gnome window manager), you can click on the foot in the middle of the toolbar to pop up a terminal emulator from which you can run emacs, gcc, and so forth.

Unix command-line programs

Most of what one does with Unix programs is manipulate the filesystem. Unix files are unstructured blobs of data whose names are given by paths consisting of a sequence of directory names separated by slashes: for example /home/accts/some-user/cs223/hw1.c. At any time you are in a current working directory (type pwd to find out what it is and cd new-directory to change it). When working on files in the current working directory you only have to give the part of the pathname after the last slash.

Here are some handy Unix commands:

man program will show you the on-line documentation for a program (e.g., try man man or man ls). Handy if you want to know what a program does. On Linux machines like the ones in the Zoo you can also get information using info program, which has an Emacs-like interface.
ls lists all the files in the current directory. Some useful variants:
mkdir dir will create a new directory in the current directory named dir.
rmdir dir deletes a directory. It only works on directories that contain no files.
cd dir changes the current working directory. With no arguments, cd changes back to your home directory.
pwd (``print working directory'') shows what your current directory is.
mv old-name new-name changes the name of a file. You can also use this to move files between directories.
cp old-name new-name makes a copy of a file.
rm file deletes a file. Deleted files cannot be recovered. Use this command carefully.
chmod changes the permissions on a file or directory. See the man page for the full details of how this works. Here are some common chmod's:
emacs, gcc, make, gdb, cvs
See next section.

Editing C programs

Writing C programs with Emacs

In order to write your programs you will need to use some sort of text editor. There are three reasonable text editors on Linux: Vi, Emacs, and gedit. My personal preference is for Vi, but almost everybody likes Emacs better.

To start Emacs, type emacs at the command line. If you are actually sitting at a Zoo node it should put up a new window. If not, Emacs will take over the current window. If you have never used Emacs before, you should immediately type C-h t (this means hold down the Control key, type h, then type t without holding down the Control key). This will pop you into the Emacs built-in tutorial.

My favorite Emacs commands

General note: C-x means hold down Control and press x; M-x means hold down Alt (Emacs calls it ``Meta'') and press x. For M-x you can also hit Esc and then x.

Get help. Everything you could possibly want to know about Emacs is available through this command. Some common versions: C-h t puts up the tutorial, C-h b lists every command available in the current mode, C-h k tells you what a particular sequence of keystrokes does, and C-h l tells you what the last 50 or so characters you typed were (handy if Emacs just garbled your file and you want to know what command to avoid in the future).
C-x u
Undo. Undoes the last change you made to the current buffer. Type it again to undo more things. A lifesaver. Note that it can only undo back to the time you first loaded the file into Emacs--- if you want to be able to back out of bigger changes, use CVS.
C-x C-s
Save. Saves changes to the current buffer out to its file on disk.
C-x C-f
Edit a different file.
C-x C-c
Quit out of Emacs. This will ask you if you want to save any buffers that have been modified. You probably want to answer yes (y) for each one, but you can answer no (n) if you changed some file inside Emacs but do not want the changes to appear in the file on the disk.
Go forward one character.
Go back one character.
Go to the next line.
Go to the previous line.
Go to the beginning of the line.
Kill the rest of the line starting with the current position. Useful Emacs idiom: C-a C-k.
``Yank.'' Get back what you just killed.
Re-indent the current line. In C mode this will indent the line according to Emacs's notion of how C should be indented.
M-x compile
Compile a program. This will ask you if you want to save out any unsaved buffers and then run a compile command of your choice (see the section on compiling programs below). The exciting thing about M-x compile is that if your program has errors in it, you can type C-x ` to jump to the next error, or at least where gcc thinks the next error is.

Using Vi instead of Emacs

If you don't find yourself liking Emacs very much, you might want to try Vim instead. Vim is a vastly enhanced reimplementation of the classic vi editor, which I personally find easier to use than Emacs. Type vimtutor to run the tutorial. You can always get out by hitting the Escape key a few times and then typing :qa!.

Compiling programs

Using gcc

A C program will typically consist of one or more files whose names end with .c. To compile foo.c, you can type gcc foo.c. Assuming foo.c contains no errors egregious enough to be detected by the extremely forgiving C compiler, this will produce a file named a.out that you can then execute by typing ./a.out.

If you want to debug your program using gdb or give it a different name, you will need to use a longer command line. Here's one that compiles foo.c to foo (run it using ./foo) and includes the information that gdb needs:

gcc -g3 -o foo foo.c

By default, gcc doesn't check everything that might be wrong with your program. But if you give it a few extra arguments, it will warn you about many (but not all) potential problems:

gcc -g3 -Wall -std=c99 -pedantic -o foo foo.c

Using make

For complicated programs involving multiple source files, you are probably better off using make than calling gcc directly. Make is a ``rule-based expert system'' that figures out how to compile programs given a little bit of information about their components.

For example, if you have a file called foo.c, try typing make foo and see what happens.

In general you will probably want to write a Makefile, which is named Makefile or makefile and tells make how to compile programs in the same directory. Here's a typical Makefile:

# Any line that starts with a sharp is a comment and is ignored
# by Make.

# These lines set variables that control make's default rules.
# We STRONGLY recommend putting "-Wall -ansi -pedantic" in your CFLAGS.
CFLAGS=-g3 -Wall -ansi -pedantic

# The next line is a dependency line.
# It says that if somebody types "make all"
# make must first make "hello-world".
# By default the left-hand-side of the first dependency is what you
# get if you just type "make" with no arguments.
all: hello-world

# How do we make hello-world?
# The dependency line says you need to first make hello-world.o
# and hello-library.o
hello-world: hello-world.o hello-library.o
	# Subsequent lines starting with a TAB character give
	# commands to execute.  Note the use of the CC and CFLAGS
	# variables.
	$(CC) $(CFLAGS) -o hello-world hello-world.o hello-library.o
	echo "I just built hello-world!  Hooray!"

# We can also declare that several things depend on one thing.
# Here we are saying that hello-world.o and hello-library.o
#  should be rebuilt whenever hello-library.h changes.
# There are no commands attached to this dependency line, so
#  make will have to figure out how to do that somewhere else
#  (probably from the builtin .c -> .o rule).
hello-world.o hello-library.o: hello-library.h

# Command lines can do more than just build things.  For example,
# "make test" will rebuild hello-world (if necessary) and then run it.
test: hello-world

# This lets you type "make clean" and get rid of anything you can
# rebuild.  The -f tells rm not to complain about files that aren't
# there.
	rm -f hello-world *.o

Given a Makefile, make looks at each dependency line and asks: (a) does the target on the left hand side exist, and (b) is it older than the files it depends on. If so, it looks for a set of commands for rebuilding the target, after first rebuilding any of the files it depends on; the commands it runs will be underneath some dependency line where the target appears on the left-hand side. It has built-in rules for doing common tasks like building .o files (which contain machine code) from .c files (which contain C source code). If you have a fake target like all above, it will try to rebuild everything all depends on because there is no file named all (one hopes).

Make gotchas

Make really really cares that the command lines start with a TAB character. TAB looks like eight spaces in Emacs and other editors, but it isn't the same thing. If you put eight spaces in (or a space and a TAB), Make will get horribly confused and give you an incomprehensible error message about a ``missing separator''. This misfeature is so scary that I avoided using make for years because I didn't understand what was going on. Don't fall into that trap--- make really is good for you, especially if you ever need to recompile a huge program when only a few source files have changed.

Debugging programs

Few programs do exactly what you expect on the first try. Sometimes it's not too hard to figure out why your program is misbehaving, but sometimes you have to look closely at what it's doing.

Let's look at a contrived example. Suppose you have the following program bogus.c:

/* Print the sum of the integers from 1 to 1000 */
main(int argc, char **argv)
    int i;
    int sum;

    sum = 0;
    for(i = 0; i -= 1000; i++) {
	sum += i;
    printf("%d\n", sum);
    return 0;

Let's compile and run it and see what happens:

$ gcc -g3 -o bogus bogus.c
$ ./bogus

That doesn't look like the sum of 1 to 1000. So what went wrong? If we were clever, we might notice that the test in the for loop is using the mysterious -= operator instead of the <= operator that we probably want. But let's suppose we're not so clever right now--- it's four in the morning, we've been working on bogus.c for twenty-nine straight hours, and there's a -= up there because in our befuddled condition we know in our bones that it's the right operator to use. We need somebody else to tell us that we are deluding ourselves, but nobody is around this time of night. So we'll have to see what we can get the computer to tell us.

The first thing to do is fire up gdb, the debugger. This runs our program in stop-motion, letting us step through it a piece at a time and watch what it is actually doing. In the example below gdb is run from the command line. You can also run it directly from Emacs with M-x gdb, which lets Emacs track and show you where your program is in the source file with a little arrow.

$ gdb bogus
GNU gdb with Linux/x86 hardware watchpoint and FPU support
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux"...
(gdb) run
Starting program: /home/accts/aspnes/tmp/bogus

Program exited normally.

So far we haven't learned anything. To see our program in action, we need to slow it down a bit. We'll stop it as soon as it enters main, and step through it one line at a time while having it print out the values of the variables.

(gdb) break main
Breakpoint 1 at 0x8048476: file bogus.c, line 9.
(gdb) run
Starting program: /home/accts/aspnes/tmp/bogus

Breakpoint 1, main (argc=1, argv=0xbffff9ac) at bogus.c:9
9           sum = 0;
(gdb) display sum
1: sum = 1
(gdb) n
10          for(i = 0; i -= 1000; i++)
1: sum = 0
(gdb) display i
2: i = 0
(gdb) n
11              sum += i;
2: i = -1000
1: sum = 0
(gdb) n
10          for(i = 0; i -= 1000; i++)
2: i = -1000
1: sum = -1000
(gdb) n
11              sum += i;
2: i = -1999
1: sum = -1000
(gdb) n
10          for(i = 0; i -= 1000; i++)
2: i = -1999
1: sum = -2999
(gdb) quit
The program is running.  Exit anyway? (y or n) y

Here we are using break main to tell the program to stop as soon as it enters main, display to tell it to show us the value of the variables i and sum whenever it stops, and n (short for next) to execute the program one line at a time.

When stepping through a program, gdb displays the line it will execute next as well as any variables you've told it to display. This means that any changes you see in the variables are the result of the previous displayed line. Bearing this in mind, we see that i drops from 0 to -1000 the very first time we hit the top of the for loop and drops to -1999 the next time. So something bad is happening in the top of that for loop, and if we squint at it a while we might begin to suspect that i -= 1000 is not the nice simple test we might have hoped it was.

My favorite gdb commands

Get a description of gdb's commands.
Runs your program. You can give it arguments that get passed in to your program just as if you had typed them to the shell. Also used to restart your program from the beginning if it is already running.
Leave gdb, killing your program if necessary.
Set a breakpoint, which is a place where gdb will automatically stop your program. Some examples:
Show part of your source file with line numbers (handy for figuring out where to put breakpoints). Examples:
Execute the next line of the program, including completing any procedure calls in that line.
Execute the next step of the program, which is either the next line if it contains no procedure calls, or the entry into the called procedure.
Continue until you get out of the current procedure (or hit a breakpoint). Useful for getting out of something you stepped into that you didn't want to step into.
(Or continue). Continue until (a) the end of the program, (b) a fatal error like a Segmentation Fault or Bus Error, or (c) a breakpoint. If you give it a numeric argument (e.g., cont 1000) it will skip over that many breakpoints before stopping.
Print the value of some expression, e.g. print i.
Like print, but runs automatically every time the program stops. Useful for watching values that change often.

Debugging strategies

In general, the idea behind debugging is that a bad program starts out sane, but after executing for a while it goes bananas. If you can find the exact moment in its execution where it first starts acting up, you can see exactly what piece of code is causing the problem and have a reasonably good chance of being able to fix it. So a typical debugging strategy is to put in a breakpoint (using break) somewhere before the insanity hits, ``instrument'' the program (using display) so that you can watch it going insane, and step through it (using next, step, or breakpoints and cont) until you find the point of failure. Sometimes this process requires restarting the program (using run) if you skip over this point without noticing it immediately.

For large or long-running programs, it often makes sense to do binary search to find the point of failure. Put in a breakpoint somewhere (say, on a function that is called many times or at the top of a major loop) and see what the state of the program is after going through the breakpoint 1000 times (using something like cont 1000). If it hasn't gone bonkers yet, try restarting and going through 2000 times. Eventually you bracket the error as occurring (for example) somewhere between the 4000th and 8000th occurrence of the breakpoint. Now try stepping through 6000 times; if the program is looking good, you know the error occurs somewhere between the 6000th and 8000th breakpoint. A dozen or so more experiments should be enough isolate the bug to a specific line of code.

The key to all debugging is knowing what your code is supposed to do. If you don't know this, you can't tell the lunatic who thinks he's Napoleon from lunatic who really is Napoleon. If you're confused about what your code is supposed to be doing, you need to figure out what exactly you want it to do. If you can figure that out, often it will be obvious what is going wrong. If it isn't obvious, you can always go back to gdb.

Fri 03 May 2002 23:06:16 EDT howto.tyx Copyright © 1998-2002 by Jim Aspnes