CS 223: Data Structures. Instructor: Jim Aspnes
To get an account in the Zoo, follow the instructions on the on-line signup form. You will need your NetID and password to sign up for an account.
Even if you already have an account, you still need to use this form to register as a CS 223 student, or you will not be able to submit assignments.
The Zoo is located on the third floor of Arthur K Watson Hall, toward the front of the building. You will need to get your ID validated (go to the basement of AKW) to get in after hours.
More information about the Zoo can be found through its top-level web page.
Most people run Unix with a command-line interface provided by a shell. Each line typed to the shell tells it what program to run (the first word in the line) and what arguments to give it (remaining words). The interpretation of the arguments is up to the program.
When you log in to a Zoo node directly, you may not automatically get a shell window. If you use the default login environment (which puts you into the KDE window manager), you need to click on the picture of the display with a shell in from of it in the toolbar at the bottom of the screen. If you run Gnome instead (you can change your startup environment using the popup menu in the login box), you can click on the foot in the middle of the toolbar. Either approach will pop up a terminal emulator from which you can run emacs, gcc, and so forth.
Most of what one does with Unix programs is manipulate the filesystem. Unix files are unstructured blobs of data whose names are given by paths consisting of a sequence of directory names separated by slashes: for example /home/accts/some-user/cs223/hw1.c. At any time you are in a current working directory (type pwd to find out what it is and cd new-directory to change it). When working on files in the current working directory you only have to give the part of the pathname after the last slash.
Here are some handy Unix commands:
In order to write your programs you will need to use some sort of text editor. There are two reasonable text editors on Linux: Vi and Emacs. My personal preference is for Vi, but almost everybody likes Emacs better.
To start Emacs, type emacs at the command line. If you are actually sitting at a Zoo node it should put up a new window. If not, Emacs will take over the current window. If you have never used Emacs before, you should immediately type C-h t (this means hold down the Control key, type h, then type t without holding down the Control key). This will pop you into the Emacs built-in tutorial.
General note: C-x means hold down Control and press x; M-x means hold down Alt (Emacs calls it ``Meta'') and press x. For M-x you can also hit Esc and then x.
If you don't find yourself liking Emacs very much, you might want to try Vim instead. Vim is a vastly enhanced reimplementation of the classic vi editor, which I personally find easier to use than Emacs. Type vimtutor to run the tutorial. You can always get out by hitting the Escape key a few times and then typing :qa!.
A C program will typically consist of one or more files whose names end with .c. To compile foo.c, you can type gcc foo.c. Assuming foo.c contains no errors egregious enough to be detected by the extremely forgiving C compiler, this will produce a file named a.out that you can then execute by typing ./a.out.
If you want to debug your program using gdb or give it a different name, you will need to use a longer command line. Here's one that compiles foo.c to foo (run it using ./foo) and includes the information that gdb needs:
gcc -g3 -o foo foo.c
By default, gcc doesn't check everything that might be wrong with your program. But if you give it a few extra arguments, it will warn you about many (but not all) potential problems:
gcc -g3 -Wall -ansi -pedantic -o foo foo.c
For complicated programs involving multiple source files, you are probably better off using make than calling gcc directly. Make is a ``rule-based expert system'' that figures out how to compile programs given a little bit of information about their components.
For example, if you have a file called foo.c, try typing make foo and see what happens.
In general you will probably want to write a Makefile, which is named Makefile or makefile and tells make how to compile programs in the same directory. Here's a typical Makefile:
# Any line that starts with a sharp is a comment and is ignored # by Make. # These lines set variables that control make's default rules. # We STRONGLY recommend putting "-Wall -ansi -pedantic" in your CFLAGS. CC=gcc CFLAGS=-g3 -Wall -ansi -pedantic # The next line is a dependency line. # It says that if somebody types "make all" # make must first make "hello-world". # By default the left-hand-side of the first dependency is what you # get if you just type "make" with no arguments. all: hello-world # How do we make hello-world? # The dependency line says you need to first make hello-world.o # and hello-library.o hello-world: hello-world.o hello-library.o # Subsequent lines starting with a TAB character give # commands to execute. Note the use of the CC and CFLAGS # variables. $(CC) $(CFLAGS) -o hello-world hello-world.o hello-library.o echo "I just built hello-world! Hooray!" # We can also declare that several things depend on one thing. # Here we are saying that hello-world.o and hello-library.o # should be rebuilt whenever hello-library.h changes. # There are no commands attached to this dependency line, so # make will have to figure out how to do that somewhere else # (probably from the builtin .c -> .o rule). hello-world.o hello-library.o: hello-library.h # Command lines can do more than just build things. For example, # "make test" will rebuild hello-world (if necessary) and then run it. test: hello-world ./hello-world # This lets you type "make clean" and get rid of anything you can # rebuild. The -f tells rm not to complain about files that aren't # there. clean: rm -f hello-world *.o
Given a Makefile, make looks at each dependency line and asks: (a) does the target on the left hand side exist, and (b) is it older than the files it depends on. If so, it looks for a set of commands for rebuilding the target, after first rebuilding any of the files it depends on; the commands it runs will be underneath some dependency line where the target appears on the left-hand side. It has built-in rules for doing common tasks like building .o files (which contain machine code) from .c files (which contain C source code). If you have a fake target like all above, it will try to rebuild everything all depends on because there is no file named all (one hopes).
Make really really cares that the command lines start with a TAB character. TAB looks like eight spaces in Emacs and other editors, but it isn't the same thing. If you put eight spaces in (or a space and a TAB), Make will get horribly confused and give you an incomprehensible error message about a ``missing separator''. This misfeature is so scary that I avoided using make for years because I didn't understand what was going on. Don't fall into that trap--- make really is good for you, especially if you ever need to recompile a huge program when only a few source files have changed.
Few programs do exactly what you expect on the first try. Sometimes it's not too hard to figure out why your program is misbehaving, but sometimes you have to look closely at what it's doing.
Let's look at a contrived example. Suppose you have the following program bogus.c:
/* Print the sum of the integers from 1 to 1000 */
int
main(int argc, char **argv)
{
int i;
int sum;
sum = 0;
for(i = 0; i -= 1000; i++) {
sum += i;
}
printf("%d\n", sum);
return 0;
}
Let's compile and run it and see what happens:
$ gcc -g3 -o bogus bogus.c $ ./bogus -34394132 $
That doesn't look like the sum of 1 to 1000. So what went wrong? If we were clever, we might notice that the test in the for loop is using the mysterious -= operator instead of the <= operator that we probably want. But let's suppose we're not so clever right now--- it's four in the morning, we've been working on bogus.c for twenty-nine straight hours, and there's a -= up there because in our befuddled condition we know in our bones that it's the right operator to use. We need somebody else to tell us that we are deluding ourselves, but nobody is around this time of night. So we'll have to see what we can get the computer to tell us.
The first thing to do is fire up gdb, the debugger. This runs our program in stop-motion, letting us step through it a piece at a time and watch what it is actually doing. In the example below gdb is run from the command line. You can also run it directly from Emacs with M-x gdb, which lets Emacs track and show you where your program is in the source file with a little arrow.
$ gdb bogus GNU gdb 4.17.0.4 with Linux/x86 hardware watchpoint and FPU support Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux"... (gdb) run Starting program: /home/accts/aspnes/tmp/bogus -34394132 Program exited normally.
So far we haven't learned anything. To see our program in action, we need to slow it down a bit. We'll stop it as soon as it enters main, and step through it one line at a time while having it print out the values of the variables.
(gdb) break main Breakpoint 1 at 0x8048476: file bogus.c, line 9. (gdb) run Starting program: /home/accts/aspnes/tmp/bogus Breakpoint 1, main (argc=1, argv=0xbffff9ac) at bogus.c:9 9 sum = 0; (gdb) display sum 1: sum = 1 (gdb) n 10 for(i = 0; i -= 1000; i++) 1: sum = 0 (gdb) display i 2: i = 0 (gdb) n 11 sum += i; 2: i = -1000 1: sum = 0 (gdb) n 10 for(i = 0; i -= 1000; i++) 2: i = -1000 1: sum = -1000 (gdb) n 11 sum += i; 2: i = -1999 1: sum = -1000 (gdb) n 10 for(i = 0; i -= 1000; i++) 2: i = -1999 1: sum = -2999 (gdb) quit The program is running. Exit anyway? (y or n) y $
Here we are using break main to tell the program to stop as soon as it enters main, display to tell it to show us the value of the variables i and sum whenever it stops, and n (short for next) to execute the program one line at a time.
When stepping through a program, gdb displays the line it will execute next as well as any variables you've told it to display. This means that any changes you see in the variables are the result of the previous displayed line. Bearing this in mind, we see that i drops from 0 to -1000 the very first time we hit the top of the for loop and drops to -1999 the next time. So something bad is happening in the top of that for loop, and if we squint at it a while we might begin to suspect that i -= 1000 is not the nice simple test we might have hoped it was.
In general, the idea behind debugging is that a bad program starts out sane, but after executing for a while it goes bananas. If you can find the exact moment in its execution where it first starts acting up, you can see exactly what piece of code is causing the problem and have a reasonably good chance of being able to fix it. So a typical debugging strategy is to put in a breakpoint (using break) somewhere before the insanity hits, ``instrument'' the program (using display) so that you can watch it going insane, and step through it (using next, step, or breakpoints and cont) until you find the point of failure. Sometimes this process requires restarting the program (using run) if you skip over this point without noticing it immediately.
For large or long-running programs, it often makes sense to do binary search to find the point of failure. Put in a breakpoint somewhere (say, on a function that is called many times or at the top of a major loop) and see what the state of the program is after going through the breakpoint 1000 times (using something like cont 1000). If it hasn't gone bonkers yet, try restarting and going through 2000 times. Eventually you bracket the error as occurring (for example) somewhere between the 4000th and 8000th occurrence of the breakpoint. Now try stepping through 6000 times; if the program is looking good, you know the error occurs somewhere between the 6000th and 8000th breakpoint. A dozen or so more experiments should be enough isolate the bug to a specific line of code.
The key to all debugging is knowing what your code is supposed to do. If you don't know this, you can't tell the lunatic who thinks he's Napoleon from lunatic who really is Napoleon. If you're confused about what your code is supposed to be doing, you need to figure out what exactly you want it to do. If you can figure that out, often it will be obvious what is going wrong. If it isn't obvious, you can always go back to gdb.
When you are programming, you will make mistakes. If you program long enough, these will eventually include true acts of boneheadedness like accidentally deleting all of your source files. You are also likely to spend some of your time trying out things that don't work, at the end of which you'd like to go back to the last version of your program that did work. All these problems can be solved by using a version control system.
We recommend using CVS, the Concurrent Versions System. CVS has a number of very complicated features designed to allow large teams of programmers to work together on a project without clobbering each other's code, but the main thing that it does is allow you to keep around a history of every significant change that you have made to your files.
CVS stores a master copy of each of your files in a repository, a special directory somewhere that you should probably never touch directly. You can tell CVS where to find the repository by setting the CVSROOT environment variable; if you are using bash as your shell, the command for doing this would be something like this:
export CVSROOT=/c/cs223/class/your-name/cvsroot
This will tell CVS where you want to put the repository; it's probably best if the directory you specify does not already exist. To avoid having to set this variable all the time, you should put the command in your ~/.bashrc file, which contains commands that are executed every time bash starts.
To initialize CVS's internal data, run
cvs init
This should create your CVSROOT directory and populate it with various bookkeeping files that CVS uses for its own nefarious purposes (which will in fact be in a subdirectory that is also, confusingly, called CVSROOT). Assuming everything went right, you now have an empty CVS repository.
In order to use your new CVS repository, you need to create at least one module, which is what CVS calls a subdirectory of its root directory. There is a right way and a wrong way to do this:
To edit the files in a module, you must first check it out. This is true even if you just imported it from somewhere (in fact, once you do an import, the directory you were in is useless to CVS). Change to some convenient directory and type:
cvs checkout hw1
This should create a new directory hw1 with whatever files you imported when you created the module, along with a subdirectory called CVS. Don't change anything in the CVS subdirectory.
You now have up to three copies of your directory. There's the original directory you did the cvs import in, there's the directory in the repository, and there's the directory you just checked out. You can safely throw away the original directory--- you don't need it any more. You can also safely throw away the checked-out directory--- you can get it back at any time by running cvs checkout again. If you never damage your repository, your module is now immortal. But it's probably nearly empty, so let's talk about how to add and edit files.
You can edit files in the checked-out directory normally. These changes do not affect the repository until you check them back in, using cvs commit. During the commit you will be asked for a log message that describes the changes you made.
A special case of editing is adding or removing files. To add a file to the module, create the file (say, foo.c), and run cvs add foo.c. This tells CVS you would like to add the file to the repository, but doesn't actually do it yet. To remove a file, use cvs remove -f foo.c; this will delete the file from the checked-out directory, but has no immediate effect on the repository. In both cases, the repository is only updated when you run cvs commit.
You can also type cvs add subdirectory to add an existing subdirectory (which you should create first using mkdir). This takes effect immediately.
The command cvs log filename (or just cvs log to get all files) will show you a list of all the different versions you have checked in to the repository, along with their log messages. If you want to know when particular lines in the current file showed up, try cvs annotate filename. As you can see, CVS keeps every version of every file you ever check in, forever. This is very useful, since it means that you can always go back and see what you did.
If you want to get an old version back, do
cvs update -r[version-number] filename
where [version-number] will be something like 1.3. Warning: this will throw away any changes that you haven't checked in yet. You can also use dates:
cvs update -Dyesterday somefile cvs update -D"01 March 1993"
After you do the update, your checked-out files will be in a special ``sticky'' state that will prevent you from doing new commits (after all, you can't change the past). To get all your files back to the present, use cvs update -A.
If you want to undo a bunch of checked-in changes, the easiest way to do it is probably to use cvs update -D to go back to the previous version you liked, copy the files somewhere else, do cvs update -A, and then copy the files back and do a commit. It's also possible to do this more directly using the -j flag to cvs update, but that approach can often go wrong in confusing ways.
If you want to undo some changes that you haven't checked in yet, just delete the offending files and run cvs update. I often use this trick when I realize that I didn't want to do what I just did, to go back to the last (hopefully stable) commit.
cvs diff
Change a few files around and then try it! Some people like the output from cvs diff -c better.
Change to the parent directory and type cvs release -d directory-name. This is safer than just deleting it, because it will check to see if anything still needs to be checked in to the repository. (You can always tell it to go ahead and delete the directory anyway.)
For more information about CVS, type info cvs, or, better yet, see the book Open Source Development With CVS. You can find a local copy of the on-line version of this book at http://www.cs.yale.edu/~aspnes/cvsbook.html.
The submit command is found in /c/cs223/bin on the Zoo. Here is the documentation (taken directly from the Perl script itself):
submit assignment-number file(s)
unsubmit assignment-number file(s)
check assignment-number
makeit assignment-number [file]
protect assignment-number file(s)
unprotect assignment-number file(s)
The submit program can be invoked in six different ways:
/c/cs223/bin/submit 1 Makefile tokenize.c unique.c time.log
submits the named source files as your solution to Homework #1;
/c/cs223/bin/check 2
lists the files that you have submitted for Homework #2;
/c/cs223/bin/unsubmit 3 error.submit bogus.solution
deletes the named files that you had submitted previously for Homework #3
(i.e., withdraws them from submission, which is useful if you accidentally
submit the wrong file);
/c/cs223/bin/makeit 4 tokenize unique
runs "make" on the files that you submitted previously for Homework #4;
/c/cs223/bin/protect 5 tokenize.c time.log
protects the named files that you submitted previously for Homework #5 (so
they cannot be deleted accidentally); and
/c/cs223/bin/unprotect 6 unique.c time.log
unprotects the named files that you submitted previously for Homework #6
(so they can be deleted).
The submit program will only work if there is a directory with your name and login on it under /c/cs223/class. If there is no such directory, you need to make sure that you have correctly signed up for CS223 using the web form.
You can find out your officially entered assignment and test grades using Grade-o-Matic, our half-baked WWW-accessible grade database. You should be sent a login name and password by email when you have completed the first homework assignment.
Fri 03 May 2002 23:06:16 EDT howto.tyx Copyright © 1998-2002 by Jim Aspnes