CPSC 426/526: Building Decentralized Systems, Fall 2010
- MW 2:30-3:45 PM, Room 500 AKW
- Bryan Ford,
Room 211 AKW, 432-1055
- Office hours: Mon & Wed 4:00-5:00 PM, or by appointment.
- Teaching Assistant:
- Michael "Fitz" Nowlan, Room 305 AKW
- Office hours: Tue & Thu 3:30-4:30 PM, or by appointment.
- URL: http://zoo.cs.yale.edu/classes/cs426
This course explores practical principles and techniques
for building decentralized systems.
What is a decentralized system?
Basically, a distributed system that works although no one is in charge.
For purposes of this course,
a distributed system is a set of computers
that are physically distributed but can communicate via some form of network.
A decentralized system is a particular kind of distributed system:
namely, a set of computers that are under the control
of multiple separate owners or administrative authorities, not just one,
who wish to use their networked computing facilities to communicate,
organize, or share computing resources,
even if not all the machines' owners are guaranteed
to trust or “play nicely” with each other.
The course takes a hands-on approach,
emphasizing learning by doing.
The course's lectures will present and discuss challenges,
known techniques, and open questions
in decentralized system design and implementation.
Lectures will often be driven by examination
of real decentralized systems with various purposes
in widespread use the past or present,
such as UseNet, IRC, FreeNet, and Tor.
Throughout the course we will explore
fundamental security and usability challenges
such as decentralized identification and authentication,
denial-of-service and Sybil attacks,
and maintenance of decentralized structures undergoing rapid changes (churn).
During the semester,
students will develop a small but usable peer-to-peer communication application
that reflects a few of the important design principles and techniques
to be explored in the course,
such as gossip, social trust networks, distributed hash tables,
and byzantine consensus algorithms.
The labs will designed so that solutions can initially be tested individually
on private, virtual networks running on one machine (e.g., a zoo machine),
then tested collectively by attempting
to make different students' solutions interoperate on a real network.
Introduction to Systems Programming and Computer Organization.
- Communicating: Networking Foundations.
Addressing, forwarding, routing.
Connection-oriented versus connectionless modes.
Client/server versus peer-to-peer communication.
Firewalls, NATs, traversal.
- Gossip: a foundation for decentralized systems.
UseNet: technical, security, and social lessons.
Randomized rumor-mongering and anti-entropy algorithms.
- Communicating Securely: Basic Cryptographic Tools.
Symmetric-key encryption. Hash functions, message authentication.
Diffie-Hellman key exchange.
Decentralized identity via public-key encryption, digital signatures.
- Trust and Reputation.
Authorities, trust networks.
Sybil attacks and defenses.
- Finding Stuff in a Big World: Naming and Lookup.
Just ask around: request flooding.
Hierarchical directories and landmark structures.
Distributed hash tables.
- Following a Moving Target.
Location services, reference points, forwarding.
- Anonymous Communication.
Onion routing, mix networks.
Voting, verifiable shuffles, homomorphic encryption.
- Fireproofing Alexandria: Decentralized Storage.
Parity, erasure coding.
- Content Distribution.
Opportunistic caching (FreeNet).
Content integrity: hash trees, hash file systems.
Swarming downloads: BitTorrent.
- Gaining perspective.
Spam, malicious content.
Review/moderation and reputation systems.
Leveraging social networks (Peerspective).
Balancing local and global viewpoints.
- Decentralized Collaboration.
Network file systems, version management. Consistency.
Disconnected operation, weak consistency, conflict resolution.
- Distributed Consensus.
Paxos. Accountability (PeerReview). Byzantine fault tolerance.
- Mobile Code and Agents.
Privacy: trusted computing, fully homomorphic encryption.
Franchises, Hosting Incentives.
Decentralized virtual organizations.
All of the required reference materials for this course
are available on-line on the reference page
(also available in the quick links at the top).
You will be using the Intel Linux PCs
in the Zoo
You may access them either locally on the third floor of Watson Hall,
or remotely via the following command,
which will log you into a randomly-chosen Zoo machine
in order to balance load on the cluster:
To access these PCs, you can either directly login from their consoles
in the Zoo, or just remotely login from other
machines across the campus.
If you plan to take the course for credit, you should get an account
on these machines in the first week. Please also visit the following web
site to create a cs426 class directory (or just to sign up for a zoo account):
Do not allow anyone else to use your accounts for any purpose.
They are for your use alone, and you are responsible for any misuse.
Your passwords control access to your accounts and should be kept secret.
Your grade will be calculated as follows:
These weights are subject to minor variation.
- Lecture preparation homework and in-class participation: 15%
- Midterm exam: 15% (open books, open notes, no electronics)
- Final exam: 20%
- Programming labs: 50%
Some lectures will have associated preparatory homework assignments,
labeled PREP: in the schedule.
Homeworks will typically involve
and answering a few questions about assigned reading material.
Written homework assignments must be turned in
at the beginning of the associated lecture;
late homework will be accepted only during
the first two-week "shopping period," until Wednesday September 15.
The midterm exam is scheduled in class on Monday October 18, 2010. The midterm is open books/notes, but no electronics (eg. laptops).
The final exam is scheduled for Friday December 17,
from 2 to 5PM (we may not use the full period).
Unless prior arrangements are made,
a grade of zero will be recorded for a missed exam.
Programming, like composition, is an individual creative process.
Individuals must reach their own understanding of the problem
and discover a path to its solution. During this time,
discussions with friends are encouraged.
However, when the time comes to write the code that solves the problem,
such discussions are no longer appropriate:
each student's code must be the work of the members of that student alone
(although you may ask teaching assistants or lab assistants
for help in debugging).
In your coding you are encouraged to adopt ideas suggested
by classmates or other reference sources,
but must carefully acknowledge the sources of those ideas
in your own code and/or documentation.
Do not, under any circumstances, copy
another student's code. Writing code for use by another or using
another's code in any form violates the University's academic regulations and
will be dealt with harshly.
For each lab, the TA will create a directory named
in which you are to copy your lab solution into a subdirectory
named according to your NetID.
At the lab deadline, the TA will freeze
the labN.ontime directory
and create a labN.late1 directory
in which to submit labs that will be considered one day late.
This process will continue as needed
with directories named labN.lateD
for labs D days late.
Please submit your entire Git working directory
(including the .git subdirectory) as a tarball,
using a command of this form:
$ tar cvzf /c/cs426/SUBMIT/labN.ontime/netid.tar.gz .
You will be using the Git
version control system to manage source code in your programming labs,
as will be laid out in Lab 1.
When you turn in a lab, you are to include
your entire Git repository,
including the .git directory and all its contents.
Attendance at lectures is expected but will not be recorded. Students are,
however, fully responsible for all material presented in lectures, even
if some of it does not appear in the "official" lecture notes.
Class attendance is recommended strongly.
Lecture notes will be made available,
though they are by no means guaranteed to be a complete record of the class
and cannot substitute for class attendance.
The best way to contact the instructor and
the TA is by electronic mail.
To get help quickly,
your best bet is to send email to
where it will be seen only by the instructor and TA,
email@example.com, where your
message will also be forwarded to every student in the class.
Use of the whole-class mailing list is encouraged
especially in the case of clarifications or debugging questions,
since it is likely that other teams will be encountering
the same or similar difficulties that you are
and may offer the quickest answer.
All the course-related information will be kept on the web
Copyright (c) 2000-2010
Department of Computer Science,