Paper review:
Hash-Based IP Traceback
Reviewer:
Mike Liu
- State the problem the paper is trying to solve.
The main problem the paper is trying to solve is how to reliably identify the
originator of an IP packet.
- State the main contribution of the paper: solving a new problem, proposing a
new algorithm, or presenting a new evaluation (analysis). If a new problem, why
was the problem important? Is the problem still important today? Will the
problem be important tomorrow? If a new algorithm or new
evaluation (analysis), what are the improvements over previous algorithms or
evaluations? How do they come up with the new algorithm or evaluation?
The main contribution of this paper is that it proposes an efficient and
scalable hash-based technique for IP traceback. This is a solution to a
relatively new problem and it is rather important in the present day because
as the Internet becomes more and more ubiquitous, a myriad of new safety
concerns arises. Today's Internet infrastructure is extremely vulnerable to
motivated and well-equipped attackers. The problem will continue to be
important as the Internet becomes more widespread and offers more and more
services.
- Summarize the (at most) 3 key main ideas (each in 1 sentence.)
The three 3 key main ideas are:
(1) The authors have developed a Source Path Isolation Engine (SPIE) to enable
IP traceback, the ability to identify the source of a particular IP packet
given a copy of the packet to be traced, and its destination, and an
approximate time or receipt, and have also enabled it to trace packets through
valid transformations, either from packet encapsulation or packet generation.
(2) Historically, tracing individual packets has required prohibitive amounts
of memory; one of SPIE's key innovations is to reduce the memory requirement
(down to 0.5% of link bandwidth per unit time) through the use of Bloom
Filters.
(3) To determine the optimum amount of resources to dedicate to SPIE on an
individual router or the network as a whole, you work with resource
requirements given by the number of packet digest functions and the amount of
memory used to store packet digests, to get the performance is measured
by the length of time for which packets digests are kept and the accuracy of
the candidate attack graphs; the more memory available for storing packet
digests, the long the time queries can be issued; digest table with lower
false-positive rates yield more accurate attack graphs.
- Critique the main contribution
- Rate the significance of the paper on a scale of 5
(breakthrough), 4 (significant contribution), 3 (modest contribution), 2
(incremental contribution), 1 (no contribution or negative contribution).
Explain your rating in a sentence or two.
I give this paper a rating of 4 because it presents a great first step at
solving the problem of an efficient, scalable, and implementable system for
IP traceback.
- Rate how convincing the methodology is: how do the authors
justify the solution approach or evaluation? Do the authors use arguments,
analyses, experiments, simulations, or a combination of them? Do the claims
and conclusions follow from the arguments, analyses or experiments? Are the
assumptions realistic (at the time of the research)? Are the assumptions still
valid today? Are the experiments well designed? Are there different
experiments that would be more convincing? Are there other alternatives the
authors should have considered? (And, of course, is the paper free of
methodological errors.)
The authors' methodology was to present the complex problem IP traceback on
the Internet especially in the midst of the different transformations packets
may go through and demonstrate how their solution solves these problem in an
implementable way. They prove that there solution is implementable by
presenting both analytic and simulation results showing the effectiveness of
an implemented FreeBSD protoype. For their analysis, they discuss the
tradeoffs involved when determining the optimum amount of resource to dedicate
to SPIE on an individual router or the network as a whole, such as the number
of packet digest functions and the amount of memory used to store packet
digests. The performance is then measured by the length of time for which
packets digests are kept and the accuracy of the candidate attack graphs. They
present the theoretical bounds and compare it to actual simulation results
using a topology snapshot and average link utilization data for a national
tier-one ISP network backbone for a week-long period during the end of 2000
- What is the most important limitation of the approach?
The most important limitation of their approach is that since they arrived at
their results using experiments run on a simulator, their results are only as
good as their simulator. Thus, if the network topology is somewhat different
for ending period of 2001, their result may not necessarily hold. Also, if the
trace is not an accurate representation of a more complex network topology, it
does not prove that the system will work in such an environment. Even so, it
was a fairly comprehensive trace and it is the best any person can do for
testing their system's performance on the Internet.
- What lessons should researchers and builders take away from this work. What
(if any) questions does this work leave open?
The lessons that researchers should take away from this work are that it is
possible to build an efficient, scalable, and implementable system for IP
traceback and it has been done despite the seeminly large memory requirements.
The potential memory requirent roadblock can be overcome using Bloom Filters
and this system can continue to be effective for tracing packets, even those
that undergo transformations.