Paper Review:
Hash-based IP Traceback
Reviewer: Mark Meras (mm446)
Main Contribution
This paper talks about a new method of tracing the source of IP packets in
a network in which multiple malicious sources and routers can be present.
The paper demonstrates that even a single IP packet can be traced back to
its source using a Bloom hashing scheme to keep track of packet digests.
Key Points
- Network security is a very important topic right now, with many
examples of Distributed Denial of Service attacks. In these and other
types of attacks, it is crucial to be able to trace back to the source of
the packets sent.
- Traceback is not as easy as looking up the source IP on the packet.
Often, attackers will use techniques to disguise their actual "location".
Therefore, new methods are needed to counteract these techniques. In
addition, routers in the system may be accomplices of the attacker.
- The paper describes a new Source Path Isolation Engine (SPIE) that
can trace back a packet (i.e. identify its true source) given the packet
itself, its destination, and an approximate time of receipt. SPIE masks
out some fields of the IP header, and then hashes the masked header along
with the first 8 bytes of content.
- SPIE uses multiple independent hash functions that have independent
probabilities of collision approaching 2^(-n). It combines them together
using a Bloom filter. Then, it keeps a Transform Lookup Table of all the
Bloom hashing outputs, with each entry only occupying 64 bits. This
allows for efficient space and memory requirements.
- To trace back a packet, an attack path is generated using SPIE
Collection and Reduction (SCAR) agents that are each responsible for a set
of routers. A search of the router tree occurs as the agents try to find
the origin.
Critique of Contribution
The contribution seems very relevant. Its greatest achievement is in its
ability to trace the source back using only one packet. Many other
traceback techniques use a probabilistic approach that requires many
packets to identify the source of the flow.
Open Question
There seems to be something arbitrary about which fields are masked. How
would this technique scale to IP v6? It seems the efficiency would go
down quite a bit due to much greater storage/memory space required for
each router to keep the Transform Lookup Table.