Paper Review: Hash-based IP Traceback

Reviewer: Mark Meras (mm446)

Main Contribution

This paper talks about a new method of tracing the source of IP packets in a network in which multiple malicious sources and routers can be present. The paper demonstrates that even a single IP packet can be traced back to its source using a Bloom hashing scheme to keep track of packet digests.

Key Points

Network security is a very important topic right now, with many examples of Distributed Denial of Service attacks. In these and other types of attacks, it is crucial to be able to trace back to the source of the packets sent.
Traceback is not as easy as looking up the source IP on the packet. Often, attackers will use techniques to disguise their actual "location". Therefore, new methods are needed to counteract these techniques. In addition, routers in the system may be accomplices of the attacker.
The paper describes a new Source Path Isolation Engine (SPIE) that can trace back a packet (i.e. identify its true source) given the packet itself, its destination, and an approximate time of receipt. SPIE masks out some fields of the IP header, and then hashes the masked header along with the first 8 bytes of content.
SPIE uses multiple independent hash functions that have independent probabilities of collision approaching 2^(-n). It combines them together using a Bloom filter. Then, it keeps a Transform Lookup Table of all the Bloom hashing outputs, with each entry only occupying 64 bits. This allows for efficient space and memory requirements.
To trace back a packet, an attack path is generated using SPIE Collection and Reduction (SCAR) agents that are each responsible for a set of routers. A search of the router tree occurs as the agents try to find the origin.

Critique of Contribution

The contribution seems very relevant. Its greatest achievement is in its ability to trace the source back using only one packet. Many other traceback techniques use a probabilistic approach that requires many packets to identify the source of the flow.

Open Question

There seems to be something arbitrary about which fields are masked. How would this technique scale to IP v6? It seems the efficiency would go down quite a bit due to much greater storage/memory space required for each router to keep the Transform Lookup Table.