Paper Review: Hash-Based IP Traceback

Reviewer: Kenneth Chin

This paper was introducing an improved version of auditing IP traceback. The improvement is mainly towards storage requirement. The authors were bringing out a technique of hashing to dramatically reduce the amount of memory required for IP traceback. The underlying idea is that instead of storing a whole packet, the digest of the packet returning from a uniform hash function with 3 header fields (20 bytes in total) and the first 8 bytes of the payload as the input is stored. Furthermore, probabilistic marking using bloom filters is deployed to selectively store the digests. As for the traceback process, the victim has to send a request to the SPIE Traceback Manager (STM), specifying the packet in question, time of attack and his own identification. Upon receiving the request, the STM then send a request to the SPIE Collection and Reduction agents (SCARs) in its local administration, asking for attack graphs. Each SCAR in turn asks its connected Data Generation Agents (DGAs) in the routers for attack paths. The attack paths form the attack graphs which in turn forms the final attack graphs. This is an overview of how the hash-based IP traceback works.

There are several issues about the hash-based IP traceback mechanism.

It is obsessively time critical; the suggested amount of time to identify an attack packet and initiate a traceback is one minute.
The traceback process is sequential in the sense that not until an SCAR figures out the source is not in its local administration does it send a request to its neighbouring SCAR for further investigation.
The amount of storage is still huge and really depends on a number of factors like the link speed and the storage access time.
The traceback requests or queries may be delayed or lost due to unstable network behavior or heavy congestion.

My view of a the storage requirement of IP traceback is that it is always there. It is because as the technology advances, both the link speed, the memory capacity and access time will go in parallel, maintaining more or less the same the differential factor. I very much appreciate the idea of digest as an alternative entity representing the whole packet, and the idea of probabilistic marking. However, it seems to me that is may still be a good idea to store the information in an inexpensive storage medium like tape drive while deploying a cache system to accommodate the gap between fast incoming packets and slow archiving process. In this way, the traceback process can be done solely offline and this removes the time critical requirement which I think is the problem creator. Also, we have to have an administrative party to ensure all routers are cooperative in order for any IP traceback scheme to succeed.

This paper has a decent contribution to IP traceback, so it deserves a 3rd-grade.