Also available at ftp://ftp.netcom.com/pub/hb/hbaker/AB-mod-N.html
Describes efficient ways to compute A*B(mon N), where A, B, and N are built-in integer types (32 bits).
This article mentions that the ANSI C specification requires that the product of two long integers should not generate an overflow, and the the result must be the least signigicant 32 bits of the actual product. This is important because it means the a double size accumulator is not required for some algorithms.
Introduces the Barrett modular reduction allgorithm.
This contains a good description of some of the problems with the standard allocator (malloc,new), and how to build a custom allocator. Many of his design decisions parallel mine, but the end result is of course different. Even though this article was written quite some time after the ln3 allocator was finished, the article is still useful as an overview of the issues, and some idea of the design decisions involved.
Compares the traditional algorithm (interleaved multiplication and reduction), Barrett' algorithm, and Motgomery's algorithm.
Among other things, this describes the implementation of Montgomery's modular multiplication algorithm.
As an application of exact division, he describes a method for doing modular inversion for m = 2n.