[Baker]
H.Baker.  Computing A*B (mod N) Efficiently in ANSI C.  ACM SIGPLAN Notices.  Vol 27, No. 1 (January 1992), pp 95-98.

Also available at ftp://ftp.netcom.com/pub/hb/hbaker/AB-mod-N.html

Describes efficient ways to compute A*B(mon N), where A, B, and N are built-in integer types (32 bits).

This article mentions that the ANSI C specification requires that the product of two long integers should not generate an overflow, and the the result must be the least signigicant 32 bits of the actual product.  This is important because it means the a double size accumulator is not required for some algorithms.

[Barrett]
P.D.Barrett.  "Implementng the Rivest Shamir and Adleman public key encryption algorithm on a standard digital signal processor"  Advances in Cryptology, Proceedings of Crypto '86, LNCS 263, A.M.Odlyzko, Ed.  Springer-Verlag, 1987, pp.311-323.

Introduces the Barrett modular reduction allgorithm.

[Becker]
P.Becker.  Memory and Object Management, Part 2.  C/C++ Users Journal.   Vol 17, No. 8 (August 1999), pp. 71-77.

This contains a good description of some of the problems with the standard allocator (malloc,new), and how to build a custom allocator.  Many of his design decisions parallel mine, but the end result is of course different.  Even though this article was written quite some time after the ln3 allocator was finished, the article is still useful as an overview of the issues, and some idea of the design decisions involved.

[Bosselaers,Govaerts,Vandewalle]
A.Bosselaers, R.Govaerts, J.Vandewalle.  Comparison of three modular reduction functions.  Crypto '93.  175-186.

Compares the traditional algorithm (interleaved multiplication and reduction), Barrett' algorithm, and Motgomery's algorithm.

[Cohen]
What is the reference?
[Dusse and Kaliski]
S.Dussé, B.Kaliski Jr.  A Cryptographic Library for the Motorola DSP56000.  Advances in Cryptology---EUROCRYPT 90, Lecture Notes in Computer Science, Vol. 473, pp. 230-244, Springer-Verlag, 1991, 21-24 May 1990.

Among other things, this describes the implementation of Montgomery's modular multiplication algorithm.

[Flath]
Number theory text
[Ireland and Rosen]
L.Ireland, M.Rosen.  A Classical Introduction to Modern Number Theory.  Springer-Verlag, New York, 1990.
[Jebelean 92a]
T.Jebelean.  An algorithm for exact division, RISC--Linz Report 92­35, May 1992, submitted to Journal of Symbolic Computation.

As an application of exact division, he describes a method for doing modular inversion for m = 2n.

[Knuth2]
D.Knuth.  The Art of Computer Programming.   Vol. 2.  Seminumerical Algorithms, 2nd ed.   Addison-Wesley, Reading, Mass., 1981.
[Montgomery]
P.Montgomery.  Modular Multiplication Without Trial Division.   Mathematics of Computation, 44(170), pp. 519-521, April 1985.
[Menezes,van Oorschot,Vanstone]
A.J.Menezes, P.C.van Oorschot, S.A.Vanstone. Handbook of Applied Cryptography. CRC Press, 1997.
[Meyer and Sorenson]
S.Meyer, J.Sorenson.  Efficient Algorithms for Computing the Jacobi Symbol.  ANTS: 2nd International Algorithmic Number Theory Symposium (ANTS), 1996.

 

[Sorenson1]
Jonathan Sorenson. Two Fast GCD Algorithms. Journal of Algorithms, 16, pp. 110-144, 1994.
[Shallit]
Article comparing Jacobi algorithms
[Weber]
K.Weber.  The Accelerated Integer GCD Algorithm.  ACM Transactions on Mathematical Software.  21(1):111-122, March 1995.