Back to Synopsis page 1  Lecture and reading summary
Bibliography
The course textbook is
Information theory, inference, and learning algorithms, by D.J.C.MacKay (2003, C.U.P.)
(Rayleigh library: 39 M 20; Betty & Gordon Moore Library: Q360.M33 2003).
This 640page textbook covers the whole course, and a whole lot more.
All students are strongly encouraged to buy or borrow this textbook.
If you buy it at the CUP bookshop and show your University ID, you can get a 20% discount.
You may download the book for free too.
Guarantee:
If you buy the book, then decide that you don't want to keep it,
I will buy it from you for a good price and sell it on to a future student.
Other online resources

A nice summary of graphical models by Kevin Murphy

Introduction to Statistical Thinking

Michael Lavine offers a free book on statistics, emphasizing the likelihood
principle.

computer vision/image analysis/imaging books online
Textbook recommendations
Other highly recommended books are as follows; I especially
recommend Goldie and Pinch
(1991), Bishop (1995), and Sivia (1996), which are all reasonably priced.
For information theory and coding theory, excellent
texts are McEliece (1977) and
the original book by Shannon and Weaver (1949), but these are hard to find (ask your library
to get Shannon (1993)). Three excellent alternatives are Hamming (1986), Goldie and Pinch
(1991), and Welsh (1988). Golomb et al. (1994) is readable and discusses the practical side of
coding theory as well as information theory. Gallager (1968) is similar and goes into more depth;
it's a good book. Cover and Thomas (1991) is also good, though their approach is theoretical
rather than practical. An important journal paper on Arithmetic coding is Witten et al. (1987)
(available in the Pt II/III library).
For neural networks and pattern recognition, an excellent text is Bishop (1995); Ripley
(1996) is also recommended. Ripley's book is encyclopaedic, covering a wide range of statistical
models and giving large numbers of citations of the original literature; he includes a set of
practical data sets which are referred to frequently throughout the book, and he also goes into
some theoretical depth. Ripley's coverage is from the point of view of the statistician. Bishop's
perspective is that of the PhysicistEngineer. Real data sets do not appear in Bishop's book,
but simple examples are given throughout, and Bishop includes exercises too. An alternative
text which emphasises connections between neural networks and statistical physics is Hertz et al.
(1991). This text discusses Hopfield networks at length, unlike Bishop (1995) and Ripley (1996).
An older text on pattern recognition is Duda and Hart (1973), recently republished (Duda et al.,
2000)  recommended. An older book on neural networks which was written at the start of the
latest craze of neural nets is Rumelhart and McClelland (1986). It's an exciting read. An
excellent book on the state of the art in supervised neural network methods is Neal (1996).
For pure statistical inference, I highly recommend Sivia (1996); Berger (1985) and Bretthorst (1988) (now out of print) are also very good. Jeffreys (1939) is an important classic, and
Box and Tiao (1973) is well worth reading too. Connections between statistical inference and
statistical Physics are explored in the essential essays of Jaynes (Rosenkrantz, 1983). For further
reading on graphical models and Bayesian belief networks, which have widespread importance
in the Artifical Intelligence community, Jensen (1996) is recommended; it includes a floppy disc
with the Hugin software for simulating Bayesian networks. A more theoretical text on graphical
models is Lauritzen (1996). For further reading about probabilistic modelling of proteins and
nucleic acids, I highly recommend Durbin et al. (1998).
Berger, J. (1985) Statistical Decision theory and Bayesian Analysis. Springer.
Bishop, C. M. (1995) Neural Networks for Pattern Recognition. Oxford University Press.
Box, G. E. P., and Tiao, G. C. (1973) Bayesian inference in statistical analysis. AddisonWesley.
Bretthorst, G. (1988) Bayesian spectrum analysis and parameter estimation. Springer. Also
available at bayes.wustl.edu.
Cover, T. M., and Thomas, J. A. (1991) Elements of Information Theory. New York: Wiley.
Duda, R. O., and Hart, P. E. (1973) Pattern Classification and Scene Analysis. Wiley.
Duda, R. O., Hart, P. E., and Stork, D. G. (2000) Pattern Classification. New York:
Wiley. 2nd Edition.
Durbin, R., Eddy, S. R., Krogh, A., and Mitchison, G. (1998) Biological Sequence
Analysis. Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press.
Gallager, R. G. (1968) Information Theory and Reliable Communication. New York: Wiley.
Goldie, C. M., and Pinch, R. G. E. (1991) Communication theory. Cambridge: Cambridge
University Press.
Golomb, S. W., Peile, R. E., and Scholtz, R. A. (1994) Basic Concepts in Information
Theory and Coding: The Adventures of Secret Agent 00111 . New York: Plenum Press.
Hamming, R. W. (1986) Coding and Information Theory. Englewood Cliffs, NJ: PrenticeHall,
2nd edition.
Hertz, J., Krogh, A., and Palmer, R. G. (1991) Introduction to the Theory of Neural
Computation. AddisonWesley.
Jeffreys, H. (1939) Theory of Probability. Oxford Univ. Press. 3rd edition reprinted 1985.
Jensen, F. V. (1996) An Introduction to Bayesian Networks. London: UCL press.
Lauritzen, S. L. (1996) Graphical Models. Number 17 in Oxford Statistical Science Series.
Oxford: Clarendon Press.
McEliece, R. J. (1977, recently reprinted in 2nd edn by C.U.P.) The Theory of Information and Coding: A Mathematical Framework
for Communication. Reading, Mass.: AddisonWesley.
Neal, R. M. (1996) Bayesian Learning for Neural Networks. Number 118 in Lecture Notes in
Statistics. New York: Springer.
Ripley, B. D. (1996) Pattern Recognition and Neural Networks. Cambridge.
Rosenkrantz, R. D. (1983) E.T. Jaynes. Papers on Probability, Statistics and Statistical
Physics. Kluwer.
Rumelhart, D. E., and McClelland, J. E. (1986) Parallel Distributed Processing. Cambridge Mass.: MIT Press.
Shannon, C. E. (1993) Collected Papers. New York: IEEE Press. Edited by N. J. A. Sloane
and A. D. Wyner.
Shannon, C. E., and Weaver, W. (1949) The Mathematical Theory of Communication.
Urbana: Univ. of Illinois Press.
Sivia, D. S. (1996) Data Analysis. A Bayesian Tutorial. Oxford University Press.
Welsh, D. (1988) Codes and Cryptography. Clarendon press.
Witten, I. H., Neal, R. M., and Cleary, J. G. (1987) Arithmetic coding for data compression. Communications of the ACM 30 (6): 520540.