David MacKay

Search :
Back to Synopsis page 1 | Lecture and reading summary


The course textbook is Information theory, inference, and learning algorithms, by D.J.C.MacKay (2003, C.U.P.) (Rayleigh library: 39 M 20; Betty & Gordon Moore Library: Q360.M33 2003). This 640-page textbook covers the whole course, and a whole lot more. All students are strongly encouraged to buy or borrow this textbook. If you buy it at the CUP bookshop and show your University ID, you can get a 20% discount. You may download the book for free too. Guarantee: If you buy the book, then decide that you don't want to keep it, I will buy it from you for a good price and sell it on to a future student.

Other online resources

  1. A nice summary of graphical models by Kevin Murphy
  2. Introduction to Statistical Thinking - Michael Lavine offers a free book on statistics, emphasizing the likelihood principle.
  3. computer vision/image analysis/imaging books online

Textbook recommendations

Other highly recommended books are as follows; I especially recommend Goldie and Pinch (1991), Bishop (1995), and Sivia (1996), which are all reasonably priced.

For information theory and coding theory, excellent texts are McEliece (1977) and the original book by Shannon and Weaver (1949), but these are hard to find (ask your library to get Shannon (1993)). Three excellent alternatives are Hamming (1986), Goldie and Pinch (1991), and Welsh (1988). Golomb et al. (1994) is readable and discusses the practical side of coding theory as well as information theory. Gallager (1968) is similar and goes into more depth; it's a good book. Cover and Thomas (1991) is also good, though their approach is theoretical rather than practical. An important journal paper on Arithmetic coding is Witten et al. (1987) (available in the Pt II/III library).

For neural networks and pattern recognition, an excellent text is Bishop (1995); Ripley (1996) is also recommended. Ripley's book is encyclopaedic, covering a wide range of statistical models and giving large numbers of citations of the original literature; he includes a set of practical data sets which are referred to frequently throughout the book, and he also goes into some theoretical depth. Ripley's coverage is from the point of view of the statistician. Bishop's perspective is that of the Physicist-Engineer. Real data sets do not appear in Bishop's book, but simple examples are given throughout, and Bishop includes exercises too. An alternative text which emphasises connections between neural networks and statistical physics is Hertz et al. (1991). This text discusses Hopfield networks at length, unlike Bishop (1995) and Ripley (1996). An older text on pattern recognition is Duda and Hart (1973), recently republished (Duda et al., 2000) - recommended. An older book on neural networks which was written at the start of the latest craze of neural nets is Rumelhart and McClelland (1986). It's an exciting read. An excellent book on the state of the art in supervised neural network methods is Neal (1996).

For pure statistical inference, I highly recommend Sivia (1996); Berger (1985) and Bretthorst (1988) (now out of print) are also very good. Jeffreys (1939) is an important classic, and Box and Tiao (1973) is well worth reading too. Connections between statistical inference and statistical Physics are explored in the essential essays of Jaynes (Rosenkrantz, 1983). For further reading on graphical models and Bayesian belief networks, which have widespread importance in the Artifical Intelligence community, Jensen (1996) is recommended; it includes a floppy disc with the Hugin software for simulating Bayesian networks. A more theoretical text on graphical models is Lauritzen (1996). For further reading about probabilistic modelling of proteins and nucleic acids, I highly recommend Durbin et al. (1998).

Berger, J. (1985) Statistical Decision theory and Bayesian Analysis. Springer.

Bishop, C. M. (1995) Neural Networks for Pattern Recognition. Oxford University Press.

Box, G. E. P., and Tiao, G. C. (1973) Bayesian inference in statistical analysis. Addison-Wesley.

Bretthorst, G. (1988) Bayesian spectrum analysis and parameter estimation. Springer. Also available at bayes.wustl.edu.

Cover, T. M., and Thomas, J. A. (1991) Elements of Information Theory. New York: Wiley.

Duda, R. O., and Hart, P. E. (1973) Pattern Classification and Scene Analysis. Wiley.

Duda, R. O., Hart, P. E., and Stork, D. G. (2000) Pattern Classification. New York: Wiley. 2nd Edition.

Durbin, R., Eddy, S. R., Krogh, A., and Mitchison, G. (1998) Biological Sequence Analysis. Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press.

Gallager, R. G. (1968) Information Theory and Reliable Communication. New York: Wiley.

Goldie, C. M., and Pinch, R. G. E. (1991) Communication theory. Cambridge: Cambridge University Press.

Golomb, S. W., Peile, R. E., and Scholtz, R. A. (1994) Basic Concepts in Information Theory and Coding: The Adventures of Secret Agent 00111 . New York: Plenum Press.

Hamming, R. W. (1986) Coding and Information Theory. Englewood Cliffs, NJ: Prentice-Hall, 2nd edition.

Hertz, J., Krogh, A., and Palmer, R. G. (1991) Introduction to the Theory of Neural Computation. Addison-Wesley.

Jeffreys, H. (1939) Theory of Probability. Oxford Univ. Press. 3rd edition reprinted 1985.

Jensen, F. V. (1996) An Introduction to Bayesian Networks. London: UCL press.

Lauritzen, S. L. (1996) Graphical Models. Number 17 in Oxford Statistical Science Series. Oxford: Clarendon Press.

McEliece, R. J. (1977, recently reprinted in 2nd edn by C.U.P.) The Theory of Information and Coding: A Mathematical Framework for Communication. Reading, Mass.: Addison-Wesley.

Neal, R. M. (1996) Bayesian Learning for Neural Networks. Number 118 in Lecture Notes in Statistics. New York: Springer.

Ripley, B. D. (1996) Pattern Recognition and Neural Networks. Cambridge.

Rosenkrantz, R. D. (1983) E.T. Jaynes. Papers on Probability, Statistics and Statistical Physics. Kluwer.

Rumelhart, D. E., and McClelland, J. E. (1986) Parallel Distributed Processing. Cambridge Mass.: MIT Press.

Shannon, C. E. (1993) Collected Papers. New York: IEEE Press. Edited by N. J. A. Sloane and A. D. Wyner.

Shannon, C. E., and Weaver, W. (1949) The Mathematical Theory of Communication. Urbana: Univ. of Illinois Press.

Sivia, D. S. (1996) Data Analysis. A Bayesian Tutorial. Oxford University Press.

Welsh, D. (1988) Codes and Cryptography. Clarendon press.

Witten, I. H., Neal, R. M., and Cleary, J. G. (1987) Arithmetic coding for data compression. Communications of the ACM 30 (6): 520-540.

Site last modified Sun Aug 31 18:51:05 BST 2014