

Information Theory, Pattern Recognition and Neural Networks
Part III Physics Course: Minor option. 12 lectures.
What's new, 2009:
 Lecture times: Mondays and Wednesdays, 2pm, starting 26th January.
Pippard Lecture Theatre

Course synopsis:  Handout 1 (2 pages postscript) 
pdf 

Handout 2:  Handout 2 (2 pages postscript) 
pdf 
What's new, 2008:

Course synopsis:  Handout 1 (2 pages postscript) 
pdf 

Handout 2:  Handout 2 (2 pages postscript) 
pdf 
What's new, 2007:

Course synopsis:  Handout 1 (2 pages postscript) 
pdf 

Handout 2:  Handout 2 (2 pages postscript) 
pdf 

Handout 3:  Handout 3 (1 pages postscript) 
pdf 

PAST PAPERS:
 Papers from 2004, 2005, 2006,
with some worked solutions (postscript)
(pdf) 
 The whole 2006 paper (pdf) 
 2001 questions
 & solutions 

 solutions for 2007 

Slides for lectures 112 are available on line.

perl program for Huffman algorithm:
huffman.p
python programs for Huffman algorithm:
huffman10.py
Bent coin:
a sparse file containing N=10000 bits of which roughly 0.01 are 1s.
The main thing at this site is the free online course textbook
Information Theory, Inference and Learning Algorithms, (which also has its own website).
An old (2006) incarnation of this website is here
Want to ask a question?
 please click to see the FAQ about
the book,
the course,
or the software.
Want to give feedback on the book, or report typos?
Great, to ask a question about the book please use this FAQ;
to report a typo, mail me. THANK YOU!
List of corrections already reported

Mathematical prerequisites for ITPRNN course
The level of Mathematics assumed is 1A Maths for Natural Scientists.
All the following topics will be reviewed.
 Probability and statistics:
 definition of conditional probability,
joint probability, marginal probability;
 Bayes' theorem;
 definition
of independence;
 mean
and standard deviation of a sum of independent random
variables; law of large numbers.
 Linear algebra:
 inner products between vectors;
 Taylor
expansions and gradients of functions over vector spaces;
 eigenvectors and eigenvalues of symmetric matrices.
 Differentiation using Chain rule.
 Statistical physics:
 definition of partition function;
 obtaining
thermodynamic functions from the partition function;
 entropy
of a set of N binary spins as a function of the fraction x of
spins that are +1;
 relationship between canonical
ensemble and microcanonical ensemble, and equivalence of the
Boltzmann and Gibbs entropies S=k_B log and
S= p_i 1/p_i (for large systems).

Course Summary

Information Theory, Pattern Recognition and Neural Networks
Minor Option [16 lecture synopsis]
(from 2006, the course is reduced to 12 lectures)
Lecturer: David MacKay

Introduction to information theory [1]
 The possibility of reliable communication over unreliable channels. The (7,4) Hamming
code and repetition codes.

Entropy and data compression [3]
 Entropy, conditional entropy, mutual information, Shannon information content. The idea
of typicality and the use of typical sets for source coding. Shannon's source coding
theorem. Codes for data compression. Uniquely decodeable codes and the KraftMacMillan
inequality. Completeness of a symbol code. Prefix codes. Huffman codes. Arithmetic
coding.

Communication over noisy channels [3]
 Definition of channel capacity. Capacity of binary symmetric channel; of binary erasure
channel; of Z channel. Joint typicality, random codes, and Shannon's noisy channel coding
theorem. Real channels and practical errorcorrecting codes. Hash codes.

Statistical inference, data modelling and pattern recognition [2]

The likelihood function and Bayes' theorem. Clustering as an example

Approximation of probability distributions [2]
 Laplace's method. (Approximation of probability distributions by Gaussian distributions.)
Monte Carlo methods: Importance sampling, rejection sampling, Gibbs sampling, Metropolis method. (Slice sampling, Hybrid Monte Carlo, Overrelaxation, exact sampling. *)
Variational methods and mean field theory. Ising models.

Neural networks and contentaddressable memories [2]

The Hopfield network.
[* = nonexaminable]
Bibliography

Back to Synopsis page 1  Lecture and reading summary
Bibliography
The course textbook is
Information theory, inference, and learning algorithms, by D.J.C.MacKay (2003, C.U.P.)
(Rayleigh library: 39 M 20; Betty & Gordon Moore Library: Q360.M33 2003).
This 640page textbook covers the whole course, and a whole lot more.
All students are strongly encouraged to buy or borrow this textbook.
If you buy it at the CUP bookshop and show your University ID, you can get a 20% discount.
You may download the book for free too.
Guarantee:
If you buy the book, then decide that you don't want to keep it,
I will buy it from you for a good price and sell it on to a future student.
Other online resources

A nice summary of graphical models by Kevin Murphy

Introduction to Statistical Thinking

Michael Lavine offers a free book on statistics, emphasizing the likelihood
principle.

computer vision/image analysis/imaging books online
Textbook recommendations
Other highly recommended books are as follows; I especially
recommend Goldie and Pinch
(1991), Bishop (1995), and Sivia (1996), which are all reasonably priced.
For information theory and coding theory, excellent
texts are McEliece (1977) and
the original book by Shannon and Weaver (1949), but these are hard to find (ask your library
to get Shannon (1993)). Three excellent alternatives are Hamming (1986), Goldie and Pinch
(1991), and Welsh (1988). Golomb et al. (1994) is readable and discusses the practical side of
coding theory as well as information theory. Gallager (1968) is similar and goes into more depth;
it's a good book. Cover and Thomas (1991) is also good, though their approach is theoretical
rather than practical. An important journal paper on Arithmetic coding is Witten et al. (1987)
(available in the Pt II/III library).
For neural networks and pattern recognition, an excellent text is Bishop (1995); Ripley
(1996) is also recommended. Ripley's book is encyclopaedic, covering a wide range of statistical
models and giving large numbers of citations of the original literature; he includes a set of
practical data sets which are referred to frequently throughout the book, and he also goes into
some theoretical depth. Ripley's coverage is from the point of view of the statistician. Bishop's
perspective is that of the PhysicistEngineer. Real data sets do not appear in Bishop's book,
but simple examples are given throughout, and Bishop includes exercises too. An alternative
text which emphasises connections between neural networks and statistical physics is Hertz et al.
(1991). This text discusses Hopfield networks at length, unlike Bishop (1995) and Ripley (1996).
An older text on pattern recognition is Duda and Hart (1973), recently republished (Duda et al.,
2000)  recommended. An older book on neural networks which was written at the start of the
latest craze of neural nets is Rumelhart and McClelland (1986). It's an exciting read. An
excellent book on the state of the art in supervised neural network methods is Neal (1996).
For pure statistical inference, I highly recommend Sivia (1996); Berger (1985) and Bretthorst (1988) (now out of print) are also very good. Jeffreys (1939) is an important classic, and
Box and Tiao (1973) is well worth reading too. Connections between statistical inference and
statistical Physics are explored in the essential essays of Jaynes (Rosenkrantz, 1983). For further
reading on graphical models and Bayesian belief networks, which have widespread importance
in the Artifical Intelligence community, Jensen (1996) is recommended; it includes a floppy disc
with the Hugin software for simulating Bayesian networks. A more theoretical text on graphical
models is Lauritzen (1996). For further reading about probabilistic modelling of proteins and
nucleic acids, I highly recommend Durbin et al. (1998).
Berger, J. (1985) Statistical Decision theory and Bayesian Analysis. Springer.
Bishop, C. M. (1995) Neural Networks for Pattern Recognition. Oxford University Press.
Box, G. E. P., and Tiao, G. C. (1973) Bayesian inference in statistical analysis. AddisonWesley.
Bretthorst, G. (1988) Bayesian spectrum analysis and parameter estimation. Springer. Also
available at bayes.wustl.edu.
Cover, T. M., and Thomas, J. A. (1991) Elements of Information Theory. New York: Wiley.
Duda, R. O., and Hart, P. E. (1973) Pattern Classification and Scene Analysis. Wiley.
Duda, R. O., Hart, P. E., and Stork, D. G. (2000) Pattern Classification. New York:
Wiley. 2nd Edition.
Durbin, R., Eddy, S. R., Krogh, A., and Mitchison, G. (1998) Biological Sequence
Analysis. Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press.
Gallager, R. G. (1968) Information Theory and Reliable Communication. New York: Wiley.
Goldie, C. M., and Pinch, R. G. E. (1991) Communication theory. Cambridge: Cambridge
University Press.
Golomb, S. W., Peile, R. E., and Scholtz, R. A. (1994) Basic Concepts in Information
Theory and Coding: The Adventures of Secret Agent 00111 . New York: Plenum Press.
Hamming, R. W. (1986) Coding and Information Theory. Englewood Cliffs, NJ: PrenticeHall,
2nd edition.
Hertz, J., Krogh, A., and Palmer, R. G. (1991) Introduction to the Theory of Neural
Computation. AddisonWesley.
Jeffreys, H. (1939) Theory of Probability. Oxford Univ. Press. 3rd edition reprinted 1985.
Jensen, F. V. (1996) An Introduction to Bayesian Networks. London: UCL press.
Lauritzen, S. L. (1996) Graphical Models. Number 17 in Oxford Statistical Science Series.
Oxford: Clarendon Press.
McEliece, R. J. (1977, recently reprinted in 2nd edn by C.U.P.) The Theory of Information and Coding: A Mathematical Framework
for Communication. Reading, Mass.: AddisonWesley.
Neal, R. M. (1996) Bayesian Learning for Neural Networks. Number 118 in Lecture Notes in
Statistics. New York: Springer.
Ripley, B. D. (1996) Pattern Recognition and Neural Networks. Cambridge.
Rosenkrantz, R. D. (1983) E.T. Jaynes. Papers on Probability, Statistics and Statistical
Physics. Kluwer.
Rumelhart, D. E., and McClelland, J. E. (1986) Parallel Distributed Processing. Cambridge Mass.: MIT Press.
Shannon, C. E. (1993) Collected Papers. New York: IEEE Press. Edited by N. J. A. Sloane
and A. D. Wyner.
Shannon, C. E., and Weaver, W. (1949) The Mathematical Theory of Communication.
Urbana: Univ. of Illinois Press.
Sivia, D. S. (1996) Data Analysis. A Bayesian Tutorial. Oxford University Press.
Welsh, D. (1988) Codes and Cryptography. Clarendon press.
Witten, I. H., Neal, R. M., and Cleary, J. G. (1987) Arithmetic coding for data compression. Communications of the ACM 30 (6): 520540.

In 2012, EmliMari Nel kindly organised the filming of 16 lectures.
From 2nd November 2012,
You can view the videos online (synchronized with snapshots and slides) at the
video lectures website.
(The original videos of the lectures and slides are also available at
http://www.inference.org.uk/itprnn_lectures)



Supervision information
Supervisions are on
Thursdays and Fridays.
Thursdays in TCM seminar room (top floor, Mott building) 
Fridays 1.452.45pm in Mott seminar room (Room 531) 
Fridays 34pm in Mott seminar room, mostly 
Thu 5th February 4pm5pm  Fri 6 Feb  Fri 6 Feb pippard 
Thu 12 Feb 4pm5pm  Fri 13 Feb  Fri 13 Feb 
Thu 19 Feb 5pm6pm  Fri 20 Feb  Fri 20 Feb 
Thu 26 Feb 5pm6pm  Fri 27 Feb  Fri 27 Feb 
Thu 5 March 5pm6pm  Fri 6 Mar  Fri 6 Mar pippard 
Thu 12 March 5pm6pm  Fri 13 Mar  Fri 13 Mar 
What work to do
The book contains numerous exercises, complete with worked solutions.
A subset of these exercises will be designated as the exercises
you should work on each week. [Generally,
these exercises are the ones marked `highly recommended' by a marginal rat.]
I encourage you not to look at the worked solutions
for these exercises before you have attempted them yourself.
In the supervisions, we will focus in detail on just a few
of the exercises.
(You should work on all the exercises yourselves before supervisions.)
So far
The first week's recommended exercises:
1.3 (p.8), 1.57 (p.13), 1.9, & 1.11 (p.14).

Information Theory, Inference, and Learning Algorithms
(Hardback, 640 pages, Published September 2003)
Order your copy
Price: £35.00 / $60.00 from CUP UK/USA amazon.co.uk/.com/.ca/.co.jp  Barnes & Noble USA  booksprice.  fetchbook.info  allbookstores  biggerbooks  blackwells  directtextbook  kalahari.net (South Africa)  Special Paperback edition for South Asia.
Download the book too
You can browse and search the book on Google
books.
You may download
The book in one file
(640 pages):
Notes:
 Version 6.0 was released Thu 26/6/03; the book is finished.
You are welcome to view the book onscreen. Version 6.0 was used for the first printing,
published by C.U.P. September 2003.

Version 6.6 was released Mon 22/12/03; it will be used for the second printing,
to be released January 2004.
In this second printing,
a small number of typographical errors were corrected,
and the design of the book was altered slightly.
Pagenumbering generally remains unchanged,
except in chapters 1, 6, and 28,
where
a few paragraphs, figures, and equations have
moved around.
All equation, section, and exercise numbers are unchanged.
 Version 7.0 is the third printing (November 2004).
Its only differences from the 2nd printing are a number of corrections,
and the renaming of the chapter `Correlated Random Variables' to `Dependent Random Variables'.
Copyright issues: The book is copyright (c) Cambridge University Press.
It has been available in bookstores since September 2003. The cover price in 2003 was 30 pounds (UK)
and $50 (USA); in 2006, 35 pounds and $60 (USA).
Now the book is published, these files will remain viewable on
this website. The same copyright rules will apply to the online copy of the
book as apply to normal books. [e.g., copying the whole book onto paper is
not permitted.]
History:
Draft 1.1.1  March 14 1997.
Draft 1.2.1  April 4 1997.
Draft 1.2.3  April 9 1997.
Draft 1.2.4  April 10 1997. Margins altered so as to print better
on Northamerican paper
Draft 1.3.0  December 23 1997.
Draft 1.9.0  Feb 1 1999.
Draft 2.0.0  Jan 6 2000. New page layout.
Draft 2.2.0  Dec 23 2000. Fresh draft.
Draft 3.1415  Jan 12 2003. Nearly finished.
Draft 4.0  April 15 2003. Chapter sequence finalized.
Draft 4.1  April 18 2003. Adding new frontmatter (Preface etc)to book. Corrected printing of letter version.
Version 6.0  Thu 26 June 2003. Final version.
Version 6.6  Mon 22 December 2003. Second printing.
 Here is my method for converting to twoup under linux:
pstops '4:0L@.67(20cm,1cm)+1L@.67(20cm,15cm),3R@.67(1cm,15.25cm)\
+2R@.67(1cm,29.25cm)' $*.ps $*.dps

Errors in the book
I would be
grateful to hear about errors in my lecture notes.
A list of corrections is provided.
Please send me new corrections by email. Thank you!
If you have a query about the book that you think
other people might also ask, please use one of the following links
to submit your query through metaFAQ; you may find the answer
is already there:

Query categories
Any other query


Software, demonstrations, etc.
Hint: To force download to file, use shiftclick
The tcl programs and other links marked (*) can be
run directly in netscape if you have got the tcl/tk
plugin.
You can also run the tcl programs in the normal way if you have a
linux machine with X windows and tcl and you download the tcl files
then execute them (this is preferable to the netscape approach, I think).
[The top four lines of each tcl file contain flags which change
the program from working with the netscape plugin to working
alone under X.]
The other programs can be used by people on unix systems that
have the relevant software (perl, gnuplot, octave.)
Please select a software category from the sidebar.

Hint: To force download to file, use shiftclick
The tcl programs and other links marked (*) can be
run directly in netscape if you have got the tcl/tk
plugin.
You can also run the tcl programs in the normal way if you have a
linux machine with X windows and tcl and you download the tcl files
then execute them (this is preferable to the netscape approach, I think).
[The top four lines of each tcl file contain flags which change
the program from working with the netscape plugin to working
alone under X.]
The other programs can be used by people on unix systems that
have the relevant software (perl, gnuplot, octave.)

Inference methods
Hint: To force download to file, use shiftclick
The tcl programs and other links marked (*) can be
run directly in netscape if you have got the tcl/tk
plugin.
You can also run the tcl programs in the normal way if you have a
linux machine with X windows and tcl and you download the tcl files
then execute them (this is preferable to the netscape approach, I think).
[The top four lines of each tcl file contain flags which change
the program from working with the netscape plugin to working
alone under X.]
The other programs can be used by people on unix systems that
have the relevant software (perl, gnuplot, octave.)

Online Hamiltonian Monte Carlo Demo
(formerly known as `Hybrid Monte Carlo')
Task: sample from the posterior distribution of a bivariate Gaussian
distribution given ten data points.
The lefthand image shows the details underlying a sequence of
HMC transitions. The righthand image shows 50 of the successive states
produced by the Markov chain  the endpoints of the trajectories.
fit.tar tar file giving octave source code.
There is also an introductory
octave DEMO
that explains Hamiltonian Monte Carlo
and contrasts it with the Metropolis method,
using the same "stronglycorrelated Gaussian"
example as is used in the
Adler Overrelaxation demo.

Hint: To force download to file, use shiftclick
The tcl programs and other links marked (*) can be
run directly in netscape if you have got the tcl/tk
plugin.
You can also run the tcl programs in the normal way if you have a
linux machine with X windows and tcl and you download the tcl files
then execute them (this is preferable to the netscape approach, I think).
[The top four lines of each tcl file contain flags which change
the program from working with the netscape plugin to working
alone under X.]
The other programs can be used by people on unix systems that
have the relevant software (perl, gnuplot, octave.)

Hint: To force download to file, use shiftclick
The tcl programs and other links marked (*) can be
run directly in netscape if you have got the tcl/tk
plugin.
You can also run the tcl programs in the normal way if you have a
linux machine with X windows and tcl and you download the tcl files
then execute them (this is preferable to the netscape approach, I think).
[The top four lines of each tcl file contain flags which change
the program from working with the netscape plugin to working
alone under X.]
The other programs can be used by people on unix systems that
have the relevant software (perl, gnuplot, octave.)

Sparse Graph Codes

Other files
 Images for Chromatic abberation demo:
 1
 2

Links to other people's software
Haskell (Naive) Bayes library by Frederik Eaton

Site last modified Sun Aug 31 18:51:05 BST 2014

