David MacKay
 Search :

# Information Theory, Pattern Recognition and Neural Networks

## Part III Physics Course: Minor option. 12 lectures.

### January 1998-2006.

[alternate web sites? | Cambridge, Europe | Toronto, North America |]

#### What's new, 2006:

1. Supervisions are Thursdays at 2pm and 5.30pm starting Feb 2nd 2006.
2. | Handout 1 (2 pages postscript) | pdf |
3. Slides for lecture 2 are available on line. Here is the problem to think about before lecture 3.
4. Slides for lectures 3-5 are available online (Tue 7/2/06).
5. Supervision page updated (Tue 7/2/06).
6. Slides for lectures 1-12 online Fri 3/3/06.
7. | Handout 2 (2 pages postscript) | pdf |
8. | Handout 3 (1 page postscript) | pdf |

The main thing at this site is the free on-line course textbook Information Theory, Inference and Learning Algorithms, (which now has its own website).

There is also an old Summary of the course.
And, for 2005, Handout 1 (two pages, postscript) | (pdf) |
| Handout 2 (two pages postscript) | pdf |
| Handout 3 (1 page postscript) | pdf |
A sermon on Chi-squared for "hypothesis testing" | postscript | pdf |
Slides from lectures

### Want to ask a question?

- please click to see the FAQ about the book, the course, or the software.

### Want to give feedback on the book, or report typos?

Great, to ask a question about the book please use this FAQ; to report a typo, mail me. THANK YOU! List of corrections already reported

# Mathematical prerequisites for ITPRNN course

The level of Mathematics assumed is 1A Maths for Natural Scientists.

All the following topics will be reviewed.
• Probability and statistics:
• definition of conditional probability, joint probability, marginal probability;
• Bayes' theorem;
• definition of independence;
• mean and standard deviation of a sum of independent random variables; law of large numbers.
• Linear algebra:
• inner products between vectors;
• Taylor expansions and gradients of functions over vector spaces;
• eigenvectors and eigenvalues of symmetric matrices.
• Differentiation using Chain rule.
• Statistical physics:
• definition of partition function;
• obtaining thermodynamic functions from the partition function;
• entropy of a set of N binary spins as a function of the fraction x of spins that are +1;
• relationship between canonical ensemble and microcanonical ensemble, and equivalence of the Boltzmann and Gibbs entropies S=k_B log and S= p_i 1/p_i (for large systems).

### Information Theory, Pattern Recognition and Neural Networks

Minor Option [16 lecture synopsis]
(from 2006, the course is reduced to 12 lectures)
Lecturer: David MacKay
Introduction to information theory [1]
The possibility of reliable communication over unreliable channels. The (7,4) Hamming code and repetition codes.
Entropy and data compression [3]
Entropy, conditional entropy, mutual information, Shannon information content. The idea of typicality and the use of typical sets for source coding. Shannon's source coding theorem. Codes for data compression. Uniquely decodeable codes and the Kraft-MacMillan inequality. Completeness of a symbol code. Prefix codes. Huffman codes. Arithmetic coding.
Communication over noisy channels [3]
Definition of channel capacity. Capacity of binary symmetric channel; of binary erasure channel; of Z channel. Joint typicality, random codes, and Shannon's noisy channel coding theorem. Real channels and practical error-correcting codes. Hash codes.
Statistical inference, data modelling and pattern recognition [2]
The likelihood function and Bayes' theorem. Clustering as an example
Approximation of probability distributions [2]
Laplace's method. (Approximation of probability distributions by Gaussian distributions.)
Monte Carlo methods: Importance sampling, rejection sampling, Gibbs sampling, Metropolis method. (Slice sampling, Hybrid Monte Carlo, Overrelaxation, exact sampling. *)
Variational methods and mean field theory. Ising models.
Data modelling with neural networks [2]
Interpolation and classification using a single neuron. (Multilayer perceptrons. *) Back- propagation algorithm. Learning algorithms viewed in terms of inference.
Neural networks as information storage devices [2]
Capacity of a single neuron. Hopfield network and its relationship to spin glasses. Hopfield network for optimization problems, e.g., travelling salesman problem. Boltzmann machine. Hopfield network as a mean field approximation to the Boltzmann machine. (Boltzmann machine learning algorithm.*) [* = non-examinable]
Bibliography

Back to Synopsis page 1 | Lecture and reading summary

### Bibliography

The course textbook is Information theory, inference, and learning algorithms, by D.J.C.MacKay (2003, C.U.P.) (Rayleigh library: 39 M 20; Betty & Gordon Moore Library: Q360.M33 2003). This 640-page textbook covers the whole course, and a whole lot more. All students are strongly encouraged to buy this textbook. It costs £30; if you buy it at the CUP bookshop and show your University ID, you can get a 20% discount, so it costs £24. You may view the book for free on line. Guarantee: If you buy the book, then decide that you don't want to keep it, I will buy it from you for a good price and sell it on to a future student.

#### Textbook recommendations

Other highly recommended books are as follows; I especially recommend Goldie and Pinch (1991), Bishop (1995), and Sivia (1996), which are all reasonably priced.

For information theory and coding theory, excellent texts are McEliece (1977) and the original book by Shannon and Weaver (1949), but these are hard to find (ask your library to get Shannon (1993)). Three excellent alternatives are Hamming (1986), Goldie and Pinch (1991), and Welsh (1988). Golomb et al. (1994) is readable and discusses the practical side of coding theory as well as information theory. Gallager (1968) is similar and goes into more depth; it's a good book. Cover and Thomas (1991) is also good, though their approach is theoretical rather than practical. An important journal paper on Arithmetic coding is Witten et al. (1987) (available in the Pt II/III library).

For neural networks and pattern recognition, an excellent text is Bishop (1995); Ripley (1996) is also recommended. Ripley's book is encyclopaedic, covering a wide range of statistical models and giving large numbers of citations of the original literature; he includes a set of practical data sets which are referred to frequently throughout the book, and he also goes into some theoretical depth. Ripley's coverage is from the point of view of the statistician. Bishop's perspective is that of the Physicist-Engineer. Real data sets do not appear in Bishop's book, but simple examples are given throughout, and Bishop includes exercises too. An alternative text which emphasises connections between neural networks and statistical physics is Hertz et al. (1991). This text discusses Hopfield networks at length, unlike Bishop (1995) and Ripley (1996). An older text on pattern recognition is Duda and Hart (1973), recently republished (Duda et al., 2000) - recommended. An older book on neural networks which was written at the start of the latest craze of neural nets is Rumelhart and McClelland (1986). It's an exciting read. An excellent book on the state of the art in supervised neural network methods is Neal (1996).

For pure statistical inference, I highly recommend Sivia (1996); Berger (1985) and Bretthorst (1988) (now out of print) are also very good. Jeffreys (1939) is an important classic, and Box and Tiao (1973) is well worth reading too. Connections between statistical inference and statistical Physics are explored in the essential essays of Jaynes (Rosenkrantz, 1983). For further reading on graphical models and Bayesian belief networks, which have widespread importance in the Artifical Intelligence community, Jensen (1996) is recommended; it includes a floppy disc with the Hugin software for simulating Bayesian networks. A more theoretical text on graphical models is Lauritzen (1996). For further reading about probabilistic modelling of proteins and nucleic acids, I highly recommend Durbin et al. (1998).

Berger, J. (1985) Statistical Decision theory and Bayesian Analysis. Springer.

Bishop, C. M. (1995) Neural Networks for Pattern Recognition. Oxford University Press.

Box, G. E. P., and Tiao, G. C. (1973) Bayesian inference in statistical analysis. Addison-Wesley.

Bretthorst, G. (1988) Bayesian spectrum analysis and parameter estimation. Springer. Also available at bayes.wustl.edu.

Cover, T. M., and Thomas, J. A. (1991) Elements of Information Theory. New York: Wiley.

Duda, R. O., and Hart, P. E. (1973) Pattern Classification and Scene Analysis. Wiley.

Duda, R. O., Hart, P. E., and Stork, D. G. (2000) Pattern Classification. New York: Wiley. 2nd Edition.

Durbin, R., Eddy, S. R., Krogh, A., and Mitchison, G. (1998) Biological Sequence Analysis. Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press.

Gallager, R. G. (1968) Information Theory and Reliable Communication. New York: Wiley.

Goldie, C. M., and Pinch, R. G. E. (1991) Communication theory. Cambridge: Cambridge University Press.

Golomb, S. W., Peile, R. E., and Scholtz, R. A. (1994) Basic Concepts in Information Theory and Coding: The Adventures of Secret Agent 00111 . New York: Plenum Press.

Hamming, R. W. (1986) Coding and Information Theory. Englewood Cliffs, NJ: Prentice-Hall, 2nd edition.

Hertz, J., Krogh, A., and Palmer, R. G. (1991) Introduction to the Theory of Neural Computation. Addison-Wesley.

Jeffreys, H. (1939) Theory of Probability. Oxford Univ. Press. 3rd edition reprinted 1985.

Jensen, F. V. (1996) An Introduction to Bayesian Networks. London: UCL press.

Lauritzen, S. L. (1996) Graphical Models. Number 17 in Oxford Statistical Science Series. Oxford: Clarendon Press.

McEliece, R. J. (1977, recently reprinted in 2nd edn by C.U.P.) The Theory of Information and Coding: A Mathematical Framework for Communication. Reading, Mass.: Addison-Wesley.

Neal, R. M. (1996) Bayesian Learning for Neural Networks. Number 118 in Lecture Notes in Statistics. New York: Springer.

Ripley, B. D. (1996) Pattern Recognition and Neural Networks. Cambridge.

Rosenkrantz, R. D. (1983) E.T. Jaynes. Papers on Probability, Statistics and Statistical Physics. Kluwer.

Rumelhart, D. E., and McClelland, J. E. (1986) Parallel Distributed Processing. Cambridge Mass.: MIT Press.

Shannon, C. E. (1993) Collected Papers. New York: IEEE Press. Edited by N. J. A. Sloane and A. D. Wyner.

Shannon, C. E., and Weaver, W. (1949) The Mathematical Theory of Communication. Urbana: Univ. of Illinois Press.

Sivia, D. S. (1996) Data Analysis. A Bayesian Tutorial. Oxford University Press.

Welsh, D. (1988) Codes and Cryptography. Clarendon press.

Witten, I. H., Neal, R. M., and Cleary, J. G. (1987) Arithmetic coding for data compression. Communications of the ACM 30 (6): 520-540.

### Slides

See also The Information Theory, Inference, and Learning Algorithms website, which includes all the figures from the book, for use in teaching.

### OLD Summary of the course

##### Lecture plan, suggested reading, and suggested questions
NB, these suggestions are old, so don't map perfectly onto the 2005 book. (see also supervision recommendations here)
 Week 1: Friday Lecture notes: Ch 1 - Intro to information theory Exercise to do before Wednesday: Ex 4.2 (p.71) Suggested reading: Ch 2 - Probabilities and Inference Suggested examples Ex 1.2 (p.13), 1.8 (p.19) Week 1: Wednesday Lecture notes: Ch 5 (now Ch 4) - Source Coding Theorem Suggested reading: rest of Ch 5 (this is the toughest bit of the course) Suggested examples Ex 2.14, 2.15, 2.18, 2.20, 2.28 (p.40) Week 2: Lecture notes: Ch 6 (now Ch 5) - Symbol Codes Suggested examples Ex 6.19, 6.20, 6.25 (p.111-2) Week 3: Lecture notes: Ch 7 (now Ch 6) - Stream Codes; (Lempel-Ziv not examinable) Reading : Ch 9 (now 8) - Correlated random variables Suggested examples Ex 7.4, 7.8 (p.131), 8.3, 8.5 (147) Ex 9.1, 9.5, 9.7, 9.8 Week 4: Lecture notes: Ch 10-11 (now 9-10) - Communication over noisy channel; Channel coding theorem Suggested examples Ex 10.12, 10.13, 10.15 Week 5: Reading: Ch 3 - More on inference; Ch 12 (12.1-12.2) (now 11.1-11.2) - Inference for Gaussian channels Suggested examples Ex 3.3, 10.19, 10.20, 11.4 Lecture notes: Ch's 24, 25*, 27* (now 29, 30, 32) - Monte Carlo methods Week 6: Lecture notes: Ch 28 (now 33) - Variational methods Reading: Ch 13 (now 12) - Hash codes, efficient information retrieval; Ch 22 - Inference; Ch 26 (now 31) - Ising models. (Ch 13 is not examinable, but I want you to think about the question `how to make a content-addressable memory?') Suggested examples 24.5, 24.9, 26.3 (p. 363). Week 7: Lecture notes: Ch 31, 32, 34 (38, 39, 41) - Neuron. Reading: Ch 33 (details optional) Suggested examples 28.2, 30.1 (p. 383). 32.2 (402), 32.5 (407) Week 8: Lecture notes: Ch 35, 36 (now 42, 43) - Hopfield networks and Boltzmann machines. Suggested examples Automatic clustering: 22.3 (p.304); 35.3, 35.4 (441)

### Supervision information

Supervisions are Thursdays at 2pm and 5.30pm starting Feb 2nd 2006.

#### What work to do

The book contains numerous exercises, complete with worked solutions. A subset of these exercises will be designated as the exercises you should work on each week. [Generally, these exercises are the ones marked `highly recommended' by a marginal rat.] I encourage you not to look at the worked solutions for these exercises before you have attempted them yourself.

In the supervisions, we will focus in detail on just a few of the exercises. (You should work on all the exercises yourselves before supervisions.)

#### About the two weeks of the course so far

The first week's recommended exercises were: 1.3 (p.8), 1.5-7 (p.13), 1.9, & 1.11 (p.14).

The first supervision focused on exercises 1.9 and 1.11 (p.14). You all did a great job on this!

Recommended reading for the first four lectures was: Chapters 1, 2, 4, and 5.

Make sure you are happy with the facts that
a) entropy is maximized by the uniform distribution (exercise 2.25, page 37; solution on page 43)
b) Gibbs inequality (section 2.6) is true (exercise 2.26; solution on p 44).

#### About the next two weeks of supervisions

Recommended exercises before the 2nd supervision are: 5.22, 5.26, 5.27. (page 102-3)

The second and third supervision will focus on the question:

Invent a compressor and uncompressor for a source file of N=10,000 bits, each having probability f=0.01 of being a 1. Implement your inventions and/or estimate how well your method works (ex 5.29, page 103).
If it's helpful to you, here's a Huffman algorithm in perl, and here's a Huffman algorithm in python.

Recommended exercises before the third supervision are: 6.7, 6.17.

Recommended reading: Chapter 6 (sec 6.1-6.3)

Advance reading: I highly recommend that you read ahead into chapters 8, 9, 10, in order to prepare for the lectures on noisy-channel coding.

If any of the recommended exercises cause insurmountable difficulties, please (1) ask a friend; (2) ask me in supervisions; or (3) ask me some other time.

##### (arrangements for 2005)

Supervisions will be at 3pm and 5pm

```Dates:

Week 3	 3 Feb
Week 4	 10 Feb
Week 5	 17 Feb       **
Week 7	 3 Mar
Week 8	 10 Mar
Week 9	 17 Mar

** Note the exclusion of
Week 6	 24 Feb
which is intended to reduce your 6th week blues.
```
All supervisions will be in the Ryle seminar room (Rutherford building, upper floor), except: ** the 10th February supervision at 3pm will be in the Pippard Lecture theatre ***.

### Suggested exercises during week 5

Ex 22.5, mixture of 2 gaussians.

### Suggestions for supervision number 4

Ex 3.10 (p57) (children)
8.10, black and white cards I(;)
9.19 TWOS
9.20, birthday problem
15.5, 15.6, (233) magic trick
22.8 photon counter

8.3 (140), 8.7, 22.4, (mu/sigma)

### Suggestions for supervision number 5

Main activities of the supervision may be: 27.1 laplace approx
and | Handout 2 (two pages postscript) | pdf |

Also recommended: Exercises 29.14, 33.5, 33.7.
22.11 sailor
22.14 maximize density

### Compression challenge - compress me!

One of the main assignments for this course is to write compression and uncompression algorithms to compress a sparse file containing N=10000 bits of which roughly 0.01 are 1s; your algorithm should work well on all such files. Please email to djcm1 the compressed size that your algorithm achieves, and maybe write a brief webpage summarising your method. If you'd like to use the Huffman algorithm in constructing your solution, here's one in perl. [Worked solutions: C programs for compression]

Another of the three main assignments is this:

• implement or describe a method to distinguish, based on string of successive characters in a document,
• whether the document is an English-language document or a totally-random document.
• OR - whether the document is an English-language document or German
• OR - whether the document is an English-language document or backwards-English (siht ekil)
• estimate how many successive characters your method needs in order to work well.
• English and German letter frequencies are provided, in case you find them useful.

An outline of the main topics to be covered in each of the six supervisions is as follows:

1. Design your own code; typicality; Gibbs's inequality
2. Symbol codes, Arithmetic codes [Worked solutions: C programs for compression]
3. Capacity of channel; noisy channel coding theorem
4. Bayesian inference
5. Monte Carlo methods, variational methods
6. Single neuron, Hopfield network

First supervision

``` Some topics for discussion:
1.9      [p14]  make your own code
2.28     [p38]  efficient computation of entropy
2.36     [p39]  Bayes's theorem
```
A list of all the recommended exercises for the course is being refined below.

Before each supervision, please hand in a sheet of paper, with a summary of your REACTIONS to the chapters of the book. Please write one paragraph on each chapter, giving the parts that were difficult to follow, the examples that were helpful, the examples that were not included that should have been, anything that's still unclear or confusing. These reactions will help guide the supervision onto worthwhile topics of discussion. Reactions may be sent by email.
Particularly recommended exercises for 2004 are as follows (with bold highlighting exercises to be discussed in supervisions):
 Exercise Purpose 1.3 (p.7) familiarity with repetition code, and approximation 1.5, 1.6, 1.7 (p.13) familiarity with Hamming code 1.9 make your own code - very likely to be an exam question 2.4, 2.5 (p.27) familiarity with binomial distribution, how to compute variances 2.21-2.26 (p.37) working with probabilities, proving inequalities 2.28 efficient computation of entropy 2.16 (p36) law of large numbers 2.37 Bayes's theorem 2.17 manipulating probabilities 2.20 life in high dimensions 3.3 (p47) Bayes's theorem no specific exercise Typicality 5.21, 5.22 (p.102) Make optimal symbol code 5.29 Creative use of Huffman algorithm - likely to be part of an exam question 6.2, 6.3, 6.9 Arithmetic coding - likely to be part an exam question [From here onwards are the old ex's for 2003; watch this space for updates to 2004.] 8.3 (p.157), 8.4, 8.6, 8.7, 8.10 (159) Two-variable entropy expressions 9.17 (175), 10.12 (196), 15.10, 15.11 (262), 15.13 Computing Capacity of channel 9.20 (176) "Birthday problem" - collisions between random strings 9.19 Modelling real information sources and channels 15.3 (260) 15.4, 15.5 Counting arguments, computing mutual information 24.4, 24.5 (332), 24.8 (337). (Cut from list: 24.12) Finding Maximum Likelihood parameters 24.14 life in high dimensions; atypicality of the most probable state 24.15 occasional silliness of Maximum Likelihood answers 29.1 (377) Laplace approximation 31.13, 31.14 (415) Metropolis method 31.4 (404), 31.5 Gibbs sampling 33.3 (447) Think creatively about making a content-addressable memory 35.5, 35.6, 35.7 Variational methods 37.5 (484) Inference problems 42.1 (510) Perceiving connections between topics 42.2 (513) Deriving simple learning algorithms 42.5 (519) Inferences as "neurons" [43.1 (529) (This topic is probably not examinable)] Counting 43.3 Information in the real world 45.3, 45.4 (551), 45.9 (560) Hopfield net

For further suggested reading and examples before the supervision, see the course summary page.

English and German letter frequencies
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z -
ENGLISH (e) 0.07 0.01 0.03 0.03 0.1 0.02 0.02 0.05 0.06 0.001 0.006 0.03 0.02 0.06 0.06 0.02 0.0009 0.05 0.05 0.08 0.02 0.008 0.02 0.002 0.01 0.0008 0.17
GERMAN (g) 0.06 0.02 0.03 0.04 0.15 0.01 0.03 0.04 0.07 0.002 0.01 0.03 0.02 0.08 0.02 0.007 0.0002 0.06 0.06 0.05 0.04 0.006 0.02 0.0003 0.0003 0.01 0.14
 The entropies of these two distributions are H(e) = 4.1 bits; H(g) = 4.1 bits; and the relative entropies between them are D_KL( e || g) = 0.16 bits and D_KL( g || e) = 0.12 bits. The relative entropies between the uniform distribution u and the English distribution e are D_KL( e || u) ~= 0.6 bits and D_KL( u || e) ~= 1 bits.

### Information Theory, Inference, and Learning Algorithms

(Hardback, 640 pages, Published September 2003)

Price: £35.00 / \$60.00 from |CUP UK/USA| |amazon.co.uk/.com/.ca/.co.jp| |Barnes and Noble USA | booksprice. | fetchbook.info | allbookstores | biggerbooks | blackwells | kalahari.net (South Africa) | Special Paperback edition for South Asia.|

You can browse and search the book on Google books.

 U.K. Canada South Africa PDF (A4) pdf (9M) (fourth printing, March 2005) pdf pdf Postscript (A4) postscript (fourth printing, March 2005) (5M) postscript postscript DJVU djvu file (6M) (2nd printing) djvu file djvu file (djvu information | Download djView) Just the words (latex) [provided for convenient searching] (2.4M) (latex) (latex) Just the figures NEW All in one file (postscript) [provided for use of teachers] (2M) (pdf) (5M) (ps.gz) (pdf) In individual eps files Individual chapters postscript and pdf available from this page mirror mirror
• Version 6.0 was released Thu 26/6/03; the book is finished. You are welcome to view the book on-screen. Version 6.0 was used for the first printing, published by C.U.P. September 2003.
• Version 6.6 was released Mon 22/12/03; it will be used for the second printing, to be released January 2004. In this second printing, a small number of typographical errors were corrected, and the design of the book was altered slightly. Page-numbering generally remains unchanged, except in chapters 1, 6, and 28, where a few paragraphs, figures, and equations have moved around. All equation, section, and exercise numbers are unchanged.
• Version 7.0 is the third printing (November 2004). Its only differences from the 2nd printing are a number of corrections, and the renaming of the chapter `Correlated Random Variables' to `Dependent Random Variables'.

Copyright issues: The book is copyright (c) Cambridge University Press. It has been available in bookstores since September 2003. The cover price in 2003 was 30 pounds (UK) and \$50 (USA); in 2006, 35 pounds and \$60 (USA).

Now the book is published, these files will remain viewable on this website. The same copyright rules will apply to the online copy of the book as apply to normal books. [e.g., copying the whole book onto paper is not permitted.]

History:
Draft 1.1.1 - March 14 1997.
Draft 1.2.1 - April 4 1997.
Draft 1.2.3 - April 9 1997.
Draft 1.2.4 - April 10 1997. Margins altered so as to print better on Northamerican paper
Draft 1.3.0 - December 23 1997.
Draft 1.9.0 - Feb 1 1999.
Draft 2.0.0 - Jan 6 2000. New page layout.
Draft 2.2.0 - Dec 23 2000. Fresh draft.
Draft 3.1415 - Jan 12 2003. Nearly finished.
Draft 4.0 - April 15 2003. Chapter sequence finalized.
Draft 4.1 - April 18 2003. Adding new frontmatter (Preface etc)to book. Corrected printing of letter version.
Version 6.0 - Thu 26 June 2003. Final version.
Version 6.6 - Mon 22 December 2003. Second printing.

• Here is my method for converting to two-up under linux:
```pstops '4:0L@.67(20cm,1cm)+1L@.67(20cm,15cm),3R@.67(1cm,15.25cm)\
+2R@.67(1cm,29.25cm)' \$*.ps \$*.dps ```

#### Errors in the book

I would be grateful to hear about errors in my lecture notes. Prizes are awarded to those who find errors that have not already been detected.

Please send me new corrections by email. Thank you!

 If you have a query about the book that you think other people might also ask, please use one of the following links to submit your query through metaFAQ; you may find the answer is already there: Query categories Any other query

Draft covers for the book by DJCM.

### `Random numbers are valuable!'

For interesting discussion of the The Marsaglia CD-ROM see true_rng.html.

### Introduction to Statistical Thinking

-- Michael Lavine offers a free book on statistics, emphasizing the likelihood principle.

### Software, demonstrations, etc.

The tcl programs and other links marked (*) can be run directly in netscape if you have got the tcl/tk plugin. You can also run the tcl programs in the normal way if you have a linux machine with X windows and tcl and you download the tcl files then execute them (this is preferable to the netscape approach, I think). [The top four lines of each tcl file contain flags which change the program from working with the netscape plugin to working alone under X.] The other programs can be used by people on unix systems that have the relevant software (perl, gnuplot, octave.)

Please select a software category from the sidebar.

The tcl programs and other links marked (*) can be run directly in netscape if you have got the tcl/tk plugin. You can also run the tcl programs in the normal way if you have a linux machine with X windows and tcl and you download the tcl files then execute them (this is preferable to the netscape approach, I think). [The top four lines of each tcl file contain flags which change the program from working with the netscape plugin to working alone under X.] The other programs can be used by people on unix systems that have the relevant software (perl, gnuplot, octave.)

### Inference methods

The tcl programs and other links marked (*) can be run directly in netscape if you have got the tcl/tk plugin. You can also run the tcl programs in the normal way if you have a linux machine with X windows and tcl and you download the tcl files then execute them (this is preferable to the netscape approach, I think). [The top four lines of each tcl file contain flags which change the program from working with the netscape plugin to working alone under X.] The other programs can be used by people on unix systems that have the relevant software (perl, gnuplot, octave.)

### Online Hamiltonian Monte Carlo Demo

(formerly known as `Hybrid Monte Carlo')

Task: sample from the posterior distribution of a bivariate Gaussian distribution given ten data points. The left-hand image shows the details underlying a sequence of HMC transitions. The right-hand image shows 50 of the successive states produced by the Markov chain - the endpoints of the trajectories.

fit.tar tar file giving octave source code.

The tcl programs and other links marked (*) can be run directly in netscape if you have got the tcl/tk plugin. You can also run the tcl programs in the normal way if you have a linux machine with X windows and tcl and you download the tcl files then execute them (this is preferable to the netscape approach, I think). [The top four lines of each tcl file contain flags which change the program from working with the netscape plugin to working alone under X.] The other programs can be used by people on unix systems that have the relevant software (perl, gnuplot, octave.)

The tcl programs and other links marked (*) can be run directly in netscape if you have got the tcl/tk plugin. You can also run the tcl programs in the normal way if you have a linux machine with X windows and tcl and you download the tcl files then execute them (this is preferable to the netscape approach, I think). [The top four lines of each tcl file contain flags which change the program from working with the netscape plugin to working alone under X.] The other programs can be used by people on unix systems that have the relevant software (perl, gnuplot, octave.)

## Other files

• Images for Chromatic abberation demo: | 1 | 2 |

#### Links to other people's software

Haskell (Naive) Bayes library by Frederik Eaton

## Further references

#### Bayesian classification, clustering

Cheeseman, P., Stutz, J., Self, M., Taylor, W., Goebel, J., Volk, K., and Walker, H. 1989. Automatic Classification of Spectra From the Infrared Astronomical Satellite (IRAS), NASA Reference Publication #1217, National Technical Information Service, Springfield, Virginia.

Goebel, J., Volk, K., Walker, H., Gerbault, F., Cheeseman, P., Self, M., Stutz, J., and Taylor, W.: 1989, `A Bayesian Classification of the IRAS LRS Atlas', Astron.Astrophys., 222, L5.

### Administrative stuff from 1999, 2000

synopsis (ftp from cambridge) 2 pages.
Postscript version of synopsis, including updated bibliography. (the 1997 version of the synopsis is available as a text file)
Old exam questions with worked solutions
(Many more such questions are also included in the book.)

### Compression challenge - compress me!

One of the main assignments for this course is to write compression and uncompression algorithms to compress a sparse file containing N=10000 bits of which roughly 0.01 are 1s; your algorithm should work well on all such files. Please email to djcm1 the compressed size that your algorithm achieves, and maybe write a brief webpage summarising your method. If you'd like to use the Huffman algorithm in constructing your solution, here's one in perl. [Worked solutions: C programs for compression]