Using BayeSys  Notes by David MacKay [BayeSys home  David MacKay home] 
Mon 29/3/04
Here are my notes on implementing a factor analysis model in BayeSys. I started from the BayeSys3 tar file provided in the test directory.
// Initialise Likelihood Common>Ndata = Ndata; // # data Common>Data = Data; // data [Ndata] Common>Acc = Acc; // accuracies = 1/sigma [Ndata]which link the Data array so that it's accessible from the Common structure.
In order to write my own likelihood, I first need to instantiate the parameters in such a way that the prior is uniform. This means that we need inverse distribution functions.
In a factor analysis model I will need Gaussian and Gamma priors.
x ~ dnorm( mu , tau )Can be mapped from a uniform distributed "Cube" variable c via BayeShape Shape=5 `Bell'.
Generating a sample from a Gamma is not simple. In /home/mackay/_doc/dirichlet/sample/sample.p I get Gamma variates using
$command = sprintf ( "/home/mackay/ansi/ranlib.C/test/tst %d %f %s%s  " , $n , $uo, $seed.$r ); /home/mackay/ansi/ranlib.C/src/ranlib.cwhich uses various methods, including rejection sampling, depending on the mean of the standard gamma distbn required.
The cdf of the standard Gamma (f(x)=x^{gamma1} e^{x}/Gamma(gamma)) is F(x) = Gamma_x(gamma)/Gamma(gamma), where Gamma_x is the incomplete gamma function.
Hmm, let's read the manual. On page 23, the BayeShape facility is described; it doesn't mention Gamma explicitly. Perhaps the simplest course of action is to replace Gamma priors by lognormal.
[I see that Shape 2, Simplex volume, has a connection to incomplete Gamma.]
mu ~ dnorm(0,1) Data ~ dnorm(mu,1)which should give (since the variance of D is 2.0)
Evidence = log P(D) = D**2/4  log(sqrt(2*pi * 2)) = 2.2655.I ran the model using bayestoy.c (modified by me) and onegaussian.c (modified from bayesapp.c) (Makefile). I tried setting the Data and Accuracy to various values, checking the resulting values of <Mock> and Evidence. For extreme data values, the answers were incorrect. (Which seems reasonable!) Here are two examples, a moderately extreme case with perfect answer, and a really extreme case with wrong answer. All results came out much the same for any value of rate from 0.1 down to 0.0001.
Data, Acc  Correct <Mock> and Evidence  Results <Mock> and Evidence 

4, 10  3.96 and 8.8447  3.96 and 8.75  Good. 
20, 1  10 and 101.2655  6.3 and 116.44  Bad. (Result did not improve if small rate was used) 
Further investigation indicates that "Bad" results can occur when a datum requires a parameter to take on a 7sigma outlying value. 6sigma outliers seem to be all right.
This is exactly what's expected: real numbers are represented in 32 bits. So anything beyond mass 2^{32} in the tail of the prior will be illrepresented. Now, at 7sigma, the tail probability is roughly exp(49/2)  say 1e12; which is 2^{40}. So in fact we should not expect a 7sigma event to be represented at all. 6sigma corresponds to 30 bits.
I made a sequence of factor analysis models. They are available in fa.c, fa2.c, fa3.c, (mainfa.c, mainfa2.c, mainfa3.c), and have increasing numbers of parameters of an increasing number of types. To do simple Bayesian models like FA I force the number of "atoms" to be 1. I encountered a bug in my first attempt, in which BayeSys requested the likelihood for an object that had Natoms=0. I was not expecting this. I fixed the problem by modifying UserBuild so that it returns "0" instead of "1" in this situation.
fa3.c implements a 3dimensional factor analysis model in which there are six parameters: three components V[1], V[2], V[3] for the loading vector, which come from a unit normal prior, and three diagonal noise variances D[1], D[2], D[3], which have a lognormal prior. The generative model for x (a 3component vector) is
P(x[1..3]) = Normal( 0, C )where the covariance matrix C is
C = D + s^{2} V V^{T}where D is a diagonal matrix made from D[1], D[2], D[3], and s^{2} is a fixed variance parameter.
Rather than explicitly enumerate N data points x, I work with the sufficient statistics, {N, <x x^{T}> } and write down the likelihood analytically. The parameter Ndata counts the number of sufficient stats, not the number of data points. The number of data points is placed in the UserCommonStr which is defined in userstr3.h, along with other essential fixed model parameters such as the variance s^{2} and the fixed hyperparameters of the lognormal prior.
In the output vector Mock, I keep track of what the sufficient statistics would be if perfect data were generated from the present parameters.
Results look fine. Mock matches Data nicely.
Aside: I notice that results obtained differ depending on which of my linux machines runs the program. A redhat machine and a Debian give slightly different answers.
In fa4.c, (mainfa4.c), I make the parameter s^{2} a free parameter with a log normal prior. When it has a fairly narrow prior (a s.d. of +/2 over log(s^{2})), the evidence changes almost negligibly
from Evidence = 6843.40 (fa3) to Evidence = 6845.81 (fa4)which is very appropriate. Increasing the prior width over log(s^{2}) by a factor of e^{2} should simply decrease the evidence by 2.0 nats. Check:
Evidence = 6848.04Bingo.
In fa5.c, (mainfa5.c), I reduce the number of free parameters again (in order to carry out model comparison between two factor analysis models, one of which claims to know the loading vector V).
I set the known V to (1,1,1), which is not wellmatched to the data. The evidence drops to
Evidence = 6980.84(a drop of 130, with a data set of 1000 points, so a KL of 0.13 per data point, which seems reasonable).
# sufficient stats for aa.dat # N = 152 # sums: # 0.00037766 0.0002665 0.0002048 # # Covariance sums: 3.2469e+06 2.81641e+06 5.69258e+06 3.75222e+06 4.88881e+06 2.13158e+07
Evidence = 3004.90 (log[e]) Information = 42.62 (log[e]) Data Mock 3246900.00 3203229.85 2816410.00 3202008.81 5692580.00 3202008.81 3752220.00 4547676.00 4888810.00 3202008.81 21315800.00 16026978.76
Evidence = 3015.90 (log[e]) Information = 53.98 (log[e]) Data Mock 3246900.00 3167927.62 2816410.00 2741839.82 5692580.00 4119870.49 3752220.00 3701758.68 4888810.00 4680292.26 21315800.00 20394359.60
Conclusion: while neither model reproduces the six data values perfectly, the evidence is exp(11) in favour of the (1,1,1) hypothesis.
For realistic problems, we want more factors. Rewrite the likelihood routine for the general case.