PJ Steiner:CRFall
From OpenWetWare
Notes on using Kevin Murphy's CRF matlab toolbox.
- Kevin Murphy's page: http://www.cs.ubc.ca/~murphyk/Software/CRF/crf.html
- Another page with some notes: http://people.cs.uchicago.edu/~dinoj/crfallnotes.html (I'm not sure it's necessary to add a feature w/ value = 1.0 to every feature vector as done there.)
Contents |
Getting it to work
To get testCRF
to work, you'll probably need to recompile repmatC.c
in the directory KPMtools
. (It won't work if you're on *nix, and it won't be found if you're on OS X. I haven't tried on Windows.) Do this using mex
. It may be on your path in *nix:
mex /path/to/CRFall/KPMtools/repmatC.c
It won't be on your path in OS X:
/Applications/MATLAB_R2008b.app/bin/mex /path/to/CRFall/KPMTools/repmatC.c
Linear Chain CRFs (crfchain
)
I'm only working with the functions for linear chain CRFs. I think what follows is correct.
Limitations
You don't define 'feature functions' normally --- you don't define arbitrary functions of the form each with a weight λ_{k}.
Instead, you compute a 'feature vector' for each position in the sequence which is input to both the training and inference functions. Note that the symbol at position i has to be part of this feature vector. (For DNA, I defined four features corresponding to {A, T, G, C} and set the appropriate feature to one for each position.)
This effectively restricts you to two types of feature functions:
- f_{k}(y_{t − 1},y_{t})
- These are potentials between states, and they're constant (can't be sequence dependent). The code learns these automatically.
- These are the features in the feature vectors. Their values depend only on the input sequence. That sounds bad, but, the code learns weights λ_{kq} --- each is specific to a feature k and a state q.
I don't think this is as powerful as a general linear chain CRF, but it's still more powerful than an HMM.
Training
To train, you need cell array of matrices X{s}(f,t)
and a cell array of row vectors Y{s}(t)
. X{s}
is a matrix --- each row corresponds to a feature, and each column corresponds to position in the sequence. Y{s}
is a row vector containing the correct state labels for the sequence. (I believe state labels must be integers --- they're used to index matrices.)
The function crfchaintrain
does the training:
chain = crfchaintrain(chain, X, Y, ... 'gradAlgo', 'scg', ... 'checkGrad', 'on', ... 'MaxIter', max_iter, ... 'verbose', 1);
Only chain
, X
, and Y
are required arguments; the others are options.
Note: I've found that increasing the number of states (i.e. labels) can really increase running time.
Inference
To infer hidden states using a trained CRF chain, you need a matrix X_test(f,t)
of feature vectors for the unlabeled sequence. (As above, rows correspond to individual features, columns correspond to elements in the sequence.) You infer like so:
bel_chain = crfchaininfer(chain, X_test); [V, I] = max(bel_chain);
crfchaininfer
computes probabilities for each state at each symbol in the sequence. (Rows are states, columns are symbols.) The call to max
gives you two row vectors. V
has the biggest energy/probability for each element in the sequence; I
has the index into each column (i.e. the state) corresponding to that biggest value.