Hidden Markov Models

Introduction and Usage

Hidden Markov Models are used in multiple areas of Machine Learning, such as speech recognition, handwritten letter recognition or natural language processing.

Formal Definition

A Hidden Markov Model (HMM) is a statistical model of a process consisting of two (in our case discrete) random variables O and Y, which change their state sequentially. The variable Y with states {y_1, … , y_n} is called the “hidden variable”, since its state is not directly observable. The state of Y changes sequentially with a so called - in our case first-order

Formally, an HMM is defined as a tuple M=(n,m,P,A,B), where n is the number of hidden states, m is the number of observable states, P is an n-dimensional vector containing initial hidden state probabilities, A is the nxn-dimensional “transition matrix” containing the transition probabilities such that A[i,j](i,j.html) =P(Y(t)=y_i|Y(t-1)=y_j) and B is the mxn-dimensional “emission matrix” containing the observation probabilities such that B[i,j]= P(O=o_i|Y=y_j).

Problems

Rabiner [1](1.html) defined three main problems for HMM models:

  1. Evaluation: Given a sequence O of observations and a model M, what is the probability P(O|M) that sequence O was generated by model M. The Evaluation problem can be efficiently solved using the Forward algorithm
  2. Decoding: Given a sequence O of observations and a model M, what is the most likely sequence Y*=argmax(Y) P(O|M,Y) of hidden variables to generate this sequence. The Decoding problem can be efficiently solved using the Viterbi algorithm.
  3. Learning: Given a sequence O of observations, what is the most likely model M*=argmax(M)P(O|M) to generate this sequence. The Learning problem can be efficiently solved using the Baum-Welch algorithm.

Example

To build a Hidden Markov Model and use it to build some predictions, try a simple example like this:

Create an input file to train the model. Here we have a sequence drawn from the set of states 0, 1, 2, and 3, separated by space characters.

$ echo "0 1 2 2 2 1 1 0 0 3 3 3 2 1 2 1 1 1 1 2 2 2 0 0 0 0 0 0 2 2 2 0 0 0 0 0 0 2 2 2 3 3 3 3 3 3 2 3 2 3 2 3 2 1 3 0 0 0 1 0 1 0 2 1 2 1 2 1 2 3 3 3 3 2 2 3 2 1 1 0" > hmm-input

Now run the baumwelch job to train your model, after first setting MAHOUT_LOCAL to true, to use your local file system.

$ export MAHOUT_LOCAL=true
$ $MAHOUT_HOME/bin/mahout baumwelch -i hmm-input -o hmm-model -nh 3 -no 4 -e .0001 -m 1000

Output like the following should appear in the console.

Initial probabilities: 
0 1 2 
1.0 0.0 3.5659361683006626E-251 
Transition matrix:
  0 1 2 
0 6.098919959130616E-5 0.9997275322964165 2.1147850399214744E-4 
1 7.404648706054873E-37 0.9086408633885092 0.09135913661149081 
2 0.2284374545687356 7.01786289571088E-11 0.7715625453610858 
Emission matrix: 
  0 1 2 3 
0 0.9999997858591223 2.0536163836449762E-39 2.1414087769942127E-7 1.052441093535389E-27 
1 7.495656581383351E-34 0.2241269055449904 0.4510889999455847 0.32478409450942497 
2 0.815051477991782 0.18494852200821799 8.465660634827592E-33 2.8603899591778015E-36 
14/03/22 09:52:21 INFO driver.MahoutDriver: Program took 180 ms (Minutes: 0.003)

The model trained with the input set now is in the file ‘hmm-model’, which we can use to build a predicted sequence.

$ $MAHOUT_HOME/bin/mahout hmmpredict -m hmm-model -o hmm-predictions -l 10

To see the predictions:

$ cat hmm-predictions 
0 1 3 3 2 2 2 2 1 2

Resources

[1] Lawrence R. Rabiner (February 1989). “A tutorial on Hidden Markov Models and selected applications in speech recognition”. Proceedings of the IEEE 77 (2): 257-286. doi:10.1109/5.18626.