---
layout: default
title: Hidden Markov Models
---
# Hidden Markov Models
## Introduction and Usage
Hidden Markov Models are used in multiple areas of Machine Learning, such
as speech recognition, handwritten letter recognition or natural language
processing.
## Formal Definition
A Hidden Markov Model (HMM) is a statistical model of a process consisting
of two (in our case discrete) random variables O and Y, which change their
state sequentially. The variable Y with states \{y_1, ... , y_n\} is called
the "hidden variable", since its state is not directly observable. The
state of Y changes sequentially with a so called - in our case first-order
- Markov Property. This means, that the state change probability of Y only
depends on its current state and does not change in time. Formally we
write: P(Y(t+1)=y_i|Y(0)...Y(t)) = P(Y(t+1)=y_i|Y(t)) = P(Y(2)=y_i|Y(1)).
The variable O with states \{o_1, ... , o_m\} is called the "observable
variable", since its state can be directly observed. O does not have a
Markov Property, but its state probability depends statically on the
current state of Y.
Formally, an HMM is defined as a tuple M=(n,m,P,A,B), where n is the number of hidden states, m is the number of observable states, P is an n-dimensional vector containing initial hidden state probabilities, A is the nxn-dimensional "transition matrix" containing the transition probabilities such that A\[i,j\](i,j\.html)
=P(Y(t)=y_i|Y(t-1)=y_j) and B is the mxn-dimensional "emission matrix"
containing the observation probabilities such that B\[i,j\]=
P(O=o_i|Y=y_j).
## Problems
Rabiner \[1\](1\.html)
defined three main problems for HMM models:
1. Evaluation: Given a sequence O of observations and a model M, what is
the probability P(O|M) that sequence O was generated by model M. The
Evaluation problem can be efficiently solved using the Forward algorithm
2. Decoding: Given a sequence O of observations and a model M, what is
the most likely sequence Y*=argmax(Y) P(O|M,Y) of hidden variables to
generate this sequence. The Decoding problem can be efficiently solved
using the Viterbi algorithm.
3. Learning: Given a sequence O of observations, what is the most likely
model M*=argmax(M)P(O|M) to generate this sequence. The Learning problem
can be efficiently solved using the Baum-Welch algorithm.
## Example
To build a Hidden Markov Model and use it to build some predictions, try a simple example like this:
Create an input file to train the model. Here we have a sequence drawn from the set of states 0, 1, 2, and 3, separated by space characters.
$ echo "0 1 2 2 2 1 1 0 0 3 3 3 2 1 2 1 1 1 1 2 2 2 0 0 0 0 0 0 2 2 2 0 0 0 0 0 0 2 2 2 3 3 3 3 3 3 2 3 2 3 2 3 2 1 3 0 0 0 1 0 1 0 2 1 2 1 2 1 2 3 3 3 3 2 2 3 2 1 1 0" > hmm-input
Now run the baumwelch job to train your model, after first setting MAHOUT_LOCAL to true, to use your local file system.
$ export MAHOUT_LOCAL=true
$ $MAHOUT_HOME/bin/mahout baumwelch -i hmm-input -o hmm-model -nh 3 -no 4 -e .0001 -m 1000
Output like the following should appear in the console.
Initial probabilities:
0 1 2
1.0 0.0 3.5659361683006626E-251
Transition matrix:
0 1 2
0 6.098919959130616E-5 0.9997275322964165 2.1147850399214744E-4
1 7.404648706054873E-37 0.9086408633885092 0.09135913661149081
2 0.2284374545687356 7.01786289571088E-11 0.7715625453610858
Emission matrix:
0 1 2 3
0 0.9999997858591223 2.0536163836449762E-39 2.1414087769942127E-7 1.052441093535389E-27
1 7.495656581383351E-34 0.2241269055449904 0.4510889999455847 0.32478409450942497
2 0.815051477991782 0.18494852200821799 8.465660634827592E-33 2.8603899591778015E-36
14/03/22 09:52:21 INFO driver.MahoutDriver: Program took 180 ms (Minutes: 0.003)
The model trained with the input set now is in the file 'hmm-model', which we can use to build a predicted sequence.
$ $MAHOUT_HOME/bin/mahout hmmpredict -m hmm-model -o hmm-predictions -l 10
To see the predictions:
$ cat hmm-predictions
0 1 3 3 2 2 2 2 1 2
## Resources
\[1\]
Lawrence R. Rabiner (February 1989). "A tutorial on Hidden Markov Models
and selected applications in speech recognition". Proceedings of the IEEE
77 (2): 257-286. doi:10.1109/5.18626.