The HMM is calculated out of two processes, a forward and backward process. The model itself consists of two matrices and a prior matrix. A state transition matrix $A$ of $M \times M$ representing the likelihood of transitioning to state $X_{t+1}$ at time $t+1$ from the previous state $X_t$ at time $t$. The values $a_{ij}$ represent the transition from $X_t$ to $X_{t+1}$ hence $i = t$ and $j = t + 1$. The table forms the likelihood $P(X_{t+1}|X_t)$ such that $$ a_{ij} = P(x_j | x_i) $$ The second matrix $B$ represents a state evidence transition matrix. Given $M$ states there are $N$ evidence variables, hence $B$ is an $M \times N$ matrix where $$ b_{ij} = P(x_j | e_i) $$ Additionally there is a prior probability for each state $X$ provided by a vector $\pi$ of length $M$ where $$ \pi_i = P(x_i) $$ The Hidden Markov model comprises of $$ HMM = \lambda = (A, B, \pi) $$ The procedure of learning the hidden markov model is to seek to learn the posterior matrices for $B$ given an initial condition. This implementation addresses the discrete model, rather than the continuous model and seeks to learn the matrix $A$ relating state $X$ to the other states $x$ and the matrix $B$ relating the state $X$ to the evidence variable $e$ give the stream of previous events in time $1 : t$ it seeks to use the posterior model to predice the next state at time $t + k + 1$ and the prior states from the sequential model in time $t + k$. $$ P(X_{t+k+1}|e_{1:t}) = \sum_{x_t+k} P(X_{t+k+1}|x_{t+k})P(x_{t+k}|e_{1:t}) $$ Note that both $A$ and $B$ are as the likelihood model, and $\pi$ is initialised as the prior. This is estimated from the training data. The raw format of the training data is a jagged array which consists of a
List[List[String]]
Where the last item in the list represents a discrete state label for the domain.
Each item preceding that represents a discrete token for an evidence label in the domain.
So each row in the jagged array contains a variable sequence of evidence labels and terminates with a state
label
which is emitted by the sequence.
Both the domain of evidence labels and state labels are deterministic in this model.
For example, if we have a simple weather world and we observe two evidence variables:
Umbrella, NoUmbrella
And we have two state variables,
Raining,Sunny
We can describe a sequence that leads up to Raining as
Umbrella,Raining
Or
Umbrella,Umbrella,Raining
We may potentially have a sequence that terminates in Sunny as
Umbrella,Umbrella,NoUmbrella,Sunny
For example.
The matrix $A$ is estimated by identifying the sequences of length $n$ and $n+1$ and then counting the frequency
of the transition between $x_n$ and $x_{n+1}$.
$$
a_{ij} = \frac{c_{ij}}{\sum_k c_{ik}}
$$
The evidence matrix $B$ is estimated by identifying the frequency of $b_{m}$ terminating in $x_{n+1}$
for rows in the traning data $M$ and terminal states in $N$ lines ending in state $x_n$.
$$
b_{ij} = \frac{ c(e_i, x_j) }{ \sum_k c(e_i, x_k) }
$$
The class [[au.id.cxd.math.data.SequenceEstimation]] is used in this library to estimate these original
likelihoods
and the prior distribution for $X$.
$$
\pi_i = \frac{c_i}{\sum_k c_{ik}}
$$
val fileName = "data/example_train_data.csv"
val file = new File(fileName)
val reader = SequenceReader()
// the data set (jagged array)
val data = reader.readSequences(file)
// unique set of states
val states = reader.readStates (data)
// the unique labels for evidence variables
val evidenceVars = reader.readEvidenceVars (data)
After which the sequence estimator is used to generate the initial parameters of the model.
val estimator = SequenceEstimation()
val pi = estimator.statePriors(data)(states)
val A = estimator.stateTransitions(data)(states)
val Bk = estimator.avgPriorEvidences(data)(evidenceVars)(states)
val input = InputModel(pi, A, List(Bk), states, evidenceVars)
The input model is then used to train the HMM
val model = HiddenMarkovModel.train(input)(data)(0.00001)(50)
The parameters include the threshold for convergence and maximum iterations.
Once trained the model can be used for prediction.
The example below is from the "CTI" world where telephony events result in a call state.
Refer also to the TestTrainRainModel for the weather world example.
val test = List("Ringing(inbound)",
"UserEvent(Start)",
"UserEvent(Stop)",
"OffHook",
"Established",
"Held",
"Dialing(Consult)");
val predict1 = HiddenMarkovModel.viterbiPredict (model) (test)
The string of events that are sent to the predict model do not include the end state.
The prediction model will return a result for each evidence transition in the input chain.
For example, from the input above, each step in the sequence above is categorised
with a corresponding state label, and the probability of that state label.
List({
prob: 6.60507635396296E-44,
state: OnCall
evidence: Ringing(inbound)
T: 1
success: true
}, {
prob: 7.758309697332233E-70,
state: Started
evidence: UserEvent(Start)
T: 2
success: true
}, {
prob: 3.2302220856424765E-94,
state: Paused
evidence: UserEvent(Stop)
T: 3
success: true
}, {
prob: 1.6782307523456403E-112,
state: OnCall
evidence: OffHook
T: 4
success: true
}, {
prob: 1.1980508062616898E-131,
state: OnCall
evidence: Established
T: 5
success: true
}, {
prob: 1.0950613538482917E-145,
state: OnHold
evidence: Held
T: 6
success: true
}, {
prob: 1.0950613538482917E-145,
state: Consult
evidence: Dialing(Consult)
T: 7
success: true
})
For further reading refer to: