“This per-state normalization of transition scores implies a “conservation of score mass” (Bottou,
1991) whereby all the mass that arrives at a state must be distributed among the possible successor states. An observation can affect which destination states get the mass, but not how much total mass to pass on. This causes a bias toward states with fewer outgoing transitions. In the extreme case, a state with a single outgoing transition effectively ignores the observation. In those cases, unlike in HMMs, Viterbi decoding cannot downgrade a branch based on observations after the branch point, and models with statetransition
structures that have sparsely connected chains of states are not properly handled. The Markovian assumptions
in MEMMs and similar state-conditional models insulate decisions at one state from future decisions in a way
that does not match the actual dependencies between consecutive states.”
Label Bias Problem
Conditional Random Fields Probabilistic Models for Segmenting and Labeling Sequence Data