Hello,
l want to design a hidden markov model to correct typos in texts without a dictionary .
In this problem, a state refers to the correct letter that should have been typed, and an observation refers to the actual letter that is typed. Given a sequence of outputs/observations (i.e., actually typed letters), the problem is to reconstruct the hidden state sequence (i.e., the intended sequence of letters). Thus, data for this problem looks like:
[('t', 't'), ('h', 'h'), ('w', 'e'), ('k', 'm')]
[('f', 'f'), ('o', 'o'), ('r', 'r'), ('m', 'm')]
The first example is misspelled: the observation is thwk while the correct word is them. The second example is correctly typed.
To do so, l traind first order and second order HMM.
But they only handle substitution errors. How can l extend this model to also handle noisy insertion of characters and omitted characters (identify and add the omitted characters) ?
Thank you