l want to design a hidden markov model to correct typos in texts without a dictionary .
In this problem, a state refers to the correct letter that should have been typed, and an observation refers to the actual letter that is typed. Given a sequence of outputs/observations (i.e., actually typed letters), the problem is to reconstruct the hidden state sequence (i.e., the intended sequence of letters). Thus, data for this problem looks like:
[('t', 't'), ('h', 'h'), ('w', 'e'), ('k', 'm')] [('f', 'f'), ('o', 'o'), ('r', 'r'), ('m', 'm')]
The first example is misspelled: the observation is thwk while the correct word is them. The second example is correctly typed.
To do so, l traind first order and second order HMM.
But they only handle substitution errors. How can l extend this model to also handle noisy insertion of characters and omitted characters (identify and add the omitted characters) ?
I have no idea but, When you get it working I would really like to use it for my forum posts.
You are asking in the wrong place. You need help with HMMs themselves -- once you know how to build it, and can explain clearly what needs to be done, we can help if you choose to implement this process in LabVIEW (is this really the optimal language for such a problem?).