Has anyone been successful in developing a robust OCR application capable of reading multiple fonts?

FM82 · ‎09-03-2015

Hi again,

i need to build an application capable of performing OCR task in food labels. The application must read the name and the expiration date printed on the label. I know that illumination must be controlled and we are thinking on a way to do this. My doubt is: the OCR module present in labview can perform the task that i need to do or i need to search for something in another programming language (maybe neural network?). Is there someone who can give me an estimate of the number of examples per character needed to correctly train the labview OCR module?

BruceAmmons · ‎09-03-2015

If you have significantly different fonts, it might be better to create an OCR library for each font, then try to read the label using each library in succession.

The problem is that some characters in one font might look like different characters in another font, which confuses the OCR system. For example, B and 8 are always troublesome.

Bruce

Bruce Ammons
Ammons Engineering

FM82 · ‎09-04-2015

Thank you Bruce, i understand what you mean. Anyway i had done some test and seems to me that the most difficult part is correctly separate the characters. Am i doing something wrong or is a limit of Labview's OCR system? In this second case can you give me any advice?

FM

BruceAmmons · ‎09-04-2015

That's another thing. For each font, there is usually a way to separate the characters, but the settings can be very different for each font. You would probably need to have separate settings for each font when you loop through the reading steps.

If you can post a couple of sample images, we might be able to make suggestions on the settings for separating the characters.

Bruce

Bruce Ammons
Ammons Engineering

FM82 · ‎09-07-2015

Here an example taken from google. I find difficult to separate "fo" "ro" and "zy". Advices?

Thank you for your time!

b.ploetzeneder · ‎09-09-2015

(double post)

b.ploetzeneder · ‎09-09-2015

((double post)) /.. comeon NI forum

b.ploetzeneder · ‎09-09-2015

OK, I think the problem in the example you are showing about is tokenization.

There are a number of steps involved in every OCR application: segmentation, tokenization, training, reading and verification. Segmentation is the problem of distinguishing noise, background and signal, aka can be solved with proper imaging techniques (proper light, often filters, etc. .. mostly hardware, but there is stuff that can be done with software) (Looking at you, new lovely flatfield correction in Vision 2015).

Tokenization is the task of separating particles into single characters.

Training is using machine learning algorithms to teach LabVIEW to distinguish between those characters.
Reading is applying that stuff to unknown data.
And verification is grading the quality of such a reading.

Tokenization was quite tricky in the early days of machine vision, and it depends strongly on what kind of font is used. Often, we use fonts that make it easy for the algorithms – monospaced fonts (every character has the same width). NI Vision provides some basic algorithms, but they struggle with more complex typography, especially with tails in fonts, and with kerning.

Let me give you an example:

This is two times the same text, same font size, but the letters are arranged differently.

Look, for example at the Ps

This is done sometimes by designers, sometimes by software, to create a more aesthetically pleasant effect. The point is, the NI Vision library cant do this.

Look at what the automatic tokenization does:

You can fiddle as much as you want with the settings, it will change a few things, but won't be able to cope with it. Because it assumes that characters fit into a rectangular (or skewed) box, and there is only one character in there.

But.. actually, your segmentation is quite good and the classification would work if only it had a chance. It's just that the tokenization fails, and it fails badly. At this point, you need to start thinking about what OCR actually does: It takes the tokenized character boxes and puts them into some machine-learning based classifaction. NI isn't too forthcoming about which parameters they use, but you don't need to care at this point anyway.

What you need to understand is that they take the entire stuff that is in the red ROI-boxes, and put them into the classifier. And they count this as one character, and because there is lots of other stuff from the next character in there too, you have no chance.

The library needs some sort of box-approach, because some letters... like the i.. are not one connected particle. They are 2 particles. And if you look at other fonts, sort of stencil-like fonts or Braille, that is totally true. Also, that's how typesetting works normally.. or used to work before it met designers..

You, however, can probably define some better constraints on your tokenization, because you know more about your specific problem and don't need a generic approach.
I dont know what kind of font you are using, but in my case, I will pretend that my font only consists of connected particles. Yes, the i-letter has a dot. I will recognize it as its own letter or kick it out based on the size.

I treat the segmentized characters as particles. In the real world, I'd have to kick out some noise.. but I don't need to here. I would probably just kick out the i-dots based with a Particle Filter too.
I can use Particle Analysis to get the center of mass of those particles. I can then use a bit of math, so I can order them in a grid-like pattern(with multi-lines, you need a little modulo-trickery, but it's not ahard), and then I classify the characters one after each other.

I have two ways for this classification: Either I use masking to just extract the letter particle shape from the image and use my good old OCR (which is in fact a particle classifier supposedly optimized for OCR), or I just use the particle classification tool itself (which is a binary classifier that uses circularity, elongation, convexity, hole numbers, spread, slenderness, etc). Or define my own classifier. There is not too much information about them, which is a pity, so I can't recommend you which is the best, but in my experience, all of those approaches work pretty well. At least particle classification on our sample picture looks like this

The result has to be ordered of course (as stated above, with a grid extracted from particle center of mass), and then you need to clean up a bit (' ' has to be replaced by "; the i-dot has to be removed etc).

But the problem is solved. In this case.

Summing up: In the end, when an algorithm in an image processing library "fails" or doesn't perform as expected, you need to figure out where it is going wrong. In our case.. kerning doesn't work with the "boxes"-approach.

In your application: Well... your job to figure it out 🙂

b.ploetzeneder · ‎09-09-2015

Actually, as an afterthought.. you are probably better off using NIs OCR engine instead of the particle classifier. As far as I remember, that one is fairly invariant to rotation and the mirroring thing doenst properly work.. and thats not what you want. You have un-scaled samples, and unrotated ones. Else, you wouldnt be able to distinguish between b and d or u and n for some fonts..

FM82 · ‎09-09-2015

Thank you for your reply, b.ploetzeneder.

Problem is: i can't make suppositions on the font that i need to elaborate. We are trying to make a reader of food label, so we need to manage an unknown bunch of possibilities (regarding the fonts). I begin to think that labview's OCR is simply not suitable for the task. We think to switch to something else, maybe an integration between labview and Tesseract OCR. Do you think that we have some chance to obtain some results?

Francesco

Machine Vision

Has anyone been successful in developing a robust OCR application capable of reading multiple fonts?

Has anyone been successful in developing a robust OCR application capable of reading multiple fonts?

Re: Has anyone been successful in developing a robust OCR application capable of reading multiple fonts?

Re: Has anyone been successful in developing a robust OCR application capable of reading multiple fonts?

Re: Has anyone been successful in developing a robust OCR application capable of reading multiple fonts?

Re: Has anyone been successful in developing a robust OCR application capable of reading multiple fonts?

Re: Has anyone been successful in developing a robust OCR application capable of reading multiple fonts?

Re: Has anyone been successful in developing a robust OCR application capable of reading multiple fonts?

Re: Has anyone been successful in developing a robust OCR application capable of reading multiple fonts?

Re: Has anyone been successful in developing a robust OCR application capable of reading multiple fonts?

Re: Has anyone been successful in developing a robust OCR application capable of reading multiple fonts?