Petru Tarabuta, Third year MEng Electronic and Electrical Engineering, expected graduation: 2016
University of Sheffield, United Kingdom
Email Address: firstname.lastname@example.org
Submission Language: English
This is a software project that does hand-written digit recognition with the help of an artificial neural network (ANN) algorithm. The dataset for the neural network algorithm was captured by the author. This system was inspired by the free online Machine Learning course offered by Stanford University on Coursera.org.
Attached below is the LabVIEW project saved for LabVIEW 2014 and 2011. The zip files contain the images used in the training and test sets, as well as the text files that store the ANN weights matrices Theta1 and Theta2. It should be possible to select one of the images, run the VI and see the predictions and confidence levels being overlaid on the front panel display.
No hardware was used in the project.
The aim of the project was to perform hand-written digit recognition, which is a type of Intelligent Character Recognition (ICR). ICR is a close cousin of optical character recognition (OCR), the difference being that ICR aims to recognize hand-written text or digits, while OCR aims to interpret typewritten characters.
OCR and ICR are important because they are used in applications such as automatic car number plate recognition, automatic information extraction from hand-written forms or documents, making scanned images of printed text searchable (searchable PDF books) and production line automated testing.
This project aimed to explore the introductory concepts in Intelligent Character Recognition by using an artificial neural network to identify digits hand-written by the author. Moreover, the project aimed to record how the performance of the system varied as a function of the regularization parameter, λ, and to use learning curves to determine whether the system has high bias or high variance. The performance of the system was defined to be the ratio of correctly recognised digits over the total number of digits in the test set.
Figure 1: Main application front panel
Figure 2: ANN's predictions and confidence level in each prediction (%)
The system’s design and operation is outlined below. The system took around three weeks to build, without including the time taken to study the machine learning concepts.
1. Creating the dataset.
The author hand-wrote 2990 digits: 299 examples for each of the ten possible digits, zero through to nine. The sheets on which the digits were written were then scaned using a regular office scanner. One of the aims of this project was to become familiar with using IMAQ functions and image processing concepts in LabVIEW. Building a new dataset of hand-written digits, rather than using one of the freely available datasets online, provided this opportunity.
The IMAQ palette made it straight-forward and intuitive to perform the image processing and analysis required for this project. Moreover, the NI IMAQ Concepts manual found here was an invaluable resource when learning about image processing and computer vision.
2. Converting images to training examples.
In order for a hand-written digit to become part of the dataset, the LabVIEW VI:
2. Detects each digit’s bounding rectangle and overlays the bounding rectangles on the front panel image display.
3. The inside of the bounding rectangles become regions of interest (ROIs). Each ROI now encloses one and only one hand-written digit.
4. Each ROI is resized into a 20-by-20 pixel image. As it can be seen in Figure 3 below, the 20x20 pixel image is quite pixelated and loses some of the original digit information. (Having a higher resolution of the individual digit images would increase the system performance but slow down the process of training the ANN, as the ANN’s number of input layer units is equal to the number of pixels of the images (each pixel becomes an input node to the ANN). Currently, training the ANN takes around 10 minutes on a training set of 2990 examples of 20x20 pixel images.)
5. The 20-by-20 pixel images are converted to numeric 2D arrays with 20 rows and 20 columns. The 20x20 2D arrays are flattened into 400x1 vectors. Each such vector contains the information for one hand-written digit.
6. The dataset is built by stacking together all the 400x1 vectors, resulting in a 2D array with 400 columns and m rows, where m is the number of hand-written digits scanned by the LabVIEW VI.
Figure 3: Detecting the digits in the image and resampling each one into a 20x20 pixel image
3. Training the ANN
The dataset is then used to train the artificial neural network. This is done in MATLAB, as the complex program that trains the ANN was inherited from the fourth programming exercise of the aforementioned Machine Learning course and time constraints prohibited the porting of the program through to a Mathscript node. The outputs of this script are two text files containing the Theta 1 and Theta 2 matrices, which are the matrices containing the weights, or parameters, of the ANN.
The ANN’s structure consists of an input layer with 400 units (corresponding to the 20x20 resolution of the input images), an output layer with 10 units (corresponding to the ten possible digits, zero through to nine) and a single hidden layer of 25 units.
Figure 4: Structure of the artificial neural network
4. Test system performance
The text files containing the Theta 1 and Theta 2 matrices are read by LabVIEW and a Mathscript node uses the weights matrices to generate the predictions of the ANN when shown test set images containing hand-written digits. The predictions, alongside the confidence level of the ANN are overlaid on the front panel display, for easy visualization of the system’s ability to recognize digits.
Figure 5: Mathscript node that creates the predictions based on the ANN weights matrices Theta1 and Theta2
The ANN has been shown to correctly identify 85% of the hand-written digits in the test set. The ANN seems to suffer from high bias, also known as under-fitting the data. Therefore, increasing the number of hidden units, from 25 currently, should help reduce the bias and increase the accuracy. However, this was left as a future expansion.
In-depth video description:
Possibilities for improvement and expansion:
a) Increase the number of hidden layer units, from 25 currently
b) Increase the number of input layer units, from 400 currently
4. Expand to hand-written letter recognition