GitHub - HadokenCode/ocr: Neural network OCR.
Trains a multi-layer perceptron (MLP) neural network to perform optical character recognition (OCR).
The training set is automatically generated using a heavily modified version of the captcha-generator node-captcha. Support for the MNIST handwritten digit database has been added recently (see performance section).
The network takes a one-dimensional binary array (default 20 * 20 = 400-bit) as input and outputs an 10-bit array of probabilities, which can be converted into a character code. Initial performance measurements show promising success rates.
After training, the network is saved as a standalone module to ./ocr.js, which can then be used in your project like this (from test.js):
var predict = require('./ocr.js'); // a binary array that we want to predict var one = [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ]; // the prediction is an array of probabilities var prediction = predict(one); // the index with the maximum probability is the best guess console.log('prediction:', prediction.indexOf(Math.max.apply(null, prediction))); // will hopefully output 1 if trained with 0-9 :)
Clone this repository. The script is using canvas, so you'll need to install the Cairo rendering engine. On OS X, assuming you have Homebrew installed, this can be done with the following (copied from canvas README):
All runs below were performed with a MacBook Pro Retina 13" Early 2015 with 8GB RAM.
To test with the MNIST dataset: click on the title above, download the 4 data files and put them in a folder called mnist in the root directory of this repository.
// config.json { "mnist": true, "network": { "hidden": 160, "learning_rate": 0.03 } }
// config.json { "mnist": false, "text": "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ012356789", "fonts": [ "sans-serif", "serif" ], "training_set": 2000, "testing_set": 1000, "image_size": 16, "threshold": 400, "network": { "hidden": 60, "learning_rate": 0.1, "output": 62 } }
// config.json { "mnist": false, "text": "abcdefghijklmnopqrstuvwxyz", "fonts": [ "sans-serif", "serif" ], "training_set": 2000, "testing_set": 1000, "image_size": 16, "threshold": 400, "network": { "hidden": 40, "learning_rate": 0.1, "output": 26 } }
// config.json { "mnist": false, "text": "0123456789", "fonts": [ "sans-serif", "serif" ], "training_set": 2000, "testing_set": 1000, "image_size": 16, "threshold": 400, "network": { "hidden": 40, "learning_rate": 0.1 } }
Tweak the network for your needs by editing the config.json file located in the main folder. Pasted below is the default config file.
// config.json { "mnist": false, "text": "0123456789", "fonts": [ "sans-serif", "serif" ], "training_set": 2000, "testing_set": 1000, "image_size": 16, "threshold": 400, "network": { "hidden": 40, "learning_rate": 0.1 } }