Sunday, June 19, 2016
Neural Net to recognize words
Experiment
I wrote a neural net to recognize words. The input is 26 nodes for each letter up to a length of 25 letters. For a total of 650 input nodes. The output is an one-hot array with one entry for each word. I was training in batches of 1000. I tried it with different word lists. The first had 9 words. The NN was 100 percent right. The next had 1015 words. The results were that the neural net was almost 100 percent wrong. I analyzed this and it turned out that it only got 15 words right. It ended up being the last 15 words. Then I tried 999 words. The neural net was 100 percent right again.
Lesson
I have learned that the structure of the training is very important. The code I wrote did the training in batches of 1000 and finished up with whatever was leftover. This meant for the 1015 words case the last 15 words with in a batch. Next I will explore how the training affects the results.
Source Code
# Usage: <words file> [details]
import tensorflow as tf
import numpy as np
"""
For each position there is a set of 26 letters. Output is one hot vector of words
"""
import sys
words_file = "words.txt"
if len(sys.argv) > 1:
words_file = sys.argv[1]
show_details = (len(sys.argv) > 2)
words_txt = open(words_file, "r")
words = words_txt.read().split('\n')
words.pop() # last is empty string
words_txt.close()
#import pdb; pdb.set_trace();
number_of_positions = 25
number_of_letters = 26
number_of_inputs = number_of_positions * number_of_letters
number_of_words = len(words)
number_of_outputs = len(words)
x = tf.placeholder(tf.float32, shape=[None, number_of_inputs])
y_ = tf.placeholder(tf.float32, shape=[None, number_of_outputs])
W = tf.Variable(tf.zeros([number_of_inputs, number_of_outputs]))
b = tf.Variable(tf.zeros([number_of_outputs]))
y = tf.nn.softmax(tf.matmul(x,W) + b)
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
sess = tf.InteractiveSession()
sess.run(tf.initialize_all_variables())
batch_size = 1000
x_array = np.zeros(shape=(min(number_of_words, batch_size), number_of_inputs), dtype=float)
y_array = np.zeros(shape=(min(number_of_words, batch_size), number_of_outputs), dtype=float)
def word_to_train(word, inputs):
for index, ch in enumerate(word):
inputs[index*number_of_letters + ord(ch) - ord('a')] = 1
index = 0
#import pdb; pdb.set_trace()
for word_number, word in enumerate(words):
word_to_train(word, x_array[index])
y_array[index][word_number] = 1
index += 1
if index == batch_size or word_number+1 == number_of_words:
#import pdb; pdb.set_trace()
print "Doing batch: %d\n" % word_number
index = 0
sess.run(train_step, feed_dict={x:x_array, y_: y_array})
to_do = number_of_words - word_number - 1
#import pdb; pdb.set_trace();
x_array = np.zeros(shape=(min(batch_size, to_do), number_of_inputs), dtype=float)
y_array = np.zeros(shape=(min(batch_size, to_do), number_of_outputs), dtype=float)
x_array = None
y_array = None
n_right = 0
#import pdb; pdb.set_trace();
for word_number, word in enumerate(words):
#import pdb; pdb.set_trace()
x_array = np.zeros(shape=(1, number_of_inputs), dtype=float)
word_to_train(word, x_array[0])
prediction = tf.argmax(y,1)
prediction = sess.run(prediction, feed_dict={x: x_array})
is_right = (prediction[0] == word_number)
if show_details:
if is_right:
print "Right %s" % words[word_number]
else:
print "Wrong %s found %s" % (words[word_number], words[prediction[0]])
if is_right:
n_right += 1
print "Accuracy: %d / %d" % (n_right, number_of_words)
Subscribe to:
Posts (Atom)