Sunday, June 19, 2016

Neural Net to recognize words


Experiment

I wrote a neural net to recognize words. The input is 26 nodes for each letter up to a length of 25 letters. For a total of 650 input nodes. The output is an one-hot array with one entry for each word. I was training in batches of 1000. I tried it with different word lists. The first had 9 words. The NN was 100 percent right. The next had 1015 words. The results were that the neural net was almost 100 percent wrong. I analyzed this and it turned out that it only got 15 words right. It ended up being the last 15 words. Then I tried 999 words. The neural net was 100 percent right again.

Lesson

I have learned that the structure of the training is very important. The code I wrote did the training in batches of 1000 and finished up with whatever was leftover. This meant for the 1015 words case the last 15 words with in a batch. Next I will explore how the training affects the results.

Source Code

# Usage: <words file> [details]

import tensorflow as tf
import numpy as np


"""
For each position there is a set of 26 letters. Output is one hot vector of words
"""


import sys
words_file = "words.txt"
if len(sys.argv) > 1:
  words_file = sys.argv[1]
show_details = (len(sys.argv) > 2)


words_txt = open(words_file, "r")
words = words_txt.read().split('\n')
words.pop() # last is empty string
words_txt.close()


#import pdb; pdb.set_trace();
number_of_positions = 25
number_of_letters = 26
number_of_inputs = number_of_positions * number_of_letters
number_of_words = len(words)
number_of_outputs = len(words)


x = tf.placeholder(tf.float32, shape=[None, number_of_inputs])
y_ = tf.placeholder(tf.float32, shape=[None, number_of_outputs])

W = tf.Variable(tf.zeros([number_of_inputs, number_of_outputs]))
b = tf.Variable(tf.zeros([number_of_outputs]))

y = tf.nn.softmax(tf.matmul(x,W) + b)
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))

train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

sess = tf.InteractiveSession()
sess.run(tf.initialize_all_variables())

batch_size = 1000
x_array = np.zeros(shape=(min(number_of_words, batch_size), number_of_inputs), dtype=float)
y_array = np.zeros(shape=(min(number_of_words, batch_size), number_of_outputs), dtype=float)


def word_to_train(word, inputs):
  for index, ch in enumerate(word):
    inputs[index*number_of_letters + ord(ch) - ord('a')] = 1


index = 0
#import pdb; pdb.set_trace()
for word_number, word in enumerate(words):
  word_to_train(word, x_array[index])
  y_array[index][word_number] = 1
  index += 1
  if index == batch_size or word_number+1 == number_of_words:
    #import pdb; pdb.set_trace()
    print "Doing batch: %d\n" % word_number
    index = 0
    sess.run(train_step, feed_dict={x:x_array, y_: y_array})
    to_do = number_of_words - word_number - 1
    #import pdb; pdb.set_trace();
    x_array = np.zeros(shape=(min(batch_size, to_do), number_of_inputs), dtype=float)
    y_array = np.zeros(shape=(min(batch_size, to_do), number_of_outputs), dtype=float)


x_array = None
y_array = None

n_right = 0

#import pdb; pdb.set_trace();
for word_number, word in enumerate(words):
  #import pdb; pdb.set_trace()
  x_array = np.zeros(shape=(1, number_of_inputs), dtype=float)
  word_to_train(word, x_array[0])
  prediction = tf.argmax(y,1)
  prediction = sess.run(prediction, feed_dict={x: x_array})
  is_right = (prediction[0] == word_number)
  if show_details:
    if is_right:
      print "Right %s" % words[word_number]
    else:
      print "Wrong %s found %s" % (words[word_number], words[prediction[0]])
  if is_right:
    n_right += 1


print "Accuracy: %d / %d" % (n_right, number_of_words)