I think using grammars to parse language is not useful and a dead end. I think language should be parsed using a generalization of the models we use to parse arithmetic expressions. Here is an example for the sentence, "The man bought a car" using these operator definitions
the - prefix priority(400)
man - constant
bought - infix priority(200)
a - prefix priority(400)
car - constant
Then sentence can then be parse by applying the operators in the normal way. Interesting generalizations are to allow operator evaluations to evaluate to other operators. For example, prepositions evaluate to post fix operators. Consider the sentence "The person from France bought a car".
from - prefix operator priority(300) -> evaluates to a post fix operator priority 100.
The sentence can now be parsed with the "from France" being applied to "The person".
Another generalization is to regard verbs as infix operators having named arguments. Consider the sentence "The man bought a car on Saturday". Update our definition of "bought"
bought - infix priority(200) named args: on
on - see from
The sentence can now be parsed with the "on Saturday" phase being part of the evaluation of the verb "bought". Interestingly for the sentence "The man on the boat bought a car" with the same definitions the "on the boat" is evaluated as a post-fix operator applied to "the man". No extra work.
What about conjunctions. Consider the sentence "The man and woman bought and sold a car and went on a trip". We need two generalizations to handle this. The first it to make a verb into a prefix operator that evaluates into a post fix operator. The second is to make a conjunction into a list terminator where the priority of the operator is not a number but rather it evaluates and soon as the term after the operator and the term before the operator have the same type. The result of the evaluation has the same type. The parsing would look like this
Start:
"the man and woman bought and sold a car and went on a trip"
The types before and after the "and" are the same.
"the (man and woman) (bought and sold) a car and went on a trip"
Apply determiners
"(the (man and woman)) (bought and sold) (a car) and went on (a trip)"
kPrefix prepositions
"(the (man and woman)) (bought and sold) (a car) and went (on (a trip))"
Prefix verbs
"(the (man and woman)) ((bought and sold) (a car)) and (went (on (a trip)))"
Apply conjunctions because types match (both are post fix operators of same priority)
"(the (man and woman)) (((bought and sold) (a car)) and (went (on (a trip))))"
Apply postfix verbs
"((the (man and woman)) (((bought and sold) (a car)) and (went (on (a trip)))))"
This also has a beautiful noise handling feature that is difficult to obtain with grammar based parsers. Imagine this as input " blah blah xhsh the man bought a car tree pick the woman bought a boat aa ashsh1334h". The result of parsing this would recognize "the man bought a car" and "the woman bought a boat" as phrases in addition to the other single word parses. The noise is ignore naturally by the parser. Sweet!
Let's take my neural net expression parser and apply that to this problem. The code is at https://github.com/cooledge/nn/blob/master/expressions/expressions2.py
Here is sample output that converts sentences to JSON.
Enter an sentence if you dare: h sadhfashdf asd joe bought a car shdhf
Input Expression: ['h', 'sadhfashdf', 'asd', 'joe', 'bought', 'a', 'car', 'shdhf']
Output Expression: {'buyer': 'joe', 'action': 'buy', 'thing': {'determiner': 'a', 'thing': 'car'}}
Enter an sentence if you dare: h sadhfashdf asd joe bought a car shdhf sally bought a jeep
Input Expression: ['h', 'sadhfashdf', 'asd', 'joe', 'bought', 'a', 'car', 'shdhf', 'sally', 'bought', 'a', 'jeep']
Output Expression: {'buyer': 'joe', 'action': 'buy', 'thing': {'determiner': 'a', 'thing': 'car'}}
Output Expression: {'buyer': 'sally', 'action': 'buy', 'thing': {'determiner': 'a', 'thing': 'jeep'}}
Enter an sentence if you dare: h sadhfashdf asd joe bought a car shdhf sally bought a jeep adsafhsdhdhd ddd move the tank to france
Input Expression: ['h', 'sadhfashdf', 'asd', 'joe', 'bought', 'a', 'car', 'shdhf', 'sally', 'bought', 'a', 'jeep', 'adsafhsdhdhd', 'ddd', 'move', 'the', 'tank', 'to', 'france']
Output Expression: {'buyer': 'joe', 'action': 'buy', 'thing': {'determiner': 'a', 'thing': 'car'}}
Output Expression: {'buyer': 'sally', 'action': 'buy', 'thing': {'determiner': 'a', 'thing': 'jeep'}}
Output Expression: {'to': 'france', 'action': 'move', 'thing': {'determiner': 'the', 'thing': 'tank'}}
No comments:
Post a Comment