I am trying to train this with minimal date still. The hierarchy trains with tuples of the form (child, parent). For example, a bat is a mammal or a guppy is a fish, etc. The neural net that maps to fly or not fly takes a one hot vector of classes. I trained on the exception nodes (bat, penguin, ostrich) and their parents as well as the root. That was sufficient to get the correct output when combined with the hierarchy model. So the neural net outputs that a whale cannot fly but was only trained that mammals cannot fly. The hierarchy model was able to help with the generalisation.
Here is sample output:
Enter type type: whaleHere is a tensorboard of the model
predict: [[ 0.98915392 0.01084609]]
NOT FLY
mammal has prob 1.0
vertibrate has prob 1.0
animal has prob 1.0
Enter type type: penguin
predict: [[ 0.99651057 0.00348945]]
NOT FLY
bird has prob 1.0
penguin has prob 1.0
vertibrate has prob 1.0
animal has prob 1.0
Enter type type: bird
predict: [[ 0.01308843 0.98691154]]
FLY
bird has prob 1.0
vertibrate has prob 1.0
animal has prob 1.0
Enter type type: bat
predict: [[ 0.01074084 0.98925918]]
FLY
bat has prob 1.0
mammal has prob 1.0
vertibrate has prob 1.0
animal has prob 1.0
Enter type type: mammal
predict: [[ 0.98915392 0.01084609]]
NOT FLY
mammal has prob 1.0
vertibrate has prob 1.0
animal has prob 1.
This is a quick drawing of how I am visualising the model
The result is a single neural net that entirely represents the classic "bird can fly but penguins cannot" problem.