the type of weights in tflow model is not compatible with that in the normal NN model

andrew_peterson repo owner

I think there is one other slight difference in how the network is specified that makes the two not quite identical. (By accident, I think we could make them identical.) I think if you specify hidden layers as (20, 20) for example, in one of the implementations we get a neural network like

Fingerprint -> 20 -> 20 -> 1 -> Energy

And in the other:

Fingerprint -> 20 -> 20 -> Energy

That is, I think one uses an extra parameter or two at the end to scale up the neural net output to an energy, and the other does this directly with the neural network parameters. I may not be remembering this right. @akhorshi or @zachary_ulissi would know. But my point is that we may need to reconcile how the networks are structured before we can transfer parameters between them.

2017-02-15T21:43:31+00:00

Zachary Ulissi

Yeah, it's whether or not it's just a linear combination of the output layer nodes, or whether it goes through a final activation function. I think the tflow one is more general, but I don't think it really matters.

On that note, check the tflow test force/energy test case to show how to set weights directly to make things identical between the two modules.

2017-02-15T21:49:29+00:00

Yin-Jia Zhang reporter

Thanks a lot for the information! I just want to test how the same weights and same training set will behave in two models. It would be nice if we can make such a comparison in future.

2017-02-16T01:34:27+00:00

Zachary Ulissi

Same weights and same activation functions should yield identical results between the two models (that's basically the energy/force test case, it currently passes). If they're not identical, there's a problem.

2017-02-16T01:37:18+00:00

andrew_peterson repo owner

Yin-Jia was having convergence issues with the tf version. If I understand, with the regular neural network it was slowly converging, while with the tf version it was quickly not converging, using the same training data and same hidden layers specification. So I am guessing she wants to try both from the same guess of initial NN parameters to try to figure out what's going on.... if one is just making a better initial guess than the other.

Without modifying the current version, is there a way to make the NN architectures identical? E.g., could we specity (20, 20) in one and (20, 20, 1) in the other?

2017-02-16T10:40:08+00:00

Yin-Jia Zhang reporter

Thanks for Andy's explanation. That's what I am struggling about. Is there any good way of comparing?

2017-02-16T13:30:05+00:00

Comments (6)