Check initial weights / scalings

andrew_peterson reporter

According to Andrew Ng's coursera course, a good initial guess for random initial weights is to choose randomly from the uniform distribution [-eps, +eps], where eps is

eps = sqrt(6) / sqrt(L_in + L_out)

where L_in and L_out are the number of units in the adjacent layer to the matrix of coefficients Theta connecting those layers.

This could potentially help convergence. But we should probably wait until just after v0.5 to implement this so we can test it a while before it enters a release (just in case it makes convergence harder).

2017-02-17T14:15:11+00:00

Alireza Khorshidi

Looking into your commit 3bb542c, you made one more change besides changing weight domain: I was setting the initial guess of biases as zero here, but you canceled that.

Regarding both changes (change in the domain of weight guesses and undoing zero biases), I assume there has been some reason behind your commit, so why shouldn't we do the same in v0.5? In any case, it will not be the end of the world, and we can make a new tag 0.5.1 after the release if we come to a different conclusion, but I guess Andrew Ng's domain and non-zero biases will be more likely to be our preference (as it was the case for v0.4), so why not going first with the more probable case in v0.5 release?

Anyhow, if you come to the conclusion the we should do the same commit as 3bb542c in the master branch, I have the commit ready, just let me know.

2017-02-23T16:24:30+00:00

andrew_peterson reporter

I want to wait on this -- we think it will work better, but we haven't tested it. So there's some chance that if we implemented this change we could be adding something that makes it work a lot worse! This is the same strategy as with changing the cutoff function; let's do both immediately after v0.5 so that they are available in master.

2017-02-23T17:40:55+00:00

andrew_peterson reporter

changed status to resolved

This is done.

2018-03-28T17:32:33+00:00

Comments (4)