Adding a method to calculate the RMSE of a validation set

After training a NNP to a set of training images, it's necessary to see which NNP iteration minimized a validation set's RMSE to see which iteration should be chosen as the final choice. I currently do this manually with the following code:

nn_iteration_calculation_frequency = 10
energy_data_iter_num = []
energy_data_RMSE = []
RMSE_validation = []
num_checkpoints = len([name for name in os.listdir('amp-checkpoints')])
for nn_iter in range(0, num_checkpoints, nn_iteration_calculation_frequency):
    energy_data_iter_num.append(nn_iter)
    calc = Amp.load('amp-checkpoints/'+str(nn_iter)+'.amp', cores=cores, dblabel=dblabel)
    energy_data = plot_parity_and_error(calc, validation_set_filename_output, dblabel=dblabel, plot_forces=False, label_parity='parity-validation-'+str(nn_iter), label_error='error-validation-'+str(nn_iter), returndata=True)
    energy_data_RMSE.append(energy_data)
    with open('energy_data_RMSE.json', 'w') as fout:
        json.dump(energy_data_RMSE, fout)
    err_sq = 0
    for j in energy_data:
        err_sq += (energy_data[j][3])**2
    RMSE_validation.append(math.sqrt(err_sq / len(energy_data)))
    np.savetxt('RMSE-vs-iteration-validation.txt', np.column_stack((energy_data_iter_num, RMSE_validation)), header='Iteration, RMSE')

This code takes about 2 minutes to calculate the RMSE of the validation set for one particular implementation. Calculating the RMSE for all checkpoints (say, around 1000) takes quite a long time. I'm sure that this can be sped up, since the neural net training itself takes a lot less time to calculate the RMSE of the training set, e.g., Amp both came up with new neural net parameters and calculated the RMSE of the training data in about 5 seconds during training for the same implementation.

I should be able to figure this out by looking into the difference between what the code does while training the NNP and while doing the lot_parity_and_error() method. Having a separate method that does this kind of thing on its own might be beneficial for a lot of users. It's on my "to do" list.

Comments (6)