IOError: [Errno 2] No such file or directory: 'tfAmpNN-checkpoint' with tensorflow 0.12.1
I installed latest tensorflow version (0.12.1) using pip, and I tried to work on #111 on my laptop. There were some problems because of the latest cores/parallel implementation. I was able to fix them in a local branch for which the diff I have attached shows the changes made (tflowfix.diff
).
However, there is the following problem now:
/home/muammar/.local/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py:91: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
/home/muammar/.local/lib/python2.7/site-packages/numpy/core/numeric.py:190: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
a = empty(shape, dtype, order)
Traceback (most recent call last):
File "test_gaussian_tflow.py", line 63, in <module>
train_test()
File "test_gaussian_tflow.py", line 56, in train_test
calc.train(images=train_images,)
File "/home/muammar/quimica_pura/posgrado/Postdoc/brown/git/amp/amp/__init__.py", line 308, in train
parallel=self._parallel)
File "/home/muammar/quimica_pura/posgrado/Postdoc/brown/git/amp/amp/model/tflow.py", line 654, in fit
with open('tfAmpNN-checkpoint') as fhandle:
IOError: [Errno 2] No such file or directory: 'tfAmpNN-checkpoint'
I am trying to understand why tfAmpNN-checkpoint is not being created with tensorflow 0.12.1. Getting back to tensorflow 0.1.0 makes the problem go away:
pip install https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.10.0-cp27-none-linux_x86_64.whl
Comments (7)
-
reporter -
reporter - attached tflow_1.0.diff
Patch to make tflow.py work with TensorFlow 1.0.
-
i try <from tensorflow.core.protobuf import saver_pb2> then i can use tensorflow 1.0
-
reporter I have done these changes to the code that include importing the module
saver_pb2
:diff --git a/amp/model/__init__.py b/amp/model/__init__.py index 69ad0f9..733e2c2 100644 --- a/amp/model/__init__.py +++ b/amp/model/__init__.py @@ -344,7 +344,7 @@ class LossFunction: if fingerprintprimes is not None: descriptor.fingerprintprimes = fingerprintprimes - def _initialize(self, args): + def _initialize(self, args=None): """Procedures to be run on the first call only, such as establishing SSH sessions, etc.""" if self._initialized is True: diff --git a/amp/model/tflow.py b/amp/model/tflow.py index 48e2d53..99218f1 100644 --- a/amp/model/tflow.py +++ b/amp/model/tflow.py @@ -15,6 +15,7 @@ import uuid from . import LossFunction from ..utilities import ConvergenceOccurred +from tensorflow.core.protobuf import saver_pb2 try: import tensorflow as tf @@ -672,7 +673,7 @@ class NeuralNetwork: # loss function, as included in model/__init__.py self.energy_loss = tf.reduce_sum( - tf.square(tf.div(tf.sub(self.energy, self.y_), + tf.square(tf.div(tf.subtract(self.energy, self.y_), self.nAtoms_in))) # Define the training step for energy training. @@ -682,7 +683,7 @@ class NeuralNetwork: # force loss function, as included in model/__init__.py if self.parameters['relativeForceCutoff'] is None: self.force_loss = tf.reduce_sum( - tf.div(tf.square(tf.sub(self.forces_in, self.forces)), + tf.div(tf.square(tf.subtract(self.forces_in, self.forces)), self.nAtoms_forces)) / 3. # tf.reduce_sum(tf.div( # tf.reduce_mean(tf.square(tf.sub(self.forces_in, @@ -692,7 +693,7 @@ class NeuralNetwork: self.force_loss = \ tf.reduce_sum(tf.div(tf.div( tf.square( - tf.sub( + tf.subtract( self.forces_in, self.forces)), tf.square( self.forces_in) + @@ -707,9 +708,9 @@ class NeuralNetwork: # Define max residuals self.energy_maxresid = tf.reduce_max( - tf.abs(tf.div(tf.sub(self.energy, self.y_), self.nAtoms_in))) + tf.abs(tf.div(tf.subtract(self.energy, self.y_), self.nAtoms_in))) self.force_maxresid = tf.reduce_max( - tf.abs(tf.sub(self.forces_in, self.forces))) + tf.abs(tf.subtract(self.forces_in, self.forces))) # Define the training step for force training. if self.parameters['regularization_strength'] is not None: @@ -1512,14 +1513,14 @@ def model(x, segmentinds, keep_prob, input_keep_prob, batchsize, # multiply through with the dg/dx tensor, and sum along the components of g # to get a tensor of dE/dx (one row per atom considered, second dim =3) - dEdx = tf.reduce_sum(tf.mul(dEdg_arranged_tile, dgdx), 1) + dEdx = tf.reduce_sum(tf.multiply(dEdg_arranged_tile, dgdx), 1) # this should be a tensor of size (total atoms in training set)x3, # representing the contribution of each atom to the total energy via # interactions with elements of the current atom type dEdx_arranged = tf.unsorted_segment_sum(dEdx, dgdx_Xindices, totalNumAtoms) - return tf.mul(reducedSum, mask), dEdx_arranged, l2_regularization + return tf.multiply(reducedSum, mask), dEdx_arranged, l2_regularization # dEg # dEjdgj1 = tf.expand_dims(dEjdgj, 1) # dEjdgj2 = tf.expand_dims(dEjdgj1, 1)
Still fails because of the checkpoints not being created:
± % python test_gaussian_tflow.py !10078 /home/muammar/brown/git/ase/ase/lattice/surface.py:17: UserWarning: Moved to ase.build warnings.warn('Moved to ase.build') W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations. W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations. W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations. W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations. W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations. /home/muammar/.local/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py:91: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. "Converting sparse IndexedSlices to a dense Tensor of unknown shape. " WARNING:tensorflow:From /home/muammar/brown/git/amp/amp/model/tflow.py:317: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02. Instructions for updating: Use `tf.global_variables_initializer` instead. Traceback (most recent call last): File "test_gaussian_tflow.py", line 84, in <module> train_test() File "test_gaussian_tflow.py", line 77, in train_test calc.train(images=train_images,) File "/home/muammar/brown/git/amp/amp/__init__.py", line 311, in train parallel=self._parallel) File "/home/muammar/brown/git/amp/amp/model/tflow.py", line 925, in fit with open('tfAmpNN-checkpoint') as fhandle: IOError: [Errno 2] No such file or directory: 'tfAmpNN-checkpoint'
This is the case for both
tensorflow
1.0 or latest 1.4.0. -
here self.saver = tf.train.Saver(trainvarlist,write_version=saver_pb2.SaverDef.V1)
-
reporter @sunshine233 that made the trick. Thanks for pointing out.
-
reporter - changed status to resolved
- edited description
- Log in to comment
The problem is still there for TensorFlow 1.0. The attached patch is needed to port
tflow.py
.