- edited description
save and load petsc matrix in compressed format
I want to store some very large matrix in a compressed form using petsc. To illustrate my problem, I will use the demo file matvecio.py which works perfectly fine but with a change in the filename from 'matrix-A.dat' to 'matrix-A.dat.gz' in order to store and compress the file, the command python matvecio.py produces the error :
Traceback (most recent call last):
File "matvecio.py", line 40, in <module>
B = PETSc.Mat().load(viewer)
File "PETSc/Mat.pyx", line 642, in petsc4py.PETSc.Mat.load (src/petsc4py.PETSc.c:118243)
petsc4py.PETSc.Error: error code 65
[0] MatLoad() line 1013 in /gpfs2/soft/Python-2.7.8/thirdparty/petsc-3.7.4/src/mat/interface/matrix.c
[0] MatLoad_SeqAIJ() line 4144 in /gpfs2/soft/Python-2.7.8/thirdparty/petsc-3.7.4/src/mat/impls/aij/seq/aij.c
[0] PetscViewerSetUp() line 336 in /gpfs2/soft/Python-2.7.8/thirdparty/petsc-3.7.4/src/sys/classes/viewer/interface/view.c
[0] PetscViewerSetUp_Binary() line 1387 in /gpfs2/soft/Python-2.7.8/thirdparty/petsc-3.7.4/src/sys/classes/viewer/impls/binary/binv.c
[0] PetscViewerFileSetUp_Binary() line 1249 in /gpfs2/soft/Python-2.7.8/thirdparty/petsc-3.7.4/src/sys/classes/viewer/impls/binary/binv.c
[0] Unable to open file
[0] Cannot open file cp: ‘matrix-A.dat.gz’ and ‘./matrix-A.dat.gz’ are the same file for reading
a simple workaround is to replace from 'matrix-A.dat.gz' to 'somedir/matrix-A.dat.gz' but then the script produces an unwanted uncompressed file 'matrix-A.dat'
Any ideas ?
Comments (8)
-
-
@balay Any idea what's going on? I never used this autocompress feature on binary viewers. Is it supposed to automagically work by just adding
.gz
to the filename? -
my e-mail reply appears to have disappeared... here is the copy/paste from it.. And I hate this tiny window provided by bitbucket to type in text :(
The 'autocompress' feature is very crude. When writing - it first writes the file - and then invokes 'gzip' to compress.
And when reading - it does a 'gzip -d -c' to to create the uncompressed file - and then reads in the uncompressed file. [i.e both files will exist on the disk]. Here - if the uncompressed file already exists - it will read in the uncompressed file [and won't do the 'gzip -d -c' again]
I don't understand the error message.
[0] PetscViewerFileSetUp_Binary() line 1249 in /gpfs2/soft/Python-2.7.8/thirdparty/petsc-3.7.4/src/sys/classes/viewer/impls/binary/binv.c [0] Unable to open file [0] Cannot open file cp: ‘matrix-A.dat.gz’ and ‘./matrix-A.dat.gz’ are the same file for reading
The line that prints this message is:
if ((vbinary->fdes = open(fname,O_RDONLY,0)) == -1) SETERRQ1(PETSC_COMM_SELF,PETSC_ERR_FILE_OPEN,"Cannot open file %s for reading",fname);
So somehow this routine is getting
fname = 'cp: ‘matrix-A.dat.gz’ and ‘./matrix-A.dat.gz’ are the same file'
string as filename? Is it coming from PetscFileRetrieve()?Anyhow - when I try to run this test code [with petsc-3.7 and --download-petsc4py] - I can't reproduce the problem.
balay@asterix /home/balay/download-pine/test $ ls -lt total 4 -rw-rw-r--. 1 balay balay 1103 Mar 28 10:09 matvecio.py balay@asterix /home/balay/download-pine/test $ PYTHONPATH=/home/balay/petsc/arch-py/lib python matvecio.py balay@asterix /home/balay/download-pine/test $ ls -lt total 36 -rw-r--r--. 1 balay balay 4104 Mar 28 10:10 vector-x.dat -rw-r--r--. 1 balay balay 22 Mar 28 10:10 vector-x.dat.info -rw-r--r--. 1 balay balay 4104 Mar 28 10:10 vector-y.dat -rw-r--r--. 1 balay balay 22 Mar 28 10:10 vector-y.dat.info -rw-r--r--. 1 balay balay 22 Mar 28 10:10 matrix-A.dat.info -rw-r--r--. 1 balay balay 3223 Mar 28 10:10 matrix-A.dat.gz -rw-rw-r--. 1 balay balay 1103 Mar 28 10:09 matvecio.py balay@asterix /home/balay/download-pine/test $ PYTHONPATH=/home/balay/petsc/arch-py/lib python matvecio.py balay@asterix /home/balay/download-pine/test $ ls -lt total 36 -rw-r--r--. 1 balay balay 4104 Mar 28 10:11 vector-x.dat -rw-r--r--. 1 balay balay 22 Mar 28 10:11 vector-x.dat.info -rw-r--r--. 1 balay balay 4104 Mar 28 10:11 vector-y.dat -rw-r--r--. 1 balay balay 22 Mar 28 10:11 vector-y.dat.info -rw-r--r--. 1 balay balay 22 Mar 28 10:11 matrix-A.dat.info -rw-r--r--. 1 balay balay 3223 Mar 28 10:11 matrix-A.dat.gz -rw-rw-r--. 1 balay balay 1103 Mar 28 10:09 matvecio.py balay@asterix /home/balay/download-pine/test $
cc: @BarryFSmith
-
reporter Thanks for your answer. I'm not sure why you can't reproduce the problem, may be I did something wrong during the installation. My understanding is that I should not use that feature. Any hint on how I should store very large matrices ?
-
Sorry - I don't have a good answer.
cc: @BarryFSmith, @knepley @jedbrown
Perhaps this code should be replaced with something that uses zlib?
-
Satish, is correct. The right thing to do is call zlib on the data before writing. Its no problem calling it on chunks because gzip chunks internally anyway. We would gratefully accept contributions for this feature.
-
Specifically.
import gzip with gzip.open('file.gz', 'w') as f: f.write('data to compress')
results in a compressed file.
-
- changed status to closed
Closing this issue as it is not petsc4py's business.
- Log in to comment