save and load petsc matrix in compressed format

Issue #90 closed
daniel created an issue

I want to store some very large matrix in a compressed form using petsc. To illustrate my problem, I will use the demo file matvecio.py which works perfectly fine but with a change in the filename from 'matrix-A.dat' to 'matrix-A.dat.gz' in order to store and compress the file, the command python matvecio.py produces the error :

Traceback (most recent call last):
  File "matvecio.py", line 40, in <module>
    B = PETSc.Mat().load(viewer)
  File "PETSc/Mat.pyx", line 642, in petsc4py.PETSc.Mat.load (src/petsc4py.PETSc.c:118243)
petsc4py.PETSc.Error: error code 65
[0] MatLoad() line 1013 in /gpfs2/soft/Python-2.7.8/thirdparty/petsc-3.7.4/src/mat/interface/matrix.c
[0] MatLoad_SeqAIJ() line 4144 in /gpfs2/soft/Python-2.7.8/thirdparty/petsc-3.7.4/src/mat/impls/aij/seq/aij.c
[0] PetscViewerSetUp() line 336 in /gpfs2/soft/Python-2.7.8/thirdparty/petsc-3.7.4/src/sys/classes/viewer/interface/view.c
[0] PetscViewerSetUp_Binary() line 1387 in /gpfs2/soft/Python-2.7.8/thirdparty/petsc-3.7.4/src/sys/classes/viewer/impls/binary/binv.c
[0] PetscViewerFileSetUp_Binary() line 1249 in /gpfs2/soft/Python-2.7.8/thirdparty/petsc-3.7.4/src/sys/classes/viewer/impls/binary/binv.c
[0] Unable to open file
[0] Cannot open file cp: ‘matrix-A.dat.gz’ and ‘./matrix-A.dat.gz’ are the same file for reading

a simple workaround is to replace from 'matrix-A.dat.gz' to 'somedir/matrix-A.dat.gz' but then the script produces an unwanted uncompressed file 'matrix-A.dat'

Any ideas ?

Comments (8)

  1. Lisandro Dalcin

    @balay Any idea what's going on? I never used this autocompress feature on binary viewers. Is it supposed to automagically work by just adding .gz to the filename?

  2. Satish Balay

    my e-mail reply appears to have disappeared... here is the copy/paste from it.. And I hate this tiny window provided by bitbucket to type in text :(

    The 'autocompress' feature is very crude. When writing - it first writes the file - and then invokes 'gzip' to compress.

    And when reading - it does a 'gzip -d -c' to to create the uncompressed file - and then reads in the uncompressed file. [i.e both files will exist on the disk]. Here - if the uncompressed file already exists - it will read in the uncompressed file [and won't do the 'gzip -d -c' again]

    I don't understand the error message.

    [0] PetscViewerFileSetUp_Binary() line 1249 in /gpfs2/soft/Python-2.7.8/thirdparty/petsc-3.7.4/src/sys/classes/viewer/impls/binary/binv.c
    [0] Unable to open file
    [0] Cannot open file cp: ‘matrix-A.dat.gz’ and ‘./matrix-A.dat.gz’ are the same file for reading
    

    The line that prints this message is:

          if ((vbinary->fdes = open(fname,O_RDONLY,0)) == -1) SETERRQ1(PETSC_COMM_SELF,PETSC_ERR_FILE_OPEN,"Cannot open file %s for reading",fname);
    

    So somehow this routine is getting fname = 'cp: ‘matrix-A.dat.gz’ and ‘./matrix-A.dat.gz’ are the same file' string as filename? Is it coming from PetscFileRetrieve()?

    Anyhow - when I try to run this test code [with petsc-3.7 and --download-petsc4py] - I can't reproduce the problem.

    balay@asterix /home/balay/download-pine/test
    $ ls -lt
    total 4
    -rw-rw-r--. 1 balay balay 1103 Mar 28 10:09 matvecio.py
    balay@asterix /home/balay/download-pine/test
    $ PYTHONPATH=/home/balay/petsc/arch-py/lib python matvecio.py 
    balay@asterix /home/balay/download-pine/test
    $ ls -lt
    total 36
    -rw-r--r--. 1 balay balay 4104 Mar 28 10:10 vector-x.dat
    -rw-r--r--. 1 balay balay   22 Mar 28 10:10 vector-x.dat.info
    -rw-r--r--. 1 balay balay 4104 Mar 28 10:10 vector-y.dat
    -rw-r--r--. 1 balay balay   22 Mar 28 10:10 vector-y.dat.info
    -rw-r--r--. 1 balay balay   22 Mar 28 10:10 matrix-A.dat.info
    -rw-r--r--. 1 balay balay 3223 Mar 28 10:10 matrix-A.dat.gz
    -rw-rw-r--. 1 balay balay 1103 Mar 28 10:09 matvecio.py
    balay@asterix /home/balay/download-pine/test
    $ PYTHONPATH=/home/balay/petsc/arch-py/lib python matvecio.py 
    balay@asterix /home/balay/download-pine/test
    $ ls -lt
    total 36
    -rw-r--r--. 1 balay balay 4104 Mar 28 10:11 vector-x.dat
    -rw-r--r--. 1 balay balay   22 Mar 28 10:11 vector-x.dat.info
    -rw-r--r--. 1 balay balay 4104 Mar 28 10:11 vector-y.dat
    -rw-r--r--. 1 balay balay   22 Mar 28 10:11 vector-y.dat.info
    -rw-r--r--. 1 balay balay   22 Mar 28 10:11 matrix-A.dat.info
    -rw-r--r--. 1 balay balay 3223 Mar 28 10:11 matrix-A.dat.gz
    -rw-rw-r--. 1 balay balay 1103 Mar 28 10:09 matvecio.py
    balay@asterix /home/balay/download-pine/test
    $ 
    

    cc: @BarryFSmith

  3. daniel reporter

    Thanks for your answer. I'm not sure why you can't reproduce the problem, may be I did something wrong during the installation. My understanding is that I should not use that feature. Any hint on how I should store very large matrices ?

  4. Satish Balay

    Sorry - I don't have a good answer.

    cc: @BarryFSmith, @knepley @jedbrown

    Perhaps this code should be replaced with something that uses zlib?

  5. Matthew Knepley

    Satish, is correct. The right thing to do is call zlib on the data before writing. Its no problem calling it on chunks because gzip chunks internally anyway. We would gratefully accept contributions for this feature.

  6. Jed Brown

    Specifically.

    import gzip
    with gzip.open('file.gz', 'w') as f:
      f.write('data to compress')
    

    results in a compressed file.

  7. Log in to comment