t3_verify.py::test_retrieve sometimes fails

Issue #272 open
Nikolaus Rath
repo owner created an issue

The t3_verify.py::test_retrieve test sometimes fails with:

=================================== FAILURES ===================================
________________________________ test_retrieve _________________________________
Traceback (most recent call last):
  File "/home/travis/build/s3ql/s3ql/tests/t3_verify.py", line 96, in test_retrieve
    thread_count=1, full=False)
  File "/opt/python/3.6.2/lib/python3.6/contextlib.py", line 88, in __exit__
    next(self.gen)
  File "/home/travis/build/s3ql/s3ql/tests/pytest_checklogs.py", line 71, in assert_logs
    % (count, pattern, handler.count))
AssertionError: Expected to catch 1 '^Object %d is corrupted' messages, but got only 0
----------------------------- Captured stdout call -----------------------------

..processed 0 objects (0.00%) so far..
------------------------------ Captured log call -------------------------------
verify.py                  118 INFO     Reading all objects...
verify.py                  215 WARNING  Backend seems to have lost object 22
verify.py                  181 INFO     Verified all 3 storage objects.
!!!!!!!!!!!!!!!!!!!! Interrupted: stopping after 1 failures !!!!!!!!!!!!!!!!!!!!
==================== 1 failed, 267 passed in 50.04 seconds =====================

Comments (6)

  1. Daniel Jagszent

    I just added some prints:

    diff --git a/tests/t3_verify.py b/tests/t3_verify.py
    index c2548cc5..747b500c 100755
    --- a/tests/t3_verify.py
    +++ b/tests/t3_verify.py
    @@ -92,6 +92,8 @@ def test_retrieve(backend, db):
    
         with assert_logs('^Backend seems to have lost', count=1, level=logging.WARNING), \
              assert_logs('^Object %d is corrupted', count=1, level=logging.WARNING):
    +        print("missing: %s" % missing_fh.getvalue())
    +        print("corrupted: %s" % corrupted_fh.getvalue())
             verify.retrieve_objects(db, backend_factory, corrupted_fh, missing_fh,
                                     thread_count=1, full=False)
         assert missing_fh.getvalue() == 's3ql_data_%d\n' % obj_ids[0]
    

    The result of

    ( SECONDS=0; while python -m pytest tests/ -k 'test_retrieve' -s; do clear; done; duration=$SECONDS; echo "tried $(($duration / 60)) minutes and $(($duration % 60)) seconds to produce failure."; for I in 0.5 0.5 0; do echo -ne '\007'; sleep $I; done; )
    

    is

    ========================================================================================================== test session starts ===========================================================================================================
    platform darwin -- Python 3.6.2, pytest-3.2.1, py-1.4.34, pluggy-0.4.0 -- /Users/dj/workspace/venvs/s3ql/bin/python
    cachedir: tests/.cache
    rootdir: /Users/dj/workspace/s3ql-git/tests, inifile: pytest.ini
    plugins: catchlog-1.2.2
    collected 350 items                                                                                                                                                                                                                       
    
    tests/t3_verify.py::test_retrieve missing: 
    corrupted: 
    ..processed 0 objects (0.00%) so far..
    FAILED
    
    ================================================================================================================ FAILURES ================================================================================================================
    _____________________________________________________________________________________________________________ test_retrieve ______________________________________________________________________________________________________________
    Traceback (most recent call last):
      File "/Users/dj/workspace/s3ql-git/tests/t3_verify.py", line 98, in test_retrieve
        thread_count=1, full=False)
      File "/usr/local/Cellar/python3/3.6.2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/contextlib.py", line 88, in __exit__
        next(self.gen)
      File "/Users/dj/workspace/s3ql-git/tests/pytest_checklogs.py", line 71, in assert_logs
        % (count, pattern, handler.count))
    AssertionError: Expected to catch 1 '^Object %d is corrupted' messages, but got only 0
    ----------------------------------------------------------------------------------------------------------- Captured log call ------------------------------------------------------------------------------------------------------------
    verify.py                  118 INFO     Reading all objects...
    verify.py                  201 DEBUG    reading object 22
    verify.py                  215 WARNING  Backend seems to have lost object 22
    verify.py                  201 DEBUG    reading object 25
    verify.py                  201 DEBUG    reading object 30
    verify.py                  201 DEBUG    reading object 31
    verify.py                  181 INFO     Verified all 3 storage objects.
    !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    ========================================================================================================== 349 tests deselected ==========================================================================================================
    ================================================================================================ 1 failed, 349 deselected in 2.20 seconds ================================================================================================
    tried 5 minutes and 29 seconds to produce failure.
    

    So it does not look like a pytest-catchlog problem since missing_fh and corrupted_fh are empty.

  2. Daniel Jagszent

    Next step – capture the database and objects when error occurs:

    --- a/tests/t3_verify.py
    +++ b/tests/t3_verify.py
    @@ -24,28 +24,38 @@ import logging
     import shutil
     import tempfile
     import pytest
    +import os
    
     @pytest.yield_fixture()
     def backend():
    -    backend_dir = tempfile.mkdtemp(prefix='s3ql-backend-')
    +    backend_dir = '/tmp/test_retrieve'
    +    try:
    +        shutil.rmtree(backend_dir)
    +    except:
    +        pass
    +    os.mkdir(backend_dir)
         be = local.Backend('local://' + backend_dir, None, None)
         try:
             yield be
         finally:
             be.close()
    -        shutil.rmtree(backend_dir)
    
     @pytest.yield_fixture()
     def db():
    -    dbfile = tempfile.NamedTemporaryFile()
    -    db = Connection(dbfile.name)
    +    dbfile = '/tmp/test_retrieve.db'
    +    try:
    +        os.remove(dbfile)
    +    except:
    +        pass
    +    with open(dbfile, 'w'):
    +        pass
    +    db = Connection(dbfile)
         create_tables(db)
         init_tables(db)
         try:
             yield db
         finally:
             db.close()
    -        dbfile.close()
    
     def test_retrieve(backend, db):
         plain_backend = backend
    

    Find the result attached.

    It looks like the metadata corruption in s3ql_data_30 was not persisted. The data corruption of s3ql_data_25 was persisted.

    To follow along:

    cd /tmp
    tar xf /path/to/test_retrieve.tar
    cd test_retrieve
    hexdump -C s3ql_data_/s3ql_data_30
    
  3. Log in to comment