Bandersnatch sometimes hangs on teardown

Issue #52 resolved
Jeremy Stanley
created an issue

I don't have a lot to go on, so this is as much a request for ideas on how to better debug... Sometimes, seemingly at random, a run of bandersnatch 1.5 completes but never actually exits. We usually discover this by noticing we have a stale mirror, and upon investigation find a hung bandersnatch mirror process on the system. That run has logged that it reached the serial it desired (and so no other output from it it expected, but has still for whatever reason not terminated):

2014-09-16 11:40:02,144 INFO: bandersnatch/1.5 (CPython 2.7.3-final0, Linux 3.2.0-68-generic x86_64)
2014-09-16 11:40:02,144 INFO: Syncing with https://pypi.python.org.
2014-09-16 11:40:02,144 INFO: Current mirror serial: 1226428
2014-09-16 11:40:02,144 INFO: Syncing based on changelog.
2014-09-16 11:40:02,369 INFO: Trying to reach serial: 1226432
2014-09-16 11:40:02,369 INFO: 2 packages to sync.
2014-09-16 11:40:02,370 INFO: Syncing package: django-media-library (serial 1226430)
2014-09-16 11:40:02,370 DEBUG: Getting /pypi/django-media-library/json (serial 1226430)
2014-09-16 11:40:02,371 INFO: Syncing package: oar-docker (serial 1226432)
2014-09-16 11:40:02,371 DEBUG: Getting /pypi/oar-docker/json (serial 1226432)
2014-09-16 11:40:02,420 INFO: Syncing index page: django-media-library
2014-09-16 11:40:02,421 DEBUG: Getting /simple/django-media-library/ (serial 1226430)
2014-09-16 11:40:02,430 DEBUG: Getting /serversig/django-media-library/ (serial 1226430)
2014-09-16 11:40:02,460 INFO: Downloading: https://pypi.python.org/packages/source/o/oar-docker/oar-docker-0.1.2.tar.gz
2014-09-16 11:40:02,460 DEBUG: Getting https://pypi.python.org/packages/source/o/oar-docker/oar-docker-0.1.2.tar.gz (serial None)
2014-09-16 11:40:02,487 INFO: Syncing index page: oar-docker
2014-09-16 11:40:02,488 DEBUG: Getting /simple/oar-docker/ (serial 1226432)
2014-09-16 11:40:02,502 DEBUG: Getting /serversig/oar-docker/ (serial 1226432)
2014-09-16 11:40:02,517 INFO: Generating global index page.
2014-09-16 11:40:02,814 INFO: New mirror serial: 1226432

Attaching to the process with strace shows what seems to be in some sort of select/poll loop:

gettimeofday({1410961910, 563804}, NULL) = 0
select(0, NULL, NULL, NULL, {0, 50000}) = 0 (Timeout)
gettimeofday({1410961910, 614091}, NULL) = 0
select(0, NULL, NULL, NULL, {0, 50000}) = 0 (Timeout)
gettimeofday({1410961910, 664326}, NULL) = 0
select(0, NULL, NULL, NULL, {0, 50000}) = 0 (Timeout)
gettimeofday({1410961910, 714626}, NULL) = 0
select(0, NULL, NULL, NULL, {0, 33214}) = 0 (Timeout)
gettimeofday({1410961910, 748154}, NULL) = 0
gettimeofday({1410961910, 748243}, NULL) = 0
gettimeofday({1410961910, 748321}, NULL) = 0
gettimeofday({1410961910, 748384}, NULL) = 0
gettimeofday({1410961910, 748444}, NULL) = 0
gettimeofday({1410961910, 748493}, NULL) = 0
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
gettimeofday({1410961910, 749718}, NULL) = 0
select(0, NULL, NULL, NULL, {0, 2000})  = 0 (Timeout)
gettimeofday({1410961910, 751927}, NULL) = 0
select(0, NULL, NULL, NULL, {0, 4000})  = 0 (Timeout)
gettimeofday({1410961910, 756149}, NULL) = 0
select(0, NULL, NULL, NULL, {0, 8000})  = 0 (Timeout)
gettimeofday({1410961910, 764421}, NULL) = 0

Killing the running process results in no additional output, and on a subsequent run we see it thinks it was interrupted with a leftover todo list:

2014-09-17 13:55:01,667 INFO: bandersnatch/1.5 (CPython 2.7.3-final0, Linux 3.2.0-68-generic x86_64)
2014-09-17 13:55:01,668 INFO: Syncing with https://pypi.python.org.
2014-09-17 13:55:01,668 INFO: Current mirror serial: 1226432
2014-09-17 13:55:01,668 INFO: Resuming interrupted sync from local todo list.
2014-09-17 13:55:01,668 INFO: Trying to reach serial: 1226437
2014-09-17 13:55:01,668 INFO: 0 packages to sync.
2014-09-17 13:55:01,669 INFO: Generating global index page.
2014-09-17 13:55:02,005 INFO: New mirror serial: 1226437

Thereafter, it runs normally.

Comments (10)

  1. Jeremy Stanley reporter

    It's not something I'm able to recreate at will (since I have no idea what triggers the issue), and I don't have an easy way to upgrade all of our mirror servers to an unreleased version either. Is there a new release due out any time soon?

  2. Christian Theune

    Ah. I'm happy to make a release soon. I'm currently discussing a few important changes that I might want to also pick up. If that doesn't happen soon I'll just make a release with what I have.

  3. Log in to comment