Reboot problem on Sabre

Issue #14 resolved
Yuguang Zhang repo owner created an issue

On Sabre3, the worker did not reboot. 6x_bootscript is set correctly to the benchmark partition. However, the logs end at the reboot step

/tmp $ ls -l
total 572
-rw-r--r-- 1 root root        0 Jun 11 22:27 benchmark.log
-rw-r--r-- 1 root root     2532 Jun 11 22:43 controller.log
-rw-r--r-- 1 root root       32 Jun 11 22:28 controller.log.md5
-rw-r--r-- 1 root root     4527 Jun 11 22:43 controller_stdout.log
-rw-r--r-- 1 root root        0 Jun 11 22:27 datamill_common.log
-rw-r--r-- 1 root root       32 Jun 11 22:28 datamill_common.log.md5
-rw-r--r-- 1 root root        0 Jun 11 22:27 datamill_test.log
-rw-r--r-- 1  240 daemon      0 Jun 11 20:54 distccd.log
-rw-r--r-- 1 root root        0 Jun 11 22:27 stderr.log
-rw------- 1 root root      965 Jun 11 22:28 tmpA83cSK
-rw-r--r-- 1 root root       32 Jun 11 22:28 tmpA83cSK.md5
-rw------- 1 root root        0 Jun 11 22:28 tmps9nQDQ
-rw-r--r-- 1 root root       32 Jun 11 22:28 tmps9nQDQ.md5
-rw------- 1 root root        0 Jun 11 22:43 tmpT6QDsI
-rw-r--r-- 1 root root       32 Jun 11 22:43 tmpT6QDsI.md5
-rw------- 1 root root   541659 Jun 11 22:28 tmpUU3CHl
-rw-r--r-- 1 root root       32 Jun 11 22:28 tmpUU3CHl.md5
drwx------ 1 root root      236 Jun 11 22:43 tmpYugi6n
-rw-r--r-- 1 root root        0 Jun 11 22:27 watchdog.log

/tmp $ tail *log
==> benchmark.log <==

==> controller.log <==

 * IMPORTANT: 4 news items need reading for repository 'gentoo'.
 * Use eselect news to read news items.


06/11/2015 10-43-12PM:INFO:worker.py:290:get_job:installing job
06/11/2015 10-43-12PM:INFO:worker.py:292:get_job:running preboot hooks
06/11/2015 10-43-12PM:INFO:worker.py:294:get_job:configuring bootloader
06/11/2015 10-43-13PM:INFO:worker.py:299:get_job:notifying master that the job is proceeding
06/11/2015 10-43-13PM:INFO:controller.py:59:<module>:got work, rebooting

Comments (6)

  1. Yuguang Zhang reporter

    There is a boot error on it when it can't mount the benchmark partition

    ...
    VFP support v0.3: implementor 41 architecture 3 part 30 variant 9 rev 4
     mmcblk0: p1 p2
    Bus freq driver module loaded
    Bus freq driver Enabled
    mxc_dvfs_core_probe
    DVFS driver module loaded
    snvs_rtc snvs_rtc.0: setting system clock to 1970-01-01 00:00:00 UTC (0)
    Waiting for root device /dev/mmcblk1p1...
    mmc1: new high speed SDHC card at address 1234
    mmcblk1: mmc1:1234 SA08G 7.21 GiB 
     mmcblk1: p1 p2
    EXT3-fs (mmcblk1p1): error: couldn't mount because of unsupported optional fea)
    EXT2-fs (mmcblk1p1): error: couldn't mount because of unsupported optional fea)
    EXT4-fs (mmcblk1p1): mounted filesystem with ordered data mode. Opts: (null)
    VFS: Mounted root (ext4 filesystem) on device 179:9.
    Freeing init memory: 208K
    
  2. Yuguang Zhang reporter

    The log files indicate that most of the sabre boards have problems booting into the benchmark partition

    ==> sabre1/benchmark.log <==
    ==> sabre1/controller.log <==
    06/22/2015 12-05-52PM:INFO:emerge.py:167:_layman_sync_call:Executing layman -S
    06/22/2015 12-06-13PM:INFO:controller.py:40:<module>:updating backup tarball
    06/22/2015 12-16-18PM:WARNING:controller.py:67:<module>:Controller exception: Command '
            tar caf /datamill/backups/unfinished.tar.gz * --anchored --exclude='proc' --exclude='sys' --exclude='dev' --exclude='datamill' --exclude='etc/local.d' --exclude='usr/portage/distfiles' --exclude='usr/portage/packages' --exclude='datamill/backups/unfinished.tar.gz'
            ' returned non-zero exit status 2
    06/22/2015 12-21-19PM:INFO:emerge.py:167:_layman_sync_call:Executing layman -S
    06/22/2015 12-21-35PM:INFO:controller.py:40:<module>:updating backup tarball
    06/22/2015 12-30-39PM:WARNING:controller.py:67:<module>:Controller exception: Command '
            tar caf /datamill/backups/unfinished.tar.gz * --anchored --exclude='proc' --exclude='sys' --exclude='dev' --exclude='datamill' --exclude='etc/local.d' --exclude='usr/portage/distfiles' --exclude='usr/portage/packages' --exclude='datamill/backups/unfinished.tar.gz'
            ' returned non-zero exit status 2
    ==> sabre1/controller_stdout.log <==
            tar caf /datamill/backups/unfinished.tar.gz * --anchored --exclude='proc' --exclude='sys' --exclude='dev' --exclude='datamill' --exclude='etc/local.d' --exclude='usr/portage/distfiles' --exclude='usr/portage/packages' --exclude='datamill/backups/unfinished.tar.gz'
            ' returned non-zero exit status 2
    Image Name:   boot script
    Created:      Mon Jun 22 12:30:39 2015
    Image Type:   ARM Linux Script (uncompressed)
    Data Size:    1438 Bytes = 1.40 kB = 0.00 MB
    Load Address: 00000000
    Entry Point:  00000000
    Contents:
       Image 0: 1430 Bytes = 1.40 kB = 0.00 MB
    ==> sabre1/datamill_common.log <==
    ==> sabre1/datamill_test.log <==
    ==> sabre1/distccd.log <==
    ==> sabre1/stderr.log <==
    ==> sabre1/watchdog.log <==
    ==> sabre2/benchmark.log <==
    ==> sabre2/controller.log <==
    ==> sabre2/controller_stdout.log <==
    ==> sabre2/datamill_common.log <==
    ==> sabre2/datamill_test.log <==
    ==> sabre2/distccd.log <==
    ==> sabre2/stderr.log <==
    ==> sabre2/watchdog.log <==
    ==> sabre3/benchmark.log <==
    ==> sabre3/controller.log <==
     * IMPORTANT: 4 news items need reading for repository 'gentoo'.
     * Use eselect news to read news items.
    06/25/2015 03-18-56AM:INFO:worker.py:290:get_job:installing job
    06/25/2015 03-18-57AM:INFO:worker.py:292:get_job:running preboot hooks
    06/25/2015 03-18-57AM:INFO:worker.py:294:get_job:configuring bootloader
    06/25/2015 03-18-57AM:INFO:worker.py:299:get_job:notifying master that the job is proceeding
    06/25/2015 03-18-57AM:INFO:controller.py:59:<module>:got work, rebooting
    ==> sabre3/controller_stdout.log <==
    Image Name:   boot script
    Created:      Thu Jun 25 03:18:57 2015
    Image Type:   ARM Linux Script (uncompressed)
    Data Size:    1438 Bytes = 1.40 kB = 0.00 MB
    Load Address: 00000000
    Entry Point:  00000000
    Contents:
       Image 0: 1430 Bytes = 1.40 kB = 0.00 MB
    06/25/2015 03-18-57AM:INFO:worker.py:299:get_job:notifying master that the job is proceeding
    06/25/2015 03-18-57AM:INFO:controller.py:59:<module>:got work, rebooting
    ==> sabre3/datamill_common.log <==
    ==> sabre3/datamill_test.log <==
    ==> sabre3/distccd.log <==
    ==> sabre3/stderr.log <==
    ==> sabre3/watchdog.log <==
    ==> sabre4/benchmark.log <==
    ==> sabre4/controller.log <==
     * IMPORTANT: 4 news items need reading for repository 'gentoo'.
     * Use eselect news to read news items.
    
    
    06/13/2015 07-33-07AM:INFO:worker.py:290:get_job:installing job
    06/13/2015 07-33-07AM:INFO:worker.py:292:get_job:running preboot hooks
    06/13/2015 07-33-07AM:INFO:worker.py:294:get_job:configuring bootloader
    06/13/2015 07-33-07AM:INFO:worker.py:299:get_job:notifying master that the job is proceeding
    06/13/2015 07-33-08AM:INFO:controller.py:59:<module>:got work, rebooting
    
    ==> sabre4/controller_stdout.log <==
    Image Name:   boot script
    Created:      Sat Jun 13 07:33:07 2015
    Image Type:   ARM Linux Script (uncompressed)
    Data Size:    1438 Bytes = 1.40 kB = 0.00 MB
    Load Address: 00000000
    Entry Point:  00000000
    Contents:
       Image 0: 1430 Bytes = 1.40 kB = 0.00 MB
    06/13/2015 07-33-07AM:INFO:worker.py:299:get_job:notifying master that the job is proceeding
    06/13/2015 07-33-08AM:INFO:controller.py:59:<module>:got work, rebooting
    
    ==> sabre4/datamill_common.log <==
    
    ==> sabre4/datamill_test.log <==
    
    ==> sabre4/distccd.log <==
    
    ==> sabre4/stderr.log <==
    
    ==> sabre4/watchdog.log <==
    
    ==> sabre5/benchmark.log <==
    
    ==> sabre5/controller.log <==
    07/01/2015 03-01-59PM:INFO:controller.py:62:<module>:nothing to do, sleeping
    07/01/2015 03-06-59PM:INFO:emerge.py:167:_layman_sync_call:Executing layman -S
    07/01/2015 03-07-10PM:INFO:worker.py:267:get_job:getting job from master
    07/01/2015 03-07-10PM:INFO:controller.py:62:<module>:nothing to do, sleeping
    07/01/2015 03-12-10PM:INFO:emerge.py:167:_layman_sync_call:Executing layman -S
    07/01/2015 03-12-21PM:INFO:worker.py:267:get_job:getting job from master
    07/01/2015 03-12-21PM:INFO:controller.py:62:<module>:nothing to do, sleeping
    07/01/2015 03-17-22PM:INFO:emerge.py:167:_layman_sync_call:Executing layman -S
    07/01/2015 03-17-38PM:INFO:worker.py:267:get_job:getting job from master
    07/01/2015 03-17-38PM:INFO:controller.py:62:<module>:nothing to do, sleeping
    
    ==> sabre5/controller_stdout.log <==
     * Syncing selected overlays,...
     * Running Rsync... # /usr/bin/rsync -rlptDvz --progress --delete --delete-after --timeout=180 --exclude=distfiles/* --exclude=local/* --exclude=packages/* rsync://mini.resl.uwaterloo.ca/datamill/ /var/lib/layman/datamill
     *
     * Succeeded:
     * ------
     * Successfully synchronized overlay "datamill".
     *
    
    07/01/2015 03-17-38PM:INFO:worker.py:267:get_job:getting job from master
    07/01/2015 03-17-38PM:INFO:controller.py:62:<module>:nothing to do, sleeping
    
    ==> sabre5/datamill_common.log <==
    
    ==> sabre5/datamill_test.log <==
    
    ==> sabre5/distccd.log <==
    
    ==> sabre5/stderr.log <==
    
    ==> sabre5/watchdog.log <==
    
  3. Log in to comment