Rome with midas produces timeout, after upgrade.

Issue #8 new
Ryan Loots created an issue

Hi

After upgrading our midas version to the latest bitbucket release, we encountered this bug when running rome3 as our analyser.

[RomeAnalyser,ERROR] [system.c:4710:ss_recv_net_command,ERROR] timeout receiving network command header [RomeAnalyser,ERROR] [midas.c:11120:rpc_call,ERROR] routine "bm_receive_event": timeout waiting for reply, program abort

[midas server in debug mode] [RomeAnalyser,ERROR] [system.c:4422:send_tcp,ERROR] send(socket=6,size=24) returned -1, errno: 104 (Connection reset by peer) [RomeAnalyser,ERROR] [system.c:4422:send_tcp,ERROR] send(socket=6,size=24) returned -1, errno: 32 (Broken pipe) [RomeAnalyser,ERROR] [system.c:4422:send_tcp,ERROR] send(socket=6,size=8) returned -1, errno: 32 (Broken pipe) [RomeAnalyser,INFO] client [yyyy.tlabs.ac.za]RomeAnalyser failed watchdog test after 10 sec

On further investigation:

ss_socket_wait: millisec 10000, tv_sec 9, tv_usec 999998, isset 1, status 1, errno 0 (Success) recv_tcp2: 0x11d32c0+992 bytes, returned 992, errno 0 (Success) recv_tcp2: 0x7fffffffd7c0+8 bytes, timeout 10000 ms!

ss_socket_wait: millisec 10000, tv_sec 0, tv_usec 0, isset 0, status 0, errno 0 (Success)

Does anyone have any idea why this is happening? It seems related to MIDAS? or is this within the Rome3 eventloop, as I tested with ROOTANA, and I do not get the same behaviour, using the same midas version.

Comments (12)

  1. Ryan Loots reporter

    The output from gdb for the rome3 build analyser:

    #0  0x00000033dd0324f5 in raise () from /lib64/libc.so.6
    #1  0x00000033dd033cd5 in abort () from /lib64/libc.so.6
    #2  0x00000000008e967b in rpc_call (routine_id=<value optimized out>) at src/midas.c:11131
    #3  0x0000000000690bcc in ROMEMidasDAQ::ReadOnlineEvent (localThis=<value optimized out>) at /opt/rome/rome_svn/src/ROMEMidasDAQ.cpp:1467
    #4  0x0000000000692ded in ROMEMidasDAQ::Event (this=0x156aff0, event=<value optimized out>) at /opt/rome/rome_svn/src/ROMEMidasDAQ.cpp:353
    #5  0x0000000000699c45 in ROMEDAQSystem::EventDAQ (this=0x156aff0, event=607918) at /opt/rome/rome_svn/src/ROMEDAQSystem.cpp:42
    #6  0x0000000000688d7b in ROMEEventLoop::DAQEvent (this=0x1591c30) at /opt/rome/rome_svn/src/ROMEEventLoop.cpp:876
    #7  0x000000000068bde5 in ROMEEventLoop::RunEvent (this=0x1591c30) at /opt/rome/rome_svn/src/ROMEEventLoop.cpp:487
    #8  0x000000000068c84d in ROMEEventLoop::ExecuteTask (this=0x1591c30, option=<value optimized out>) at /opt/rome/rome_svn/src/ROMEEventLoop.cpp:281
    #9  0x000000000068618d in ROMEAnalyzer::Start (this=0x14fe480, argc=4, argv=0x7fffffffe338) at /opt/rome/rome_svn/src/ROMEAnalyzer.cpp:361
    #10 0x00000000005101e7 in main (argc=4, argv=0x7fffffffe338) at src/generated/main.cpp:146
    
  2. Ryan Loots reporter

    setting OnlineThread to false fixes the timeout, but now the analyser is lagging behind the event count.

     //subGroup->GetLastParameter()->AddSetLine("   gAnalyzer->GetMidasDAQ()->SetOnlineThread(## == \"true\");");
    
          subGroup->GetLastParameter()->AddSetLine("   gAnalyzer->GetMidasDAQ()->SetOnlineThread(## == \"false\");");
    
  3. Ryan Loots reporter

    @Ryu Sawada ... Yip I recompiled from latest MIDAS/ROME checkouts as well as latest ROOT checkout.

    The problem is repeatable over OS, from centos(6.5) -> debian (testing)

    As mentioned, the problem was when I upgraded our midas daq server to the latest checkout of midas.

  4. Ryu Sawada

    Hi,

    I tried to reproduce the issue using a setup written in https://bitbucket.org/muegamma/rome3/wiki/Midas. Because I don't have a real MIDAS DAQ, I used a frontend in $MIDASSYS/examples/experiment. Unfortunately, I couldn't reproduce the problem, the ROME analyzer runs fine with reading events from MIDAS in online mode with a thread. So it is rather hard to me to further investigate the problem.

    One thing I realised is the directory name of your ROME is /opt/rome/rome_svn. If you are really using a copy from SVN, then it may be worth to try downloading from this git repository. git clone https://bitbucket.org/muegamma/rome3.git

    Ryu

  5. Ryan Loots reporter

    Hi.

    I tested with rome3 from bitbucket, the latest checkout. The MIDAS frontend example should be fine, but did you try a remote connection with mserver.

    The physicist in question ran an older rome version ( 2.xx ), and the only change was to MIDAS. We kept our ROOT version to a specific version for his analyser, but I tested with latest ROOT, latest ROME3, and latest MIDAS, with the config etc. that we've been using now for a few years.

    Apart from the physicists own analysis additions,I could replicate this issue across rome version, across OS, across ROOT versions.

  6. Ryu Sawada

    Hi,

    Yes, I tested with a remote connection, mserver and frontend runs on a host-A and ROME analyser runs on host-B. It works fine for me. I tested with midas v2.1-1658-ga53d611 (recent develop branch), and v3.2.12-4-g2cf42c0 (recent master branch), which I get from "git describe". And I used $MIDASSYS/examples/experiment and $ROMESYS/examples/midas/ I changed ODB parameters, /Experiment/Security/Enable non-localhost RPC and Disable RPC hosts check, both to 'y'. Nothing else was changed by hand.

    Ryu

  7. dd1

    Hi, Konstantin here. I think what we see is an mserver crash. (I think you figured this out already). If I remember right, I have seen a problem with timeouts in bm_receive_event() if running through the mserver. I do not remember the details, though. K.O.

  8. dd1

    I am presently reviewing the midas event buffer code, just finished with the buffer-write side. Next I will be looking at the event receive side and I will definitely verify that bm_receive_event() has correctly working timeouts when connected through the mserver. this work is happening on the midas release candidate branch feature/midas-2018-12. K.O.

  9. dd1

    Just in case this happens to be a recently introduced problem, you could try to use the mserver and the midas library from the last midas release candidate feature/midas-2017-10? K.O.

  10. Log in to comment