Rome with midas produces timeout, after upgrade.
Hi
After upgrading our midas version to the latest bitbucket release, we encountered this bug when running rome3 as our analyser.
[RomeAnalyser,ERROR] [system.c:4710:ss_recv_net_command,ERROR] timeout receiving network command header [RomeAnalyser,ERROR] [midas.c:11120:rpc_call,ERROR] routine "bm_receive_event": timeout waiting for reply, program abort
[midas server in debug mode] [RomeAnalyser,ERROR] [system.c:4422:send_tcp,ERROR] send(socket=6,size=24) returned -1, errno: 104 (Connection reset by peer) [RomeAnalyser,ERROR] [system.c:4422:send_tcp,ERROR] send(socket=6,size=24) returned -1, errno: 32 (Broken pipe) [RomeAnalyser,ERROR] [system.c:4422:send_tcp,ERROR] send(socket=6,size=8) returned -1, errno: 32 (Broken pipe) [RomeAnalyser,INFO] client [yyyy.tlabs.ac.za]RomeAnalyser failed watchdog test after 10 sec
On further investigation:
ss_socket_wait: millisec 10000, tv_sec 9, tv_usec 999998, isset 1, status 1, errno 0 (Success) recv_tcp2: 0x11d32c0+992 bytes, returned 992, errno 0 (Success) recv_tcp2: 0x7fffffffd7c0+8 bytes, timeout 10000 ms!
ss_socket_wait: millisec 10000, tv_sec 0, tv_usec 0, isset 0, status 0, errno 0 (Success)
Does anyone have any idea why this is happening? It seems related to MIDAS? or is this within the Rome3 eventloop, as I tested with ROOTANA, and I do not get the same behaviour, using the same midas version.
Comments (12)
-
reporter -
reporter setting OnlineThread to false fixes the timeout, but now the analyser is lagging behind the event count.
//subGroup->GetLastParameter()->AddSetLine(" gAnalyzer->GetMidasDAQ()->SetOnlineThread(## == \"true\");"); subGroup->GetLastParameter()->AddSetLine(" gAnalyzer->GetMidasDAQ()->SetOnlineThread(## == \"false\");");
-
-
Did you try recompiling from fresh copies of MIDAS, ROME and your analyzer ?
Ryu
-
reporter @Ryu Sawada ... Yip I recompiled from latest MIDAS/ROME checkouts as well as latest ROOT checkout.
The problem is repeatable over OS, from centos(6.5) -> debian (testing)
As mentioned, the problem was when I upgraded our midas daq server to the latest checkout of midas.
-
Hi,
I tried to reproduce the issue using a setup written in https://bitbucket.org/muegamma/rome3/wiki/Midas. Because I don't have a real MIDAS DAQ, I used a frontend in $MIDASSYS/examples/experiment. Unfortunately, I couldn't reproduce the problem, the ROME analyzer runs fine with reading events from MIDAS in online mode with a thread. So it is rather hard to me to further investigate the problem.
One thing I realised is the directory name of your ROME is /opt/rome/rome_svn. If you are really using a copy from SVN, then it may be worth to try downloading from this git repository. git clone https://bitbucket.org/muegamma/rome3.git
Ryu
-
reporter Hi.
I tested with rome3 from bitbucket, the latest checkout. The MIDAS frontend example should be fine, but did you try a remote connection with mserver.
The physicist in question ran an older rome version ( 2.xx ), and the only change was to MIDAS. We kept our ROOT version to a specific version for his analyser, but I tested with latest ROOT, latest ROME3, and latest MIDAS, with the config etc. that we've been using now for a few years.
Apart from the physicists own analysis additions,I could replicate this issue across rome version, across OS, across ROOT versions.
-
Hi,
Yes, I tested with a remote connection, mserver and frontend runs on a host-A and ROME analyser runs on host-B. It works fine for me. I tested with midas v2.1-1658-ga53d611 (recent develop branch), and v3.2.12-4-g2cf42c0 (recent master branch), which I get from "git describe". And I used $MIDASSYS/examples/experiment and $ROMESYS/examples/midas/ I changed ODB parameters, /Experiment/Security/Enable non-localhost RPC and Disable RPC hosts check, both to 'y'. Nothing else was changed by hand.
Ryu
-
Hi, Konstantin here. I think what we see is an mserver crash. (I think you figured this out already). If I remember right, I have seen a problem with timeouts in bm_receive_event() if running through the mserver. I do not remember the details, though. K.O.
-
I am presently reviewing the midas event buffer code, just finished with the buffer-write side. Next I will be looking at the event receive side and I will definitely verify that bm_receive_event() has correctly working timeouts when connected through the mserver. this work is happening on the midas release candidate branch feature/midas-2018-12. K.O.
-
Just in case this happens to be a recently introduced problem, you could try to use the mserver and the midas library from the last midas release candidate feature/midas-2017-10? K.O.
-
No feedback in almost a year, this must be a dead bug. I am closing the corresponding bug in midas. If the bug is still important, please reopen it. When you test it, I suggest that you use midas-2019-09. https://bitbucket.org/tmidas/midas/issues/154/possible-bug-with-upgrade-of-midas-and
K.O.
- Log in to comment
The output from gdb for the rome3 build analyser: