CloudSim stops sending some topics

Issue #261 new
Martin Dlouhy created an issue

The CloudSim or the bridge sometimes stop sending some topics. This issue was reported two months ago(?) in context of two running ROS nodes. Do you have some reference counters on bridge to check?

Here is the complete Python3 dump for robot A final run 10:

0:00:01.874250 (0.0 0.0 0.0) []
Using times [10, 2400]
Artifacts before start: []
0:01:00.113899 (15.2 -3.6 0.0) [('acc', 1031), ('origin', 178), ('pose2d', 1029), ('rot', 1031), ('scan', 206), ('sim_time_sec', 21)]
0:02:00.041882 (37.3 -3.7 0.0) [('acc', 1109), ('pose2d', 1109), ('rot', 1109), ('scan', 222), ('sim_time_sec', 22)]
0:03:00.025751 (59.8 -3.6 0.0) [('acc', 1123), ('pose2d', 1123), ('rot', 1123), ('scan', 224), ('sim_time_sec', 23)]
0:04:00.039811 (74.7 3.7 0.0) [('acc', 1152), ('pose2d', 1153), ('rot', 1152), ('scan', 231), ('sim_time_sec', 23)]
0:05:00.087357 (74.7 27.0 0.0) [('acc', 1165), ('pose2d', 1164), ('rot', 1165), ('scan', 233), ('sim_time_sec', 23)]
0:06:00.032591 (58.9 33.6 0.0) [('acc', 1160), ('pose2d', 1161), ('rot', 1160), ('scan', 232), ('sim_time_sec', 23)]
0:07:00.091972 (37.8 28.3 0.0) [('acc', 1178), ('pose2d', 1177), ('rot', 1178), ('scan', 235), ('sim_time_sec', 24)]
0:08:00.037849 (36.8 9.7 -0.0) [('acc', 1167), ('pose2d', 1168), ('rot', 1167), ('scan', 234), ('sim_time_sec', 23)]
0:09:00.124006 (34.8 26.2 -0.0) [('acc', 1163), ('pose2d', 1162), ('rot', 1163), ('scan', 232), ('sim_time_sec', 24)]
0:10:00.099786 (51.8 36.4 -0.0) [('acc', 1143), ('pose2d', 1143), ('rot', 1143), ('scan', 229), ('sim_time_sec', 22)]
0:11:00.064612 (74.2 36.4 -0.0) [('acc', 1120), ('pose2d', 1120), ('rot', 1120), ('scan', 224), ('sim_time_sec', 23)]
0:12:00.037038 (74.8 58.0 -0.0) [('acc', 1147), ('pose2d', 1147), ('rot', 1147), ('scan', 229), ('sim_time_sec', 23)]
0:13:00.078897 (74.8 81.3 -0.0) [('acc', 1165), ('pose2d', 1165), ('rot', 1165), ('scan', 233), ('sim_time_sec', 23)]
0:14:00.084369 (77.4 67.2 -0.0) [('acc', 1170), ('pose2d', 1170), ('rot', 1170), ('scan', 234), ('sim_time_sec', 23)]
0:15:00.036357 (77.4 43.9 -0.0) [('acc', 1162), ('pose2d', 1163), ('rot', 1162), ('scan', 233), ('sim_time_sec', 24)]
0:16:00.101717 (92.1 36.2 -0.0) [('acc', 1157), ('pose2d', 1157), ('rot', 1157), ('scan', 231), ('sim_time_sec', 23)]
0:17:00.174657 (115.5 36.3 -0.0) [('acc', 1175), ('pose2d', 1174), ('rot', 1175), ('scan', 235), ('sim_time_sec', 23)]
0:18:00.071696 (139.0 36.3 -0.0) [('acc', 1174), ('pose2d', 1173), ('rot', 1174), ('scan', 235), ('sim_time_sec', 24)]
0:19:00.167036 (154.7 47.1 -0.0) [('acc', 1166), ('pose2d', 1167), ('rot', 1166), ('scan', 233), ('sim_time_sec', 23)]
0:20:00.112116 (154.7 70.5 -0.0) [('acc', 1153), ('pose2d', 1171), ('rot', 1153), ('scan', 234), ('sim_time_sec', 24)]
0:21:00.103524 (155.1 92.8 -0.0) [('pose2d', 1164), ('scan', 233), ('sim_time_sec', 23)]
0:22:00.040778 (155.6 115.8 -0.0) [('pose2d', 1152), ('scan', 230), ('sim_time_sec', 23)]
0:23:00.225396 (156.0 139.0 -0.0) [('pose2d', 1159), ('scan', 232), ('sim_time_sec', 23)]
0:24:00.096273 (156.5 162.0 -0.0) [('pose2d', 1151), ('scan', 231), ('sim_time_sec', 23)]
0:25:00.087319 (156.9 185.3 -0.0) [('pose2d', 1165), ('scan', 232), ('sim_time_sec', 23)]
0:26:00.017386 (157.4 208.3 -0.0) [('pose2d', 1147), ('scan', 230), ('sim_time_sec', 23)]
0:27:00.039352 (157.8 231.5 -0.0) [('pose2d', 1163), ('scan', 233), ('sim_time_sec', 24)]
0:28:00.071255 (158.3 255.2 -0.0) [('pose2d', 1185), ('scan', 236), ('sim_time_sec', 23)]
0:29:00.015956 (158.7 278.4 -0.0) [('pose2d', 1157), ('scan', 232), ('sim_time_sec', 23)]
0:30:00.025776 (159.2 301.8 -0.0) [('pose2d', 1173), ('scan', 234), ('sim_time_sec', 24)]
0:31:00.053610 (159.6 324.9 -0.0) [('pose2d', 1152), ('scan', 231), ('sim_time_sec', 23)]
0:32:00.156292 (160.1 347.5 -0.0) [('pose2d', 1130), ('scan', 226), ('sim_time_sec', 22)]
0:33:00.095004 (160.5 369.7 -0.0) [('pose2d', 1110), ('scan', 222), ('sim_time_sec', 23)]
0:34:00.020327 (160.9 391.6 -0.0) [('pose2d', 1096), ('scan', 219), ('sim_time_sec', 22)]
0:35:00.034189 (161.3 413.5 -0.0) [('pose2d', 1097), ('scan', 220), ('sim_time_sec', 22)]
0:36:00.085905 (161.8 435.2 -0.0) [('pose2d', 1085), ('scan', 216), ('sim_time_sec', 21)]
0:37:00.052622 (162.2 457.0 -0.0) [('pose2d', 1090), ('scan', 218), ('sim_time_sec', 22)]
0:38:00.038468 (162.6 478.4 -0.0) [('pose2d', 1070), ('scan', 215), ('sim_time_sec', 21)]
0:39:00.048765 (163.0 499.6 -0.0) [('pose2d', 1061), ('scan', 212), ('sim_time_sec', 22)]
0:40:00.166637 (163.4 520.7 -0.0) [('pose2d', 1055), ('scan', 211), ('sim_time_sec', 21)]
0:41:00.116588 (163.8 541.6 -0.0) [('pose2d', 1043), ('scan', 208), ('sim_time_sec', 21)]
0:42:00.005091 (164.2 562.5 -0.0) [('pose2d', 1047), ('scan', 210), ('sim_time_sec', 21)]
0:43:00.057229 (164.6 582.8 -0.0) [('pose2d', 1013), ('scan', 202), ('sim_time_sec', 20)]
0:44:00.086354 (165.0 603.6 -0.0) [('pose2d', 1040), ('scan', 208), ('sim_time_sec', 21)]
0:45:00.079366 (165.4 624.1 -0.0) [('pose2d', 1023), ('scan', 205), ('sim_time_sec', 20)]
0:46:00.094935 (165.8 645.1 -0.0) [('pose2d', 1053), ('scan', 210), ('sim_time_sec', 21)]
0:47:00.031867 (166.2 666.1 -0.0) [('pose2d', 1049), ('scan', 210), ('sim_time_sec', 21)]
0:48:00.119254 (166.6 687.0 -0.0) [('pose2d', 1046), ('scan', 209), ('sim_time_sec', 21)]
0:49:00.007596 (167.0 707.8 -0.0) [('pose2d', 1037), ('scan', 208), ('sim_time_sec', 21)]
0:50:00.054670 (167.5 728.9 -0.0) [('pose2d', 1058), ('scan', 211), ('sim_time_sec', 21)]
0:51:00.166188 (167.9 749.9 -0.0) [('pose2d', 1047), ('scan', 210), ('sim_time_sec', 21)]
0:52:00.006919 (168.3 770.5 -0.0) [('pose2d', 1030), ('scan', 206), ('sim_time_sec', 20)]
0:53:00.057601 (168.7 791.2 -0.0) [('pose2d', 1038), ('scan', 207), ('sim_time_sec', 21)]
0:54:00.026260 (169.1 811.5 -0.0) [('pose2d', 1049), ('scan', 210), ('sim_time_sec', 21)]
0:55:00.071333 (169.5 832.3 -0.0) [('pose2d', 1040), ('scan', 208), ('sim_time_sec', 21)]
0:56:00.042111 (169.9 853.2 -0.0) [('pose2d', 1046), ('scan', 209), ('sim_time_sec', 21)]
0:57:00.013416 (170.3 874.3 -0.0) [('pose2d', 1054), ('scan', 211), ('sim_time_sec', 21)]
0:58:00.119442 (170.7 895.7 -0.0) [('pose2d', 1070), ('scan', 214), ('sim_time_sec', 21)]
0:59:00.086211 (170.8 900.1 -0.0) [('pose2d', 220), ('scan', 216), ('sim_time_sec', 5)]
1:00:00.056517 (170.8 900.1 -0.0) [('scan', 214)]
1:01:00.139475 (170.8 900.1 -0.0) [('scan', 214)]
1:02:00.042254 (170.8 900.1 -0.0) [('scan', 213)]
1:03:00.064216 (170.8 900.1 -0.0) [('scan', 218)]
1:04:00.009180 (170.8 900.1 -0.0) [('scan', 219)]
1:05:00.098884 (170.8 900.1 -0.0) [('scan', 220)]
1:06:00.197615 (170.8 900.1 -0.0) [('scan', 225)]
1:07:00.028404 (170.8 900.1 -0.0) [('scan', 222)]
1:08:00.147307 (170.8 900.1 -0.0) [('scan', 220)]
1:09:00.194171 (170.8 900.1 -0.0) [('scan', 220)]
1:10:00.149403 (170.8 900.1 -0.0) [('scan', 219)]
1:11:00.247164 (170.8 900.1 -0.0) [('scan', 224)]
1:12:00.074530 (170.8 900.1 -0.0) [('scan', 224)]
1:13:00.038614 (170.8 900.1 -0.0) [('scan', 227)]
1:14:00.014625 (170.8 900.1 -0.0) [('scan', 223)]
1:15:00.061669 (170.8 900.1 -0.0) [('scan', 222)]
1:16:00.107241 (170.8 900.1 -0.0) [('scan', 219)]
1:17:00.246864 (170.8 900.1 -0.0) [('scan', 217)]
1:18:00.093432 (170.8 900.1 -0.0) [('scan', 217)]
1:19:00.180346 (170.8 900.1 -0.0) [('scan', 214)]
1:20:00.196161 (170.8 900.1 -0.0) [('scan', 215)]
1:21:00.276935 (170.8 900.1 -0.0) [('scan', 214)]
1:22:00.145624 (170.8 900.1 -0.0) [('scan', 213)]
1:23:00.139866 (170.8 900.1 -0.0) [('scan', 216)]
1:24:00.086690 (170.8 900.1 -0.0) [('scan', 215)]
1:25:00.112151 (170.8 900.1 -0.0) [('scan', 216)]
1:26:00.242472 (170.8 900.1 -0.0) [('scan', 217)]
1:27:00.093402 (170.8 900.1 -0.0) [('scan', 216)]
1:28:00.044074 (170.8 900.1 -0.0) [('scan', 215)]
1:29:00.065824 (170.8 900.1 -0.0) [('scan', 218)]
1:30:00.198382 (170.8 900.1 -0.0) [('scan', 218)]
1:31:00.064302 (170.8 900.1 -0.0) [('scan', 215)]
1:32:00.156794 (170.8 900.1 -0.0) [('scan', 213)]
1:33:00.116688 (170.8 900.1 -0.0) [('scan', 218)]
1:34:00.245220 (170.8 900.1 -0.0) [('scan', 216)]
1:35:00.120979 (170.8 900.1 -0.0) [('scan', 216)]
1:36:00.112766 (170.8 900.1 -0.0) [('scan', 216)]
1:37:00.024872 (170.8 900.1 -0.0) [('scan', 214)]
1:38:00.079595 (170.8 900.1 -0.0) [('scan', 215)]
1:39:00.269596 (170.8 900.1 -0.0) [('scan', 216)]
1:40:00.148613 (170.8 900.1 -0.0) [('scan', 211)]
1:41:00.064843 (170.8 900.1 -0.0) [('scan', 210)]
1:42:00.268545 (170.8 900.1 -0.0) [('scan', 216)]
1:43:00.125946 (170.8 900.1 -0.0) [('scan', 215)]
1:44:00.016775 (170.8 900.1 -0.0) [('scan', 208)]
1:45:00.108588 (170.8 900.1 -0.0) [('scan', 213)]
1:46:00.123156 (170.8 900.1 -0.0) [('scan', 213)]
1:47:00.198297 (170.8 900.1 -0.0) [('scan', 206)]
1:48:00.057000 (170.8 900.1 -0.0) [('scan', 207)]
1:49:00.021796 (170.8 900.1 -0.0) [('scan', 211)]
1:50:00.067160 (170.8 900.1 -0.0) [('scan', 212)]
1:51:00.106854 (170.8 900.1 -0.0) [('scan', 208)]
1:52:00.312889 (170.8 900.1 -0.0) [('scan', 205)]
1:53:00.120459 (170.8 900.1 -0.0) [('scan', 208)]
1:54:00.220170 (170.8 900.1 -0.0) [('scan', 207)]
1:55:00.211964 (170.8 900.1 -0.0) [('scan', 212)]
1:56:00.013743 (170.8 900.1 -0.0) [('scan', 206)]
1:57:00.140487 (170.8 900.1 -0.0) [('scan', 197)]
1:58:00.078270 (170.8 900.1 -0.0) [('scan', 203)]
1:59:00.022021 (170.8 900.1 -0.0) [('scan', 209)]
2:00:00.118374 (170.8 900.1 -0.0) [('scan', 212)]
2:01:00.015736 (170.8 900.1 -0.0) [('scan', 212)]
2:02:00.103999 (170.8 900.1 -0.0) [('scan', 210)]
2:03:00.195317 (170.8 900.1 -0.0) [('scan', 214)]
2:04:00.069948 (170.8 900.1 -0.0) [('scan', 212)]
2:05:00.144081 (170.8 900.1 -0.0) [('scan', 213)]
2:06:00.255350 (170.8 900.1 -0.0) [('scan', 210)]
2:07:00.038883 (170.8 900.1 -0.0) [('scan', 211)]
2:08:00.244349 (170.8 900.1 -0.0) [('scan', 208)]
2:09:00.220553 (170.8 900.1 -0.0) [('scan', 208)]
2:10:00.012854 (170.8 900.1 -0.0) [('scan', 210)]
2:11:00.182522 (170.8 900.1 -0.0) [('scan', 205)]
2:12:00.317423 (170.8 900.1 -0.0) [('scan', 211)]
2:13:00.329937 (170.8 900.1 -0.0) [('scan', 204)]
2:14:00.237473 (170.8 900.1 -0.0) [('scan', 211)]
2:15:00.102629 (170.8 900.1 -0.0) [('scan', 208)]
2:16:00.311819 (170.8 900.1 -0.0) [('scan', 202)]
2:17:00.064137 (170.8 900.1 -0.0) [('scan', 210)]
2:18:00.297227 (170.8 900.1 -0.0) [('scan', 211)]
2:19:00.011858 (170.8 900.1 -0.0) [('scan', 205)]
2:20:00.041207 (170.8 900.1 -0.0) [('scan', 208)]
2:21:00.306937 (170.8 900.1 -0.0) [('scan', 209)]
2:22:00.014030 (170.8 900.1 -0.0) [('scan', 207)]
2:23:00.057761 (170.8 900.1 -0.0) [('scan', 211)]
2:24:00.078889 (170.8 900.1 -0.0) [('scan', 211)]
2:25:00.271348 (170.8 900.1 -0.0) [('scan', 210)]
2:26:00.023680 (170.8 900.1 -0.0) [('scan', 207)]
2:27:00.206565 (170.8 900.1 -0.0) [('scan', 209)]
2:28:00.196093 (170.8 900.1 -0.0) [('scan', 214)]
2:29:00.190260 (170.8 900.1 -0.0) [('scan', 212)]
2:30:00.132740 (170.8 900.1 -0.0) [('scan', 213)]
2:31:00.065569 (170.8 900.1 -0.0) [('scan', 214)]
2:32:00.212321 (170.8 900.1 -0.0) [('scan', 213)]
2:33:00.196690 (170.8 900.1 -0.0) [('scan', 215)]
2:34:00.186527 (170.8 900.1 -0.0) [('scan', 215)]
2:35:00.349303 (170.8 900.1 -0.0) [('scan', 215)]
2:36:00.063184 (170.8 900.1 -0.0) [('scan', 210)]
2:37:00.077993 (170.8 900.1 -0.0) [('scan', 210)]
2:38:00.253989 (170.8 900.1 -0.0) [('scan', 213)]
2:39:00.205328 (170.8 900.1 -0.0) [('scan', 206)]
2:40:00.139102 (170.8 900.1 -0.0) [('scan', 204)]
2:41:00.172922 (170.8 900.1 -0.0) [('scan', 209)]
2:42:00.261287 (170.8 900.1 -0.0) [('scan', 210)]
2:43:00.193495 (170.8 900.1 -0.0) [('scan', 212)]
2:44:00.104197 (170.8 900.1 -0.0) [('scan', 209)]
2:45:00.166760 (170.8 900.1 -0.0) [('scan', 213)]
2:46:00.088262 (170.8 900.1 -0.0) [('scan', 208)]
2:47:00.194601 (170.8 900.1 -0.0) [('scan', 209)]
2:48:00.158934 (170.8 900.1 -0.0) [('scan', 211)]
2:49:00.034882 (170.8 900.1 -0.0) [('scan', 208)]
2:50:00.161464 (170.8 900.1 -0.0) [('scan', 202)]
2:51:00.113854 (170.8 900.1 -0.0) [('scan', 211)]
2:52:00.143288 (170.8 900.1 -0.0) [('scan', 211)]
2:53:00.111603 (170.8 900.1 -0.0) [('scan', 201)]
2:54:00.109043 (170.8 900.1 -0.0) [('scan', 208)]

Comments (42)

  1. Martin Dlouhy reporter

    p.s.2 if Python3 would be suspicious then here are also some outputs form C++ code talking to the bridge:

    ...
    87.228000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:313(Update)] [topics: ] MD robot pose -0.962074 0.291239 dist=1.01041
    87.828000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:321(Update)] [topics: ] Arrived at entrance!
    89.128000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:76(imuCallback)] [topics: ] received Imu 500 
    89.128000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:85(scanCallback)] [topics: ] received Scan 200
    89.176000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:103(odomCallback)] [topics: ] received Odom 500
    91.128000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:76(imuCallback)] [topics: ] received Imu 600 
    91.176000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:103(odomCallback)] [topics: ] received Odom 600
    93.128000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:76(imuCallback)] [topics: ] received Imu 700 
    93.176000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:103(odomCallback)] [topics: ] received Odom 700
    94.128000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:85(scanCallback)] [topics: ] received Scan 300
    95.128000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:76(imuCallback)] [topics: ] received Imu 800 
    95.176000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:103(odomCallback)] [topics: ] received Odom 800
    97.148000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:76(imuCallback)] [topics: ] received Imu 900 
    97.148000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:103(odomCallback)] [topics: ] received Odom 900
    99.148000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:76(imuCallback)] [topics: ] received Imu 1000 
    99.148000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:85(scanCallback)] [topics: ] received Scan 400
    99.148000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:103(odomCallback)] [topics: ] received Odom 1000
    99.800000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:403(getSpeedCmd)] [topics: ] Python3: stdout 0:01:00.113899 (15.2 -3.6 0.0) [('acc', 1031), ('origin', 178), ('pose2d', 1029), ('rot'
    ...
    539.136000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:403(getSpeedCmd)] [topics: ] Python3: stdout 0:20:00.112116 (154.7 70.5 -0.0) [('acc', 1153), ('pose2d', 1171), ('rot', 1153), ('scan', 234), ('sim_time_sec', 24)]
    539.136000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:85(scanCallback)] [topics: ] received Scan 9200
    539.136000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:103(odomCallback)] [topics: ] received Odom 23000
    541.136000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:103(odomCallback)] [topics: ] received Odom 23100
    543.188000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:103(odomCallback)] [topics: ] received Odom 23200
    544.136000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:85(scanCallback)] [topics: ] received Scan 9300
    545.136000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:103(odomCallback)] [topics: ] received Odom 23300
    547.136000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:103(odomCallback)] [topics: ] received Odom 23400
    549.136000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:85(scanCallback)] [topics: ] received Scan 9400
    549.136000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:103(odomCallback)] [topics: ] received Odom 23500
    551.136000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:103(odomCallback)] [topics: ] received Odom 23600
    553.144000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:103(odomCallback)] [topics: ] received Odom 23700
    554.140000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:85(scanCallback)] [topics: ] received Scan 9500
    555.136000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:103(odomCallback)] [topics: ] received Odom 23800
    557.136000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:103(odomCallback)] [topics: ] received Odom 23900
    559.136000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:85(scanCallback)] [topics: ] received Scan 9600
    559.136000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:103(odomCallback)] [topics: ] received Odom 24000
    561.184000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:103(odomCallback)] [topics: ] received Odom 24100
    562.436000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:403(getSpeedCmd)] [topics: ] Python3: stdout 0:21:00.103524 (155.1 92.8 -0.0) [('pose2d', 1164), ('scan', 233), ('sim_time_sec', 23)] 
    ...
    1366.052000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:403(getSpeedCmd)] [topics: ] Python3: stdout 0:58:00.119442 (170.7 895.7 -0.0) [('pose2d', 1070), ('scan', 214), ('sim_time_sec', 21)]
    1367.152000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:103(odomCallback)] [topics: ] received Odom 64400
    1369.152000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:85(scanCallback)] [topics: ] received Scan 25800
    1369.152000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:103(odomCallback)] [topics: ] received Odom 64500
    1374.152000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:85(scanCallback)] [topics: ] received Scan 25900
    1379.152000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:85(scanCallback)] [topics: ] received Scan 26000
    1384.152000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:85(scanCallback)] [topics: ] received Scan 26100
    1387.652000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:403(getSpeedCmd)] [topics: ] Python3: stdout 0:59:00.086211 (170.8 900.1 -0.0) [('pose2d', 220), ('scan', 216), ('sim_time_sec', 5)]
    1389.152000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:85(scanCallback)] [topics: ] received Scan 26200
    1394.152000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:85(scanCallback)] [topics: ] received Scan 26300
    1399.152000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:85(scanCallback)] [topics: ] received Scan 26400
    1404.152000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:85(scanCallback)] [topics: ] received Scan 26500
    1409.052000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:403(getSpeedCmd)] [topics: ] Python3: stdout 1:00:00.056517 (170.8 900.1 -0.0) [('scan', 214)] 
    ...
    

  2. Nate Koenig

    Martin,

    Can you please provide a complete reproducible example of the problem you are experiencing with step-by-step instructions that demonstrates in the behavior you are seeing?

  3. Martin Dlouhy reporter

    Hello Nate,

    I am afraid that it is not easy to reproduce. The first time I observed something similar was #202 when I run two nodes (original subt_seed code + wait_for_sensors.py). Here it was only one node and it failed after longer simulation. We could try to reproduce another situation which looked to be repeatable (12/12) for World B/Tree of Life, when it is available. Can you add it to existing worlds on AWS?

  4. Sophisticated Engineering

    We are investigating a problem in the logs from the tunnel circuit. We are but sure whether this fits tot he Problem reported here, but I seems that we did not get any feedback from a node for a long time. Would such an example help?

  5. Sophisticated Engineering

    In Tunnel Circuit run 04: In the log rosout.log of robot X2n3 there are no messages between 677.440 and 1314.008. At minimum messages from one topic could not be received between 642 and 1437. We can see that because we do not get new position messages (“aktuelle Position”) but we get other messages from the same node into the log for the time from 642 to 644 and from 1314 to 1437. From 1437 on everything seems to have worked again. So the nodes did not crash.

    So it seems that there was not only a logging issue but also a problem with topics or that nodes did pause.

  6. Nate Koenig

    In the logs for X2n3 I see the following:

    677.412000000 INFO /X2n3_robot_controller [/home/developer/subt_solution/src/subt_seed/src/robot_Controller.cpp:6312(main)] [topics: /rosout, /X2n3/cmd_vel, /X2n3/wand_visualization_marker] Ende SpinOnce
    1314.008000000 INFO /X2n3_robot_controller [/home/developer/subt_solution/src/subt_seed/src/robot_Controller.cpp:6314(main)] [topics: /rosout, /X2n3/cmd_vel, /X2n3/wand_visualization_marker] Ende sleep
    1314.008000000 WARN /X2n3/robot_localization [/tmp/binarydeb/ros-melodic-robot-localization-2.6.5/src/ros_filter.cpp:1819(RosFilter<T>::periodicUpdate)] [topics: /rosout, /diagnostics, /tf, /X2n3/odometry/filtered] Failed to meet update rate! Took 636.57600000000002183
    

    At 677.412 there is a "Ende SpinOnce". This is followed by an "Ende sleep" at 1314.008. Without access to your code, I would assume that you have a 637 second sleep somewhere in your code.

  7. Sophisticated Engineering

    I don‘t think so. The code between these two ROS_INFO commands is only loop_rate.sleep();

    And loop_eate is only initialized once: loop_rate(10);

    Another Point is, that not only this node was sleeping. There where at minimum 3 nodes frozen until they woke up later.

  8. Sophisticated Engineering

    P.S.: one message at was repeated at 677.44 for 1475 times. And another messages was later at 1620.331 repeated for about 20000 times!

    Another idea: Could it be possible that the clock node did not work correctly?

  9. Nate Koenig

    Let me know if you come up with a example of messages not transmitting. So far I have been unable to reproduce this problem.

  10. Sophisticated Engineering

    I don’t think that it is easily reproduceable. In Tunnel Circuit I found it in one of the 15 runs with the same configuration and the same programs.

    Could you see anything unnormal in the logs of the ignition server? Would it make sense to add additional logging to the server to have more infos when we have another case of this kind?

  11. Nate Koenig

    There is nothing abnormal in Ignition, and nothing in the bridge. The state logs and server logs all look fine. We have been running our own solutions on Cloudsim without any problems as well.

    At this point, I really need a docker image that can reliably reproduce the situation you are describing.

  12. Reid Sawtell

    What Sophisticated Engineering is describing sounds similar to what I encountered shortly before the competition in our catkin workspace (issue #221). The clock would stop posting so all topics would delay for an indeterminable time until the clock started again, at which point they would resume but with a massive jump in time that usually mucks everything up. I noticed the problem more frequently in smaller testing tunnels (possibly related to real time factor? the problem started after the cap at 1 was implemented), but it did happen in the larger tunnels as well.

  13. Martin Dlouhy reporter

    @Nate Koenig - can you please direct us, where is the testing solution docker and how we could extend current test suite? I.e. also how to report success and failure. Thanks

  14. Nate Koenig

    What I'm looking for is a docker image and source code that reproduces the problem described in this issue.

  15. Arthur Schang

    @Martin Dlouhy can you explain what these fields are in this message?

    0:01:00.113899 (15.2 -3.6 0.0) [('acc', 1031), ('origin', 178), ('pose2d', 1029), ('rot', 1031), ('scan', 206), ('sim_time_sec', 21)]

    for example:

    0:01:00.113899 is wall time

    ('sim_time_sec', 21) is the simulation time seconds and ???

    ('scan', 206) is the scan topic and sequence number???

  16. Martin Dlouhy reporter

    @Arthur Schang these are our internal reports for every minute of real time, so there where 21 simulated seconds and we received 206 messages with lidar scans.

  17. Arthur Schang

    So in the case of this message:

    1:02:00.042254 (170.8 900.1 -0.0) [('scan', 213)]

    You received ~200 scan messages and nothing else for that minute of wall time?

    What is ('sim_time_sec', 20) indicate? Is that 20 messages on the /clock topic or something else?

  18. Arthur Schang

    The logs you’ve provided are in an unsupported form. What would be required is a rosbag log tracking the dropout observed through remapping of topics. Investigative attempts on CloudSim have provided no concrete evidence indicating a timing issue whereas there is plenty of anecdotal evidence of timing issues. Until we are able to reproduce these issues consistently from a standardized logging procedure, it will be incredibly difficult to solve them.

  19. Martin Dlouhy reporter

    @Arthur Schang my colleague is testing Urban Qualification with with remote control (from console, ROS keyboard control), visualizing lidar scan and images via RViz. It is sometimes working and sometimes not and in RViz you can see that data are sometimes not coming … maybe it is the same issue?

  20. Martin Dlouhy reporter

    p.s. he is using teleop_twist_keyboard and it typically fails before you reach the rails

  21. GoRobotGo

    I have been trying to create a repeatable test case, but have not succeeded yet. The results of the five runs so far:

    8fb4e416-b125-4628-bca2-07d6bbc8e9f1 - failed to run - admin review

    c02f8fa8-6c59-46bc-8087-98865670f608 - all data stopped after a minute - not exactly the same, but similar

    78f46732-d285-4235-be35-4ffb49834fc9 - no issues seen

    9410d118-8241-4ff1-9e3c-70630b0c6888 - no issues seen

    eac91df0-fbbc-4b55-a775-e76e963a133e - no issues seen

    The approach was to modify subt_seed to log the number of messages received each second and to add some processing so that it used more of the CPU.

  22. Sarah Kitchen

    @GoRobotGo , we have added something similar to our solution with 5 second wall clock intervals and counting /clock messages. Of the 6 runs we submitted the last two days (3 in Urban Simple 1 and 3 in Urban Practice 1), I am seeing evidence of issues in several runs - I have not quantified the severity or effect of the issue yet, just comparing # messages per 5 second interval vs. RTF to check for consistency.

    I am preparing information to send to the subt-help email. I have already sent the image URL I am using.

  23. Nate Koenig

    I believe the underlying issue can be traced to TCP buffer overruns. Basically, a TCP buffer can overflow if messages are sent too fast and/or not read fast enough on the other side. We have reduced the likelihood of this in https://bitbucket.org/ignitionrobotics/ign-transport/pull-requests/416/subt-flood-testing/diff. This fix has been deployed to cloudsim.

    However, it's also important to not block transport callbacks in your solution. Doing so will cause you to miss messages. Take a look at this test code for some guidance.

  24. Martin Dlouhy reporter

    This problem is back again (robotika, Ver45p1, 3765be69-bb4b-46f7-b63f-715b861b0a26). I checked the camera of robot B, which can see start of robot A and stops receiving data shortly after A crash (yes, this robotika release was not great), and then B no longer has camera data (note, that B did not move yet).

    0:01:00.103055 (0.0 0.0 0.0) [('acc', 672), ('artf', 3), ('image', 268), ('origin', 269), ('pose2d', 667), ('rot', 672), ('scan', 93), ('sim_time_sec', 14)]
    0:02:00.148753 (0.0 0.0 0.0) [('acc', 683), ('image', 273), ('origin', 273), ('pose2d', 682), ('rot', 683), ('sim_time_sec', 14)]
    0:03:00.148457 (0.0 0.0 0.0) [('acc', 667), ('image', 267), ('origin', 267), ('pose2d', 668), ('rot', 667), ('sim_time_sec', 13)]
    0:04:00.037466 (0.0 0.0 0.0) [('acc', 661), ('image', 264), ('origin', 265), ('pose2d', 661), ('rot', 661), ('sim_time_sec', 13)]
    0:05:00.223220 (0.0 0.0 0.0) [('acc', 667), ('image', 108), ('origin', 266), ('pose2d', 667), ('rot', 667), ('sim_time_sec', 14)]
    0:06:00.098936 (0.0 0.0 0.0) [('acc', 669), ('origin', 266), ('pose2d', 669), ('rot', 669), ('sim_time_sec', 13)]
    0:07:00.142890 (0.0 0.0 0.0) [('acc', 673), ('origin', 268), ('pose2d', 673), ('rot', 673), ('sim_time_sec', 14)]
    0:08:00.103073 (0.0 0.0 0.0) [('acc', 671), ('origin', 267), ('pose2d', 482), ('rot', 671), ('sim_time_sec', 13)]
    0:09:00.211639 (0.0 0.0 0.0) [('acc', 682), ('origin', 273), ('rot', 682), ('sim_time_sec', 14)]
    0:10:00.137003 (0.0 0.0 0.0) [('acc', 681), ('origin', 271), ('rot', 681), ('sim_time_sec', 13)]
    

    Note, that for example robot E runs fine (all 5 robots have the same docker image).

  25. Martin Dlouhy reporter

    the last run seems to be partially related to our bug (scan data) - they are counted on input (C++ code), but not at the very end (Python3, output of my previous post). But the issue is still valid for “Image” (last report):

    157.076000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:116(imageCallback)] [topics: ] received Image 2100 
    

    and the reason could be that it was not read & processed in time.

  26. Martin Dlouhy reporter

    I would just update note, which I also sent to subt-help@, that this problem is back again … or it is observable under heavy load? Some runs are totally unusable while in others the robots score 5 points (ver51p1 vs. ver51p1b - the same robotika docker image). I am just curious if other teams observe similar issue? Is it possible to check on CloudSim side (or better the bridge side), that the sensor data are no longer re-transmitted? Also it would be nice to know what the control program could do in such a fatal case? thanks m.

  27. GoRobotGo

    I can confirm that this bug is back in force. All four runs that I attempted yesterday were taken out by this bug or a variant.

    2c05f538-2b55-4f91-b4a2-717537db0898 - image data stopped after about 80 seconds

    41a9398c-2816-450c-9a15-a1dc95a9b505 - IMU data stopped after 500 seconds

    a68ab8f4-cc4f-459b-9290-f0bcc6f2ccdf - no data appears to have arrived

    28be546f-d924-4e56-957d-c17f76e3df8b - no image data

  28. Reid Sawtell

    We had this problem yesterday as well. We have a node who’s only job is to monitor the clock topic and compare it to system time and there were long gaps of no clock messages received at all on several of our runs. For example, in one run we stopped getting clock messages at ROS time 2556.108, they did not resume until ROS time 3829.104.

  29. Nate Koenig

    GoRobotGo: When you say "image data stopped" or "imu data stopped", do you mean that at a specific time those topics never sent another message while other topics continued sending data?

  30. Zbyněk Winkler

    That is what is happening to us - some topics keep being received while others stop at certain point and maybe reappear later. You can check in our rosout log. We print info line for each 100 messages received for a topic. See for example http://osgar.robotika.cz/subt/virtual/ver53p1/B-rosout.log:

    210.332000000 INFO [subt_seed_node.cc:125(odomCallback)] [] received Odom 600
    211.388000000 INFO [subt_seed_node.cc:107(scanCallback)] [] received Scan 200
    220.864000000 INFO [subt_seed_node.cc:125(odomCallback)] [] received Odom 700
    220.872000000 INFO [subt_seed_node.cc:125(odomCallback)] [] received Odom 800
    220.872000000 INFO [subt_seed_node.cc:125(odomCallback)] [] received Odom 900
    220.872000000 INFO [subt_seed_node.cc:107(scanCallback)] [] received Scan 300
    220.876000000 INFO [subt_seed_node.cc:125(odomCallback)] [] received Odom 1000
    221.072000000 INFO [subt_seed_node.cc:98(imuCallback)] [] received Imu 400 
    221.072000000 INFO [subt_seed_node.cc:116(imageCallback)] [] received Image 0
    222.628000000 INFO [subt_seed_node.cc:134(gasCallback)] [] received Gas 200
    224.232000000 INFO [subt_seed_node.cc:98(imuCallback)] [] received Imu 500 
    225.068000000 INFO [subt_seed_node.cc:98(imuCallback)] [] received Imu 600 
    225.176000000 INFO [subt_seed_node.cc:89(clockCallback)] [] received Clock 3000 
    227.084000000 INFO [subt_seed_node.cc:98(imuCallback)] [] received Imu 700 
    229.100000000 INFO [subt_seed_node.cc:98(imuCallback)] [] received Imu 800 
    229.852000000 INFO [subt_seed_node.cc:89(clockCallback)] [] received Clock 4000 
    231.200000000 INFO [subt_seed_node.cc:98(imuCallback)] [] received Imu 900 
    233.084000000 INFO [subt_seed_node.cc:98(imuCallback)] [] received Imu 1000 
    234.960000000 INFO [subt_seed_node.cc:89(clockCallback)] [] received Clock 5000 
    235.112000000 INFO [subt_seed_node.cc:98(imuCallback)] [] received Imu 1100 
    237.112000000 INFO [subt_seed_node.cc:98(imuCallback)] [] received Imu 1200 
    

    As you can see from this excerpt, the topics come and go at random - note for example how many other messages we have received before Clock appeared again, Odom and Scan disappeared etc.

  31. Zbyněk Winkler

    The number at the end of the lines is the total number of messages received from a given topic.

  32. GoRobotGo

    For 2c05f538-2b55-4f91-b4a2-717537db0898 - yes image data stopped, but IMU data continued

    For 41a9398c-2816-450c-9a15-a1dc95a9b505 - maybe - I looked at this more closely image data dropped out early once, then again at 419 and then imu at 504. After IMU dropped it was hard to tell.

    For the other 2 my logs didn’t have enough data to tell.

  33. Sophisticated Engineering

    We also had 3 of 4 runs witch might have the same problem. Only one showed results as expected. But we did not have time to check in more detail.

  34. Malcolm Stagg

    In case it helps support the hypothesis that this occurs under heavy load, I had 6 simulations last night all auto-restart at the same time, and I believe all 6 hit this issue. An earlier simulation, started on its own, appeared to be fine. I haven’t investigated very much yet though to figure out exactly what was going on, but whatever it was caused move_base to fail. I will update if I find out anything else that’s useful.

  35. Zbyněk Winkler

    I am trying to confirm the hypothesis that the problem happens under heavy load. TL DR: the load must be really really heavy but I was able to reproduce it locally.

    Our solution contains 2 processes - first is a ros cpp node that subscribes to all topics we need and forwards all received data to the second process which is python implementation of our controller (which is completely ROS independent). On my notebook the ros node takes around 40% CPU and the python process takes around 300% CPU (I have total of 8 logical cores at 3.8Gz).

    In order to simulate the problem I used taskset to move both processes (with all its threads) to a single CPU core. The python part contains some adaptation to load so the result was 40% for ros proxy and 60% for python controller. However, the controller was driving the robot around just fine.

    Then I reniced the ros proxy to 19 and only then the problems appeared. In order to better see what is going on, it is beneficial to enable ROS DEBUG logging (see https://github.com/robotika/osgar/pull/303/commits). Then I get logs of messages being dropped due to queues being full. The fastest topics go down first, like ie. clock, then imu etc.

    So my hypothesis is that when cloudsim runs many simulations in parallel, the controllers get significantly less CPU then before when almost nothing has been running. It would be nice to check if this is indeed what is happening.

  36. Log in to comment