Local/AWS docker simulation does not always start

Issue #225 new
Martin Dlouhy created an issue

I originally suspected wrong configuration selection (#217), but I am much more careful now and never the less robot sometimes does not react??

I use 3 docker debug setup: sim, bridge and solution:

gpu@gpu-desktop:~/subt_testbed$ ./run.bash osrf/subt-virtual-testbed:cloudsim_sim_latest cloudsim_sim.ign worldName:=simple_tunnel_02 robotName1:=X2L robotConfig1:=X2_SENSOR_CONFIG_4

gpu@gpu-desktop:~/subt_testbed$ ./run.bash osrf/subt-virtual-testbed:cloudsim_bridge_latest worldName:=simple_tunnel_02 robotName1:=X2L robotConfig1:=X2_SENSOR_CONFIG_4

and 3rd is our docker based on subt_seed example.

On the simulation terminal I see:

[Msg] Scoring has Started

but nothing more although the robot should be already moving.

Bridge reports:

ROS_MASTER_URI=http://localhost:11311

No processes to monitor
shutting down processing monitor...
... shutting down processing monitor complete
process[X2L/ros1_ign_bridge_imu-1]: started with pid [119]
process[tf_world_static-4]: started with pid [122]
[Dbg] [Manager.cc:421] Death of process[52] with name[x2_description].
process[X2L/ros1_ign_bridge_pose-2]: started with pid [153]
process[X2L/ros1_ign_bridge_twist-3]: started with pid [171]
process[X2L/ros1_ign_bridge_camera-4]: started with pid [184]
process[X2L/ros1_ign_image_camera-5]: started with pid [197]
process[X2L/ros1_ign_bridge_gpu_lidar-6]: started with pid [210]
process[X2L/ros1_ign_bridge_odom-7]: started with pid [222]
process[X2L/ros1_ign_bridge_battery_state-8]: started with pid [235]
process[X2L/pose_tf_broadcaster-9]: started with pid [248]

and the solution reports receiving of data:

[ INFO] [1570181442.395531202, 957.000000000]: received Imu 3500
[ INFO] [1570181442.395660705, 957.000000000]: received Odom 3500
[ INFO] [1570181442.473386446, 957.048000000]: received Scan 1400
[ INFO] [1570181442.473581647, 957.048000000]: received Image 1400
[ INFO] [1570181445.649678831, 959.000000000]: received Imu 3600
[ INFO] [1570181445.649816540, 959.000000000]: received Odom 3600
0:02:00.069263 (0.0 0.0 0.0) [('acc', 1825), ('image', 730), ('origin', 730), ('pose2d', 1825), ('rot', 1825), ('scan', 365), ('sim_time_sec', 36)]
0:02:00.069263 []
[ INFO] [1570181447.128823277, 959.904000000]: Python3: stdout 0:02:00.069263 (0.0 0.0 0.0) [('acc', 1825), ('image', 730), ('origin', 730), ('pose2d', 1825), ('rot', 1825), ('scan', 365), ('sim_time_sec', 36)]

but the robot is still at (0, 0, 0) even after 2 minutes of “simulation”??

Any hint what I should check? This happened to me alreaday say 5 times? I am in particular “scared” of necessary sleep 30 in example solution “to wait for bridge” … isn’t there also some other “timeout” when the solution has to respond after sim and bridge are started?

p.s. the robot is sending turn command since the beginning:

0:00:02.305059 20 b'cmd_vel 0.000000 -0.785398'
0:00:02.354023 20 b'cmd_vel 0.000000 -0.785398'

Comments (28)

  1. Martin Dlouhy reporter

    In working example has the bridge this output:

    ROS_MASTER_URI=http://localhost:11311
    
    No processes to monitor
    shutting down processing monitor...
    process[subt_ros_relay-3]: started with pid [98]
    ... shutting down processing monitor complete
    process[X2L/ros1_ign_bridge_imu-1]: started with pid [121]
    [Dbg] [Manager.cc:421] Death of process[52] with name[x2_description].
    process[tf_world_static-4]: started with pid [130]
    process[X2L/ros1_ign_bridge_pose-2]: started with pid [152]
    process[X2L/ros1_ign_bridge_twist-3]: started with pid [171]
    process[X2L/ros1_ign_bridge_camera-4]: started with pid [184]
    process[X2L/ros1_ign_image_camera-5]: started with pid [197]
    process[X2L/ros1_ign_bridge_gpu_lidar-6]: started with pid [210]
    process[X2L/ros1_ign_bridge_odom-7]: started with pid [227]
    process[X2L/ros1_ign_bridge_battery_state-8]: started with pid [236]
    process[X2L/pose_tf_broadcaster-9]: started with pid [249]
    

    … I just re-run all 3 dockers, no other change (I run them in sequence sim, bridge, solution, without too much delay) … here I can see different order or processes, does it matter? Is subt_ros_relay-3 important for something?

  2. Martin Dlouhy reporter

    p.s. looking again into broken run and there is also the “relay”, but sooner:

        ros1_ign_bridge_pose (ros1_ign_bridge/parameter_bridge)
        ros1_ign_bridge_twist (ros1_ign_bridge/parameter_bridge)
        ros1_ign_image_camera (ros1_ign_image/image_bridge)
    
    ROS_MASTER_URI=http://localhost:11311
    
    process[subt_ros_relay-3]: started with pid [97]
    started roslaunch server http://gpu-desktop:45041/
    
    SUMMARY
    ========
    
    PARAMETERS
     * /X2L/robot_description: <?xml version="1....
     * /rosdistro: melodic
     * /rosversion: 1.14.3
    
    NODES
    
    ROS_MASTER_URI=http://localhost:11311
    
    No processes to monitor
    shutting down processing monitor...
    

  3. Martin Dlouhy reporter

    When comparing the two bridge stdout I see a small difference - for some reason there are two masters started in both cases, but in the bad one the ouput is “interlaced”:

    auto-starting new master
    process[master]: started with pid [68]
    master has started, initiating launch
    master has started, initiating launch
    ROS_MASTER_URI=http://localhost:11311
    
    setting /run_id to fcbcc4b0-e686-11e9-a476-7071bc510133
    ... logging to /home/developer/.ros/log/fcbcc4b0-e686-11e9-a476-7071bc510133/roslaunch-gpu-desktop-53.log
    Checking log directory for disk usage. This may take awhile.
    Press Ctrl-C to interrupt
    ... logging to /home/developer/.ros/log/fcbcc4b0-e686-11e9-a476-7071bc510133/roslaunch-gpu-desktop-52.log
    Checking log directory for disk usage. This may take awhile.
    Press Ctrl-C to interrupt
    Done checking log file disk usage. Usage is <1GB.
    
    Done checking log file disk usage. Usage is <1GB.
    

    and in the good case it is not:

    auto-starting new master
    process[master]: started with pid [68]
    master has started, initiating launch
    master has started, initiating launch
    ROS_MASTER_URI=http://localhost:11311
    
    setting /run_id to e9ba5674-e68c-11e9-9b0c-7071bc510133
    ... logging to /home/developer/.ros/log/e9ba5674-e68c-11e9-9b0c-7071bc510133/roslaunch-gpu-desktop-53.log
    Checking log directory for disk usage. This may take awhile.
    Press Ctrl-C to interrupt
    Done checking log file disk usage. Usage is <1GB.
    
    ... logging to /home/developer/.ros/log/e9ba5674-e68c-11e9-9b0c-7071bc510133/roslaunch-gpu-desktop-52.log
    Checking log directory for disk usage. This may take awhile.
    Press Ctrl-C to interrupt
    Done checking log file disk usage. Usage is <1GB.
    

    but this could be just coincidence …

  4. Alfredo Bencomo

    I use 3 docker debug setup: sim, bridge and solution

    Martin, I’m going to try to reproduce it, but is the above with the seed solution or your solution?

  5. Martin Dlouhy reporter

    It is our docker solution, but the beginning is identical to seed example so it may happen there too. At the moment it is not fatal, I just have to watch it and make sure it really starts. Thanks m.

    p.s. you can try to remove the 30s sleep - on AWS it failed (again I am not 100% sure it was related, but when I returned the sleep there it worked again)

  6. Martin Dlouhy reporter

    It just happened again 😞

    [ INFO] [1570278211.349145962, 51.680000000]: Python3: stdout 0:01:00.018846 (0.0 0.0 0.0) [('acc', 1666), ('image', 666), ('origin', 670), ('pose2d', 1664), ('rot', 1666), ('scan', 333), ('sim_time_sec', 34)]

    i.e. not moving for one minute …

    (actually 3 already, but … shortly it is not moving).

    The bridge has “normal separated start”:

    auto-starting new master
    process[master]: started with pid [69]
    ROS_MASTER_URI=http://localhost:11311
    
    setting /run_id to bf72d498-e76a-11e9-b646-7071bc510133
    master has started, initiating launch
    master has started, initiating launch
    ... logging to /home/developer/.ros/log/bf72d498-e76a-11e9-b646-7071bc510133/roslaunch-gpu-desktop-53.log
    Checking log directory for disk usage. This may take awhile.
    Press Ctrl-C to interrupt
    Done checking log file disk usage. Usage is <1GB.
    
    ... logging to /home/developer/.ros/log/bf72d498-e76a-11e9-b646-7071bc510133/roslaunch-gpu-desktop-54.log
    Checking log directory for disk usage. This may take awhile.
    Press Ctrl-C to interrupt
    Done checking log file disk usage. Usage is <1GB.
    
    process[rosout-1]: started with pid [84]
    started core service [/rosout]
    started roslaunch server http://gpu-desktop:43715/
    

  7. Martin Dlouhy reporter

    p.s. this time the simulation docker proceeded:

    [Msg] Publishing laser scans on [world/simple_tunnel_02/model/X0_40_100_400L/lin
    k/base_link/sensor/front_laser/scan]
    [GUI] [Msg] Loading plugin [ignition-rendering2-ogre2]
    [GUI] [Dbg] [RenderUtil.cc:837] Create scene [scene]
    [Msg] Scoring has Started
    [Dbg] [LinearBatteryPlugin.cc:333] [Battery Plugin] Battery drain: 0 minutes passed.
    
    [Dbg] [LinearBatteryPlugin.cc:333] [Battery Plugin] Battery drain: 1 minutes passed.
    [Dbg] [LinearBatteryPlugin.cc:430] Battery: linear_battery
    [Dbg] [LinearBatteryPlugin.cc:431] PowerLoads().size(): 1
    [Dbg] [LinearBatteryPlugin.cc:433] voltage: 12.6591
    [Dbg] [LinearBatteryPlugin.cc:434] state of charge: 0.909999 (q 17.9838)
    
    [Dbg] [LinearBatteryPlugin.cc:333] [Battery Plugin] Battery drain: 2 minutes passed.
    [Dbg] [LinearBatteryPlugin.cc:333] [Battery Plugin] Battery drain: 3 minutes passed.
    [Dbg] [LinearBatteryPlugin.cc:430] Battery: linear_battery
    [Dbg] [LinearBatteryPlugin.cc:431] PowerLoads().size(): 1
    

  8. Sophisticated Engineering

    Since this morning I have much problems starting the osrf/subt-virtual-testbed docker image. It takes up to 10 minutes. Maybe it downloads the model data (fuel…).

    Unfortunately I deleted the older version of the image. Is there any chance to again pull an older version of the docker image?

  9. Alfredo Bencomo

    Can both of you send me the console output from these two commands?

    $ docker images | grep srf/subt-virtual-testbed
    
    $ docker ps -a
    
    $ docker version
    
    $ docker-compose version
    

  10. Alfredo Bencomo

    Can you reproduce it all the time?

    How long during the run when it happens?

    Is the vehicle inside the tunnel? How many?

    Post also the exact vehicles configurations when this occur.

  11. Martin Dlouhy reporter

    It happens only “once a while” and it always happens at the beginning. The vehicle does not move so it is at the start position outside the tunnel. The configuration is X2_SENSOR_CONFIG_4

    gpu@gpu-desktop:~$ docker images | grep srf/subt-virtual-testbed
    osrf/subt-virtual-testbed                                    subt_solution_latest          bc0376256a85        2 weeks ago         3.36GB
    osrf/subt-virtual-testbed                                    cloudsim_sim_latest           dac9410605e6        2 weeks ago         4.55GB
    osrf/subt-virtual-testbed                                    cloudsim_bridge_latest        d8c6a32fef70        2 weeks ago         3.41GB
    osrf/subt-virtual-testbed                                    latest                        5198acd449d2        3 weeks ago         4.49GB
    osrf/subt-virtual-testbed                                    <none>                        66da30f8d158        3 weeks ago         4.49GB
    osrf/subt-virtual-testbed                                    <none>                        a28b5c00f6ab        4 weeks ago         4.49GB
    osrf/subt-virtual-testbed                                    <none>                        515cf3d05a94        5 weeks ago         4.49GB
    osrf/subt-virtual-testbed                                    <none>                        c44241493b84        5 weeks ago         3.21GB
    gpu@gpu-desktop:~$ docker ps -a
    CONTAINER ID        IMAGE                                              COMMAND                  CREATED             STATUS              PORTS               NAMES
    9a8e511ae360        osrf/subt-virtual-testbed:cloudsim_bridge_latest   "./run_bridge.bash w…"   5 minutes ago       Up 5 minutes                            frosty_jang
    934268f6fcc5        osrf/subt-virtual-testbed:cloudsim_sim_latest      "./run_sim.bash clou…"   5 minutes ago       Up 5 minutes                            nervous_ritchie
    cdb07335caeb        sln_bash                                           "/bin/bash"              2 hours ago         Up 2 hours                              distracted_joliot
    gpu@gpu-desktop:~$ docker version
    Client:
     Version:           18.09.7
     API version:       1.39
     Go version:        go1.10.8
     Git commit:        2d0083d
     Built:             Thu Jun 27 17:56:23 2019
     OS/Arch:           linux/amd64
     Experimental:      false
    
    Server: Docker Engine - Community
     Engine:
      Version:          18.09.7
      API version:      1.39 (minimum version 1.12)
      Go version:       go1.10.8
      Git commit:       2d0083d
      Built:            Thu Jun 27 17:23:02 2019
      OS/Arch:          linux/amd64
      Experimental:     false
    gpu@gpu-desktop:~$ docker-compose version
    docker-compose version 1.24.1, build 4667896b
    docker-py version: 3.7.3
    CPython version: 3.6.8
    OpenSSL version: OpenSSL 1.1.0j  20 Nov 2018
    

  12. Sophisticated Engineering

    Here are the results:

    $ docker images | grep srf/subt-virtual-testbed
    osrf/subt-virtual-testbed                                                        latest                        4b30c527463d        4 days ago          4.51GB
    osrf/subt-virtual-testbed                                                        subt_solution_latest          bc0376256a85        2 weeks ago         3.36GB
    osrf/subt-virtual-testbed                                                        cloudsim_sim_latest           dac9410605e6        2 weeks ago         4.55GB
    osrf/subt-virtual-testbed                                                        cloudsim_bridge_latest        d8c6a32fef70        2 weeks ago         3.41GB
    
    
    $ docker ps -a
    CONTAINER ID        IMAGE                    COMMAND                  CREATED             STATUS                  PORTS               NAMES
    4459bfbd4668        sopheng_solution_devel   "/bin/bash"              6 hours ago         Up 6 hours                                  fervent_solomon
    28be2cba9fa7        2f4ca2a5c282             "/bin/sh -c '/bin/ba…"   3 days ago          Exited (1) 3 days ago                       affectionate_bardeen
    adf95a8ad95c        2f4ca2a5c282             "/bin/sh -c '/bin/ba…"   3 days ago          Exited (1) 3 days ago                       gallant_euler
    8d1260cd9841        4d399cc50169             "/bin/sh -c '/bin/rm…"   6 days ago          Exited (1) 6 days ago                       great_boyd
    d91dc7eb0e1d        4d399cc50169             "/bin/sh -c '/bin/rm…"   6 days ago          Exited (1) 6 days ago                       inspiring_kare
    
    
    $ docker version
    Client: Docker Engine - Community
     Version:           19.03.2
     API version:       1.40
     Go version:        go1.12.8
     Git commit:        6a30dfc
     Built:             Thu Aug 29 05:29:11 2019
     OS/Arch:           linux/amd64
     Experimental:      false
    
    Server: Docker Engine - Community
     Engine:
      Version:          19.03.2
      API version:      1.40 (minimum version 1.12)
      Go version:       go1.12.8
      Git commit:       6a30dfc
      Built:            Thu Aug 29 05:27:45 2019
      OS/Arch:          linux/amd64
      Experimental:     false
     containerd:
      Version:          1.2.6
      GitCommit:        894b81a4b802e4eb2a91d1ce216b8817763c29fb
     runc:
      Version:          1.0.0-rc8
      GitCommit:        425e105d5a03fabd737a126ad93d62a9eeede87f
     docker-init:
      Version:          0.18.0
      GitCommit:        fec3683
    
    
    $ docker-compose version
    docker-compose version 1.24.1, build 4667896b
    docker-py version: 3.7.3
    CPython version: 3.6.8
    OpenSSL version: OpenSSL 1.1.0j  20 Nov 2018
    

    I could reproduce it. It happend every startup during the last 8 hours. After starting the docker container with 3 Robots (one X2 one X3 and one X4) it took some minutes until the robots were shown in the Subt Simulator.

    The last time now it was really quick. After 38 seconds all 3 robots and the tunnel are shown in the Subt Simultator. Thanks, for me it currently works. I will come back if it reoccurs.

  13. Martin Dlouhy reporter

    🙂 I see! I mean I run everything remotely so I do not see any Subt Simulator so the robots are maybe not there yet, very good point! Thank you Sophisticated Engineering 🙂

    … so is there a way how to check that the robots are already “visible” over CLI?

  14. Alfredo Bencomo

    Martin, are you saying that you use ssh to run your solution on a remote machine?

    Are you running with docker-compose or via 3 terminals to launch the sim, the bridge, and the solution?

  15. Martin Dlouhy reporter

    I use ssh to remote machine, but I use 3 dockers sim, bridge, solution, not the docker-compose.

  16. Martin Dlouhy reporter

    This happened just on AWS, Simple Tunnel 01, robotika, 1dfdc090-6495-437e-9e38-72ad9ce01660, robots A0F40F100F400L and B30F40F100F400R.

    I was able to extract OSGAR logs from B and I see that robot A never moved. Note, that A started immediately (A0F) while B is supposed to wait 30 (B30F). And for A I see:

    0.000000000  Node Startup
    0.000000000 INFO /tf_world_static [/tmp/binarydeb/ros-melodic-tf2-ros-0.6.5/src/static_transform_broadcaster_program.cpp:142(main)] [topics: /rosout, /tf_static] Spinning until killed publishing world to simple_tunnel_01
    0.000000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:204(Controller)] [topics: /rosout] Waiting for /clock, /subt/start, and /subt/pose_from_artifact
    137.748000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:212(Controller)] [topics: /rosout] Using robot name[A0F40F100F400L]
    
    137.752000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:247(Update)] [topics: /rosout] Sent start signal.
    137.796000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:403(getSpeedCmd)] [topics: /rosout, /A0F40F100F400L/cmd_vel] Python3: stdout SubT Challenge Ver2!
    137.796000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:403(getSpeedCmd)] [topics: /rosout, /A0F40F100F400L/cmd_vel] Python3: stdout Waiting for robot_name ...
    137.796000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:313(Update)] [topics: /rosout, /A0F40F100F400L/cmd_vel] MD robot pose -6 5 dist=61
    137.848000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:403(getSpeedCmd)] [topics: /rosout, /A0F40F100F400L/cmd_vel] Python3: stdout 0:00:01.950919 (0.0 0.0 0.0) []
    137.848000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:403(getSpeedCmd)] [topics: /rosout, /A0F40F100F400L/cmd_vel] Python3: stdout robot_name: A0F40F100F400L
    137.848000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:403(getSpeedCmd)] [topics: /rosout, /A0F40F100F400L/cmd_vel] Python3: stdout Use right wall: False
    137.848000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:403(getSpeedCmd)] [topics: /rosout, /A0F40F100F400L/cmd_vel] Python3: stdout Using times [0, 40, 100, 400]
    137.848000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:403(getSpeedCmd)] [topics: /rosout, /A0F40F100F400L/cmd_vel] Python3: stdout Artifacts before start: []
    137.848000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:403(getSpeedCmd)] [topics: /rosout, /A0F40F100F400L/cmd_vel] Python3: stdout Waiting for origin ...
    137.848000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:403(getSpeedCmd)] [topics: /rosout, /A0F40F100F400L/cmd_vel] Python3: stdout Origin: [-6.0, 5.0, 0.0615] A0F40F100F400L
    137.848000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:403(getSpeedCmd)] [topics: /rosout, /A0F40F100F400L/cmd_vel] Python3: stdout Offset: (-6.0, 5.0, 0.0615)
    137.848000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:403(getSpeedCmd)] [topics: /rosout, /A0F40F100F400L/cmd_vel] Python3: stdout heading -0.0 angle -39.80557109226519 dist 7.810249675906654
    137.948000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:103(odomCallback)] [topics: /rosout, /A0F40F100F400L/cmd_vel] received Odom 0
    137.948000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:76(imuCallback)] [topics: /rosout, /A0F40F100F400L/cmd_vel] received Imu 0 
    137.948000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:85(scanCallback)] [topics: /rosout, /A0F40F100F400L/cmd_vel] received Scan 0
    137.948000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:94(imageCallback)] [topics: /rosout, /A0F40F100F400L/cmd_vel] received Image 0
    138.000000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:403(getSpeedCmd)] [topics: /rosout, /A0F40F100F400L/cmd_vel] Python3: stdout Image resolution (960, 1280, 3)
    138.000000000 INFO /A0F40F100F400L/ros1_ign_bridge_twist [/tmp/ros1-ign-bridge-release/include/ros1_ign_bridge/factory.hpp:137(Factory<ROS1_T, IGN_T>::ros1_callback)] [topics: /rosout] Passing message from ROS1 geometry_msgs/Twist to Ignition ignition.msgs.Twist (showing msg only once per type
    139.948000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:103(odomCallback)] [topics: /rosout, /A0F40F100F400L/cmd_vel] received Odom 100
    1
    ...
    213.200000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:403(getSpeedCmd)] [topics: /rosout, /A0F40F100F400L/cmd_vel] Python3: stdout 0:02:00.015000 (0.0 0.0 0.0) [('acc', 2058), ('image', 823), ('origin', 823), ('pose2d', 2057), ('rot', 2058), ('scan', 412), ('sim_time_sec', 42)]
    ...
    391.200000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:403(getSpeedCmd)] [topics: /rosout, /A0F40F100F400L/cmd_vel] Python3: stdout 0:07:00.014022 (0.0 0.0 0.0) [('acc', 1750), ('image', 700), ('origin', 700), ('pose2d', 1750), ('rot', 1750), ('scan', 350), ('sim_time_sec', 35)]
    ...
    

    simply it never moves.

    Could it be related to

    138.000000000 INFO /A0F40F100F400L/ros1_ign_bridge_twist [/tmp/ros1-ign-bridge-release/include/ros1_ign_bridge/factory.hpp:137(Factory<ROS1_T, IGN_T>::ros1_callback)] [topics: /rosout] Passing message from ROS1 geometry_msgs/Twist to Ignition ignition.msgs.Twist (showing msg only once per type 
    

    ?

    on the “B” robot it was sooner:

    136.564000000 INFO /B30F40F100F400R/ros1_ign_bridge_twist [/tmp/ros1-ign-bridge-release/include/ros1_ign_bridge/factory.hpp:137(Factory<ROS1_T, IGN_T>::ros1_callback)] [topics: /rosout] Passing message from ROS1 geometry_msgs/Twist to Ignition ignition.msgs.Twist (showing msg only once per type

    as I said this code is based on subt_seed example

  17. Arthur Schang

    Martin, in your velocity controller/issuer is there any sort of logic that will time-out if your autonomy stack does not see your robot respond to velocity commands it is issuing (this is the case for move-base). Would your velocity controller continue sending commands throughout the entire simulation duration irregardless of whether or not the robot had responded to commands in the past? It seems the velocity controller startup is delayed as you have pointed out and that could be an issue. To clarify, robot B moved but robot A did not move?

  18. Martin Dlouhy reporter

    The controller (Python3) is sending commands on timer and the desired speed change is triggered by odometry readings. If the robot does not turn it will permanently send commands to turn (that was the command mentioned at the beginning:

    0:00:02.305059 20 b'cmd_vel 0.000000 -0.785398'
    

    I do not have the log from AWS unfortunately as the robot dumps it when the task is over.

    On the simulation side, if there is any delay, I suppose that there is some watchdog and the robot stops, but re-starts with new velocity command, right?

  19. Arthur Schang

    As long as you’re sending velocity commands to the correct topic, the robot should move. Is there any way your Python script failed to pass a blocking statement before sending commands? Do you have a printout to see when it begins sending commands to the robot?

  20. Martin Dlouhy reporter

    I do not have any printout of sending commands from C++ code … I can add one for debug, or counter how many messages were sent.

    My colleague just suggested to join the running solution docker and run rostopic info/echo for velocity commands … that I should see if anybody else is listening. Locally I can get to this state by starting the simulation and bridge sooner then the solution docker … so hopefully I will know more say in 1 hour.

    thanks m.

  21. Arthur Schang

    It looks like your most recent run ended, sorry I wasn’t able to get on Cloudsim in time to check. Did it work as intended or are you still having the same issue?

  22. Martin Dlouhy reporter

    If it was “Virtual STIX” then the result was the same (robot A did not move, B moved as intended).

  23. Arthur Schang

    Wait, did you submit on Virtual STIX earlier on the previous run? The logs you posted above are from simple tunnel 01.

    I’ll respond to your email sent to subt-help@googlegroups.com. I should just jump into your instance while it’s running and poke around to see if I can figure out if there’s an issue on the Cloudsim side.

    Do vehicles A and B ALWAYS move as expected with a local Cloudsim run? If it does not work locally there is even less chance of it working remotely.

  24. Martin Dlouhy reporter

    I am waiting for “SubT Portal Upgrade” as mentioned in e-mail 2 hours ago. I noticed this problem on AWS for the first time and it was “Virtual STIX”. As soon as AWS is available again, I can re-run it on other world(s). I added in robot namecode extra 10s (simulated) wait even for the first robot … but I may intentionally revert it … also note, that the first number after 1st letter (A, B, C, D, E?) is time how long the robot has to wait before it can start navigation. So if there is A100F*R it is expected that it will wait 100s before it moves. The failing case was A0F*

    As usually Murphy would be satisfied … I am currently not able to force it to fail locally … when everything is working fine, then:

    developer@gpu-desktop:~/subt_solution$ rostopic info /A0F100R/cmd_vel
    Type: geometry_msgs/Twist
    
    Publishers:
     * /subt_solution (http://gpu-desktop:37551/)
    
    Subscribers:
     * /A0F100R/ros1_ign_bridge_twist (http://gpu-desktop:32957/)
    

  25. Martin Dlouhy reporter

    OK, the very last simulation before upgrade (robotika, ver24p01x4, 5d281e38-5571-4653-9847-3b4838eb59a2, Practice Tunnel 01) failed robot C (C120F100F400F1000L). There were no odometry data and after 35 minutes there were no lidar scans

    1. it is not related to Virtual STIX
    2. it can happen even after several minutes of waiting (here 120s of simulation)
    3. it is probably related to missing odometry data

  26. Martin Dlouhy reporter

    only robot D survived and recorded the other robots … you can see lidar trace of entering robot A and B and C not moving even after 6 minutes from start (real time)

  27. Martin Dlouhy reporter

    Robot C:

    447.932000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:403(getSpeedCmd)] [topics: /rosout, /C120F100F400F1000L/cmd_vel] Python3: stdout 0:35:00.182576 (0.0 0.0 0.0) [('acc', 575), ('image', 230), ('origin', 230), ('rot', 575), ('scan', 89)] 
    
    459.332000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:403(getSpeedCmd)] [topics: /rosout, /C120F100F400F1000L/cmd_vel] Python3: stdout 0:36:00.194315 (0.0 0.0 0.0) [('acc', 570), ('image', 227), ('origin', 228), ('rot', 570)]
    

    and the lost of scans partially (1 minute) correlates with hard crash of robot D - acc data:

    0:36:20.722400 16 [236, 1742, 9837]
    0:36:20.738244 16 [-176, 6251, 15514]
    0:36:20.909687 16 [148, 3073, 9173]
    

    but maybe it is just “acc jitter” described in #166

  28. Log in to comment