Gazebo issues on Windows Subsystem for Linux (WSL)

Issue #2351 resolved
Silvio Traversaro
created an issue

I recently discover that a few users of gazebo-yarp-plugins are using Gazebo on WSL (a.k.a. Ubuntu on Windows, https://msdn.microsoft.com/en-us/commandline/wsl/about ). See https://bitbucket.org/osrf/gazebo_tutorials/pull-requests/364/tutorial-for-installing-on-ubuntu-on/diff for a tutorial on how to run Gazebo on WSL.

For middleware that support Windows such as YARP, this is an interesting alternative to get Gazebo running on Windows, because it permits to have just the minimum amount of Gazebo-related code running under WSL, while the rest of the software can run on the actual Windows system, communicating with WSL-processes using regular network sockets.

However, I noticed some WSL-specific problem in Gazebo, that I think it is worth reporting (even if most of them are actually WSL bugs).

Error setting socket option (IP_MULTICAST_IF)

Related WSL issue: https://github.com/Microsoft/BashOnWindows/issues/990

Problem

All code using ignition-transport (including Gazebo 8 and 9) fails on the regular WSL with the following error:

Error setting socket option (IP_MULTICAST_IF).
Error setting socket option (IP_MULTICAST_IF).
Did you set the environment variable IGN_IP with a correct IP address?
  [192.168.1.100] seems an invalid local IP address.
  Using 127.0.0.1 as hostname.
terminate called after throwing an instance of 'std::out_of_range'
  what():  vector::_M_range_check: __n (which is 0) >= this->size() (which is 0)
Aborted (core dumped)

This is due to the fact that the IP_MULTICAST_IF socket option is not supported in the released version of WSL, and support for it have been introduced only in Windows build 16176 ( https://github.com/Microsoft/BashOnWindows/issues/990 ) that is currently only available if you use the "Insider" version of Windows.

Possible solutions

I am not an expert of the schedule of Windows update, but I think waiting for the fix to be released is the easiest option. People interested in running the latest version of Gazebo in the meanwhile can update to the "Insider" version of Windows.

clock_nanosleep with clockid CLOCK_REALTIME fails with error EINVAL

Related WSL issue: https://github.com/Microsoft/BashOnWindows/issues/2503

Related Gazebo issue: https://bitbucket.org/osrf/gazebo/issues/2058/use-clock_monotonic-in-sleep-and-timer

Problem

Both the gazebo::common::Time::Sleep(...) and ign::common::Time::Sleep(...) methods use the system call clock_nanosleep(CLOCK_REALTIME, ... ) to sleep the current thread. However, clock_nanosleep in WSL works only with the clock CLOCK_MONOTONIC. I opened an issue for this on WSL issue tracker https://github.com/Microsoft/BashOnWindows/issues/2503 , but I don't it will be solved anytime soon.

Possible solutions

One possible solution is to check if clock_nanosleep(CLOCK_REALTIME, ...) is supported during the Time class initialization, and switch otherwise to use clock_nanosleep(CLOCK_MONOTONIC, ...) . A more complicated possible solution is to migrate the Time classes to the new C++11 <chrono> functions, that seem to work fine also on WSL (even if I imagine that they are internally implemented using the clock_nanosleep system calls. Apparently switching to CLOCK_MONOTONIC was already planned to avoid problems with system clock reset (see https://bitbucket.org/osrf/gazebo/issues/2058/use-clock_monotonic-in-sleep-and-timer) so that would be the easier solution.

Comments (8)

  1. Silvio Traversaro reporter

    I found some notes on this from last September, and given that I am not working on this I will copy them here, so they can be useful if someone wants to work on this issue.

    400 ns resolution for CLOCK_MONOTONIC

    Even switching from using CLOCK_REALTIME to CLOCK_MONOTONIC, there is still a major difference between Ubuntu on Windows and the native Ubuntu, that is the resolution of the clock. This is the output of this example program https://gist.github.com/traversaro/e031b324dd278acbd033059e3604ed3f in native Ubuntu running on a Intel Core i7-6500U CPU :

    traversaro@turati:~/src/time_res_check/build$ ./time_res_check 
    --> Testing POSIX APIs
        Error macros values: 
        EINTR : 4
        EINVAL: 22
        ENOTSUP: 95
        Testing CLOCK_MONOTONIC 
        clock_getres return value: 0 with errno 0, 
        resolution: 1 nanoseconds.
        Testing CLOCK_REALTIME 
        clock_getres return value: 0 with errno 0, 
        resolution: 1 nanoseconds.
        Testing CLOCK_PROCESS_CPUTIME_ID 
        clock_getres return value: 0 with errno 0, 
        resolution: 1 nanoseconds.
        Testing CLOCK_THREAD_CPUTIME_ID  
        clock_getres return value: 0 with errno 0, 
        resolution: 1 nanoseconds.
    --> Testing C++ APIs
        Testing std::chrono::high_resolution_clock 
        resolution: 1 nanoseconds.
        Testing std::chrono::system_clock 
        resolution: 1 nanoseconds.
        Testing std::chrono::steady_clock 
        resolution: 1 nanoseconds.
    

    The same program running on WSL on the same processor returns:

    traversaro@LAPTOP-TO4SAKLB:~/src/time_res_check$ ./build/time_res_check
    --> Testing POSIX APIs
        Error macros values:
        EINTR : 4
        EINVAL: 22
        ENOTSUP: 95
        Testing CLOCK_MONOTONIC
        clock_getres return value: 0 with errno 0,
        resolution: 400 nanoseconds.
        Testing CLOCK_REALTIME
        clock_getres return value: 0 with errno 0,
        resolution: 400 nanoseconds.
        Testing CLOCK_PROCESS_CPUTIME_ID
        clock_getres return value: -1 with errno 22,
        resolution: 0 nanoseconds.
        Testing CLOCK_THREAD_CPUTIME_ID
        clock_getres return value: -1 with errno 22,
        resolution: 0 nanoseconds.
    --> Testing C++ APIs
        Testing std::chrono::high_resolution_clock
        resolution: 1 nanoseconds.
        Testing std::chrono::system_clock
        resolution: 1 nanoseconds.
        Testing std::chrono::steady_clock
        resolution: 1 nanoseconds.
    

    In a nutshell, the resolution of the clocks in Linux is reported to be 1ns, while on Linux on Windows is reported to be 400 ns. Interestingly the C++11 classes report a period of 1 ns in both cases, but I still have to investigate this. Back in September 2017, this was creating a lot of warnings similar to "warning [Time.cc.205] Sleep time is larger than clock resolution, skipping sleep", due to this line https://bitbucket.org/osrf/gazebo/src/01c7f8b1d68448bc618b575ad1c7ec13fee2b87f/gazebo/common/Time.cc#lines-451 .

    Related ROS2 issue: https://github.com/ros2/rcutils/issues/43#issuecomment-320954506 .

  2. Silvio Traversaro reporter

    November 2018 Update:

    Given that we are currently using Gazebo on WSL in my group, I just opened two PRs to solve the most pressing problems:

    Once these PRs are merged, I think we can close this issue.

  3. Silvio Traversaro reporter

    The PRs https://bitbucket.org/osrf/gazebo/pull-requests/3036 and https://bitbucket.org/osrf/gazebo/pull-requests/3037 have been merged. There are still some marginal problems. For example, the "Sleep time is larger than clock resolution, skipping sleep" messages still spam the log files instead of the command line both on WSL and on native Windows build (cc @Sean Yen that could be interested in this) , but I think we can open a new issue for that, and close this one.

    Thanks a lot to the reviewers!

  4. Log in to comment