Issue #63 resolved
Lucas Chiesa
created an issue

When doing:

$ gzsdf print tutorial.urdf

in a freshly installed Ubuntu trusty I get:

Segmentation fault (core dumped)

I have ros Indigo installed, but not sourced at the moment of doing this.

This happened with every urdf that I've tried. If the urdf is not correct (a missing joint for example) then it complains of the bad structure.

$ gdb /usr/bin/gzsdf
GNU gdb (Ubuntu 7.7-0ubuntu3) 7.7
This GDB was configured as "x86_64-linux-gnu".
Reading symbols from /usr/bin/gzsdf...(no debugging symbols found)...done.
(gdb) set args print tutorial.urdf
(gdb) run
Starting program: /usr/bin/gzsdf print tutorial.urdf
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffe8316700 (LWP 3334)]

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff76c7d0b in boost::detail::shared_count::shared_count(boost::detail::weak_count const&, boost::detail::sp_nothrow_tag) () from /usr/lib/x86_64-linux-gnu/libsdformat.so.1
(gdb) bt
#0  0x00007ffff76c7d0b in boost::detail::shared_count::shared_count(boost::detail::weak_count const&, boost::detail::sp_nothrow_tag) () from /usr/lib/x86_64-linux-gnu/libsdformat.so.1
#1  0x00007ffff76c64a3 in ReduceFixedJoints(TiXmlElement*, boost::shared_ptr<urdf::Link>) ()
   from /usr/lib/x86_64-linux-gnu/libsdformat.so.1
#2  0x00007ffff76c6f12 in sdf::URDF2SDF::InitModelString(std::string const&, bool) ()
   from /usr/lib/x86_64-linux-gnu/libsdformat.so.1
#3  0x00007ffff76c7514 in sdf::URDF2SDF::InitModelDoc(TiXmlDocument*) ()
   from /usr/lib/x86_64-linux-gnu/libsdformat.so.1
#4  0x00007ffff76c7678 in sdf::URDF2SDF::InitModelFile(std::string const&) ()
   from /usr/lib/x86_64-linux-gnu/libsdformat.so.1
#5  0x00007ffff76a1aa3 in sdf::readFile(std::string const&, boost::shared_ptr<sdf::SDF>) ()
   from /usr/lib/x86_64-linux-gnu/libsdformat.so.1
#6  0x0000000000403e60 in main ()

If I rebuild sdformat with debug symbols, it works, so I can't get a better backtrace.

I've pasted the valgrind output of the not segfaulting version in: https://gist.github.com/tulku/c0670b0df1f40b4396d7 in case it is of any help.

The problem seems to be when calling ReduceFixedJoints method.

Comments (26)

  1. Jonathan Binney

    This happens when using the gzsdf tool, and it also happens when launching gazebo and trying to spawn a urdf model using spawn_model.

    To be clear, this happens when using the versions of gazebo/sdformat from the ROS repos, so: ii gazebo2 2.2.2-4~trusty ii libsdformat-dev:amd64 1.4.11-1 ii libsdformat1:amd64 1.4.11-1

  2. Steven Peters

    I tested gzsdf print urdf with ros-indigo-gazebo on saucy and works. I tested on trusty, and it fails. If I build from source, then it works on trusty.

    I noticed that the trusty libsdformat.so.1.4.11 links against liburdfdom_model on trusty, while it did not do this on saucy. I'm guessing that this is the source of the error. @Jose Luis Rivero what do you think?

    $ ldd /usr/lib/x86_64-linux-gnu/libsdformat.so.1.4.11 | sort
        /lib64/ld-linux-x86-64.so.2 (0x00007ff728890000)
        libboost_filesystem.so.1.54.0 => /usr/lib/x86_64-linux-gnu/libboost_filesystem.so.1.54.0 (0x00007ff727f78000)
        libboost_regex.so.1.54.0 => /usr/lib/x86_64-linux-gnu/libboost_regex.so.1.54.0 (0x00007ff727c70000)
        libboost_system.so.1.54.0 => /usr/lib/x86_64-linux-gnu/libboost_system.so.1.54.0 (0x00007ff728190000)
        libconsole_bridge.so.0.2 => /usr/lib/x86_64-linux-gnu/libconsole_bridge.so.0.2 (0x00007ff726298000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ff726e50000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007ff724820000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007ff727218000)
        libicudata.so.52 => /usr/lib/x86_64-linux-gnu/libicudata.so.52 (0x00007ff724a28000)
        libicui18n.so.52 => /usr/lib/x86_64-linux-gnu/libicui18n.so.52 (0x00007ff7264a8000)
        libicuuc.so.52 => /usr/lib/x86_64-linux-gnu/libicuuc.so.52 (0x00007ff7268b0000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007ff727430000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007ff726c30000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007ff727738000)
        libtinyxml.so.2.6.2 => /usr/lib/x86_64-linux-gnu/libtinyxml.so.2.6.2 (0x00007ff728398000)
        liburdfdom_model.so.0.2 => /usr/lib/x86_64-linux-gnu/liburdfdom_model.so.0.2 (0x00007ff727a40000)
        linux-vdso.so.1 =>  (0x00007fff76200000)
    
        liburdfdom_model.so.0.2 => /usr/lib/x86_64-linux-gnu/liburdfdom_model.so.0.2 (0x00007ff727a40000)
    
  3. Jose Luis Rivero

    libsdformat1 for Trusty is provided directly by Ubuntu, where we are not using the urdf embedded library that is being used in the OSRF repository (saucy). So yes, this could be a potential cause of differences and bugs.

    To mimic the ubuntu/debian compilation you can provide the following flags to cmake:

                 cmake .. \
                -DUSE_EXTERNAL_URDF:BOOL=True \
                -DUSE_UPSTREAM_CFLAGS:BOOL=False \
                -DCMAKE_BUILD_TYPE:STRING=RelWithDebInfo
    
  4. Jose Luis Rivero

    At first moment, I was not able to reproduce the error with the example attached here but Steve pointed me to use the nao description and I end up discovering that a basic example with a fixed joint is enough to trigger the segfault:

    <robot name="test_robot">
      <link name="link1" />
      <link name="link2" />
    
      <joint name="joint1" type="fixed">
        <parent link="link1"/>
        <child link="link2"/>
      </joint>
    </robot>
    

    After build sdformat packages using Debug cmake build type some more interesting details appeared:

    Program received signal SIGSEGV, Segmentation fault.
    ReduceFixedJoints (_root=_root@entry=0x7724c0, _link=...) at /tmp/sdformat-1.4.11/src/parser_urdf.cc:389
    389     if (_link->child_links[i]->parent_joint->type == urdf::Joint::FIXED)
    (gdb) bt
    #0  ReduceFixedJoints (_root=_root@entry=0x7724c0, _link=...) at /tmp/sdformat-1.4.11/src/parser_urdf.cc:389
    #1  0x00007ffff76c6fa2 in sdf::URDF2SDF::InitModelString (this=this@entry=0x7fffffffdc7d, _urdfStr=..., _enforceLimits=_enforceLimits@entry=true)
        at /tmp/sdformat-1.4.11/src/parser_urdf.cc:2667
    #2  0x00007ffff76c75a4 in sdf::URDF2SDF::InitModelDoc (this=this@entry=0x7fffffffdc7d, _xmlDoc=_xmlDoc@entry=0x7fffffffdbb0) at /tmp/sdformat-1.4.11/src/parser_urdf.cc:2706
    #3  0x00007ffff76c7708 in sdf::URDF2SDF::InitModelFile (this=this@entry=0x7fffffffdc7d, _filename=...) at /tmp/sdformat-1.4.11/src/parser_urdf.cc:2715
    #4  0x00007ffff76a1aa3 in sdf::readFile (_filename=..., _sdf=...) at /tmp/sdformat-1.4.11/src/parser.cc:265
    #5  0x0000000000403e60 in main ()
    

    The value of child_links.size() when the segfault occurs is garbage. If we built sdformat packages using embedded copy of the urdfdom, the problem is not present so it should be something in the code of urdfdom packages.

    After my time debugging, if I try to run the same non segfaulting example attached here, now it segfaults, in the same place.

  5. Jose Luis Rivero

    After some fights ...

    (trusty)jrivero@nium ~ $ gzsdf print tutorial2.urdf 
    <sdf version='1.4'>
      <model name='test_robot'>
        <link name='__default__'>
          <velocity_decay>
            <linear>0</linear>
            <angular>0</angular>
          </velocity_decay>
        </link>
      </model>
    </sdf>
    (trusty)jrivero@nium ~ $ . /opt/ros/indigo/setup.bash 
    (trusty)jrivero@nium ~ $ gzsdf print tutorial2.urdf 
    Segmentation fault (core dumped)
    

    symbol collision probably.

    UPDATE:

    (trusty)jrivero@nium ~ $ ldd /usr/bin/gzsdf | grep urdf
        liburdfdom_model.so.0.2 => /usr/lib/x86_64-linux-gnu/liburdfdom_model.so.0.2 (0x00007fedf0916000)
    (trusty)jrivero@nium ~ $ . /opt/ros/indigo/setup.bash 
    (trusty)jrivero@nium ~ $ ldd /usr/bin/gzsdf | grep urdf
        liburdfdom_model.so.0.2 => /opt/ros/indigo/lib/liburdfdom_model.so.0.2 (0x00007ff88faae000)
    

    Workarounds: Modify the LD_LIBRARY_PATH, something like:

    LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu/:/opt/ros/indigo/lib:/opt/ros/indigo/lib/x86_64-linux-gnu gzsdf print tutorial2.urdf

  6. Jonathan Binney

    @ipa_fxm you can use gazebo now in indigo on 14.04 if you build it from source. just make sure that you apt-get remove liburdfdom, libsdformat, and gazebo2 before you build it.

  7. Martin Pecka

    @ipa_fxm you don't have to rebuild gazebo from source. @jrivero 's workaround is fine. Just prepend

    LD_LIBRARY_PATH="/usr/lib/x86_64-linux-gnu/:$LD_LIBRARY_PATH"
    

    before all gazebo commands you run and you're okay.

    E.g.

    LD_LIBRARY_PATH="/usr/lib/x86_64-linux-gnu/:$LD_LIBRARY_PATH" rosrun gazebo_ros gzserver
    

    You may also want to set the environment variable in your .bashrc until the bug gets fixed.

  8. Jose Luis Rivero

    Problem description:

    After fixing the problem with ROS urdfdom package, there is still a different error found by @jbinney:

    jbinney@h-> gzsdf print nao.urdf 
    *** Error in `gzsdf': free(): invalid pointer: 0x0000000001ea4b70 ***
    Aborted (core dumped)
    
    jbinney@h-> which gzsdf
    /usr/bin/gzsdf
    
    jbinney@h-> ldd /usr/bin/gzsdf | grep urdf                                                                                                
            liburdfdom_model.so.0.2 => /usr/lib/x86_64-linux-gnu/liburdfdom_model.so.0.2 (0x00007f1cb8542000)
    
    jbinney@h-> dpkg -S /usr/lib/x86_64-linux-gnu/liburdfdom_model.so.0.2
    liburdfdom-model0.2:amd64: /usr/lib/x86_64-linux-gnu/liburdfdom_model.so.0.2
    

    I run gdb against it and looks like the same issue that we were fixing before.

    (gdb) run print nao.urdf
    Starting program: /usr/bin/gzsdf print nao.urdf
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
    [New Thread 0x7fffe8338700 (LWP 4525)]
    *** Error in `/usr/bin/gzsdf': free(): invalid pointer: 0x00000000008ad5f0 ***
    
    Program received signal SIGABRT, Aborted.
    0x00007ffff6db6f79 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
    56  ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
    (gdb) bt
    #0  0x00007ffff6db6f79 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
    #1  0x00007ffff6dba388 in __GI_abort () at abort.c:89
    #2  0x00007ffff6df41d4 in __libc_message (do_abort=do_abort@entry=1, fmt=fmt@entry=0x7ffff6f02a10 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/posix/libc_fatal.c:175
    #3  0x00007ffff6e004ae in malloc_printerr (ptr=<optimized out>, str=0x7ffff6efeb03 "free(): invalid pointer", action=1) at malloc.c:4996
    #4  _int_free (av=<optimized out>, p=<optimized out>, have_lock=0) at malloc.c:3840
    #5  0x00007ffff768549e in boost::detail::sp_counted_base::release (this=0x8b8df0) at /usr/include/boost/smart_ptr/detail/sp_counted_base_gcc_x86.hpp:146
    #6  0x00007ffff76bb53d in ~shared_count (this=<synthetic pointer>, __in_chrg=<optimized out>) at /usr/include/boost/smart_ptr/detail/shared_count.hpp:371
    #7  ~shared_ptr (this=<synthetic pointer>, __in_chrg=<optimized out>) at /usr/include/boost/smart_ptr/shared_ptr.hpp:328
    #8  ReduceVisualToParent (_link=..., _groupName=..., _visual=...) at /build/buildd/sdformat-1.4.11/src/parser_urdf.cc:378
    #9  0x00007ffff76bbba4 in ReduceVisualsToParent (_link=...) at /build/buildd/sdformat-1.4.11/src/parser_urdf.cc:887
    #10 0x00007ffff76c671f in ReduceFixedJoints (_root=_root@entry=0x870a30, _link=...) at /build/buildd/sdformat-1.4.11/src/parser_urdf.cc:405
    #11 0x00007ffff76c6467 in ReduceFixedJoints (_root=_root@entry=0x870a30, _link=...) at /build/buildd/sdformat-1.4.11/src/parser_urdf.cc:390
    #12 0x00007ffff76c6f12 in sdf::URDF2SDF::InitModelString (this=this@entry=0x7fffffffde6d, _urdfStr=..., _enforceLimits=_enforceLimits@entry=true) at /build/buildd/sdformat-1.4.11/src/parser_urdf.cc:2667
    #13 0x00007ffff76c7514 in sdf::URDF2SDF::InitModelDoc (this=this@entry=0x7fffffffde6d, _xmlDoc=_xmlDoc@entry=0x7fffffffdda0) at /build/buildd/sdformat-1.4.11/src/parser_urdf.cc:2706
    #14 0x00007ffff76c7678 in sdf::URDF2SDF::InitModelFile (this=this@entry=0x7fffffffde6d, _filename=...) at /build/buildd/sdformat-1.4.11/src/parser_urdf.cc:2715
    #15 0x00007ffff76a1aa3 in sdf::readFile (_filename=..., _sdf=...) at /build/buildd/sdformat-1.4.11/src/parser.cc:265
    #16 0x0000000000403e60 in main ()
    

    Problem explanation:

    Some details of the problem:

    • Ubuntu 1.4.11 sdformat was built on 2013-11-07
    • Ubuntu 1.4.11 includes the patch in the PR 77 which was merged 2013-12-17
    • OSRF/ROS deb packages 1.4.11 does not have PR77

    So due to patching, packages are not exactly the same in OSRF/ROS repo and Ubuntu. Ubuntu ones includes PR 77.

    My hypothesis is that we made PR77 considering urdfdom-0.3.0. Looks like the there was a change in the way of handling visual and collision arrays:

    -  /// if more than one collision element is specified, all collision elements are placed in this array (the collision member will be NULL)
    +  /// if more than one collision element is specified, all collision elements are placed in this array (the collision member points to the first element of the array)
       std::vector<boost::shared_ptr<Collision> > collision_array;
    
    ... same for visual array
    

    So we are creating the bug when using urdfdom-0.2.3 and activate the USE_EXTERNAL_URDF flag

    Workaround:

    I did a quick test and generate an sdformat-1.4.11-1osrf1 package removing the code in PR77. It can work just doing:

    apt-get install gazebo2
    dpkg -i libsdformat1_1.4.11-1osrf1_amd64.deb
    

    Proper fix:

    I need to start a bug in ubuntu to get rid of the patch in the 1.4.11 package so we get the fix from official repositories as soon as possible. Also, generate a gazebo2 version which depends on this patched version and above.

  9. Steven Peters

    I just added a new test to capture these failures in b825fee0002d. We had a test for urdf with fixed joints and visuals, but the parent link of the fixed joint had a visual. The seg-faults seem to occur when the parent link is empty. So, I added a test/integration/fixed_joint_reduction_visual.urdf, which is a slight variant of test/integration/fixed_joint_reduction.urdf without the parent link visuals.

    This test currently fails for me on trusty with the following packages:

    ii  liburdfdom-model0.2:amd64                             0.2.10+dfsg-1                                       amd64        URDF DOM - model library
    ii  liburdfdom-headers-dev                                0.2.3+dfsg-1                                        all          URDF DOM - header files
    
  10. Jonathan Binney

    Thanks for figuring this out! Based on the above comments, I've added collision and visual geometries to all links which are fixed-joint parents of other links that have collision or visual geometries, and now i can use gzsdf/gazebo with our robot again.

  11. Jose Luis Rivero

    Thanks John, Steve for the pr 107, looks the proper way to get things fixed, tested and be sure not to repeat the problem again. Let's continue with it. However, it was critical for ROS to have the new package ready, so I just went ahead with a minimal patch on Trusty sdformat version.

    I've released in packages.osrfoundation.org repository the sdformat version 1.4.11-1osrf1 that should automatically upgrade current 1.4.11-1 but lower than the an hypothetic 1.4.11-1ubuntu0.1.

    I'm now building gazebo2 (2.2.2-5) depending on sdformat => 1.4.11-1osrf1 to be sure that we are using a safe version. It will make a new comment when ready.

    I will start tomorrow the process of patching the problem in Ubuntu.

  12. Log in to comment