Potential Issues Building On ARM64?

Issue #375 resolved
Mike Sheffler created an issue

I'm building gtsam 4.0 on a NVIDIA Jetson TX2 (that's an ARM64 target -- not Intel)

I did

$ mkdir build
$ cd build
$ cmake ..
$ make check

and got

92/221 Test  #92: testAdaptAutoDiff ......................***Failed    0.02 sec
/home/nvidia/Desktop/gtsam/gtsam/nonlinear/tests/testAdaptAutoDiff.cpp:240: Failure: "expected 328 but was: 336" 
There were 1 failures

and

144/221 Test #144: testGaussianBayesTreeB .................***Failed    0.02 sec
not equal:
expected = [
      5   0   6;
          0 -11 -12
  ]
actual = [
     -5   0  -6;
          0 -11 -12
  ]
actual - expected = [
             -10            0          -12;
                   0 -1.77636e-15  3.55271e-15
  ]
/home/nvidia/Desktop/gtsam/tests/testGaussianBayesTreeB.cpp:322: Failure: "assert_equal(expectedJointJ, actualJointJ)" 
There were 1 failures
99% tests passed, 2 tests failed out of 221

Total Test time (real) =   9.58 sec

The following tests FAILED:
     92 - testAdaptAutoDiff (Failed)
    144 - testGaussianBayesTreeB (Failed)

I don't think I've ever seen make check complain before for GTSAM. I don't believe there were any compile / link errors. How suspicious is this?

Update: 1630 2018-08-17 @dellaert , I just built again on Intel and saw the same warnings (no errors) as the build on the TX2, but there were no problems with the unit tests on Intel.

I changed the title of the issue because I'm guessing this is something more related to architecture than to the specific unit tests. I'm still investigating.

Comments (20)

  1. Mike Sheffler reporter

    Below is my CMake configuration. I believe it is 'out of the box', with the exception of having turned off GTSAM_BUILD_WRAP, GTSAM_WITH_EIGEN_MKL, GTSAM_WITH_EIGEN_MKL_OPENMP, GTSAM_WITH_TBB, and GTSAM_WRAP_SERIALIZATION (this is true for both ARM64 and Intel)

    GTSAM_SOURCE_ROOT_DIR: [/home/nvidia/Desktop/gtsam]
    Boost version: 1.58.0
    Found the following Boost libraries:
      serialization
      system
      filesystem
      thread
      program_options
      date_time
      timer
      chrono
      atomic
    Ignoring Boost restriction on optional lvalue assignment from rvalues
    Found Intel TBB
    Could NOT find MKL (missing:  MKL_INCLUDE_DIR MKL_LIBRARIES) 
    Building 3rdparty
    checking for thread-local storage - found
    Could NOT find GeographicLib (missing:  GeographicLib_LIBRARY_DIRS GeographicLib_LIBRARIES GeographicLib_INCLUDE_DIRS) 
    Building base
    Building geometry
    Building inference
    Building symbolic
    Building discrete
    Building linear
    Building nonlinear
    Building sam
    Building sfm
    Building slam
    Building smart
    Building navigation
    GTSAM Version: 4.0.0
    Install prefix: /usr/local
    Building GTSAM - shared
    Building base_unstable
    Building geometry_unstable
    Building linear_unstable
    Building discrete_unstable
    Building dynamics_unstable
    Building nonlinear_unstable
    Building slam_unstable
    GTSAM_UNSTABLE Version: 4.0.0
    Install prefix: /usr/local
    Building GTSAM_UNSTABLE - shared
    Wrote /home/nvidia/Desktop/gtsam/build/GTSAMConfig.cmake
    Could NOT find Doxygen (missing:  DOXYGEN_EXECUTABLE) 
    ===============================================================
    ================  Configuration Options  ======================
      CMAKE_CXX_COMPILER_ID type     : GNU
      CMAKE_CXX_COMPILER_VERSION     : 5.4.0
    Build flags                                               
      Build Tests                    : Enabled
      Build examples with 'make all' : Enabled
      Build timing scripts with 'make all': Disabled
      Build static GTSAM library instead of shared: Disabled
      Put build type in library name : Enabled
      Build libgtsam_unstable        : Enabled
      Build type                     : Release
      C compilation flags            :  -std=c11   -Wall -O3 -DNDEBUG    -O3 -DNDEBUG
      C++ compilation flags          :  -std=c++11 -Wall -O3 -DNDEBUG    -O3 -DNDEBUG
      Use System Eigen               : No
      Use Intel TBB                  : TBB found but GTSAM_WITH_TBB is disabled
      Eigen will use MKL             : MKL not found
      Eigen will use MKL and OpenMP  : OpenMP found but GTSAM_WITH_EIGEN_MKL is disabled
      Default allocator              : STL
    Packaging flags                                               
      CPack Source Generator         : TGZ
      CPack Generator                : TGZ
    GTSAM flags                                               
      Quaternions as default Rot3     : Disabled
      Runtime consistency checking    : Disabled
      Rot3 retract is full ExpMap     : Disabled
      Pose3 retract is full ExpMap    : Disabled
      Deprecated in GTSAM 4 allowed   : Enabled
      Point3 is typedef to Vector3    : Disabled
      Metis-based Nested Dissection   : Enabled
      Use tangent-space preintegration: Enabled
    MATLAB toolbox flags                                      
      Install matlab toolbox         : Disabled
      Build Wrap                     : Disabled
    Python module flags                                       
      Build python module            : Disabled
    Cython toolbox flags                                      
      Install Cython toolbox         : Disabled
      Build Wrap                     : Disabled
    ===============================================================
    Configuring done
    
  2. Jing Dong

    Hi @mikesheffler , would you like to figure out this and possibly make a pull request to fix this? We don't have a TX2 so we cannot reproduce the problem on our side... I tried this on a Respberry Pi 3B+ with arm7hf but did not see any issue, so looks like it's an arm64 only issue

  3. Mike Sheffler reporter

    I'll take a look. Before I bothered to dive in, I wanted to make sure this wasn't something that was already on your radar. (side note: from the /home/nvidia path present in the warning, I think #373 was also likely building on a TX2 or TX1).

    The only thing I've tried so far was adding

    set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -fsigned-char")
    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fsigned-char")
    

    which didn't make a difference. My thinking was that if a char was being used somewhere as an 8-bit numeric type, the jump to a different architecture might highlight a signed / unsigned comparison problem.

    If I figure it out, I'll put together a pull request. Either way, I'll report back with whatever I do or don't learn.

  4. Mike Sheffler reporter

    Okay, I've made some headway. I'm going to do this in two comments -- one for each of the two tests

    First:

    92/221 Test  #92: testAdaptAutoDiff ......................***Failed    0.02 sec
    /home/nvidia/Desktop/gtsam/gtsam/nonlinear/tests/testAdaptAutoDiff.cpp:240: Failure: "expected 328 but was: 336" 
    There were 1 failures
    

    The offending test is

    // Test AutoDiff wrapper Snavely
    TEST(AdaptAutoDiff, AdaptAutoDiff)
    

    it's a short test. We have

      Expression<Vector9> P(1);
      Expression<Vector3> X(2);
      typedef AdaptAutoDiff<SnavelyProjection, 2, 9, 3> Adaptor;
      Expression<Vector2> expression(Adaptor(), P, X);
      EXPECT_LONGS_EQUAL(
          sizeof(internal::BinaryExpression<Vector2, Vector9, Vector3>::Record),
          expression.traceSize());
    

    so, expression.traceSize() is 336, while sizeof(internal::BinaryExpression<Vector2, Vector9, Vector3>::Record) is 328. I would argue that the test is misleading, if not necessarily wrong.

    See, the members of internal::BinaryExpression<Vector2, Vector9, Vector3>::Record add up to 144 + 48 + 16 + 16 + 72 + 24 = 320, and Record inherits from a struct with virtual methods, so I think the vtable is the source of the missing 8 bytes for a sizeof of 328. Meanwhile, when expression is constructed, the code on

    https://bitbucket.org/gtborg/gtsam/src/develop/gtsam/nonlinear/internal/ExpressionNode.h, line 344 is going to get called:

    this->traceSize_ = //
            upAligned(sizeof(Record)) + e1.traceSize() + e2.traceSize();
    

    e1.traceSize() and e2.traceSize() are both 0, so the entire traceSize is going to be the value of upAligned(sizeof(Record)). The alignment that is forced is 16 bytes, and 328 % 16 = 8, so the upAligned value is 336.

    It seems that, on x86_64, the size of the struct is already 336, so the test passes. The alignment on ARM is different, leading to a size of 328. Since the members of Record are all composite types (not int',double`, etc.), I'm not 100% sure why the alignment is different, but that seems to be the case.

    I would argue that testing against

    upAlign(sizeof(internal::BinaryExpression<Vector2, Vector9, Vector3>::Record))) + P.traceSize() + X.traceSize()
    

    is what should appear instead of

    sizeof(internal::BinaryExpression<Vector2, Vector9, Vector3>::Record)
    

    I haven't tried that specific code to make sure the methods have the right visibility, etc., but that seems to be the heart of the matter. If I've interpreted correctly, I can make a pull request.

    @dellaert was the last one to touch that code (part of this commit), and the comparison value in question had been a literal previously (of 336, and a different literal before that, depending on whether or not quaternions were being used).

  5. Frank Dellaert

    Mike, it’s been a while since I was rooting around in this code, but your excellent post explains it well, and I agree the proposed fix looks right. If you make a PR, please comment on whether tests pass on both x86 and ARM for you now?

  6. Mike Sheffler reporter

    Will do. I'm writing up my comments on the other test right now (spoiler alert: I haven't figured that one out yet). Depending on a few things, I'm hoping to make a PR this weekend. If not, it'll probably be the weekend after that. I have the appropriate machines in my office, so I'll test on both architectures and mention the results in the request.

  7. Mike Sheffler reporter

    The second test:

    144/221 Test #144: testGaussianBayesTreeB .................***Failed    0.02 sec
    not equal:
    expected = [
          5   0   6;
              0 -11 -12
      ]
    actual = [
         -5   0  -6;
              0 -11 -12
      ]
    actual - expected = [
                 -10            0          -12;
                       0 -1.77636e-15  3.55271e-15
      ]
    /home/nvidia/Desktop/gtsam/tests/testGaussianBayesTreeB.cpp:322: Failure: "assert_equal(expectedJointJ, actualJointJ)" 
    There were 1 failures
    

    I don't have this one figured out yet. What's crazy is that it only happens in release -- not in debug. I checked both release and debug on x86_64, and they were both fine.

    The problematic test is

    TEST(GaussianBayesTree, shortcut_overlapping_separator)
    

    The line

    GaussianFactorGraph joint = *bt.joint(1,2, EliminateQR);
    

    produces a problematic value for joint, so that Matrix actualJointJ = joint.augmentedJacobian(); produces a value that is different than the matrix literal coded into the test.

    With a debugger only on the side that passes, it's down to print()-like statements everywhere I can have them to try to figure out when the codes are actually producing different values. I'm writing this down for my own sanity, because I can't remember the whole stack for very long and can't really follow it statically because of the templates.

    So,

    gtsam/inference/BayesTree-inst.h: joint() goes to

    gtsam/inference/BayesTree-inst.h: jointBayesNet() goes to

    gtsam/inference/EliminateableFactorGraph-inst.h:marginalMultifrontalBayesNet (a few times, recursively) then goes to

    gtsam/inference/EliminateableFactorGraph-inst.h: eliminatePartialMultifrontal() (the first one, with ordering as the first argument) goes to

    gtsam/inference/ClusterTree-inst.h: eliminate() goes to

    gtsam/base/treeTraversal-inst.h: DepthFirstForest() (the third one, with int problemSizeThreshold = 10 as the final argument). I turned off GTSAM_USE_TBB (that doesn't make a difference -- I checked), so that goes to

    gtsam/base/treeTraversal-inst.h: DepthFirstForest() (the first one). As near as I can tell, something happens differently here between debug and release, because, when I get back to

    gtsam/inference/ClusterTree-inst.h: eliminate(), I print rootsContainer.childFactors:

    for (const sharedFactor& factor : rootsContainer.childFactors)
    {
      if (factor)
      {
        std::cout << "factor" << std::endl;
        factor->print();
      }
    }
    

    and get

    A[1] = [
            5;
            0;
    ]
    A[2] = [
            -1.14375e-31;
                    -11
    ]
    b = [     6 -12 ]
     Noise model: unit (2)
    

    in debug, and

    A[1] = [
            -5;
            0;
    ]
    A[2] = [
            0;
           -11
    ]
    b = [     -6 -12 ]
     Noise model: unit (2)
    

    in release. Of all the variables I know how to interrogate, this is the first time I can see them be different on the two paths. I'm not sure what is happening in DepthFirstForest() because I don't know how to formulate print statements for what's going on in there yet.

    Of all of the arguments passed in to

    DepthFirstForest(*this, rootsContainer, Data::EliminationPreOrderVisitor, visitorPost, 10)
    

    only rootsContainer appears to be different between debug and release.

    If there is a member variable that is being quietly mutated and it hasn't popped up on my radar (I have investigated some this variables and a few other obvious spots), I'll probably never find the real first difference. From what is being passed around, I'm inclined to think I'm in the ballpark. I'm going to have another guy take a look at it later this week because he's more handy with gdb / cgdb than I am and is much more of a template wizard than I am.

  8. Frank Dellaert

    So, I don’t know why, but it seems like it is just a sign change on a Jacobian row, and this would not affect the joint density. If you confirm this, a quick fix is to flip sign of rows and RHS on expected value to signs of RHS on the actual value. Of course, knowing why they are different would be better, but my gut feeling says the signs are immaterial.

  9. Mike Sheffler reporter

    a quick fix is to flip sign of rows and RHS on expected value to signs of RHS on the actual value ... my gut feeling says the signs are immaterial

    I agree. If that sign appears on both sides of the equation that is implied, then I guess it's okay? Interestingly, I fired up the old time machine, and it looks like the signs and rows used to be swapped:

    -    0, 11, 12,
    -    -5, 0, -6
    
    +    5, 0, 6,
    +    0, -11, -12
    

    That was five years ago, though, and the comment was 'Fixed some unit tests', so ...

    I'd still like to figure out why the flip happens. Like I said, I'm going to have someone else take a look at it in a couple / few days time and I'll report back.

  10. Mike Sheffler reporter

    @dellaert, I submitted pull request #314 to address the first part of this issue:

    Test 92: testAdaptAutoDiff failing on AArch64 (ARM64)

    This fix builds and runs make check without the error. Tested debug and release builds on x86_64 and AArch64 (ARM64).

    I'm still trying to learn what's going on (I haven't looked at it since the last time I posted here) for

    Test 144: testGaussianBayesTreeB .................***Failed

  11. Mike Sheffler reporter

    Not yet. It's still on my radar, but I haven't really spent any time on it since I submitted the pull request for the other issue; hopefully, I will be looking at it this week.

    Here's what I was thinking about: I'd like to know why the matrix I'm outputting is different than the matrix that is coded into the test, but, as we were discussing, there isn't necessarily anything wrong with the matrix I'm computing, so I'm trying to figure out how to rejigger the test if I don't wind up seeing anything objectionable when I do some more debugging. I guess I could put the matrix into reduced row echelon form? What do you think?

  12. Frank Dellaert

    Might be overkill. As I said above: it seems like it is just a sign change on a Jacobian row, and this would not affect the joint density. If you confirm this, a quick fix is to flip sign of rows and RHS on expected value to signs of RHS on the actual value.

  13. Mike Sheffler reporter

    Okay, I was taking a look at the code, and at section 3 in

    Kaess, M., Ila, V., Roberts, R. & Dellaert, F. The Bayes tree: An algorithmic foundation for probabilistic robot mapping. in Springer Tracts in Advanced Robotics 68, 157–173 (2010).

    (like I mentioned before, I'm having trouble following a little bit of the code, so I was looking for hints in the variable names wrt the write up) and I think the joint will wind up not being affected.

    I will put together the suggested change to the test and look at debug / release on my x64_86 and AArch64 systems. If everything looks okay, I'll submit a pull request.

  14. Mike Sheffler reporter

    I made the changes, synced my fork, and am testing all of the builds. If everything passes, I'll try to put together a PR tonight.

  15. Mike Sheffler reporter

    @dellaert, I submitted pull request #315 to address the testGaussianBayesTreeB failure. As discussed, the difference appears to be a non-issue, so I updated the test to allow for sign reversal at the row level in the augmented Jacobian.

    Edit: I checked, and all

    make -j2 check

    tests pass in debug and release on x86_64 and AArch64.

  16. Log in to comment