- changed title to Potential Issues Building On ARM64?
- edited description
Potential Issues Building On ARM64?
I'm building gtsam 4.0 on a NVIDIA Jetson TX2 (that's an ARM64 target -- not Intel)
I did
$ mkdir build
$ cd build
$ cmake ..
$ make check
and got
92/221 Test #92: testAdaptAutoDiff ......................***Failed 0.02 sec
/home/nvidia/Desktop/gtsam/gtsam/nonlinear/tests/testAdaptAutoDiff.cpp:240: Failure: "expected 328 but was: 336"
There were 1 failures
and
144/221 Test #144: testGaussianBayesTreeB .................***Failed 0.02 sec
not equal:
expected = [
5 0 6;
0 -11 -12
]
actual = [
-5 0 -6;
0 -11 -12
]
actual - expected = [
-10 0 -12;
0 -1.77636e-15 3.55271e-15
]
/home/nvidia/Desktop/gtsam/tests/testGaussianBayesTreeB.cpp:322: Failure: "assert_equal(expectedJointJ, actualJointJ)"
There were 1 failures
99% tests passed, 2 tests failed out of 221
Total Test time (real) = 9.58 sec
The following tests FAILED:
92 - testAdaptAutoDiff (Failed)
144 - testGaussianBayesTreeB (Failed)
I don't think I've ever seen make check
complain before for GTSAM. I don't believe there were any compile / link errors. How suspicious is this?
Update: 1630 2018-08-17 @dellaert , I just built again on Intel and saw the same warnings (no errors) as the build on the TX2, but there were no problems with the unit tests on Intel.
I changed the title of the issue because I'm guessing this is something more related to architecture than to the specific unit tests. I'm still investigating.
Comments (20)
-
reporter -
reporter Below is my CMake configuration. I believe it is 'out of the box', with the exception of having turned off
GTSAM_BUILD_WRAP
,GTSAM_WITH_EIGEN_MKL
,GTSAM_WITH_EIGEN_MKL_OPENMP
,GTSAM_WITH_TBB
, andGTSAM_WRAP_SERIALIZATION
(this is true for both ARM64 and Intel)GTSAM_SOURCE_ROOT_DIR: [/home/nvidia/Desktop/gtsam] Boost version: 1.58.0 Found the following Boost libraries: serialization system filesystem thread program_options date_time timer chrono atomic Ignoring Boost restriction on optional lvalue assignment from rvalues Found Intel TBB Could NOT find MKL (missing: MKL_INCLUDE_DIR MKL_LIBRARIES) Building 3rdparty checking for thread-local storage - found Could NOT find GeographicLib (missing: GeographicLib_LIBRARY_DIRS GeographicLib_LIBRARIES GeographicLib_INCLUDE_DIRS) Building base Building geometry Building inference Building symbolic Building discrete Building linear Building nonlinear Building sam Building sfm Building slam Building smart Building navigation GTSAM Version: 4.0.0 Install prefix: /usr/local Building GTSAM - shared Building base_unstable Building geometry_unstable Building linear_unstable Building discrete_unstable Building dynamics_unstable Building nonlinear_unstable Building slam_unstable GTSAM_UNSTABLE Version: 4.0.0 Install prefix: /usr/local Building GTSAM_UNSTABLE - shared Wrote /home/nvidia/Desktop/gtsam/build/GTSAMConfig.cmake Could NOT find Doxygen (missing: DOXYGEN_EXECUTABLE) =============================================================== ================ Configuration Options ====================== CMAKE_CXX_COMPILER_ID type : GNU CMAKE_CXX_COMPILER_VERSION : 5.4.0 Build flags Build Tests : Enabled Build examples with 'make all' : Enabled Build timing scripts with 'make all': Disabled Build static GTSAM library instead of shared: Disabled Put build type in library name : Enabled Build libgtsam_unstable : Enabled Build type : Release C compilation flags : -std=c11 -Wall -O3 -DNDEBUG -O3 -DNDEBUG C++ compilation flags : -std=c++11 -Wall -O3 -DNDEBUG -O3 -DNDEBUG Use System Eigen : No Use Intel TBB : TBB found but GTSAM_WITH_TBB is disabled Eigen will use MKL : MKL not found Eigen will use MKL and OpenMP : OpenMP found but GTSAM_WITH_EIGEN_MKL is disabled Default allocator : STL Packaging flags CPack Source Generator : TGZ CPack Generator : TGZ GTSAM flags Quaternions as default Rot3 : Disabled Runtime consistency checking : Disabled Rot3 retract is full ExpMap : Disabled Pose3 retract is full ExpMap : Disabled Deprecated in GTSAM 4 allowed : Enabled Point3 is typedef to Vector3 : Disabled Metis-based Nested Dissection : Enabled Use tangent-space preintegration: Enabled MATLAB toolbox flags Install matlab toolbox : Disabled Build Wrap : Disabled Python module flags Build python module : Disabled Cython toolbox flags Install Cython toolbox : Disabled Build Wrap : Disabled =============================================================== Configuring done
-
Hi @mikesheffler , would you like to figure out this and possibly make a pull request to fix this? We don't have a TX2 so we cannot reproduce the problem on our side... I tried this on a Respberry Pi 3B+ with arm7hf but did not see any issue, so looks like it's an arm64 only issue
-
reporter I'll take a look. Before I bothered to dive in, I wanted to make sure this wasn't something that was already on your radar. (side note: from the /home/nvidia path present in the warning, I think
#373was also likely building on a TX2 or TX1).The only thing I've tried so far was adding
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -fsigned-char") set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fsigned-char")
which didn't make a difference. My thinking was that if a
char
was being used somewhere as an 8-bit numeric type, the jump to a different architecture might highlight a signed / unsigned comparison problem.If I figure it out, I'll put together a pull request. Either way, I'll report back with whatever I do or don't learn.
-
reporter Okay, I've made some headway. I'm going to do this in two comments -- one for each of the two tests
First:
92/221 Test #92: testAdaptAutoDiff ......................***Failed 0.02 sec /home/nvidia/Desktop/gtsam/gtsam/nonlinear/tests/testAdaptAutoDiff.cpp:240: Failure: "expected 328 but was: 336" There were 1 failures
The offending test is
// Test AutoDiff wrapper Snavely TEST(AdaptAutoDiff, AdaptAutoDiff)
it's a short test. We have
Expression<Vector9> P(1); Expression<Vector3> X(2); typedef AdaptAutoDiff<SnavelyProjection, 2, 9, 3> Adaptor; Expression<Vector2> expression(Adaptor(), P, X); EXPECT_LONGS_EQUAL( sizeof(internal::BinaryExpression<Vector2, Vector9, Vector3>::Record), expression.traceSize());
so,
expression.traceSize()
is 336, whilesizeof(internal::BinaryExpression<Vector2, Vector9, Vector3>::Record)
is 328. I would argue that the test is misleading, if not necessarily wrong.See, the members of
internal::BinaryExpression<Vector2, Vector9, Vector3>::Record
add up to 144 + 48 + 16 + 16 + 72 + 24 = 320, andRecord
inherits from a struct withvirtual
methods, so I think the vtable is the source of the missing 8 bytes for asizeof
of 328. Meanwhile, whenexpression
is constructed, the code onhttps://bitbucket.org/gtborg/gtsam/src/develop/gtsam/nonlinear/internal/ExpressionNode.h, line 344 is going to get called:
this->traceSize_ = // upAligned(sizeof(Record)) + e1.traceSize() + e2.traceSize();
e1.traceSize()
ande2.traceSize()
are both 0, so the entiretraceSize
is going to be the value ofupAligned(sizeof(Record))
. The alignment that is forced is 16 bytes, and 328 % 16 = 8, so theupAligned
value is 336.It seems that, on x86_64, the size of the struct is already 336, so the test passes. The alignment on ARM is different, leading to a size of 328. Since the members of
Record
are all composite types (notint',
double`, etc.), I'm not 100% sure why the alignment is different, but that seems to be the case.I would argue that testing against
upAlign(sizeof(internal::BinaryExpression<Vector2, Vector9, Vector3>::Record))) + P.traceSize() + X.traceSize()
is what should appear instead of
sizeof(internal::BinaryExpression<Vector2, Vector9, Vector3>::Record)
I haven't tried that specific code to make sure the methods have the right visibility, etc., but that seems to be the heart of the matter. If I've interpreted correctly, I can make a pull request.
@dellaert was the last one to touch that code (part of this commit), and the comparison value in question had been a literal previously (of 336, and a different literal before that, depending on whether or not quaternions were being used).
-
Mike, it’s been a while since I was rooting around in this code, but your excellent post explains it well, and I agree the proposed fix looks right. If you make a PR, please comment on whether tests pass on both x86 and ARM for you now?
-
reporter Will do. I'm writing up my comments on the other test right now (spoiler alert: I haven't figured that one out yet). Depending on a few things, I'm hoping to make a PR this weekend. If not, it'll probably be the weekend after that. I have the appropriate machines in my office, so I'll test on both architectures and mention the results in the request.
-
reporter The second test:
144/221 Test #144: testGaussianBayesTreeB .................***Failed 0.02 sec not equal: expected = [ 5 0 6; 0 -11 -12 ] actual = [ -5 0 -6; 0 -11 -12 ] actual - expected = [ -10 0 -12; 0 -1.77636e-15 3.55271e-15 ] /home/nvidia/Desktop/gtsam/tests/testGaussianBayesTreeB.cpp:322: Failure: "assert_equal(expectedJointJ, actualJointJ)" There were 1 failures
I don't have this one figured out yet. What's crazy is that it only happens in release -- not in debug. I checked both release and debug on x86_64, and they were both fine.
The problematic test is
TEST(GaussianBayesTree, shortcut_overlapping_separator)
The line
GaussianFactorGraph joint = *bt.joint(1,2, EliminateQR);
produces a problematic value for
joint
, so thatMatrix actualJointJ = joint.augmentedJacobian();
produces a value that is different than the matrix literal coded into the test.With a debugger only on the side that passes, it's down to
print()
-like statements everywhere I can have them to try to figure out when the codes are actually producing different values. I'm writing this down for my own sanity, because I can't remember the whole stack for very long and can't really follow it statically because of the templates.So,
gtsam/inference/BayesTree-inst.h: joint()
goes togtsam/inference/BayesTree-inst.h: jointBayesNet()
goes togtsam/inference/EliminateableFactorGraph-inst.h:marginalMultifrontalBayesNet
(a few times, recursively) then goes togtsam/inference/EliminateableFactorGraph-inst.h: eliminatePartialMultifrontal()
(the first one, withordering
as the first argument) goes togtsam/inference/ClusterTree-inst.h: eliminate()
goes togtsam/base/treeTraversal-inst.h: DepthFirstForest()
(the third one, withint problemSizeThreshold = 10
as the final argument). I turned offGTSAM_USE_TBB
(that doesn't make a difference -- I checked), so that goes togtsam/base/treeTraversal-inst.h: DepthFirstForest()
(the first one). As near as I can tell, something happens differently here between debug and release, because, when I get back togtsam/inference/ClusterTree-inst.h: eliminate()
, I printrootsContainer.childFactors
:for (const sharedFactor& factor : rootsContainer.childFactors) { if (factor) { std::cout << "factor" << std::endl; factor->print(); } }
and get
A[1] = [ 5; 0; ] A[2] = [ -1.14375e-31; -11 ] b = [ 6 -12 ] Noise model: unit (2)
in debug, and
A[1] = [ -5; 0; ] A[2] = [ 0; -11 ] b = [ -6 -12 ] Noise model: unit (2)
in release. Of all the variables I know how to interrogate, this is the first time I can see them be different on the two paths. I'm not sure what is happening in
DepthFirstForest()
because I don't know how to formulate print statements for what's going on in there yet.Of all of the arguments passed in to
DepthFirstForest(*this, rootsContainer, Data::EliminationPreOrderVisitor, visitorPost, 10)
only
rootsContainer
appears to be different between debug and release.If there is a member variable that is being quietly mutated and it hasn't popped up on my radar (I have investigated some
this
variables and a few other obvious spots), I'll probably never find the real first difference. From what is being passed around, I'm inclined to think I'm in the ballpark. I'm going to have another guy take a look at it later this week because he's more handy with gdb / cgdb than I am and is much more of a template wizard than I am. -
So, I don’t know why, but it seems like it is just a sign change on a Jacobian row, and this would not affect the joint density. If you confirm this, a quick fix is to flip sign of rows and RHS on expected value to signs of RHS on the actual value. Of course, knowing why they are different would be better, but my gut feeling says the signs are immaterial.
-
Issue
#373was marked as a duplicate of this issue. -
reporter a quick fix is to flip sign of rows and RHS on expected value to signs of RHS on the actual value ... my gut feeling says the signs are immaterial
I agree. If that sign appears on both sides of the equation that is implied, then I guess it's okay? Interestingly, I fired up the old time machine, and it looks like the signs and rows used to be swapped:
- 0, 11, 12, - -5, 0, -6 + 5, 0, 6, + 0, -11, -12
That was five years ago, though, and the comment was 'Fixed some unit tests', so ...
I'd still like to figure out why the flip happens. Like I said, I'm going to have someone else take a look at it in a couple / few days time and I'll report back.
-
reporter @dellaert, I submitted pull request #314 to address the first part of this issue:
Test 92: testAdaptAutoDiff failing on AArch64 (ARM64)
This fix builds and runs
make check
without the error. Tested debug and release builds on x86_64 and AArch64 (ARM64).I'm still trying to learn what's going on (I haven't looked at it since the last time I posted here) for
Test 144: testGaussianBayesTreeB .................***Failed
-
Any progress on BayesTree issue?
-
reporter Not yet. It's still on my radar, but I haven't really spent any time on it since I submitted the pull request for the other issue; hopefully, I will be looking at it this week.
Here's what I was thinking about: I'd like to know why the matrix I'm outputting is different than the matrix that is coded into the test, but, as we were discussing, there isn't necessarily anything wrong with the matrix I'm computing, so I'm trying to figure out how to rejigger the test if I don't wind up seeing anything objectionable when I do some more debugging. I guess I could put the matrix into reduced row echelon form? What do you think?
-
Might be overkill. As I said above: it seems like it is just a sign change on a Jacobian row, and this would not affect the joint density. If you confirm this, a quick fix is to flip sign of rows and RHS on expected value to signs of RHS on the actual value.
-
reporter That sounds more reasonable. Okay, I'll poke around and get this moving again.
-
reporter Okay, I was taking a look at the code, and at section 3 in
Kaess, M., Ila, V., Roberts, R. & Dellaert, F. The Bayes tree: An algorithmic foundation for probabilistic robot mapping. in Springer Tracts in Advanced Robotics 68, 157–173 (2010).
(like I mentioned before, I'm having trouble following a little bit of the code, so I was looking for hints in the variable names wrt the write up) and I think the joint will wind up not being affected.
I will put together the suggested change to the test and look at debug / release on my x64_86 and AArch64 systems. If everything looks okay, I'll submit a pull request.
-
reporter I made the changes, synced my fork, and am testing all of the builds. If everything passes, I'll try to put together a PR tonight.
-
reporter @dellaert, I submitted pull request #315 to address the
testGaussianBayesTreeB
failure. As discussed, the difference appears to be a non-issue, so I updated the test to allow for sign reversal at the row level in the augmented Jacobian.Edit: I checked, and all
make -j2 check
tests pass in debug and release on x86_64 and AArch64.
-
- changed status to resolved
Fixed by PR #315
- Log in to comment