-
assigned issue to
Possibly incorrect computation of steepest descent direction
I suspect that the computation of the steepest descent direction used for Powell’s dog leg method in iSAM2 (in GTSAM 4.0.0) is incorrect. For my particular problem, which is a planar problem with 2 vehicles using odometry measurements and a range measurement between them, I often get the error “Warning: Dog leg stopping because cannot decrease error with minimum delta” when I take a range measurement. I get much more reasonable results when I just use the GaussNewtonOptimizer instead, but I would like to use the dog leg optimizer to potentially improve convergence speed, and since the book “Factor Graphs for Robot Perception” (Dellaert and Kaess, 2017) states that “Powell’s dog leg can provide improved efficiency, and, as we will later see, is essential when incrementally updating matrix factorizations”.
When I enable verbose output of the dog leg optimizer, I can see that even very close to the linearization point (with delta approaching 1e-5 or less), the model function increases in the steepest descent direction (even though the nonlinear error function actually decreases). Normally, the model function should decrease in the steepest descent direction, but the nonlinear function may or may not decrease depending on the quality of the model. This implies that the model is a poor approximation and perhaps incorrect.
In reading the document trustregion.pdf included in the GTSAM doc directory, it states that the gradient, g, is computed as R'*(R*x-d), where R is an upper triangular factor of A that can be found using QR factorization of the measurement Jacobian A, x is a column vector of the states, and d=Q*b, where b is the column vector of measurement errors (normalized by standard deviations). I can see how this is derived by essentially taking the derivative of the model function with respect to x, but it ignores the fact that R is also a function of x (this may not be the best explanation, but something doesn’t seem right here). The correct gradient, as given in the 2017 book in section 2.5.1, or in Dennis and Schnabel (1996), is g=A'*b. However, if GTSAM actually implements what is stated in trustregion.pdf, it may be incorrect.
I have constructed a small Matlab example that provides further evidence. I did some of the math by hand, so it is possible there are errors though.
% The purpose of this script is to numerically investigate the validity of the
% steepest descent direction used for gtsam. The scenario is a planar problem
% with 2 vehicles using odometry and a range measurement. The simulation
% propagates one step in time, and heading is assumed to be known exactly.
% Measurements and standard deviations
sigma_dx = 0.01;
sigma_dy = 0.01;
sigma_rho = 0.01;
dx1 = 1;
dy1 = 0;
dx2 = 1;
dy2 = 0;
rho = 0.99; % range measurement between vehicles at time 2
% Linearization point
x1 = [0 1]; % x position of vehicle 1 at times 1 and 2
y1 = [0 0];
x2 = [0 1];
y2 = [1 1];
% Range model
rho_hat = sqrt((x2(2)-x1(2))^2 + (y2(2)-y1(2))^2);
% Measurement vector
b = [(x1(2) - x1(1) - dx1)/sigma_dx;
(y1(2) - y1(1) - dy1)/sigma_dy;
(x2(2) - x2(1) - dx2)/sigma_dx;
(y2(2) - y2(1) - dy2)/sigma_dy;
(rho_hat - rho)/sigma_rho];
% Measurement Jacobian
A = [1/sigma_dx, 0, 0, 0;
0, 1/sigma_dy, 0, 0;
0, 0, 1/sigma_dx, 0;
0, 0, 0, 1/sigma_dy;
-(x2(2)-x1(2))/rho_hat/sigma_rho, -(y2(2)-y1(2))/rho_hat/sigma_rho, (x2(2)-x1(2))/rho_hat/sigma_rho, (y2(2)-y1(2))/rho_hat/sigma_rho];
% Gradient according to Dennis and Schnabel (1996), or Dellaert and Kaess (2017)
A'*b
% QR factorization of the measurement Jacobian
[Q,R]=qr(A,0);
d=Q'*b;
% Gradient according to trustregion.pdf
R'*(R*[x1(2);y1(2);x2(2);y2(2)]-d)
The two gradients do not agree. The first one gives a gradient of [0;-100;0;100], and the second gives [10000;-9900;10000;19900]. The first one makes sense to me. Since the range measurement should be 1 to agree with the odometry, it would make sense to have to move the vehicles closer to each other (in the negative of the gradient) in the y direction to reduce error in the range estimate. If I set rho=1, the gradient given by the first method is indeed zero (as it should be), but the second one is not.
Comments (21)
-
reporter -
Thanks ! Did you try this with a factor graph and GaussNewton? One issue I see immediately that example above is singular, with 8 unknowns and only 5 measurements.
-
reporter Hi @dellaert , thanks for the quick reply, I appreciate any help you can provide.
The example I posted above has 8 variables, but I am treating the initial position of each vehicle as being known exactly, so there are really only 4 unknowns. This results in a 5x4 measurement Jacobian that is full rank (since the upper square is diagonal). I didn't compute derivatives with respect to the known variables.
I haven’t implemented this particular example with a factor graph in GTSAM, but I have run several cases that were much longer (1500 time steps) for a paper I presented at the IEEE PLANS conference in 2016. For those runs, I computed a full batch solution with the LevenbergMarquardtOptimizer initialized with a slight perturbation of the true values (but with precisely known position and heading at the initial time), but I think I also tried the GaussNewtonOptimizer and got similar results.
Tomorrow, I hope to write up a similar example using GTSAM.
-
Sure. Note that
GaussianFactorGraph
has a methodgradientAtZero
, and it is wrapped, so you can check straight in MATLAB. In fact, I just did it quickly:% Test gradient % Linearization point T11 = gtsam.Pose2(0,0,0); T12 = gtsam.Pose2(1,0,0); T21 = gtsam.Pose2(0,1,0); T22 = gtsam.Pose2(1,1,0); % Factor graph graph = gtsam.NonlinearFactorGraph(); % Priors prior = gtsam.noiseModel.Isotropic.Sigma(3, 1); graph.add(gtsam.PriorFactorPose2(11, T11, prior)); graph.add(gtsam.PriorFactorPose2(21, T21, prior)); % Odometry model = gtsam.noiseModel.Diagonal.Sigmas([0.01; 0.01; 1e6]); graph.add(gtsam.BetweenFactorPose2(11, 12, T11.between(T12), model)); graph.add(gtsam.BetweenFactorPose2(21, 22, T21.between(T22), model)); % Range model_rho = gtsam.noiseModel.Isotropic.Sigma(1, 0.01); graph.add(gtsam.RangeFactorPose2(12, 22, 1.0, model_rho)); % Print graph graph values = gtsam.Values(); values.insert(11, T11); values.insert(12, T12); values.insert(21, T21); values.insert(22, T22); linearized = graph.linearize(values); % Get Jacobian ordering = gtsam.Ordering(); ordering.push_back(11); ordering.push_back(21); ordering.push_back(12); ordering.push_back(22); A = linearized.jacobian(ordering); % Print Jacobian on real unknowns A(7:13,7:12) linearized.gradientAtZero()
The output is:
size: 5 Factor 0: PriorFactor on 11 prior mean: (0, 0, 0) noise model: unit (3) Factor 1: PriorFactor on 21 prior mean: (0, 1, 0) noise model: unit (3) Factor 2: BetweenFactor(11,12) measured: (1, 0, 0) noise model: diagonal sigmas[0.01; 0.01; 1000000]; Factor 3: BetweenFactor(21,22) measured: (1, 0, 0) noise model: diagonal sigmas[0.01; 0.01; 1000000]; Factor 4: RangeFactor Factor 4: keys = { 12 22 } isotropic dim=1 sigma=0.01 ExpressionFactor with measurement: 1.000000 ans = 100.0000 0 0 0 0 0 0 100.0000 0 0 0 0 0 0 0.0000 0 0 0 0 0 0 100.0000 0 0 0 0 0 0 100.0000 0 0 0 0 0 0 0.0000 0 -100.0000 0 0 100.0000 0 : 4 elements 11: 0 0 0 12: 0 0 0 21: 0 0 0 22: 0 0 0
It makes sense the gradient is zero, as we are at the global minimum. Note that
b
should be the prediction error, which I think here would be zero, unless I’m misunderstanding the example. Digest, play, let me know. -
Any more thoughts? Note, a great way to diagnose crisply would be to add an example, e.g. the one above or whatever you think does not work, to
testDoglegOptimizer.cpp
that fails. If it does not fail, also good PR One can never have too much butter, uhm, tests. -
reporter I am actively working on this, but I don’t have a clean example yet.
In response to your post yesterday, it appears you understand my (previously posted) example perfectly. The prediction error should indeed be zero when the range measurement is 1.
So I took your code, changed the range measurement from 1.0 to 0.99 (to be away from the global minimum) and attempted to optimize the factor graph with three different optimizers: Levenberg-Marquardt, Gauss-Newton, and dog leg. With the uncertainty parameters you provided, only the Levenberg-Marquardt optimizer produced a result. If I changed standard deviation in the heading odometry from 1e6 to 0, then I got equivalent results with all three optimizers. This makes sense to me since there are now two more unknowns with the effectively unknown heading changes (with a standard deviation of 1e6) , but there are no additional measurements to help observe those unknowns. The Levenberg-Marquardt algorithm can still handle singular systems though. When the relative changes in heading are known (with a standard deviation of 0), I am effectively back down to 4 unknowns with 5 constraints.
I am currently adapting your code to be closer to the example that led me down this path in the first place. I actually have an example (that I almost submitted yesterday) whereby the Levenberg-Marquardt and Gauss -Newton optimizers produce good results, but the dog leg optimizer does not. Unfortunately, I had an error in how I was creating the initial guess. Interestingly, the LM and GN optimizers were able to recover from the poor initial guess, but the DL optimizer was not. Once I fixed the initial guess to be more reasonable, all three optimizers worked well (but the DL optimizer was always able to take the GN step).
My current thought is that in my original code (which is actually in C++), I either have a bad initial guess, or there actually is a problem with the DL optimizer. I think I am fairly close to having a working example using the dog leg optimizer with ISAM2, but I will need to return to working on it next week, or perhaps over the weekend if I can find some time.
-
reporter I have isolated a case that fails to decrease in the steepest descent direction as described in my original post (using GTSAM 3.2.1, as this is the version I have set up for Matlab). The scenario here is still a planar cooperative problem with odometry and inter-vehicle range, but it propagates for 5 timesteps. The initial guess has a fair bit of error, but the Levenberg-Marquardt and Gauss-Newton optimizers handle this case easily. The dog leg optimizer fails, as shown in the verbose output below (a summary of the results is at the bottom).
size: 14 factor 0: PriorFactor on 100001 prior mean: (0, 0, 0) noise model: constrained sigmas[0; 0; 0]; noise model: constrained mu[1000; 1000; 1000]; factor 1: PriorFactor on 200001 prior mean: (600, 0, 0) noise model: constrained sigmas[0; 0; 0]; noise model: constrained mu[1000; 1000; 1000]; factor 2: BetweenFactor(100001,100002) measured: (-0.00611769713, 3.00272473, 0) noise model: constrained sigmas[0.01; 0.01; 0]; noise model: constrained mu[1000; 1000; 1000]; factor 3: BetweenFactor(200001,200002) measured: (-0.00726979015, 3.00305988, -0) noise model: constrained sigmas[0.01; 0.01; 0]; noise model: constrained mu[1000; 1000; 1000]; factor 4: RangeFactor, range = 599.993166 keys = { 100002 200002 } noise model: isotropic sigma 0.01 factor 5: BetweenFactor(100002,100003) measured: (-0.00911369363, 2.98321507, -0) noise model: constrained sigmas[0.01; 0.01; 0]; noise model: constrained mu[1000; 1000; 1000]; factor 6: BetweenFactor(200002,200003) measured: (0.0149394677, 3.00521545, -0) noise model: constrained sigmas[0.01; 0.01; 0]; noise model: constrained mu[1000; 1000; 1000]; factor 7: RangeFactor, range = 600.013327 keys = { 100003 200003 } noise model: isotropic sigma 0.01 factor 8: BetweenFactor(100003,100004) measured: (-0.0068554029, 2.99257656, -0) noise model: constrained sigmas[0.01; 0.01; 0]; noise model: constrained mu[1000; 1000; 1000]; factor 9: BetweenFactor(200003,200004) measured: (0.0203433896, 2.97994098, 0) noise model: constrained sigmas[0.01; 0.01; 0]; noise model: constrained mu[1000; 1000; 1000]; factor 10: RangeFactor, range = 599.995367 keys = { 100004 200004 } noise model: isotropic sigma 0.01 factor 11: BetweenFactor(100004,100005) measured: (-0.00254725018, 2.98969401, -0) noise model: constrained sigmas[0.01; 0.01; 0]; noise model: constrained mu[1000; 1000; 1000]; factor 12: BetweenFactor(200004,200005) measured: (-0.000920557164, 3.00307973, 0) noise model: constrained sigmas[0.01; 0.01; 0]; noise model: constrained mu[1000; 1000; 1000]; factor 13: RangeFactor, range = 599.996946 keys = { 100005 200005 } noise model: isotropic sigma 0.01
I generated this case with the following script:
% Test optimizers % Clear screen clc % Factor graph graph = gtsam.NonlinearFactorGraph(); values = gtsam.Values(); % Noise models sigma_dx = 0.01; sigma_dy = 0.01; sigma_dtheta = 0; sigma_rho = 0.01; model_odom = gtsam.noiseModel.Diagonal.Sigmas([sigma_dx; sigma_dy; sigma_dtheta]); model_rho = gtsam.noiseModel.Isotropic.Sigma(1, sigma_rho); % Priors prior = gtsam.noiseModel.Diagonal.Sigmas([0; 0; 0]); key1 = 1e5+1; key2 = 2e5+1; T1 = gtsam.Pose2(0,0,0); % true pose of vehicle 1 T2 = gtsam.Pose2(600,0,0); graph.add(gtsam.PriorFactorPose2(key1, T1, prior)); graph.add(gtsam.PriorFactorPose2(key2, T2, prior)); values.insert(key1, T1); values.insert(key2, T2); for i=2:5 % Pose guess (perturbed truth) key1 = 1e5+i; key2 = 2e5+i; T1 = gtsam.Pose2(0+randn,3*(i-1)+randn,0); T2 = gtsam.Pose2(600+randn,3*(i-1)+randn,0); % Odometry (corrupted by noise) odom1 = gtsam.Pose2(sigma_dx*randn,3+sigma_dy*randn,sigma_dtheta*randn); odom2 = gtsam.Pose2(sigma_dx*randn,3+sigma_dy*randn,sigma_dtheta*randn); graph.add(gtsam.BetweenFactorPose2(key1-1, key1, odom1, model_odom)); graph.add(gtsam.BetweenFactorPose2(key2-1, key2, odom2, model_odom)); % Range (corrupted by noise) graph.add(gtsam.RangeFactorPose2(key1, key2, 600+sigma_rho*randn, model_rho)); % Values (use perturbed truth as initial guess for optimizers) values.insert(key1, T1); values.insert(key2, T2); end % Print graph graph % Print initial guess values %disp(sprintf('\nInitial guess:')) %values % Optimize disp(sprintf('\nLevenberg-Marquardt optimizer result:')) lm_params = gtsam.LevenbergMarquardtParams(); lm_params.setVerbosity('VALUES') lm_optimizer = gtsam.LevenbergMarquardtOptimizer(graph, values, lm_params); result = lm_optimizer.optimizeSafely() disp(sprintf('Optimizer error: %f', lm_optimizer.error())) disp(sprintf('Levenberg-Marquardt optimizer converged in %d iterations', lm_optimizer.iterations())) disp(sprintf('\nGauss-Newton optimizer result:')) gn_params = gtsam.GaussNewtonParams(); gn_params.setVerbosity('VALUES') gn_optimizer = gtsam.GaussNewtonOptimizer(graph, values, gn_params); result = gn_optimizer.optimizeSafely() disp(sprintf('Optimizer error: %f', gn_optimizer.error())) disp(sprintf('Gauss-Newton optimizer converged in %d iterations', gn_optimizer.iterations())) disp(sprintf('\nPowell dog leg optimizer result:')) dogleg_params = gtsam.DoglegParams(); dogleg_params.setVerbosity('VALUES'); dogleg_params.setVerbosityDL('VERBOSE'); dl_optimizer = gtsam.DoglegOptimizer(graph, values, dogleg_params); result = dl_optimizer.optimizeSafely() disp(sprintf('Optimizer error: %f', dl_optimizer.error())) disp(sprintf('Dog leg optimizer converged in %d iterations', dl_optimizer.iterations())) disp(sprintf('\nLevenberg-Marguardt optimizer error: %f (%d iterations)', lm_optimizer.error(), lm_optimizer.iterations())) disp(sprintf('Gauss-Newton optimizer error: %f (%d iterations)', gn_optimizer.error(), gn_optimizer.iterations())) disp(sprintf('Dog leg optimizer error: %f (%d iterations)', dl_optimizer.error(), dl_optimizer.iterations())) if(dl_optimizer.error() > gn_optimizer.error()+1e-3) linearized = graph.linearize(result); disp('Gradient at final dog leg result') linearized.gradientAtZero() end
Note that this script will generate a different initial guess each time it runs. Most of the time, all three optimizers succeed, but the dog leg optimizer almost always takes more iterations. However, the dog leg optimizer fails on approximately 1% of the cases (anecdotally speaking, I haven't done a Monte Carlo analysis), usually accompanied by a warning about a failure to decrease the objective function, but sometimes terminating at a poor solution due to relative function decrease being below the threshold. I have never seen the others optimizers fail over multiple runs of this example.
I could understand getting caught in a local minimum, but I don't think that is what's happening here, since the gradient at the final result for the dog leg optimizer is not zero (although, I don’t understand why the gradient output has three columns).
-
That’s great work. Indeed, because the dogleg optimizer is really switching between GN and gradient descent (just like LM, but in a different way) it should succeed whenever GN succeeds. @kaess might weigh in on this, he did the original code and can confirm my understanding. Not sure whether he monitors Bitbucket mail but let's give hime a chance to weigh in.
-
reporter Looking back at my previous post, I realize that the optimizer results got cut off. I have included them here for completeness (again, the summary of the results is at the bottom):
Levenberg-Marquardt optimizer result: Initial valuesValues with 10 values: Value 100001: (0, 0, 0) Value 100002: (-2.31996895, 1.93364125, 0) Value 100003: (-0.625587382, 6.18857792, 0) Value 100004: (-0.84568354, 8.96342818, 0) Value 100005: (0.551152193, 13.0344137, 0) Value 200001: (600, 0, 0) Value 200002: (601.596888, 2.80988045, 0) Value 200003: (601.916594, 6.59647362, 0) Value 200004: (599.298225, 8.9870857, 0) Value 200005: (598.533201, 12.7405703, 0) Initial error: 259426.136 newValuesValues with 10 values: Value 100001: (0, 0, 0) Value 100002: (-0.00265411902, 3.00271523, 0) Value 100003: (-0.00642747226, 5.98592353, 0) Value 100004: (0.000213748629, 8.97849884, 0) Value 100005: (0.00103177622, 11.9681912, 0) Value 200001: (600, 0, 0) Value 200002: (599.989267, 3.00306938, 0) Value 200003: (599.998866, 6.00829161, 0) Value 200004: (600.005713, 8.98823384, 0) Value 200005: (600.001427, 11.9913152, 0) newError: 3.24317791 errorThreshold: 3.24317791 > 0 absoluteDecrease: 259422.892406 >= 1e-05 relativeDecrease: 0.999987498646 >= 1e-05 newValuesValues with 10 values: Value 100001: (0, 0, 0) Value 100002: (-0.00247563827474, 3.00272472013, 0) Value 100003: (-0.00634458490105, 5.98593978019, 0) Value 100004: (0.000244059247352, 8.97851663487, 0) Value 100005: (0.00106996413503, 11.968210776, 0) Value 200001: (600, 0, 0) Value 200002: (599.989088151, 3.00305989179, 0) Value 200003: (599.998782872, 6.00827535653, 0) Value 200004: (600.005682214, 8.98821604247, 0) Value 200005: (600.001388502, 11.9912956414, 0) newError: 3.24191625764 errorThreshold: 3.24191625764 > 0 absoluteDecrease: 0.00126164867332 >= 1e-05 relativeDecrease: 0.000389016177888 >= 1e-05 newValuesValues with 10 values: Value 100001: (0, 0, 0) Value 100002: (-0.00247563827474, 3.00272472013, 0) Value 100003: (-0.00634458490105, 5.98593978019, 0) Value 100004: (0.000244059247352, 8.97851663487, 0) Value 100005: (0.00106996413503, 11.968210776, 0) Value 200001: (600, 0, 0) Value 200002: (599.989088151, 3.00305989179, 0) Value 200003: (599.998782872, 6.00827535653, 0) Value 200004: (600.005682214, 8.98821604247, 0) Value 200005: (600.001388502, 11.9912956414, 0) newError: 3.24191625764 errorThreshold: 3.24191625764 > 0 absoluteDecrease: 0 < 1e-05 relativeDecrease: 0 < 1e-05 converged errorThreshold: 3.24191625764 <? 0 absoluteDecrease: 0 <? 1e-05 relativeDecrease: 0 <? 1e-05 iterations: 3 >? 100 Values with 10 values: Value 100001: (0, 0, 0) Value 100002: (-0.00247563827474, 3.00272472013, 0) Value 100003: (-0.00634458490105, 5.98593978019, 0) Value 100004: (0.000244059247352, 8.97851663487, 0) Value 100005: (0.00106996413503, 11.968210776, 0) Value 200001: (600, 0, 0) Value 200002: (599.989088151, 3.00305989179, 0) Value 200003: (599.998782872, 6.00827535653, 0) Value 200004: (600.005682214, 8.98821604247, 0) Value 200005: (600.001388502, 11.9912956414, 0) Optimizer error: 3.241916 Levenberg-Marquardt optimizer converged in 3 iterations Gauss-Newton optimizer result: Initial valuesValues with 10 values: Value 100001: (0, 0, 0) Value 100002: (-2.31996894643, 1.93364124892, 0) Value 100003: (-0.625587382398, 6.18857792394, 0) Value 100004: (-0.845683539749, 8.9634281823, 0) Value 100005: (0.551152193177, 13.0344137125, 0) Value 200001: (600, 0, 0) Value 200002: (601.596888244, 2.80988045076, 0) Value 200003: (601.916593571, 6.59647362149, 0) Value 200004: (599.298224949, 8.98708569826, 0) Value 200005: (598.533200855, 12.7405703085, 0) Initial error: 259426.135584 newValuesValues with 10 values: Value 100001: (0, 0, 0) Value 100002: (-0.00265411746744, 3.00271523455, 0) Value 100003: (-0.00642747024626, 5.98592353112, 0) Value 100004: (0.000213751452157, 8.9784988381, 0) Value 100005: (0.00103177913082, 11.9681911955, 0) Value 200001: (600, 0, 0) Value 200002: (599.98926663, 3.00306937738, 0) Value 200003: (599.998865757, 6.0082916056, 0) Value 200004: (600.005712522, 8.98823383924, 0) Value 200005: (600.001426687, 11.9913152219, 0) newError: 3.24317789784 errorThreshold: 3.24317789784 > 0 absoluteDecrease: 259422.892406 >= 1e-05 relativeDecrease: 0.999987498646 >= 1e-05 newValuesValues with 10 values: Value 100001: (0, 0, 0) Value 100002: (-0.00247563827473, 3.00272472013, 0) Value 100003: (-0.00634458490103, 5.98593978019, 0) Value 100004: (0.000244059247353, 8.97851663487, 0) Value 100005: (0.00106996413502, 11.968210776, 0) Value 200001: (600, 0, 0) Value 200002: (599.989088151, 3.00305989179, 0) Value 200003: (599.998782872, 6.00827535653, 0) Value 200004: (600.005682214, 8.98821604247, 0) Value 200005: (600.001388502, 11.9912956414, 0) newError: 3.24191625763 errorThreshold: 3.24191625763 > 0 absoluteDecrease: 0.00126164021014 >= 1e-05 relativeDecrease: 0.000389013569369 >= 1e-05 newValuesValues with 10 values: Value 100001: (0, 0, 0) Value 100002: (-0.00247563827454, 3.00272471981, 0) Value 100003: (-0.00634458490063, 5.9859397795, 0) Value 100004: (0.00024405924787, 8.97851663335, 0) Value 100005: (0.00106996413566, 11.9682107742, 0) Value 200001: (600, 0, 0) Value 200002: (599.989088151, 3.00305989211, 0) Value 200003: (599.998782872, 6.00827535722, 0) Value 200004: (600.005682214, 8.98821604398, 0) Value 200005: (600.001388502, 11.9912956432, 0) newError: 3.24191625765 errorThreshold: 3.24191625765 > 0 absoluteDecrease: -1.83888460015e-11 < 1e-05 relativeDecrease: -5.67221499265e-12 < 1e-05 Warning: stopping nonlinear iterations because error increased errorThreshold: 3.24191625765 <? 0 absoluteDecrease: -1.83888460015e-11 <? 1e-05 relativeDecrease: -5.67221499265e-12 <? 1e-05 iterations: 3 >? 100 Values with 10 values: Value 100001: (0, 0, 0) Value 100002: (-0.00247563827454, 3.00272471981, 0) Value 100003: (-0.00634458490063, 5.9859397795, 0) Value 100004: (0.00024405924787, 8.97851663335, 0) Value 100005: (0.00106996413566, 11.9682107742, 0) Value 200001: (600, 0, 0) Value 200002: (599.989088151, 3.00305989211, 0) Value 200003: (599.998782872, 6.00827535722, 0) Value 200004: (600.005682214, 8.98821604398, 0) Value 200005: (600.001388502, 11.9912956432, 0) Optimizer error: 3.241916 Gauss-Newton optimizer converged in 3 iterations Powell dog leg optimizer result: Initial valuesValues with 10 values: Value 100001: (0, 0, 0) Value 100002: (-2.31996894643, 1.93364124892, 0) Value 100003: (-0.625587382398, 6.18857792394, 0) Value 100004: (-0.845683539749, 8.9634281823, 0) Value 100005: (0.551152193177, 13.0344137125, 0) Value 200001: (600, 0, 0) Value 200002: (601.596888244, 2.80988045076, 0) Value 200003: (601.916593571, 6.59647362149, 0) Value 200004: (599.298224949, 8.98708569826, 0) Value 200005: (598.533200855, 12.7405703085, 0) Initial error: 259426.135584 Steepest descent magnitude 1.95480862697, Newton's method magnitude 4.35379659726 In steepest descent region with fraction 0.511559027417 of steepest descent magnitude Delta = 1, dx_d_norm = 1 f error: 259426.135583571 -> 185971.053351668 M error: 259422.875503615 -> 129765.355265682 rho = 0.566531598761918 newValuesValues with 10 values: Value 100001: (0, 0, 0) Value 100002: (-1.86412553822045, 2.06832781531011, -0.453938439195871) Value 100003: (-0.590444576823901, 6.10318345382433, 0.0862414921921389) Value 100004: (-0.744594506742141, 9.03799077081437, -0.0274186355258822) Value 100005: (0.355180272916884, 12.972405548984, -0.181737521283955) Value 200001: (600, 0, 0) Value 200002: (601.297052546576, 2.86549217858808, 0.387416112407595) Value 200003: (601.602479616415, 6.51769823947892, -0.0712516588538513) Value 200004: (599.397299365984, 9.06398466783818, 0.0207463642783543) Value 200005: (598.692708404883, 12.6974410312305, 0.252457540272867) newError: 185971.053351668 errorThreshold: 185971.053351668 > 0 absoluteDecrease: 73455.0822319 >= 1e-05 relativeDecrease: 0.283144495317 >= 1e-05 Steepest descent magnitude 1.69218450879, Newton's method magnitude 3.94978241016 In steepest descent region with fraction 0.590952106468 of steepest descent magnitude Delta = 1, dx_d_norm = 1 f error: 185971.053351668 -> 147461.672510673 M error: 116438.808120394 -> 47434.8028409954 rho = 0.558074573860903 newValuesValues with 10 values: Value 100001: (0, 0, 0) Value 100002: (-1.45847824652871, 2.18804108058623, -0.849078234211621) Value 100003: (-0.417761833172634, 6.01567138137945, 0.190674226241442) Value 100004: (-0.570994832658775, 9.1190302810607, -0.109147169130491) Value 100005: (0.166626390248096, 12.8843152592579, -0.148312531162434) Value 200001: (600, 0, 0) Value 200002: (601.014634033161, 2.84235501567348, 0.727695629583124) Value 200003: (601.108216937119, 6.42983435830157, -0.00251625576581953) Value 200004: (599.519101941525, 9.1717313246385, 0.0435747838360083) Value 200005: (598.841464230881, 12.6342797874151, 0.403706578505043) newError: 147461.672510673 errorThreshold: 147461.672510673 > 0 absoluteDecrease: 38509.380841 >= 1e-05 relativeDecrease: 0.20707190795 >= 1e-05 Steepest descent magnitude 1.39713448715, Newton's method magnitude 3.2439018278 In steepest descent region with fraction 0.715750709182 of steepest descent magnitude Delta = 1, dx_d_norm = 1 f error: 147461.672510673 -> 141060.170419056 M error: 42110.9601290933 -> 9251.50939491742 rho = 0.194814640798574 newValuesValues with 10 values: Value 100001: (0, 0, 0) Value 100002: (-1.26714361378349, 2.06261316294581, -1.10388919339331) Value 100003: (-0.084334535948957, 6.14617486173199, 0.248927195164887) Value 100004: (-0.340276429581535, 9.24667383940122, -0.159575195667903) Value 100005: (-0.000681850951073648, 12.7295430081767, -0.0307387462930205) Value 200001: (600, 0, 0) Value 200002: (600.836788514223, 2.65530056990016, 0.931482861696576) Value 200003: (600.460705644938, 6.43928728088233, 0.161151832307827) Value 200004: (599.624409546913, 9.27518406487952, 0.0679752639479036) Value 200005: (598.991044473568, 12.5489552000355, 0.487120130218038) newError: 141060.170419056 errorThreshold: 141060.170419056 > 0 absoluteDecrease: 6401.50209162 >= 1e-05 relativeDecrease: 0.0434112944918 >= 1e-05 Steepest descent magnitude 0.987694213444, Newton's method magnitude 3.65449294076 In steepest descent region with fraction 0.506229552825 of steepest descent magnitude Delta = 0.5, dx_d_norm = 0.5 f error: 141060.170419056 -> 160690.359410671 M error: 19224.4729042793 -> 8292.52982292943 rho = -1.79567244775589 Steepest descent magnitude 0.987694213444289, Newton's method magnitude 3.65449294075576 In steepest descent region with fraction 0.25311477641263 of steepest descent magnitude Delta = 0.25, dx_d_norm = 0.25 f error: 141060.170419056 -> 149646.228609192 M error: 19224.4729042793 -> 12832.309412885 rho = -1.34321629941029 Steepest descent magnitude 0.987694213444289, Newton's method magnitude 3.65449294075576 In steepest descent region with fraction 0.126557388206315 of steepest descent magnitude Delta = 0.125, dx_d_norm = 0.125 f error: 141060.170419056 -> 145046.056040865 M error: 19224.4729042793 -> 15796.8431709023 rho = -1.16286936800573 Steepest descent magnitude 0.987694213444289, Newton's method magnitude 3.65449294075576 In steepest descent region with fraction 0.0632786941031576 of steepest descent magnitude Delta = 0.0625, dx_d_norm = 0.0625 f error: 141060.170419056 -> 142976.34644219 M error: 19224.4729042793 -> 17452.7710406708 rho = -1.08154541263031 Steepest descent magnitude 0.987694213444289, Newton's method magnitude 3.65449294075576 In steepest descent region with fraction 0.0316393470515788 of steepest descent magnitude Delta = 0.03125, dx_d_norm = 0.03125 f error: 141060.170419056 -> 141999.069408146 M error: 19224.4729042793 -> 18324.1502232451 rho = -1.04284720230691 Steepest descent magnitude 0.987694213444289, Newton's method magnitude 3.65449294075576 In steepest descent region with fraction 0.0158196735257894 of steepest descent magnitude Delta = 0.015625, dx_d_norm = 0.015625 f error: 141060.170419056 -> 141524.823010372 M error: 19224.4729042793 -> 18770.6936264547 rho = -1.02396167921865 Steepest descent magnitude 0.987694213444289, Newton's method magnitude 3.65449294075576 In steepest descent region with fraction 0.0079098367628947 of steepest descent magnitude Delta = 0.0078125, dx_d_norm = 0.0078125 f error: 141060.170419056 -> 141291.297534087 M error: 19224.4729042793 -> 18996.6787810401 rho = -1.01463159692137 Steepest descent magnitude 0.987694213444289, Newton's method magnitude 3.65449294075576 In steepest descent region with fraction 0.00395491838144735 of steepest descent magnitude Delta = 0.00390625, dx_d_norm = 0.00390625 f error: 141060.170419056 -> 141175.434187135 M error: 19224.4729042793 -> 19110.349721578 rho = -1.00999433551364 Steepest descent magnitude 0.987694213444289, Newton's method magnitude 3.65449294075576 In steepest descent region with fraction 0.00197745919072367 of steepest descent magnitude Delta = 0.001953125, dx_d_norm = 0.001953125 f error: 141060.170419056 -> 141117.727356451 M error: 19224.4729042793 -> 19167.3547826582 rho = -1.00768260163478 Steepest descent magnitude 0.987694213444289, Newton's method magnitude 3.65449294075576 In steepest descent region with fraction 0.000988729595361837 of steepest descent magnitude Delta = 0.0009765625, dx_d_norm = 0.0009765625 f error: 141060.170419056 -> 141088.930151186 M error: 19224.4729042793 -> 19195.8997109011 rho = -1.00652845305704 Steepest descent magnitude 0.987694213444289, Newton's method magnitude 3.65449294075576 In steepest descent region with fraction 0.000494364797680919 of steepest descent magnitude Delta = 0.00048828125, dx_d_norm = 0.00048828125 f error: 141060.170419056 -> 141074.545600989 M error: 19224.4729042793 -> 19210.1827744483 rho = -1.00595180755867 Steepest descent magnitude 0.987694213444289, Newton's method magnitude 3.65449294075576 In steepest descent region with fraction 0.000247182398840459 of steepest descent magnitude Delta = 0.000244140625, dx_d_norm = 0.000244140625 f error: 141060.170419056 -> 141067.356838993 M error: 19224.4729042793 -> 19217.3269560783 rho = -1.00566359222216 Steepest descent magnitude 0.987694213444289, Newton's method magnitude 3.65449294075576 In steepest descent region with fraction 0.00012359119942023 of steepest descent magnitude Delta = 0.0001220703125, dx_d_norm = 0.0001220703125 f error: 141060.170419056 -> 141063.763336268 M error: 19224.4729042793 -> 19220.8997093574 rho = -1.00551951144341 Steepest descent magnitude 0.987694213444289, Newton's method magnitude 3.65449294075576 In steepest descent region with fraction 6.17955997101148e-05 of steepest descent magnitude Delta = 6.103515625e-05, dx_d_norm = 6.103515625e-05 f error: 141060.170419056 -> 141061.96680447 M error: 19224.4729042793 -> 19222.686251613 rho = -1.00544747624829 Steepest descent magnitude 0.987694213444289, Newton's method magnitude 3.65449294075576 In steepest descent region with fraction 3.08977998550574e-05 of steepest descent magnitude Delta = 3.0517578125e-05, dx_d_norm = 3.0517578125e-05 f error: 141060.170419056 -> 141061.068593469 M error: 19224.4729042793 -> 19223.5795641448 rho = -1.00541146498946 Steepest descent magnitude 0.987694213444289, Newton's method magnitude 3.65449294075576 In steepest descent region with fraction 1.54488999275287e-05 of steepest descent magnitude Delta = 1.52587890625e-05, dx_d_norm = 1.52587890625e-05 f error: 141060.170419056 -> 141060.619501689 M error: 19224.4729042793 -> 19224.0262307617 rho = -1.00539345964191 Steepest descent magnitude 0.987694213444289, Newton's method magnitude 3.65449294075576 In steepest descent region with fraction 7.72444996376435e-06 of steepest descent magnitude Delta = 7.62939453125e-06, dx_d_norm = 7.62939453125e-06 f error: 141060.170419056 -> 141060.39495923 M error: 19224.4729042793 -> 19224.2495666579 rho = -1.00538446001464 Warning: Dog leg stopping because cannot decrease error with minimum Delta newValuesValues with 10 values: Value 100001: (0, 0, 0) Value 100002: (-1.26714361378349, 2.06261316294581, -1.10388919339331) Value 100003: (-0.084334535948957, 6.14617486173199, 0.248927195164887) Value 100004: (-0.340276429581535, 9.24667383940122, -0.159575195667903) Value 100005: (-0.000681850951073648, 12.7295430081767, -0.0307387462930205) Value 200001: (600, 0, 0) Value 200002: (600.836788514223, 2.65530056990016, 0.931482861696576) Value 200003: (600.460705644938, 6.43928728088233, 0.161151832307827) Value 200004: (599.624409546913, 9.27518406487952, 0.0679752639479036) Value 200005: (598.991044473568, 12.5489552000355, 0.487120130218038) newError: 141060.170419056 errorThreshold: 141060.170419056 > 0 absoluteDecrease: 0 < 1e-05 relativeDecrease: 0 < 1e-05 converged errorThreshold: 141060.170419 <? 0 absoluteDecrease: 0 <? 1e-05 relativeDecrease: 0 <? 1e-05 iterations: 4 >? 100 Values with 10 values: Value 100001: (0, 0, 0) Value 100002: (-1.26714361378, 2.06261316295, -1.10388919339) Value 100003: (-0.084334535949, 6.14617486173, 0.248927195165) Value 100004: (-0.340276429582, 9.2466738394, -0.159575195668) Value 100005: (-0.000681850951074, 12.7295430082, -0.030738746293) Value 200001: (600, 0, 0) Value 200002: (600.836788514, 2.6553005699, 0.931482861697) Value 200003: (600.460705645, 6.43928728088, 0.161151832308) Value 200004: (599.624409547, 9.27518406488, 0.0679752639479) Value 200005: (598.991044474, 12.5489552, 0.487120130218) Optimizer error: 141060.170419 Dog leg optimizer converged in 4 iterations Levenberg-Marguardt optimizer error: 3.241916 (3 iterations) Gauss-Newton optimizer error: 3.241916 (3 iterations) Dog leg optimizer error: 141060.170419 (4 iterations) Gradient at final dog leg result : 10 elements 100001: 14071.0781863 -7028.82432574 -20115.5573379 100002: -16247.2849703 2245.51456129 -111925.762288 100003: -41300.1163485 1802.83273811 16350.3090428 100004: 8313.66096417 -3907.30104762 -8697.10444118 100005: 7891.30154475 5334.76614485 0.128836449375 200001: -7826.81638299 -4698.72760465 16849.7772284 200002: 4080.60037322 2272.1817879 107220.992762 200003: 37156.8892367 -5207.4242453 -11577.9236275 200004: 764.645730704 -1578.68635549 -16011.0962793 200005: -12967.4866694 7769.32009792 0.41914486627
Now, you can really see how the dogleg optimizer is attempting to take smaller and smaller steps in the steepest descent direction until it quits due to insufficient decrease of the objective function. Also, I now realize why the gradient has 3 columns; there are three states for each key (x, y, and heading).
-
Interesting. Hypothesis: in this example dogleg, by taking a step in between first and second order updates, moves to a different local minimum region it can’t escape from. Can you initialize LM with these values but using a starting lambda that is very high? We don’t want LM to accidentally jump out the basin, just gradient-descent its way out (if it can).
-
reporter Great idea! This is very easy to do.
In the last code I posted, I just modified the “if” block to start from the last dog leg result, as you suggest. I also start from the graph and initial values I posted above, rather than generating a new random case.
Unfortunately, I can't exactly emulate a pure steepest descent algorith with the Levenberg-Marquardt optimizer because the step length goes to zero as the direction approaches the steepest descent direction (which is what is supposed to happen, since LM creates a "hook" between the steepest descent and GN directions for varying lambda).
When I set the initial lambda to 1e6, the Levenberg-Marquardt optimizer converges in 8 iterations starting from the dog leg result. The result has an error of 3.2419, which is the same as the original results with the LM and GN optimizers. I also see that lambda decreases by a factor of 10 on each iteration. With an
initial lambda of 1e10, it takes 12 iterations, but it still reaches the correct result. If I increase the initial lambda to 1e11, the relative decrease in the cost function drops below 1e-5 after the second iteration (due to the small step size), and the optimizer fails.
The step length of the dog leg optimizer doesn't necessarily have to approach zero along the steepest descent direction, since it uses the dog leg to approximate the hook. However, for some reason, the dog leg optimizer wants to use a really small step in the steepest descent direction for this case.
if(dl_optimizer.error() > gn_optimizer.error()+1e-3) linearized = graph.linearize(dl_result); disp('Gradient at final dog leg result') linearized.gradientAtZero() lm_params = gtsam.LevenbergMarquardtParams(); lm_params.setlambdaInitial(1e10); lm_params.setVerbosity('VALUES') lm_params.setVerbosityLM('TRYLAMBDA') lm_optimizer = gtsam.LevenbergMarquardtOptimizer(graph, dl_result, lm_params); result = lm_optimizer.optimizeSafely() disp(sprintf('Optimizer error: %f', lm_optimizer.error())) disp(sprintf('Levenberg-Marquardt optimizer converged in %d iterations', lm_optimizer.iterations())) end
-
reporter I have coded a pure steepest descent optimization algorithm in Matlab. For the case I posted on 2019-05-06, it reports an initial error of 259426, which matches what is being reported by GTSAM. The steepest descent algorithm takes about 200 iterations to settle down to an error of about 3.93 (I tried 1000 iterations with little improvement after 200). Starting from the final dog leg optimizer result, it can also achieve an error of 3.93 after about 100 iterations.
Interestingly, this error of 3.93 is higher than the error of 3.24 achieved by the LM and GN optimizers. If I use my code to start from the LM result, it also reports an initial error of 3.24, but climbs up to 3.93 after about 7 iterations. If I use GTSAM dogleg optimizer to start from the pure steepest descent result, it can achieve an error of 3.24, but I can see that takes steps in the Gauss-Newton direction instead of the steepest descent direction after only a couple of iterations.
These results make me want to try some other implementations of the dogleg algorithm, such as scipy.optimize or Google Ceres, since they vary in implementation details. It seems that the full GN step should be accepted on the first iteration of the dogleg algorithm since the objective function decreases from 259426 to 3.24 after the first iteration of GN, but I think there are some criteria based on the model mismatch that are causing this step to be rejected.
Here’s the code I used for steepest descent:
% The purpose of this script is to test a pure steepest descent optimization % algorithm on a cooperative navigation problem. This particular problem % leads to a failure of Powell's dog leg algorithm in GTSAM. The scenario is % a planar problem with 2 vehicles using odometry and inter-vehicle range % measurements. The simulation propagates four steps in time, and heading is % assumed to be known exactly. % Measurement standard deviations sigma_dx = 0.01; sigma_dy = 0.01; sigma_rho = 0.01; % Initial values % (x1, y1) is the position of vehicle 1, (x2, y2) is for vehicle 2 % Original problem x1 = [0 -2.31996895 -0.625587382 -0.84568354 0.551152193]; y1 = [0 1.93364125 6.18857792 8.96342818 13.0344137]; x2 = [600 601.596888 601.916594 599.298225 598.533201]; y2 = [0 2.80988045 6.59647362 8.9870857 12.7405703]; % dl_result of orignal problem % x1 = [0 -1.26714358918 -0.0843344751289 -0.340276382986 -0.000681825384076]; % y1 = [0 2.06261313863 6.14617487076 9.24667380744 12.729543035]; % x2 = [600 600.836788356 600.460705882 599.624409558 598.991044587]; % y2 = [0 2.65530066601 6.43928724172 9.27518405959 12.5489552084]; % dl_result of solution from steepest descent % x1 = [0 -0.0024754923585 -0.00634447874762 0.000244165574022 0.00106994187677]; % y1 = [0 3.00272471646 5.98593977381 8.97851662639 11.9682107662]; % x2 = [600 599.989088005 599.998782766 600.005682108 600.001388524]; % y2 = [0 3.00305989354 6.00827535619 8.98821604361 11.9912956438]; % Measurements % The first four elements are the x1(1), y1(1), x2(1), and y2(1) % After that, the order is dx1(i), dy1(i), dx2(i), dy2(i), rho(i) % for i from 2 to 5 z = [ 0; 0; 600; 0; -0.00611769713; 3.00272473; -0.00726979015; 3.00305988; 599.993166; -0.00911369363; 2.98321507; 0.0149394677; 3.00521545; 600.013327; -0.0068554029; 2.99257656; 0.0203433896; 2.97994098; 599.995367; -0.00254725018; 2.98969401; -0.000920557164; 3.00307973; 599.996946]; % Steepest descent optimization loop for k=1:200 % Construct the error vector and measurement Jacobian b = []; for i=2:5 dx1 = z(5*i-5); dy1 = z(5*i-4); dx2 = z(5*i-3); dy2 = z(5*i-2); rho = z(5*i-1); rho_hat = hypot(x2(i)-x1(i), y2(i)-y1(i)); % Augment the error vector for this time step b = [b; (x1(i) - x1(i-1) - dx1)/sigma_dx; (y1(i) - y1(i-1) - dy1)/sigma_dy; (x2(i) - x2(i-1) - dx2)/sigma_dx; (y2(i) - y2(i-1) - dy2)/sigma_dy; (rho_hat - rho)/sigma_rho]; % Augment the measurement Jacobian for this time step A(5*i-9:5*i-5,4*i-7:4*i) = [-1/sigma_dx, 0, 0, 0, 1/sigma_dx, 0, 0, 0; 0, -1/sigma_dy, 0, 0, 0, 1/sigma_dy, 0, 0; 0, 0, -1/sigma_dx, 0, 0, 0, 1/sigma_dx, 0; 0, 0, 0, -1/sigma_dy, 0, 0, 0, 1/sigma_dy; 0, 0, 0, 0, -(x2(i)-x1(i))/rho_hat/sigma_rho, -(y2(i)-y1(i))/rho_hat/sigma_rho, (x2(i)-x1(i))/rho_hat/sigma_rho, (y2(i)-y1(i))/rho_hat/sigma_rho]; end % Remove first 4 columns of A since initial positions are known % The first 4 columns correspond to x1(1), y1(1), x2(1), y2(1) A = A(:,5:end); % Compute least squares error and display output e(k) = b'*b/2; disp(sprintf('Iteration %d error: %f', k, e(k))); % Compute steepest descent direction and step size [Q,R] = qr(A,0); d = Q'*b; g = -R'*d; Rg = R*g; delta = (g'*g/(Rg'*Rg))*g; % update values % (the first value of each isn't updated since it is known) x1(2:end) = x1(2:end) + delta(1:4:end)'; y1(2:end) = y1(2:end) + delta(2:4:end)'; x2(2:end) = x2(2:end) + delta(3:4:end)'; y2(2:end) = y2(2:end) + delta(4:4:end)'; end
-
OK, how can gradient descent go up? Something must be wrong with the gradient. How can I best help debug this issue?
-
One immediate idea is to check the gradient with numerical derivative. Did you do this already?
-
reporter Indeed, there was an error with my Jacobian. I didn’t clear the variables before running the script, and I had some values in A in places I shouldn’t, probably from incorrect indexing on a previous run.
I can run this script, beginning with the dl_result that followed the steepest descent result, and the gradient is fairly small (with the largest element of 1.3e-5), but probably not as small as it should be. The error is fairly stable for a few iterations, but after 100 iterations, the error creeps up to 3.38 (after starting at 3.24). I could see this happening if it takes too large of steps in the steepest descent direction. If I begin with the poor initial guess that gives an error of 259426, it takes my steepest descent script about 7000 iterations to get to an error of 3.94, but it never makes it down to 3.24. I’m not convinced that the script is 100% correct, but I can’t find the error.
Perhaps I should try to implement a Matlab steepest descent algorithm using GTSAM computed Jacobians and gradients? In my script, the vehicles are only 2 DOF, so perhaps it’s not a good comparison, even when I set heading uncertainties to zero in GTSAM. How does GTSAM handle zero noise models? It seems this would cause a division by zero problem when computing the measurement error, unless these terms are explicitly ignored.
-
reporter It seems to be rather difficult to approximate the gradient for this problem using a forward difference approximation of the derivative. I started by just perturbing one value at a time to monitor it’s effect on the error function, but the numerical derivative didn’t seem to converge on a value. So I rewrote this in Python to use scipy.optimize.approx_fprime. I experimented with some functions that I could easily differentiate by hand to make sure I was using it properly, and then I tried applying it to my problem. I also verified that the value of the error function is the same as my Matlab implementation.
The approx_fprime function requires supplying a finite difference size, and I tried values from 1e-3 to 1e-12, decreasing by a factor of 10 each time, and couldn’t get a consistent answer with any two finite difference sizes. My best guess is that the function is highly nonlinear, hence relatively large finite difference sized don’t work, but trying to use small finite difference sizes is causing numerical problems. I could try doing the complex step trick, but I’m not sure the best way to proceed with that idea at the moment.
I still want to try some other implementations of Powell’s dogleg, but if you have any other suggestions, I’m certainly open to them.
Here’s the Python code I tried:
import numpy as np import matplotlib.pyplot as plt from scipy.optimize import approx_fprime sigma_dx = 0.01 sigma_dy = 0.01 sigma_rho = 0.01 def error(x,z): b = np.array([]) x1 = x[0:20:4] y1 = x[1:20:4] x2 = x[2:20:4] y2 = x[3:20:4] for i in range(1,5): dx1 = z[5*i-1] dy1 = z[5*i] dx2 = z[5*i+1] dy2 = z[5*i+2] rho = z[5*i+3] rho_hat = np.hypot(x2[i]-x1[i], y2[i]-y1[i]) b = np.append(b,[(x1[i]-x1[i-1]-dx1)/sigma_dx, (y1[i]-y1[i-1]-dy1)/sigma_dy, (x2[i]-x2[i-1]-dx2)/sigma_dx, (y2[i]-y2[i-1]-dy2)/sigma_dy, (rho_hat-rho)/sigma_rho]) return b.dot(b)/2 x = np.array([-0.0024754923585,3.00272471646,599.989088005,3.00305989354, -0.00634447874762,5.98593977381,599.998782766,6.00827535619, 0.000244165574022,8.97851662639,600.005682108,8.98821604361, 0.00106994187677,11.9682107662,600.001388524,11.9912956438]) z = np.array([0,0,600,0, -0.00611769713,3.00272473,-0.00726979015,3.00305988,599.993166, -0.00911369363,2.98321507,0.0149394677,3.00521545,600.013327, -0.0068554029,2.99257656,0.0203433896,2.97994098,599.995367, -0.00254725018,2.98969401,-0.000920557164,3.00307973,599.996946]) f = (lambda x : error(np.append([0,0,600,0],x),z)) e = f(x) grad_f = approx_fprime(x,f,1e-3)
And here’s the output, varying the finite difference size from 1e-3 to 1e-12, decreasing by a factor of 10 each time:
In [124]: run test_approx_fprime.py In [125]: grad_f Out[125]: array([ 15.00000078, 9.9999867 , 14.99999302, 9.99998659, 14.99999537, 9.99993164, 15.00001382, 9.99993173, 14.99999916, 10.0000836 , 15.00000059, 10.00008428, 10.00000268, 5.00002845, 9.99999377, 5.00002778]) In [126]: run test_approx_fprime.py In [127]: grad_f Out[127]: array([ 1.50000078, 0.99999872, 1.49999302, 0.99999862, 1.49999532, 0.99999315, 1.50001376, 0.99999333, 1.49999917, 1.00000806, 1.5000006 , 1.00000888, 1.00000267, 0.50000313, 0.99999376, 0.50000247]) In [128]: run test_approx_fprime.py In [129]: grad_f Out[129]: array([ 0.15000074, 0.09999984, 0.14999302, 0.09999979, 0.14999523, 0.09999958, 0.15001469, 0.10000042, 0.1499994 , 0.10000188, 0.1500006 , 0.10000155, 0.10000237, 0.05000083, 0.0999938 , 0.04999956]) In [130]: run test_approx_fprime.py In [131]: grad_f Out[131]: array([ 0.01500075, 0.01000021, 0.01499302, 0.00999797, 0.01499523, 0.01000518, 0.01501376, 0.01000415, 0.0149994 , 0.01000188, 0.0150006 , 0.01000957, 0.01000275, 0.00499814, 0.0099938 , 0.00500186]) In [132]: run test_approx_fprime.py In [133]: grad_f Out[133]: array([ 0.00149528, 0.0009911 , 0.00149302, 0.00099068, 0.00146726, 0.00102382, 0.00151376, 0.0010694 , 0.00153375, 0.00102478, 0.0015006 , 0.00097522, 0.00097591, 0.00049431, 0.0009938 , 0.00050569]) In [134]: run test_approx_fprime.py In [135]: grad_f Out[135]: array([ 1.63513647e-04, 9.11271059e-05, 1.43041134e-04, -7.32747196e-05, -7.21644966e-04, -1.55786495e-04, 1.63780101e-04, 1.28803634e-03, 6.92779167e-05, 5.82778270e-04, 1.50635060e-04, -3.82716081e-04, 7.59392549e-05, -1.85718108e-04, 9.37916411e-05, -9.76996262e-05]) In [136]: run test_approx_fprime.py In [137]: grad_f Out[137]: array([ -1.53654867e-04, 1.33226763e-06, 8.43769499e-06, 1.90958360e-05, -8.56203997e-04, 6.27942143e-03, 2.93098879e-05, 3.06243919e-03, 1.07913678e-03, 1.63780101e-03, 1.59872116e-05, -1.61737290e-03, -3.46522810e-03, 1.30340183e-03, 4.44089210e-06, -1.29318778e-03]) In [138]: run test_approx_fprime.py In [139]: grad_f Out[139]: array([ -1.10933485e-02, -4.44089210e-06, 0.00000000e+00, 1.33226763e-05, -5.67990099e-02, -3.05089287e-03, 1.77635684e-05, 3.05533376e-03, -4.47286652e-02, 1.62980740e-03, 4.44089210e-06, -1.62536651e-03, -1.49746882e-02, 1.30562228e-03, 0.00000000e+00, -1.29674049e-03]) In [140]: run test_approx_fprime.py In [141]: grad_f Out[141]: array([ 7.10542736e-03, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, -8.95727936e-01, -3.01980663e-03, 4.44089210e-05, 3.10862447e-03, -4.47197834e-02, 1.64313008e-03, 0.00000000e+00, -1.59872116e-03, -1.49658064e-02, 1.33226763e-03, 0.00000000e+00, -1.24344979e-03]) In [142]: run test_approx_fprime.py In [143]: grad_f Out[143]: array([ 3.72146758e-01, 4.44089210e-04, 0.00000000e+00, 0.00000000e+00, -7.42073070e+00, -2.66453526e-03, 0.00000000e+00, 3.55271368e-03, -2.33502107e+00, 1.77635684e-03, 0.00000000e+00, -1.33226763e-03, -7.81597009e-01, 1.77635684e-03, 4.44089210e-04, -8.88178420e-04]) In [144]:
Note that at first, the gradient appears to decrease by a factor of the step size each time the step size is decreased, but that trend eventually breaks down.
-
So, I did my own experiments, with your example but coded with GTSAM factors, and found that dogleg performs surprisingly well. In fact, I found that by changing the
deltaInitial
default to 10 it consistently outperforms LM, again, for this example. I created a pull-request PR #424 which you can peruse. It has nice graphs.Let me know if you still want to keep this issue open or want to experiment on your own to produce a failing unit test on the GTSAM gradient that you think proves it to be definitely wrong.
-
As far as the gradient goes, you could sanity-check the GTSAM gradient using your example, which I merged as a test in
testNonlinearFactorGraph.cpp
, at the bottom. You can create a boost::function and usenumericalGradient
innumericalDerivative.h
. You can sanity-check that withlinearized.gradientAtZero()
, which is what LM/Dogleg would use. -
reporter Thanks! I’ll have to take a look at the pull request, and perhaps I’ll check out the numerical derivative in C++.
I realized yesterday after my last post that the behavior of the forward difference gradient that I observed is approximately what should happen for a quadratic function near a minimum. Thus, to get a good numerical derivative, I should try a central difference approximation.
It turns out that my analytical appears to be correct, as can be seen in the output below. There are three columns, the first is the analytical derivative, the second is the central difference approximation with a finite difference size of 1e-4, and the third is simply the difference between the two.
In [68]: run test_approx_fprime.py In [69]: g_comparison Out[69]: array([[ 7.85591176e-07, 7.81574805e-07, -4.01637124e-09], [ 5.21561095e-08, 4.52038407e-08, -6.95226879e-09], [ -6.97945615e-06, -6.97945701e-06, -8.66862138e-13], [ -5.21605503e-08, -4.52060611e-08, 6.95448923e-09], [ -4.62303356e-06, -4.58793892e-06, 3.50946436e-08], [ -4.23609354e-08, -9.17887988e-08, -4.94278634e-08], [ 1.38153594e-05, 1.38057232e-05, -9.63613545e-09], [ 4.23609354e-08, 9.17887988e-08, 4.94278634e-08], [ -8.38471834e-07, -8.26350099e-07, 1.21217347e-08], [ -3.35789793e-07, -4.08248990e-07, -7.24591968e-08], [ 5.86626811e-07, 5.99786887e-07, 1.31600757e-08], [ 3.35807557e-07, 4.08268974e-07, 7.24614172e-08], [ 2.69162961e-06, 2.69487099e-06, 3.24138185e-09], [ 3.35735195e-07, 3.26929595e-07, -8.80560022e-09], [ -6.22339483e-06, -6.23677554e-06, -1.33807134e-08], [ -3.35748517e-07, -3.26942917e-07, 8.80560022e-09]])
The behavior of my steepest descent algorithm may be related to a sloppy implementation, which doesn’t have stopping criteria and all the nice checks that a good implementation would have. Also, the really slow convergence I reported on 2019-05-14 was actually related to me manually reducing the step size to attempt to get the error to decrease.
In any case, I do believe the steepest descent direction is correct in GTSAM. Going back to my original post, I think I was confused about notation. Certainly, when x is zero, then R'*(R*x-d) is just -R'*d, which is just -A'*b, which is the correct steepest descent direction. Here though, I think x really means a change in x rather than the values themselves.
I will try adjusting the deltaInitial for the dogleg for the problem that led me to report this issue in the first place, and hopefully I can have some success. I certainly understand the motivation for using the dogleg versus LM (or, in iSAM2, since there is no LM, I want a more robust alternative to GN).
At this point, I think the issue can be closed. I still want to investigate the various criteria of the dogleg a little more, but I don’t think the title of this issue makes sense for that. If I notice a problem, I’ll post a different issue with a more appropriate title, and reference this one if necessary.
Thanks for all your help!
-
Great. Thank you for digging! It led me to change the dogleg default. I just pushed another change where you can play with the script using CL arguments. I’ll close the issue, but if you see any issues with gradients, feel free to re-open a new issue.
-
- changed status to closed
Merged in feature/dogleg_test (pull request #424) Close issue
#452Experiments and new default for Dogleg→ <<cset dc774ed8e93e>>
- Log in to comment
I suspect that the computation of the steepest descent direction used for Powell’s dog leg method in iSAM2 (in GTSAM 4.0.0) is incorrect. For my particular problem, which is a planar problem with 2 vehicles using odometry measurements and a range measurement between them, I often get the error “Warning: Dog leg stopping because cannot decrease error with minimum delta” when I take a range measurement. I get much more reasonable results when I just use the GaussNewtonOptimizer instead, but I would like to use the dog leg optimizer to potentially improve convergence speed, and since the book “Factor Graphs for Robot Perception” (Dellaert and Kaess, 2017) states that “Powell’s dog leg can provide improved efficiency, and, as we will later see, is essential when incrementally updating matrix factorizations”.
When I enable verbose output of the dog leg optimizer, I can see that even very close to the linearization point (with delta approaching 1e-5 or less), the model function increases in the steepest descent direction (even though the nonlinear error function actually decreases). Normally, the model function should decrease in the steepest descent direction, but the nonlinear function may or may not decrease depending on the quality of the model. This implies that the model is a poor approximation and perhaps incorrect.
In reading the document trustregion.pdf included in the GTSAM doc directory, it states that the gradient, g, is computed as R'*(R*x-d), where R is an upper triangular factor of A that can be found using QR factorization of the measurement Jacobian A, x is a column vector of the states, and d=Q*b, where b is the column vector of measurement errors (normalized by standard deviations). I can see how this is derived by essentially taking the derivative of the model function with respect to x, but it ignores the fact that R is also a function of x (this may not be the best explanation, but something doesn’t seem right here). The correct gradient, as given in the 2017 book in section 2.5.1, or in Dennis and Schnabel (1996), is g=A'*b. However, if GTSAM actually implements what is stated in trustregion.pdf, it may be incorrect.
I have constructed a small Matlab example that provides further evidence. I did some of the math by hand, so it is possible there are errors though.
The two gradients do not agree. The first one gives a gradient of [0;-100;0;100], and the second gives [10000;-9900;10000;19900]. The first one makes sense to me. Since the range measurement should be 1 to agree with the odometry, it would make sense to have to move the vehicles closer to each other (in the negative of the gradient) in the y direction to reduce error in the range estimate. If I set rho=1, the gradient given by the first method is indeed zero (as it should be), but the second one is not.