Possibly incorrect computation of steepest descent direction

Issue #452 closed
Adam Rutkowski created an issue

I suspect that the computation of the steepest descent direction used for Powell’s dog leg method in iSAM2 (in GTSAM 4.0.0) is incorrect. For my particular problem, which is a planar problem with 2 vehicles using odometry measurements and a range measurement between them, I often get the error “Warning: Dog leg stopping because cannot decrease error with minimum delta” when I take a range measurement. I get much more reasonable results when I just use the GaussNewtonOptimizer instead, but I would like to use the dog leg optimizer to potentially improve convergence speed, and since the book “Factor Graphs for Robot Perception” (Dellaert and Kaess, 2017) states that “Powell’s dog leg can provide improved efficiency, and, as we will later see, is essential when incrementally updating matrix factorizations”.

When I enable verbose output of the dog leg optimizer, I can see that even very close to the linearization point (with delta approaching 1e-5 or less), the model function increases in the steepest descent direction (even though the nonlinear error function actually decreases). Normally, the model function should decrease in the steepest descent direction, but the nonlinear function may or may not decrease depending on the quality of the model. This implies that the model is a poor approximation and perhaps incorrect.

In reading the document trustregion.pdf included in the GTSAM doc directory, it states that the gradient, g, is computed as R'*(R*x-d), where R is an upper triangular factor of A that can be found using QR factorization of the measurement Jacobian A, x is a column vector of the states, and d=Q*b, where b is the column vector of measurement errors (normalized by standard deviations). I can see how this is derived by essentially taking the derivative of the model function with respect to x, but it ignores the fact that R is also a function of x (this may not be the best explanation, but something doesn’t seem right here). The correct gradient, as given in the 2017 book in section 2.5.1, or in Dennis and Schnabel (1996), is g=A'*b. However, if GTSAM actually implements what is stated in trustregion.pdf, it may be incorrect.

I have constructed a small Matlab example that provides further evidence. I did some of the math by hand, so it is possible there are errors though.

% The purpose of this script is to numerically investigate the validity of the 
% steepest descent direction used for gtsam. The scenario is a planar problem
% with 2 vehicles using odometry and a range measurement. The simulation 
% propagates one step in time, and heading is assumed to be known exactly.

% Measurements and standard deviations
sigma_dx = 0.01;
sigma_dy = 0.01;
sigma_rho = 0.01;
dx1 = 1;
dy1 = 0;
dx2 = 1;
dy2 = 0;
rho = 0.99;  % range measurement between vehicles at time 2

% Linearization point
x1 = [0 1];  % x position of vehicle 1 at times 1 and 2
y1 = [0 0];
x2 = [0 1];
y2 = [1 1];

% Range model
rho_hat = sqrt((x2(2)-x1(2))^2 + (y2(2)-y1(2))^2);

% Measurement vector
b = [(x1(2) - x1(1) - dx1)/sigma_dx;
     (y1(2) - y1(1) - dy1)/sigma_dy;
     (x2(2) - x2(1) - dx2)/sigma_dx;
     (y2(2) - y2(1) - dy2)/sigma_dy;
     (rho_hat - rho)/sigma_rho];

% Measurement Jacobian
A = [1/sigma_dx, 0, 0, 0;
     0, 1/sigma_dy, 0, 0;
     0, 0, 1/sigma_dx, 0;
     0, 0, 0, 1/sigma_dy;
     -(x2(2)-x1(2))/rho_hat/sigma_rho, -(y2(2)-y1(2))/rho_hat/sigma_rho, (x2(2)-x1(2))/rho_hat/sigma_rho, (y2(2)-y1(2))/rho_hat/sigma_rho];

% Gradient according to Dennis and Schnabel (1996), or Dellaert and Kaess (2017)
A'*b

% QR factorization of the measurement Jacobian
[Q,R]=qr(A,0);
d=Q'*b;

% Gradient according to trustregion.pdf
R'*(R*[x1(2);y1(2);x2(2);y2(2)]-d) 

The two gradients do not agree. The first one gives a gradient of [0;-100;0;100], and the second gives [10000;-9900;10000;19900]. The first one makes sense to me. Since the range measurement should be 1 to agree with the odometry, it would make sense to have to move the vehicles closer to each other (in the negative of the gradient) in the y direction to reduce error in the range estimate. If I set rho=1, the gradient given by the first method is indeed zero (as it should be), but the second one is not.

Comments (21)

  1. Adam Rutkowski reporter

    I suspect that the computation of the steepest descent direction used for Powell’s dog leg method in iSAM2 (in GTSAM 4.0.0) is incorrect. For my particular problem, which is a planar problem with 2 vehicles using odometry measurements and a range measurement between them, I often get the error “Warning: Dog leg stopping because cannot decrease error with minimum delta” when I take a range measurement. I get much more reasonable results when I just use the GaussNewtonOptimizer instead, but I would like to use the dog leg optimizer to potentially improve convergence speed, and since the book “Factor Graphs for Robot Perception” (Dellaert and Kaess, 2017) states that “Powell’s dog leg can provide improved efficiency, and, as we will later see, is essential when incrementally updating matrix factorizations”.

    When I enable verbose output of the dog leg optimizer, I can see that even very close to the linearization point (with delta approaching 1e-5 or less), the model function increases in the steepest descent direction (even though the nonlinear error function actually decreases). Normally, the model function should decrease in the steepest descent direction, but the nonlinear function may or may not decrease depending on the quality of the model. This implies that the model is a poor approximation and perhaps incorrect.

    In reading the document trustregion.pdf included in the GTSAM doc directory, it states that the gradient, g, is computed as R'*(R*x-d), where R is an upper triangular factor of A that can be found using QR factorization of the measurement Jacobian A, x is a column vector of the states, and d=Q*b, where b is the column vector of measurement errors (normalized by standard deviations). I can see how this is derived by essentially taking the derivative of the model function with respect to x, but it ignores the fact that R is also a function of x (this may not be the best explanation, but something doesn’t seem right here). The correct gradient, as given in the 2017 book in section 2.5.1, or in Dennis and Schnabel (1996), is g=A'*b. However, if GTSAM actually implements what is stated in trustregion.pdf, it may be incorrect.

    I have constructed a small Matlab example that provides further evidence. I did some of the math by hand, so it is possible there are errors though.

    % The purpose of this script is to numerically investigate the validity of the 
    % steepest descent direction used for gtsam. The scenario is a planar problem
    % with 2 vehicles using odometry and a range measurement. The simulation 
    % propagates one step in time, and heading is assumed to be known exactly.
    
    % Measurements and standard deviations
    sigma_dx = 0.01;
    sigma_dy = 0.01;
    sigma_rho = 0.01;
    dx1 = 1;
    dy1 = 0;
    dx2 = 1;
    dy2 = 0;
    rho = 0.99;  % range measurement between vehicles at time 2
    
    % Linearization point
    x1 = [0 1];  % x position of vehicle 1 at times 1 and 2
    y1 = [0 0];
    x2 = [0 1];
    y2 = [1 1];
    
    % Range model
    rho_hat = sqrt((x2(2)-x1(2))^2 + (y2(2)-y1(2))^2);
    
    % Measurement vector
    b = [(x1(2) - x1(1) - dx1)/sigma_dx;
         (y1(2) - y1(1) - dy1)/sigma_dy;
         (x2(2) - x2(1) - dx2)/sigma_dx;
         (y2(2) - y2(1) - dy2)/sigma_dy;
         (rho_hat - rho)/sigma_rho];
    
    % Measurement Jacobian
    A = [1/sigma_dx, 0, 0, 0;
         0, 1/sigma_dy, 0, 0;
         0, 0, 1/sigma_dx, 0;
         0, 0, 0, 1/sigma_dy;
         -(x2(2)-x1(2))/rho_hat/sigma_rho, -(y2(2)-y1(2))/rho_hat/sigma_rho, (x2(2)-x1(2))/rho_hat/sigma_rho, (y2(2)-y1(2))/rho_hat/sigma_rho];
    
    % Gradient according to Dennis and Schnabel (1996), or Dellaert and Kaess (2017)
    A'*b
    
    % QR factorization of the measurement Jacobian
    [Q,R]=qr(A,0);
    d=Q'*b;
    
    % Gradient according to trustregion.pdf
    R'*(R*[x1(2);y1(2);x2(2);y2(2)]-d) 
    

    The two gradients do not agree. The first one gives a gradient of [0;-100;0;100], and the second gives [10000;-9900;10000;19900]. The first one makes sense to me. Since the range measurement should be 1 to agree with the odometry, it would make sense to have to move the vehicles closer to each other (in the negative of the gradient) in the y direction to reduce error in the range estimate. If I set rho=1, the gradient given by the first method is indeed zero (as it should be), but the second one is not.

  2. Frank Dellaert

    Thanks ! Did you try this with a factor graph and GaussNewton? One issue I see immediately that example above is singular, with 8 unknowns and only 5 measurements.

  3. Adam Rutkowski reporter

    Hi @dellaert , thanks for the quick reply, I appreciate any help you can provide.

    The example I posted above has 8 variables, but I am treating the initial position of each vehicle as being known exactly, so there are really only 4 unknowns. This results in a 5x4 measurement Jacobian that is full rank (since the upper square is diagonal). I didn't compute derivatives with respect to the known variables.

    I haven’t implemented this particular example with a factor graph in GTSAM, but I have run several cases that were much longer (1500 time steps) for a paper I presented at the IEEE PLANS conference in 2016. For those runs, I computed a full batch solution with the LevenbergMarquardtOptimizer initialized with a slight perturbation of the true values (but with precisely known position and heading at the initial time), but I think I also tried the GaussNewtonOptimizer and got similar results.

    Tomorrow, I hope to write up a similar example using GTSAM.

  4. Frank Dellaert

    Sure. Note that GaussianFactorGraph has a method gradientAtZero, and it is wrapped, so you can check straight in MATLAB. In fact, I just did it quickly:

    % Test gradient
    
    % Linearization point
    T11 = gtsam.Pose2(0,0,0);
    T12 = gtsam.Pose2(1,0,0);
    T21 = gtsam.Pose2(0,1,0);
    T22 = gtsam.Pose2(1,1,0);
    
    % Factor graph
    graph = gtsam.NonlinearFactorGraph();
    
    % Priors
    prior = gtsam.noiseModel.Isotropic.Sigma(3, 1);
    graph.add(gtsam.PriorFactorPose2(11, T11, prior));
    graph.add(gtsam.PriorFactorPose2(21, T21, prior));
    
    % Odometry
    model = gtsam.noiseModel.Diagonal.Sigmas([0.01; 0.01; 1e6]);
    graph.add(gtsam.BetweenFactorPose2(11, 12, T11.between(T12), model));
    graph.add(gtsam.BetweenFactorPose2(21, 22, T21.between(T22), model));
    
    % Range
    model_rho = gtsam.noiseModel.Isotropic.Sigma(1, 0.01);
    graph.add(gtsam.RangeFactorPose2(12, 22, 1.0, model_rho));
    
    % Print graph
    graph
    
    values = gtsam.Values();
    values.insert(11, T11);
    values.insert(12, T12);
    values.insert(21, T21);
    values.insert(22, T22);
    linearized = graph.linearize(values);
    
    % Get Jacobian
    ordering = gtsam.Ordering();
    ordering.push_back(11);
    ordering.push_back(21);
    ordering.push_back(12);
    ordering.push_back(22);
    A = linearized.jacobian(ordering);
    
    % Print Jacobian on real unknowns
    A(7:13,7:12)
    linearized.gradientAtZero()
    

    The output is:

    size: 5
    
    Factor 0: PriorFactor on 11
      prior mean: (0, 0, 0)
      noise model: unit (3) 
    
    Factor 1: PriorFactor on 21
      prior mean: (0, 1, 0)
      noise model: unit (3) 
    
    Factor 2: BetweenFactor(11,12)
      measured: (1, 0, 0)
      noise model: diagonal sigmas[0.01; 0.01; 1000000];
    
    Factor 3: BetweenFactor(21,22)
      measured: (1, 0, 0)
      noise model: diagonal sigmas[0.01; 0.01; 1000000];
    
    Factor 4: RangeFactor
    Factor 4:   keys = { 12 22 }
    isotropic dim=1 sigma=0.01
    ExpressionFactor with measurement: 1.000000
    
    
    ans =
    
      100.0000         0         0         0         0         0
             0  100.0000         0         0         0         0
             0         0    0.0000         0         0         0
             0         0         0  100.0000         0         0
             0         0         0         0  100.0000         0
             0         0         0         0         0    0.0000
             0 -100.0000         0         0  100.0000         0
    
    : 4 elements
      11: 0 0 0
      12: 0 0 0
      21: 0 0 0
      22: 0 0 0
    

    It makes sense the gradient is zero, as we are at the global minimum. Note that b should be the prediction error, which I think here would be zero, unless I’m misunderstanding the example. Digest, play, let me know.

  5. Frank Dellaert

    Any more thoughts? Note, a great way to diagnose crisply would be to add an example, e.g. the one above or whatever you think does not work, to testDoglegOptimizer.cpp that fails. If it does not fail, also good PR 🙂 One can never have too much butter, uhm, tests.

  6. Adam Rutkowski reporter

    I am actively working on this, but I don’t have a clean example yet.

    In response to your post yesterday, it appears you understand my (previously posted) example perfectly. The prediction error should indeed be zero when the range measurement is 1.

    So I took your code, changed the range measurement from 1.0 to 0.99 (to be away from the global minimum) and attempted to optimize the factor graph with three different optimizers: Levenberg-Marquardt, Gauss-Newton, and dog leg. With the uncertainty parameters you provided, only the Levenberg-Marquardt optimizer produced a result. If I changed standard deviation in the heading odometry from 1e6 to 0, then I got equivalent results with all three optimizers. This makes sense to me since there are now two more unknowns with the effectively unknown heading changes (with a standard deviation of 1e6) , but there are no additional measurements to help observe those unknowns. The Levenberg-Marquardt algorithm can still handle singular systems though. When the relative changes in heading are known (with a standard deviation of 0), I am effectively back down to 4 unknowns with 5 constraints.

    I am currently adapting your code to be closer to the example that led me down this path in the first place. I actually have an example (that I almost submitted yesterday) whereby the Levenberg-Marquardt and Gauss -Newton optimizers produce good results, but the dog leg optimizer does not. Unfortunately, I had an error in how I was creating the initial guess. Interestingly, the LM and GN optimizers were able to recover from the poor initial guess, but the DL optimizer was not. Once I fixed the initial guess to be more reasonable, all three optimizers worked well (but the DL optimizer was always able to take the GN step).

    My current thought is that in my original code (which is actually in C++), I either have a bad initial guess, or there actually is a problem with the DL optimizer. I think I am fairly close to having a working example using the dog leg optimizer with ISAM2, but I will need to return to working on it next week, or perhaps over the weekend if I can find some time.

  7. Adam Rutkowski reporter

    I have isolated a case that fails to decrease in the steepest descent direction as described in my original post (using GTSAM 3.2.1, as this is the version I have set up for Matlab). The scenario here is still a planar cooperative problem with odometry and inter-vehicle range, but it propagates for 5 timesteps. The initial guess has a fair bit of error, but the Levenberg-Marquardt and Gauss-Newton optimizers handle this case easily. The dog leg optimizer fails, as shown in the verbose output below (a summary of the results is at the bottom).

    size: 14
    factor 0: PriorFactor on 100001
      prior mean: (0, 0, 0)
      noise model: constrained sigmas[0; 0; 0];
      noise model: constrained mu[1000; 1000; 1000];
    factor 1: PriorFactor on 200001
      prior mean: (600, 0, 0)
      noise model: constrained sigmas[0; 0; 0];
      noise model: constrained mu[1000; 1000; 1000];
    factor 2: BetweenFactor(100001,100002)
      measured: (-0.00611769713, 3.00272473, 0)
      noise model: constrained sigmas[0.01; 0.01; 0];
      noise model: constrained mu[1000; 1000; 1000];
    factor 3: BetweenFactor(200001,200002)
      measured: (-0.00726979015, 3.00305988, -0)
      noise model: constrained sigmas[0.01; 0.01; 0];
      noise model: constrained mu[1000; 1000; 1000];
    factor 4: RangeFactor, range = 599.993166
      keys = { 100002 200002 }
      noise model: isotropic sigma  0.01
    factor 5: BetweenFactor(100002,100003)
      measured: (-0.00911369363, 2.98321507, -0)
      noise model: constrained sigmas[0.01; 0.01; 0];
      noise model: constrained mu[1000; 1000; 1000];
    factor 6: BetweenFactor(200002,200003)
      measured: (0.0149394677, 3.00521545, -0)
      noise model: constrained sigmas[0.01; 0.01; 0];
      noise model: constrained mu[1000; 1000; 1000];
    factor 7: RangeFactor, range = 600.013327
      keys = { 100003 200003 }
      noise model: isotropic sigma  0.01
    factor 8: BetweenFactor(100003,100004)
      measured: (-0.0068554029, 2.99257656, -0)
      noise model: constrained sigmas[0.01; 0.01; 0];
      noise model: constrained mu[1000; 1000; 1000];
    factor 9: BetweenFactor(200003,200004)
      measured: (0.0203433896, 2.97994098, 0)
      noise model: constrained sigmas[0.01; 0.01; 0];
      noise model: constrained mu[1000; 1000; 1000];
    factor 10: RangeFactor, range = 599.995367
      keys = { 100004 200004 }
      noise model: isotropic sigma  0.01
    factor 11: BetweenFactor(100004,100005)
      measured: (-0.00254725018, 2.98969401, -0)
      noise model: constrained sigmas[0.01; 0.01; 0];
      noise model: constrained mu[1000; 1000; 1000];
    factor 12: BetweenFactor(200004,200005)
      measured: (-0.000920557164, 3.00307973, 0)
      noise model: constrained sigmas[0.01; 0.01; 0];
      noise model: constrained mu[1000; 1000; 1000];
    factor 13: RangeFactor, range = 599.996946
      keys = { 100005 200005 }
      noise model: isotropic sigma  0.01
    

    I generated this case with the following script:

    % Test optimizers
    
    % Clear screen
    clc
    
    % Factor graph
    graph = gtsam.NonlinearFactorGraph();
    values = gtsam.Values();
    
    % Noise models
    sigma_dx = 0.01;
    sigma_dy = 0.01;
    sigma_dtheta = 0;
    sigma_rho = 0.01;
    model_odom = gtsam.noiseModel.Diagonal.Sigmas([sigma_dx; sigma_dy; sigma_dtheta]);
    model_rho = gtsam.noiseModel.Isotropic.Sigma(1, sigma_rho);
    
    % Priors
    prior = gtsam.noiseModel.Diagonal.Sigmas([0; 0; 0]);
    key1 = 1e5+1;
    key2 = 2e5+1;
    T1 = gtsam.Pose2(0,0,0); % true pose of vehicle 1
    T2 = gtsam.Pose2(600,0,0);
    graph.add(gtsam.PriorFactorPose2(key1, T1, prior));
    graph.add(gtsam.PriorFactorPose2(key2, T2, prior));
    values.insert(key1, T1);
    values.insert(key2, T2);
    
    for i=2:5
        % Pose guess (perturbed truth)
        key1 = 1e5+i;
        key2 = 2e5+i;
        T1 = gtsam.Pose2(0+randn,3*(i-1)+randn,0);
        T2 = gtsam.Pose2(600+randn,3*(i-1)+randn,0);
    
        % Odometry (corrupted by noise)
        odom1 = gtsam.Pose2(sigma_dx*randn,3+sigma_dy*randn,sigma_dtheta*randn);
        odom2 = gtsam.Pose2(sigma_dx*randn,3+sigma_dy*randn,sigma_dtheta*randn);
        graph.add(gtsam.BetweenFactorPose2(key1-1, key1, odom1, model_odom));
        graph.add(gtsam.BetweenFactorPose2(key2-1, key2, odom2, model_odom));
    
        % Range (corrupted by noise)
        graph.add(gtsam.RangeFactorPose2(key1, key2, 600+sigma_rho*randn, model_rho));
    
        % Values (use perturbed truth as initial guess for optimizers)
        values.insert(key1, T1);
        values.insert(key2, T2);
    
    end
    
    % Print graph
    graph
    
    % Print initial guess values
    %disp(sprintf('\nInitial guess:'))
    %values
    
    % Optimize
    disp(sprintf('\nLevenberg-Marquardt optimizer result:'))
    lm_params = gtsam.LevenbergMarquardtParams();
    lm_params.setVerbosity('VALUES')
    lm_optimizer = gtsam.LevenbergMarquardtOptimizer(graph, values, lm_params);
    result = lm_optimizer.optimizeSafely()
    disp(sprintf('Optimizer error: %f', lm_optimizer.error()))
    disp(sprintf('Levenberg-Marquardt optimizer converged in %d iterations', lm_optimizer.iterations())) 
    
    disp(sprintf('\nGauss-Newton optimizer result:'))
    gn_params = gtsam.GaussNewtonParams();
    gn_params.setVerbosity('VALUES')
    gn_optimizer = gtsam.GaussNewtonOptimizer(graph, values, gn_params);
    result = gn_optimizer.optimizeSafely()
    disp(sprintf('Optimizer error: %f', gn_optimizer.error()))
    disp(sprintf('Gauss-Newton optimizer converged in %d iterations', gn_optimizer.iterations())) 
    
    disp(sprintf('\nPowell dog leg optimizer result:'))
    dogleg_params = gtsam.DoglegParams();
    dogleg_params.setVerbosity('VALUES');
    dogleg_params.setVerbosityDL('VERBOSE');
    dl_optimizer = gtsam.DoglegOptimizer(graph, values, dogleg_params);
    result = dl_optimizer.optimizeSafely()
    disp(sprintf('Optimizer error: %f', dl_optimizer.error()))
    disp(sprintf('Dog leg optimizer converged in %d iterations', dl_optimizer.iterations())) 
    
    disp(sprintf('\nLevenberg-Marguardt optimizer error: %f (%d iterations)', lm_optimizer.error(), lm_optimizer.iterations()))
    disp(sprintf('Gauss-Newton optimizer error: %f (%d iterations)', gn_optimizer.error(), gn_optimizer.iterations()))
    disp(sprintf('Dog leg optimizer error: %f (%d iterations)', dl_optimizer.error(), dl_optimizer.iterations()))
    
    if(dl_optimizer.error() > gn_optimizer.error()+1e-3)
        linearized = graph.linearize(result);
        disp('Gradient at final dog leg result')
        linearized.gradientAtZero()
    end
    

    Note that this script will generate a different initial guess each time it runs. Most of the time, all three optimizers succeed, but the dog leg optimizer almost always takes more iterations. However, the dog leg optimizer fails on approximately 1% of the cases (anecdotally speaking, I haven't done a Monte Carlo analysis), usually accompanied by a warning about a failure to decrease the objective function, but sometimes terminating at a poor solution due to relative function decrease being below the threshold. I have never seen the others optimizers fail over multiple runs of this example.

    I could understand getting caught in a local minimum, but I don't think that is what's happening here, since the gradient at the final result for the dog leg optimizer is not zero (although, I don’t understand why the gradient output has three columns).

  8. Frank Dellaert

    That’s great work. Indeed, because the dogleg optimizer is really switching between GN and gradient descent (just like LM, but in a different way) it should succeed whenever GN succeeds. @kaess might weigh in on this, he did the original code and can confirm my understanding. Not sure whether he monitors Bitbucket mail but let's give hime a chance to weigh in.

  9. Adam Rutkowski reporter

    Looking back at my previous post, I realize that the optimizer results got cut off. I have included them here for completeness (again, the summary of the results is at the bottom):

    Levenberg-Marquardt optimizer result:
    Initial valuesValues with 10 values:
    Value 100001: (0, 0, 0)
    Value 100002: (-2.31996895, 1.93364125, 0)
    Value 100003: (-0.625587382, 6.18857792, 0)
    Value 100004: (-0.84568354, 8.96342818, 0)
    Value 100005: (0.551152193, 13.0344137, 0)
    Value 200001: (600, 0, 0)
    Value 200002: (601.596888, 2.80988045, 0)
    Value 200003: (601.916594, 6.59647362, 0)
    Value 200004: (599.298225, 8.9870857, 0)
    Value 200005: (598.533201, 12.7405703, 0)
    Initial error: 259426.136
    newValuesValues with 10 values:
    Value 100001: (0, 0, 0)
    Value 100002: (-0.00265411902, 3.00271523, 0)
    Value 100003: (-0.00642747226, 5.98592353, 0)
    Value 100004: (0.000213748629, 8.97849884, 0)
    Value 100005: (0.00103177622, 11.9681912, 0)
    Value 200001: (600, 0, 0)
    Value 200002: (599.989267, 3.00306938, 0)
    Value 200003: (599.998866, 6.00829161, 0)
    Value 200004: (600.005713, 8.98823384, 0)
    Value 200005: (600.001427, 11.9913152, 0)
    newError: 3.24317791
    errorThreshold: 3.24317791 > 0
    absoluteDecrease: 259422.892406 >= 1e-05
    relativeDecrease: 0.999987498646 >= 1e-05
    newValuesValues with 10 values:
    Value 100001: (0, 0, 0)
    Value 100002: (-0.00247563827474, 3.00272472013, 0)
    Value 100003: (-0.00634458490105, 5.98593978019, 0)
    Value 100004: (0.000244059247352, 8.97851663487, 0)
    Value 100005: (0.00106996413503, 11.968210776, 0)
    Value 200001: (600, 0, 0)
    Value 200002: (599.989088151, 3.00305989179, 0)
    Value 200003: (599.998782872, 6.00827535653, 0)
    Value 200004: (600.005682214, 8.98821604247, 0)
    Value 200005: (600.001388502, 11.9912956414, 0)
    newError: 3.24191625764
    errorThreshold: 3.24191625764 > 0
    absoluteDecrease: 0.00126164867332 >= 1e-05
    relativeDecrease: 0.000389016177888 >= 1e-05
    newValuesValues with 10 values:
    Value 100001: (0, 0, 0)
    Value 100002: (-0.00247563827474, 3.00272472013, 0)
    Value 100003: (-0.00634458490105, 5.98593978019, 0)
    Value 100004: (0.000244059247352, 8.97851663487, 0)
    Value 100005: (0.00106996413503, 11.968210776, 0)
    Value 200001: (600, 0, 0)
    Value 200002: (599.989088151, 3.00305989179, 0)
    Value 200003: (599.998782872, 6.00827535653, 0)
    Value 200004: (600.005682214, 8.98821604247, 0)
    Value 200005: (600.001388502, 11.9912956414, 0)
    newError: 3.24191625764
    errorThreshold: 3.24191625764 > 0
    absoluteDecrease: 0 < 1e-05
    relativeDecrease: 0 < 1e-05
    converged
    errorThreshold: 3.24191625764 <? 0
    absoluteDecrease: 0 <? 1e-05
    relativeDecrease: 0 <? 1e-05
    iterations: 3 >? 100
    Values with 10 values:
    Value 100001: (0, 0, 0)
    Value 100002: (-0.00247563827474, 3.00272472013, 0)
    Value 100003: (-0.00634458490105, 5.98593978019, 0)
    Value 100004: (0.000244059247352, 8.97851663487, 0)
    Value 100005: (0.00106996413503, 11.968210776, 0)
    Value 200001: (600, 0, 0)
    Value 200002: (599.989088151, 3.00305989179, 0)
    Value 200003: (599.998782872, 6.00827535653, 0)
    Value 200004: (600.005682214, 8.98821604247, 0)
    Value 200005: (600.001388502, 11.9912956414, 0)
    Optimizer error: 3.241916
    Levenberg-Marquardt optimizer converged in 3 iterations
    
    Gauss-Newton optimizer result:
    Initial valuesValues with 10 values:
    Value 100001: (0, 0, 0)
    Value 100002: (-2.31996894643, 1.93364124892, 0)
    Value 100003: (-0.625587382398, 6.18857792394, 0)
    Value 100004: (-0.845683539749, 8.9634281823, 0)
    Value 100005: (0.551152193177, 13.0344137125, 0)
    Value 200001: (600, 0, 0)
    Value 200002: (601.596888244, 2.80988045076, 0)
    Value 200003: (601.916593571, 6.59647362149, 0)
    Value 200004: (599.298224949, 8.98708569826, 0)
    Value 200005: (598.533200855, 12.7405703085, 0)
    Initial error: 259426.135584
    newValuesValues with 10 values:
    Value 100001: (0, 0, 0)
    Value 100002: (-0.00265411746744, 3.00271523455, 0)
    Value 100003: (-0.00642747024626, 5.98592353112, 0)
    Value 100004: (0.000213751452157, 8.9784988381, 0)
    Value 100005: (0.00103177913082, 11.9681911955, 0)
    Value 200001: (600, 0, 0)
    Value 200002: (599.98926663, 3.00306937738, 0)
    Value 200003: (599.998865757, 6.0082916056, 0)
    Value 200004: (600.005712522, 8.98823383924, 0)
    Value 200005: (600.001426687, 11.9913152219, 0)
    newError: 3.24317789784
    errorThreshold: 3.24317789784 > 0
    absoluteDecrease: 259422.892406 >= 1e-05
    relativeDecrease: 0.999987498646 >= 1e-05
    newValuesValues with 10 values:
    Value 100001: (0, 0, 0)
    Value 100002: (-0.00247563827473, 3.00272472013, 0)
    Value 100003: (-0.00634458490103, 5.98593978019, 0)
    Value 100004: (0.000244059247353, 8.97851663487, 0)
    Value 100005: (0.00106996413502, 11.968210776, 0)
    Value 200001: (600, 0, 0)
    Value 200002: (599.989088151, 3.00305989179, 0)
    Value 200003: (599.998782872, 6.00827535653, 0)
    Value 200004: (600.005682214, 8.98821604247, 0)
    Value 200005: (600.001388502, 11.9912956414, 0)
    newError: 3.24191625763
    errorThreshold: 3.24191625763 > 0
    absoluteDecrease: 0.00126164021014 >= 1e-05
    relativeDecrease: 0.000389013569369 >= 1e-05
    newValuesValues with 10 values:
    Value 100001: (0, 0, 0)
    Value 100002: (-0.00247563827454, 3.00272471981, 0)
    Value 100003: (-0.00634458490063, 5.9859397795, 0)
    Value 100004: (0.00024405924787, 8.97851663335, 0)
    Value 100005: (0.00106996413566, 11.9682107742, 0)
    Value 200001: (600, 0, 0)
    Value 200002: (599.989088151, 3.00305989211, 0)
    Value 200003: (599.998782872, 6.00827535722, 0)
    Value 200004: (600.005682214, 8.98821604398, 0)
    Value 200005: (600.001388502, 11.9912956432, 0)
    newError: 3.24191625765
    errorThreshold: 3.24191625765 > 0
    absoluteDecrease: -1.83888460015e-11 < 1e-05
    relativeDecrease: -5.67221499265e-12 < 1e-05
    Warning:  stopping nonlinear iterations because error increased
    errorThreshold: 3.24191625765 <? 0
    absoluteDecrease: -1.83888460015e-11 <? 1e-05
    relativeDecrease: -5.67221499265e-12 <? 1e-05
    iterations: 3 >? 100
    Values with 10 values:
    Value 100001: (0, 0, 0)
    Value 100002: (-0.00247563827454, 3.00272471981, 0)
    Value 100003: (-0.00634458490063, 5.9859397795, 0)
    Value 100004: (0.00024405924787, 8.97851663335, 0)
    Value 100005: (0.00106996413566, 11.9682107742, 0)
    Value 200001: (600, 0, 0)
    Value 200002: (599.989088151, 3.00305989211, 0)
    Value 200003: (599.998782872, 6.00827535722, 0)
    Value 200004: (600.005682214, 8.98821604398, 0)
    Value 200005: (600.001388502, 11.9912956432, 0)
    Optimizer error: 3.241916
    Gauss-Newton optimizer converged in 3 iterations
    
    Powell dog leg optimizer result:
    Initial valuesValues with 10 values:
    Value 100001: (0, 0, 0)
    Value 100002: (-2.31996894643, 1.93364124892, 0)
    Value 100003: (-0.625587382398, 6.18857792394, 0)
    Value 100004: (-0.845683539749, 8.9634281823, 0)
    Value 100005: (0.551152193177, 13.0344137125, 0)
    Value 200001: (600, 0, 0)
    Value 200002: (601.596888244, 2.80988045076, 0)
    Value 200003: (601.916593571, 6.59647362149, 0)
    Value 200004: (599.298224949, 8.98708569826, 0)
    Value 200005: (598.533200855, 12.7405703085, 0)
    Initial error: 259426.135584
    Steepest descent magnitude 1.95480862697, Newton's method magnitude 4.35379659726
    In steepest descent region with fraction 0.511559027417 of steepest descent magnitude
    Delta = 1, dx_d_norm = 1
    f error: 259426.135583571 -> 185971.053351668
    M error: 259422.875503615 -> 129765.355265682
    rho = 0.566531598761918
    newValuesValues with 10 values:
    Value 100001: (0, 0, 0)
    Value 100002: (-1.86412553822045, 2.06832781531011, -0.453938439195871)
    Value 100003: (-0.590444576823901, 6.10318345382433, 0.0862414921921389)
    Value 100004: (-0.744594506742141, 9.03799077081437, -0.0274186355258822)
    Value 100005: (0.355180272916884, 12.972405548984, -0.181737521283955)
    Value 200001: (600, 0, 0)
    Value 200002: (601.297052546576, 2.86549217858808, 0.387416112407595)
    Value 200003: (601.602479616415, 6.51769823947892, -0.0712516588538513)
    Value 200004: (599.397299365984, 9.06398466783818, 0.0207463642783543)
    Value 200005: (598.692708404883, 12.6974410312305, 0.252457540272867)
    newError: 185971.053351668
    errorThreshold: 185971.053351668 > 0
    absoluteDecrease: 73455.0822319 >= 1e-05
    relativeDecrease: 0.283144495317 >= 1e-05
    Steepest descent magnitude 1.69218450879, Newton's method magnitude 3.94978241016
    In steepest descent region with fraction 0.590952106468 of steepest descent magnitude
    Delta = 1, dx_d_norm = 1
    f error: 185971.053351668 -> 147461.672510673
    M error: 116438.808120394 -> 47434.8028409954
    rho = 0.558074573860903
    newValuesValues with 10 values:
    Value 100001: (0, 0, 0)
    Value 100002: (-1.45847824652871, 2.18804108058623, -0.849078234211621)
    Value 100003: (-0.417761833172634, 6.01567138137945, 0.190674226241442)
    Value 100004: (-0.570994832658775, 9.1190302810607, -0.109147169130491)
    Value 100005: (0.166626390248096, 12.8843152592579, -0.148312531162434)
    Value 200001: (600, 0, 0)
    Value 200002: (601.014634033161, 2.84235501567348, 0.727695629583124)
    Value 200003: (601.108216937119, 6.42983435830157, -0.00251625576581953)
    Value 200004: (599.519101941525, 9.1717313246385, 0.0435747838360083)
    Value 200005: (598.841464230881, 12.6342797874151, 0.403706578505043)
    newError: 147461.672510673
    errorThreshold: 147461.672510673 > 0
    absoluteDecrease: 38509.380841 >= 1e-05
    relativeDecrease: 0.20707190795 >= 1e-05
    Steepest descent magnitude 1.39713448715, Newton's method magnitude 3.2439018278
    In steepest descent region with fraction 0.715750709182 of steepest descent magnitude
    Delta = 1, dx_d_norm = 1
    f error: 147461.672510673 -> 141060.170419056
    M error: 42110.9601290933 -> 9251.50939491742
    rho = 0.194814640798574
    newValuesValues with 10 values:
    Value 100001: (0, 0, 0)
    Value 100002: (-1.26714361378349, 2.06261316294581, -1.10388919339331)
    Value 100003: (-0.084334535948957, 6.14617486173199, 0.248927195164887)
    Value 100004: (-0.340276429581535, 9.24667383940122, -0.159575195667903)
    Value 100005: (-0.000681850951073648, 12.7295430081767, -0.0307387462930205)
    Value 200001: (600, 0, 0)
    Value 200002: (600.836788514223, 2.65530056990016, 0.931482861696576)
    Value 200003: (600.460705644938, 6.43928728088233, 0.161151832307827)
    Value 200004: (599.624409546913, 9.27518406487952, 0.0679752639479036)
    Value 200005: (598.991044473568, 12.5489552000355, 0.487120130218038)
    newError: 141060.170419056
    errorThreshold: 141060.170419056 > 0
    absoluteDecrease: 6401.50209162 >= 1e-05
    relativeDecrease: 0.0434112944918 >= 1e-05
    Steepest descent magnitude 0.987694213444, Newton's method magnitude 3.65449294076
    In steepest descent region with fraction 0.506229552825 of steepest descent magnitude
    Delta = 0.5, dx_d_norm = 0.5
    f error: 141060.170419056 -> 160690.359410671
    M error: 19224.4729042793 -> 8292.52982292943
    rho = -1.79567244775589
    Steepest descent magnitude 0.987694213444289, Newton's method magnitude 3.65449294075576
    In steepest descent region with fraction 0.25311477641263 of steepest descent magnitude
    Delta = 0.25, dx_d_norm = 0.25
    f error: 141060.170419056 -> 149646.228609192
    M error: 19224.4729042793 -> 12832.309412885
    rho = -1.34321629941029
    Steepest descent magnitude 0.987694213444289, Newton's method magnitude 3.65449294075576
    In steepest descent region with fraction 0.126557388206315 of steepest descent magnitude
    Delta = 0.125, dx_d_norm = 0.125
    f error: 141060.170419056 -> 145046.056040865
    M error: 19224.4729042793 -> 15796.8431709023
    rho = -1.16286936800573
    Steepest descent magnitude 0.987694213444289, Newton's method magnitude 3.65449294075576
    In steepest descent region with fraction 0.0632786941031576 of steepest descent magnitude
    Delta = 0.0625, dx_d_norm = 0.0625
    f error: 141060.170419056 -> 142976.34644219
    M error: 19224.4729042793 -> 17452.7710406708
    rho = -1.08154541263031
    Steepest descent magnitude 0.987694213444289, Newton's method magnitude 3.65449294075576
    In steepest descent region with fraction 0.0316393470515788 of steepest descent magnitude
    Delta = 0.03125, dx_d_norm = 0.03125
    f error: 141060.170419056 -> 141999.069408146
    M error: 19224.4729042793 -> 18324.1502232451
    rho = -1.04284720230691
    Steepest descent magnitude 0.987694213444289, Newton's method magnitude 3.65449294075576
    In steepest descent region with fraction 0.0158196735257894 of steepest descent magnitude
    Delta = 0.015625, dx_d_norm = 0.015625
    f error: 141060.170419056 -> 141524.823010372
    M error: 19224.4729042793 -> 18770.6936264547
    rho = -1.02396167921865
    Steepest descent magnitude 0.987694213444289, Newton's method magnitude 3.65449294075576
    In steepest descent region with fraction 0.0079098367628947 of steepest descent magnitude
    Delta = 0.0078125, dx_d_norm = 0.0078125
    f error: 141060.170419056 -> 141291.297534087
    M error: 19224.4729042793 -> 18996.6787810401
    rho = -1.01463159692137
    Steepest descent magnitude 0.987694213444289, Newton's method magnitude 3.65449294075576
    In steepest descent region with fraction 0.00395491838144735 of steepest descent magnitude
    Delta = 0.00390625, dx_d_norm = 0.00390625
    f error: 141060.170419056 -> 141175.434187135
    M error: 19224.4729042793 -> 19110.349721578
    rho = -1.00999433551364
    Steepest descent magnitude 0.987694213444289, Newton's method magnitude 3.65449294075576
    In steepest descent region with fraction 0.00197745919072367 of steepest descent magnitude
    Delta = 0.001953125, dx_d_norm = 0.001953125
    f error: 141060.170419056 -> 141117.727356451
    M error: 19224.4729042793 -> 19167.3547826582
    rho = -1.00768260163478
    Steepest descent magnitude 0.987694213444289, Newton's method magnitude 3.65449294075576
    In steepest descent region with fraction 0.000988729595361837 of steepest descent magnitude
    Delta = 0.0009765625, dx_d_norm = 0.0009765625
    f error: 141060.170419056 -> 141088.930151186
    M error: 19224.4729042793 -> 19195.8997109011
    rho = -1.00652845305704
    Steepest descent magnitude 0.987694213444289, Newton's method magnitude 3.65449294075576
    In steepest descent region with fraction 0.000494364797680919 of steepest descent magnitude
    Delta = 0.00048828125, dx_d_norm = 0.00048828125
    f error: 141060.170419056 -> 141074.545600989
    M error: 19224.4729042793 -> 19210.1827744483
    rho = -1.00595180755867
    Steepest descent magnitude 0.987694213444289, Newton's method magnitude 3.65449294075576
    In steepest descent region with fraction 0.000247182398840459 of steepest descent magnitude
    Delta = 0.000244140625, dx_d_norm = 0.000244140625
    f error: 141060.170419056 -> 141067.356838993
    M error: 19224.4729042793 -> 19217.3269560783
    rho = -1.00566359222216
    Steepest descent magnitude 0.987694213444289, Newton's method magnitude 3.65449294075576
    In steepest descent region with fraction 0.00012359119942023 of steepest descent magnitude
    Delta = 0.0001220703125, dx_d_norm = 0.0001220703125
    f error: 141060.170419056 -> 141063.763336268
    M error: 19224.4729042793 -> 19220.8997093574
    rho = -1.00551951144341
    Steepest descent magnitude 0.987694213444289, Newton's method magnitude 3.65449294075576
    In steepest descent region with fraction 6.17955997101148e-05 of steepest descent magnitude
    Delta = 6.103515625e-05, dx_d_norm = 6.103515625e-05
    f error: 141060.170419056 -> 141061.96680447
    M error: 19224.4729042793 -> 19222.686251613
    rho = -1.00544747624829
    Steepest descent magnitude 0.987694213444289, Newton's method magnitude 3.65449294075576
    In steepest descent region with fraction 3.08977998550574e-05 of steepest descent magnitude
    Delta = 3.0517578125e-05, dx_d_norm = 3.0517578125e-05
    f error: 141060.170419056 -> 141061.068593469
    M error: 19224.4729042793 -> 19223.5795641448
    rho = -1.00541146498946
    Steepest descent magnitude 0.987694213444289, Newton's method magnitude 3.65449294075576
    In steepest descent region with fraction 1.54488999275287e-05 of steepest descent magnitude
    Delta = 1.52587890625e-05, dx_d_norm = 1.52587890625e-05
    f error: 141060.170419056 -> 141060.619501689
    M error: 19224.4729042793 -> 19224.0262307617
    rho = -1.00539345964191
    Steepest descent magnitude 0.987694213444289, Newton's method magnitude 3.65449294075576
    In steepest descent region with fraction 7.72444996376435e-06 of steepest descent magnitude
    Delta = 7.62939453125e-06, dx_d_norm = 7.62939453125e-06
    f error: 141060.170419056 -> 141060.39495923
    M error: 19224.4729042793 -> 19224.2495666579
    rho = -1.00538446001464
    Warning:  Dog leg stopping because cannot decrease error with minimum Delta
    newValuesValues with 10 values:
    Value 100001: (0, 0, 0)
    Value 100002: (-1.26714361378349, 2.06261316294581, -1.10388919339331)
    Value 100003: (-0.084334535948957, 6.14617486173199, 0.248927195164887)
    Value 100004: (-0.340276429581535, 9.24667383940122, -0.159575195667903)
    Value 100005: (-0.000681850951073648, 12.7295430081767, -0.0307387462930205)
    Value 200001: (600, 0, 0)
    Value 200002: (600.836788514223, 2.65530056990016, 0.931482861696576)
    Value 200003: (600.460705644938, 6.43928728088233, 0.161151832307827)
    Value 200004: (599.624409546913, 9.27518406487952, 0.0679752639479036)
    Value 200005: (598.991044473568, 12.5489552000355, 0.487120130218038)
    newError: 141060.170419056
    errorThreshold: 141060.170419056 > 0
    absoluteDecrease: 0 < 1e-05
    relativeDecrease: 0 < 1e-05
    converged
    errorThreshold: 141060.170419 <? 0
    absoluteDecrease: 0 <? 1e-05
    relativeDecrease: 0 <? 1e-05
    iterations: 4 >? 100
    Values with 10 values:
    Value 100001: (0, 0, 0)
    Value 100002: (-1.26714361378, 2.06261316295, -1.10388919339)
    Value 100003: (-0.084334535949, 6.14617486173, 0.248927195165)
    Value 100004: (-0.340276429582, 9.2466738394, -0.159575195668)
    Value 100005: (-0.000681850951074, 12.7295430082, -0.030738746293)
    Value 200001: (600, 0, 0)
    Value 200002: (600.836788514, 2.6553005699, 0.931482861697)
    Value 200003: (600.460705645, 6.43928728088, 0.161151832308)
    Value 200004: (599.624409547, 9.27518406488, 0.0679752639479)
    Value 200005: (598.991044474, 12.5489552, 0.487120130218)
    Optimizer error: 141060.170419
    Dog leg optimizer converged in 4 iterations
    
    Levenberg-Marguardt optimizer error: 3.241916 (3 iterations)
    Gauss-Newton optimizer error: 3.241916 (3 iterations)
    Dog leg optimizer error: 141060.170419 (4 iterations)
    Gradient at final dog leg result
    : 10 elements
      100001:  14071.0781863 -7028.82432574 -20115.5573379
      100002: -16247.2849703  2245.51456129 -111925.762288
      100003: -41300.1163485  1802.83273811  16350.3090428
      100004:  8313.66096417 -3907.30104762 -8697.10444118
      100005:  7891.30154475  5334.76614485 0.128836449375
      200001: -7826.81638299 -4698.72760465  16849.7772284
      200002: 4080.60037322  2272.1817879 107220.992762
      200003:  37156.8892367  -5207.4242453 -11577.9236275
      200004:  764.645730704 -1578.68635549 -16011.0962793
      200005: -12967.4866694  7769.32009792  0.41914486627
    

    Now, you can really see how the dogleg optimizer is attempting to take smaller and smaller steps in the steepest descent direction until it quits due to insufficient decrease of the objective function. Also, I now realize why the gradient has 3 columns; there are three states for each key (x, y, and heading).

  10. Frank Dellaert

    Interesting. Hypothesis: in this example dogleg, by taking a step in between first and second order updates, moves to a different local minimum region it can’t escape from. Can you initialize LM with these values but using a starting lambda that is very high? We don’t want LM to accidentally jump out the basin, just gradient-descent its way out (if it can).

  11. Adam Rutkowski reporter

    Great idea! This is very easy to do.

    In the last code I posted, I just modified the “if” block to start from the last dog leg result, as you suggest. I also start from the graph and initial values I posted above, rather than generating a new random case.

    Unfortunately, I can't exactly emulate a pure steepest descent algorith with the Levenberg-Marquardt optimizer because the step length goes to zero as the direction approaches the steepest descent direction (which is what is supposed to happen, since LM creates a "hook" between the steepest descent and GN directions for varying lambda).

    When I set the initial lambda to 1e6, the Levenberg-Marquardt optimizer converges in 8 iterations starting from the dog leg result. The result has an error of 3.2419, which is the same as the original results with the LM and GN optimizers. I also see that lambda decreases by a factor of 10 on each iteration. With an

    initial lambda of 1e10, it takes 12 iterations, but it still reaches the correct result. If I increase the initial lambda to 1e11, the relative decrease in the cost function drops below 1e-5 after the second iteration (due to the small step size), and the optimizer fails.

    The step length of the dog leg optimizer doesn't necessarily have to approach zero along the steepest descent direction, since it uses the dog leg to approximate the hook. However, for some reason, the dog leg optimizer wants to use a really small step in the steepest descent direction for this case.

    if(dl_optimizer.error() > gn_optimizer.error()+1e-3)
        linearized = graph.linearize(dl_result);
        disp('Gradient at final dog leg result')
        linearized.gradientAtZero()
    
        lm_params = gtsam.LevenbergMarquardtParams();
        lm_params.setlambdaInitial(1e10);
        lm_params.setVerbosity('VALUES')
        lm_params.setVerbosityLM('TRYLAMBDA')
        lm_optimizer = gtsam.LevenbergMarquardtOptimizer(graph, dl_result, lm_params);
        result = lm_optimizer.optimizeSafely()
    
        disp(sprintf('Optimizer error: %f', lm_optimizer.error()))
        disp(sprintf('Levenberg-Marquardt optimizer converged in %d iterations', lm_optimizer.iterations())) 
    end
    
  12. Adam Rutkowski reporter

    I have coded a pure steepest descent optimization algorithm in Matlab. For the case I posted on 2019-05-06, it reports an initial error of 259426, which matches what is being reported by GTSAM. The steepest descent algorithm takes about 200 iterations to settle down to an error of about 3.93 (I tried 1000 iterations with little improvement after 200). Starting from the final dog leg optimizer result, it can also achieve an error of 3.93 after about 100 iterations.

    Interestingly, this error of 3.93 is higher than the error of 3.24 achieved by the LM and GN optimizers. If I use my code to start from the LM result, it also reports an initial error of 3.24, but climbs up to 3.93 after about 7 iterations. If I use GTSAM dogleg optimizer to start from the pure steepest descent result, it can achieve an error of 3.24, but I can see that takes steps in the Gauss-Newton direction instead of the steepest descent direction after only a couple of iterations.

    These results make me want to try some other implementations of the dogleg algorithm, such as scipy.optimize or Google Ceres, since they vary in implementation details. It seems that the full GN step should be accepted on the first iteration of the dogleg algorithm since the objective function decreases from 259426 to 3.24 after the first iteration of GN, but I think there are some criteria based on the model mismatch that are causing this step to be rejected.

    Here’s the code I used for steepest descent:

    % The purpose of this script is to test a pure steepest descent optimization
    % algorithm on a cooperative navigation problem. This particular problem
    % leads to a failure of Powell's dog leg algorithm in GTSAM. The scenario is 
    % a planar problem with 2 vehicles using odometry and inter-vehicle range 
    % measurements. The simulation propagates four steps in time, and heading is 
    % assumed to be known exactly.
    
    % Measurement standard deviations
    sigma_dx = 0.01;
    sigma_dy = 0.01;
    sigma_rho = 0.01;
    
    % Initial values
    % (x1, y1) is the position of vehicle 1, (x2, y2) is for vehicle 2
    % Original problem
    x1 = [0 -2.31996895 -0.625587382 -0.84568354 0.551152193];
    y1 = [0 1.93364125 6.18857792 8.96342818 13.0344137];
    x2 = [600 601.596888 601.916594 599.298225 598.533201];
    y2 = [0 2.80988045 6.59647362 8.9870857 12.7405703];
    
    % dl_result of orignal problem
    % x1 = [0 -1.26714358918 -0.0843344751289 -0.340276382986 -0.000681825384076];
    % y1 = [0 2.06261313863 6.14617487076 9.24667380744 12.729543035];
    % x2 = [600 600.836788356 600.460705882 599.624409558 598.991044587];
    % y2 = [0 2.65530066601 6.43928724172 9.27518405959 12.5489552084];
    
    % dl_result of solution from steepest descent
    % x1 = [0 -0.0024754923585 -0.00634447874762 0.000244165574022 0.00106994187677];
    % y1 = [0 3.00272471646 5.98593977381 8.97851662639 11.9682107662];
    % x2 = [600 599.989088005 599.998782766 600.005682108 600.001388524];
    % y2 = [0 3.00305989354 6.00827535619 8.98821604361 11.9912956438];
    
    % Measurements
    % The first four elements are the x1(1), y1(1), x2(1), and y2(1)
    % After that, the order is dx1(i), dy1(i), dx2(i), dy2(i), rho(i)
    % for i from 2 to 5
    z = [   0;
            0;
          600;
            0;
           -0.00611769713;
            3.00272473;
           -0.00726979015;
            3.00305988;
          599.993166;
           -0.00911369363;
            2.98321507;
            0.0149394677;
            3.00521545;
          600.013327;
           -0.0068554029;
            2.99257656;
            0.0203433896;
            2.97994098;
          599.995367;
           -0.00254725018;
            2.98969401;
           -0.000920557164;
            3.00307973;
          599.996946];
    
    % Steepest descent optimization loop
    for k=1:200
    
        % Construct the error vector and measurement Jacobian
        b = [];
        for i=2:5
            dx1 = z(5*i-5);
            dy1 = z(5*i-4);
            dx2 = z(5*i-3);
            dy2 = z(5*i-2);
            rho = z(5*i-1);   
            rho_hat = hypot(x2(i)-x1(i), y2(i)-y1(i));
    
            % Augment the error vector for this time step
            b = [b;
                (x1(i) - x1(i-1) - dx1)/sigma_dx;
                (y1(i) - y1(i-1) - dy1)/sigma_dy;
                (x2(i) - x2(i-1) - dx2)/sigma_dx;
                (y2(i) - y2(i-1) - dy2)/sigma_dy;
                (rho_hat - rho)/sigma_rho];
    
            % Augment the measurement Jacobian for this time step
            A(5*i-9:5*i-5,4*i-7:4*i) = [-1/sigma_dx, 0, 0, 0, 1/sigma_dx, 0, 0, 0;
                                          0, -1/sigma_dy, 0, 0, 0, 1/sigma_dy, 0, 0;
                                          0, 0, -1/sigma_dx, 0, 0, 0, 1/sigma_dx, 0;
                                          0, 0, 0, -1/sigma_dy, 0, 0, 0, 1/sigma_dy;
                                          0, 0, 0, 0, -(x2(i)-x1(i))/rho_hat/sigma_rho, -(y2(i)-y1(i))/rho_hat/sigma_rho, (x2(i)-x1(i))/rho_hat/sigma_rho, (y2(i)-y1(i))/rho_hat/sigma_rho];
        end
    
        % Remove first 4 columns of A since initial positions are known
        % The first 4 columns correspond to x1(1), y1(1), x2(1), y2(1)
        A = A(:,5:end);
    
        % Compute least squares error and display output
        e(k) = b'*b/2;
        disp(sprintf('Iteration %d error: %f', k, e(k)));
    
        % Compute steepest descent direction and step size
        [Q,R] = qr(A,0);
        d = Q'*b;
        g = -R'*d;
        Rg = R*g;
        delta = (g'*g/(Rg'*Rg))*g;
    
        % update values
        % (the first value of each isn't updated since it is known)
        x1(2:end) = x1(2:end) + delta(1:4:end)';
        y1(2:end) = y1(2:end) + delta(2:4:end)';
        x2(2:end) = x2(2:end) + delta(3:4:end)';
        y2(2:end) = y2(2:end) + delta(4:4:end)';
    
    end
    
  13. Frank Dellaert

    OK, how can gradient descent go up? Something must be wrong with the gradient. How can I best help debug this issue?

  14. Frank Dellaert

    One immediate idea is to check the gradient with numerical derivative. Did you do this already?

  15. Adam Rutkowski reporter

    Indeed, there was an error with my Jacobian. I didn’t clear the variables before running the script, and I had some values in A in places I shouldn’t, probably from incorrect indexing on a previous run.

    I can run this script, beginning with the dl_result that followed the steepest descent result, and the gradient is fairly small (with the largest element of 1.3e-5), but probably not as small as it should be. The error is fairly stable for a few iterations, but after 100 iterations, the error creeps up to 3.38 (after starting at 3.24). I could see this happening if it takes too large of steps in the steepest descent direction. If I begin with the poor initial guess that gives an error of 259426, it takes my steepest descent script about 7000 iterations to get to an error of 3.94, but it never makes it down to 3.24. I’m not convinced that the script is 100% correct, but I can’t find the error.

    Perhaps I should try to implement a Matlab steepest descent algorithm using GTSAM computed Jacobians and gradients? In my script, the vehicles are only 2 DOF, so perhaps it’s not a good comparison, even when I set heading uncertainties to zero in GTSAM. How does GTSAM handle zero noise models? It seems this would cause a division by zero problem when computing the measurement error, unless these terms are explicitly ignored.

  16. Adam Rutkowski reporter

    It seems to be rather difficult to approximate the gradient for this problem using a forward difference approximation of the derivative. I started by just perturbing one value at a time to monitor it’s effect on the error function, but the numerical derivative didn’t seem to converge on a value. So I rewrote this in Python to use scipy.optimize.approx_fprime. I experimented with some functions that I could easily differentiate by hand to make sure I was using it properly, and then I tried applying it to my problem. I also verified that the value of the error function is the same as my Matlab implementation.

    The approx_fprime function requires supplying a finite difference size, and I tried values from 1e-3 to 1e-12, decreasing by a factor of 10 each time, and couldn’t get a consistent answer with any two finite difference sizes. My best guess is that the function is highly nonlinear, hence relatively large finite difference sized don’t work, but trying to use small finite difference sizes is causing numerical problems. I could try doing the complex step trick, but I’m not sure the best way to proceed with that idea at the moment.

    I still want to try some other implementations of Powell’s dogleg, but if you have any other suggestions, I’m certainly open to them.

    Here’s the Python code I tried:

    import numpy as np
    import matplotlib.pyplot as plt
    from scipy.optimize import approx_fprime
    
    sigma_dx = 0.01
    sigma_dy = 0.01
    sigma_rho = 0.01
    
    def error(x,z):
        b = np.array([])
        x1 = x[0:20:4]
        y1 = x[1:20:4]
        x2 = x[2:20:4]
        y2 = x[3:20:4]
    
        for i in range(1,5):
            dx1 = z[5*i-1]
            dy1 = z[5*i]
            dx2 = z[5*i+1]
            dy2 = z[5*i+2]
            rho = z[5*i+3]
            rho_hat = np.hypot(x2[i]-x1[i], y2[i]-y1[i])
            b = np.append(b,[(x1[i]-x1[i-1]-dx1)/sigma_dx,
                             (y1[i]-y1[i-1]-dy1)/sigma_dy,
                             (x2[i]-x2[i-1]-dx2)/sigma_dx,
                             (y2[i]-y2[i-1]-dy2)/sigma_dy,
                             (rho_hat-rho)/sigma_rho])
    
        return b.dot(b)/2
    
    x = np.array([-0.0024754923585,3.00272471646,599.989088005,3.00305989354,
                  -0.00634447874762,5.98593977381,599.998782766,6.00827535619,
                  0.000244165574022,8.97851662639,600.005682108,8.98821604361,
                  0.00106994187677,11.9682107662,600.001388524,11.9912956438])
    z = np.array([0,0,600,0,
                  -0.00611769713,3.00272473,-0.00726979015,3.00305988,599.993166,
                  -0.00911369363,2.98321507,0.0149394677,3.00521545,600.013327,
                  -0.0068554029,2.99257656,0.0203433896,2.97994098,599.995367,
                  -0.00254725018,2.98969401,-0.000920557164,3.00307973,599.996946])
    f = (lambda x : error(np.append([0,0,600,0],x),z))
    e = f(x)
    grad_f = approx_fprime(x,f,1e-3)
    

    And here’s the output, varying the finite difference size from 1e-3 to 1e-12, decreasing by a factor of 10 each time:

    In [124]: run test_approx_fprime.py
    
    In [125]: grad_f
    Out[125]:
    array([ 15.00000078,   9.9999867 ,  14.99999302,   9.99998659,
            14.99999537,   9.99993164,  15.00001382,   9.99993173,
            14.99999916,  10.0000836 ,  15.00000059,  10.00008428,
            10.00000268,   5.00002845,   9.99999377,   5.00002778])
    
    In [126]: run test_approx_fprime.py
    
    In [127]: grad_f
    Out[127]:
    array([ 1.50000078,  0.99999872,  1.49999302,  0.99999862,  1.49999532,
            0.99999315,  1.50001376,  0.99999333,  1.49999917,  1.00000806,
            1.5000006 ,  1.00000888,  1.00000267,  0.50000313,  0.99999376,
            0.50000247])
    
    In [128]: run test_approx_fprime.py
    
    In [129]: grad_f
    Out[129]:
    array([ 0.15000074,  0.09999984,  0.14999302,  0.09999979,  0.14999523,
            0.09999958,  0.15001469,  0.10000042,  0.1499994 ,  0.10000188,
            0.1500006 ,  0.10000155,  0.10000237,  0.05000083,  0.0999938 ,
            0.04999956])
    
    In [130]: run test_approx_fprime.py
    
    In [131]: grad_f
    Out[131]:
    array([ 0.01500075,  0.01000021,  0.01499302,  0.00999797,  0.01499523,
            0.01000518,  0.01501376,  0.01000415,  0.0149994 ,  0.01000188,
            0.0150006 ,  0.01000957,  0.01000275,  0.00499814,  0.0099938 ,
            0.00500186])
    
    In [132]: run test_approx_fprime.py
    
    In [133]: grad_f
    Out[133]:
    array([ 0.00149528,  0.0009911 ,  0.00149302,  0.00099068,  0.00146726,
            0.00102382,  0.00151376,  0.0010694 ,  0.00153375,  0.00102478,
            0.0015006 ,  0.00097522,  0.00097591,  0.00049431,  0.0009938 ,
            0.00050569])
    
    In [134]: run test_approx_fprime.py
    
    In [135]: grad_f
    Out[135]:
    array([  1.63513647e-04,   9.11271059e-05,   1.43041134e-04,
            -7.32747196e-05,  -7.21644966e-04,  -1.55786495e-04,
             1.63780101e-04,   1.28803634e-03,   6.92779167e-05,
             5.82778270e-04,   1.50635060e-04,  -3.82716081e-04,
             7.59392549e-05,  -1.85718108e-04,   9.37916411e-05,
            -9.76996262e-05])
    
    In [136]: run test_approx_fprime.py
    
    In [137]: grad_f
    Out[137]:
    array([ -1.53654867e-04,   1.33226763e-06,   8.43769499e-06,
             1.90958360e-05,  -8.56203997e-04,   6.27942143e-03,
             2.93098879e-05,   3.06243919e-03,   1.07913678e-03,
             1.63780101e-03,   1.59872116e-05,  -1.61737290e-03,
            -3.46522810e-03,   1.30340183e-03,   4.44089210e-06,
            -1.29318778e-03])
    
    In [138]: run test_approx_fprime.py
    
    In [139]: grad_f
    Out[139]:
    array([ -1.10933485e-02,  -4.44089210e-06,   0.00000000e+00,
             1.33226763e-05,  -5.67990099e-02,  -3.05089287e-03,
             1.77635684e-05,   3.05533376e-03,  -4.47286652e-02,
             1.62980740e-03,   4.44089210e-06,  -1.62536651e-03,
            -1.49746882e-02,   1.30562228e-03,   0.00000000e+00,
            -1.29674049e-03])
    
    In [140]: run test_approx_fprime.py
    
    In [141]: grad_f
    Out[141]:
    array([  7.10542736e-03,   0.00000000e+00,   0.00000000e+00,
             0.00000000e+00,  -8.95727936e-01,  -3.01980663e-03,
             4.44089210e-05,   3.10862447e-03,  -4.47197834e-02,
             1.64313008e-03,   0.00000000e+00,  -1.59872116e-03,
            -1.49658064e-02,   1.33226763e-03,   0.00000000e+00,
            -1.24344979e-03])
    
    In [142]: run test_approx_fprime.py
    
    In [143]: grad_f
    Out[143]:
    array([  3.72146758e-01,   4.44089210e-04,   0.00000000e+00,
             0.00000000e+00,  -7.42073070e+00,  -2.66453526e-03,
             0.00000000e+00,   3.55271368e-03,  -2.33502107e+00,
             1.77635684e-03,   0.00000000e+00,  -1.33226763e-03,
            -7.81597009e-01,   1.77635684e-03,   4.44089210e-04,
            -8.88178420e-04])
    
    In [144]:
    

    Note that at first, the gradient appears to decrease by a factor of the step size each time the step size is decreased, but that trend eventually breaks down.

  17. Frank Dellaert

    So, I did my own experiments, with your example but coded with GTSAM factors, and found that dogleg performs surprisingly well. In fact, I found that by changing the deltaInitial default to 10 it consistently outperforms LM, again, for this example. I created a pull-request PR #424 which you can peruse. It has nice graphs.

    Let me know if you still want to keep this issue open or want to experiment on your own to produce a failing unit test on the GTSAM gradient that you think proves it to be definitely wrong.

  18. Frank Dellaert

    As far as the gradient goes, you could sanity-check the GTSAM gradient using your example, which I merged as a test in testNonlinearFactorGraph.cpp, at the bottom. You can create a boost::function and use numericalGradient in numericalDerivative.h. You can sanity-check that with linearized.gradientAtZero(), which is what LM/Dogleg would use.

  19. Adam Rutkowski reporter

    Thanks! I’ll have to take a look at the pull request, and perhaps I’ll check out the numerical derivative in C++.

    I realized yesterday after my last post that the behavior of the forward difference gradient that I observed is approximately what should happen for a quadratic function near a minimum. Thus, to get a good numerical derivative, I should try a central difference approximation.

    It turns out that my analytical appears to be correct, as can be seen in the output below. There are three columns, the first is the analytical derivative, the second is the central difference approximation with a finite difference size of 1e-4, and the third is simply the difference between the two.

    In [68]: run test_approx_fprime.py
    
    In [69]: g_comparison
    Out[69]:
    array([[  7.85591176e-07,   7.81574805e-07,  -4.01637124e-09],
           [  5.21561095e-08,   4.52038407e-08,  -6.95226879e-09],
           [ -6.97945615e-06,  -6.97945701e-06,  -8.66862138e-13],
           [ -5.21605503e-08,  -4.52060611e-08,   6.95448923e-09],
           [ -4.62303356e-06,  -4.58793892e-06,   3.50946436e-08],
           [ -4.23609354e-08,  -9.17887988e-08,  -4.94278634e-08],
           [  1.38153594e-05,   1.38057232e-05,  -9.63613545e-09],
           [  4.23609354e-08,   9.17887988e-08,   4.94278634e-08],
           [ -8.38471834e-07,  -8.26350099e-07,   1.21217347e-08],
           [ -3.35789793e-07,  -4.08248990e-07,  -7.24591968e-08],
           [  5.86626811e-07,   5.99786887e-07,   1.31600757e-08],
           [  3.35807557e-07,   4.08268974e-07,   7.24614172e-08],
           [  2.69162961e-06,   2.69487099e-06,   3.24138185e-09],
           [  3.35735195e-07,   3.26929595e-07,  -8.80560022e-09],
           [ -6.22339483e-06,  -6.23677554e-06,  -1.33807134e-08],
           [ -3.35748517e-07,  -3.26942917e-07,   8.80560022e-09]])
    

    The behavior of my steepest descent algorithm may be related to a sloppy implementation, which doesn’t have stopping criteria and all the nice checks that a good implementation would have. Also, the really slow convergence I reported on 2019-05-14 was actually related to me manually reducing the step size to attempt to get the error to decrease.

    In any case, I do believe the steepest descent direction is correct in GTSAM. Going back to my original post, I think I was confused about notation. Certainly, when x is zero, then R'*(R*x-d) is just -R'*d, which is just -A'*b, which is the correct steepest descent direction. Here though, I think x really means a change in x rather than the values themselves.

    I will try adjusting the deltaInitial for the dogleg for the problem that led me to report this issue in the first place, and hopefully I can have some success. I certainly understand the motivation for using the dogleg versus LM (or, in iSAM2, since there is no LM, I want a more robust alternative to GN).

    At this point, I think the issue can be closed. I still want to investigate the various criteria of the dogleg a little more, but I don’t think the title of this issue makes sense for that. If I notice a problem, I’ll post a different issue with a more appropriate title, and reference this one if necessary.

    Thanks for all your help!

  20. Frank Dellaert

    Great. Thank you for digging! It led me to change the dogleg default. I just pushed another change where you can play with the script using CL arguments. I’ll close the issue, but if you see any issues with gradients, feel free to re-open a new issue.

  21. Log in to comment