Calling trac_ik in parallel means valid solutions are not found due to timeouts

Issue #63 closed
Austin Gregg-Smith created an issue

Hi,

I have been using trac_ik in “speed” mode to return 1000 ik solutions each started from a random seed.

I am using an OpenMP for loop to run those 1000 queries in parallel. This has been working fine because an OpenMP for loop will only spawn 1 thread per processor by default so each trac_ik instance has a dedicated core to run it.

However, I noticed that if I spawn two threads, where each thread attempts to calculate 1000 ik calls, then trac_ik fails to find the majority of solutions for either thread. If I run the calls serially it works of course.

I had a look and I assume this is due the the timeout. If there is contention for the cpu time, then the timeout runs out before trac_ik has run enough iterations to find a solution.

Is it possible to set an iteration count rather than a timeout so that the number of found IK solutions is the same regardless of how many threads call the function (I know there might be reduced time performance because of the contention, but this matters less than failing to find an ik solution that exists).

If that is not possible (and from looking at the codebase it looks it will be a bit difficult to extract the timeouts to replace with iteration counts), am I doing something incorrectly? Is it possible to call trac_ik in parallel and get the same results as calling it serially?

Thanks,

Austin

Comments (13)

  1. Austin Gregg-Smith reporter

    I create a new instance of trac_ik for each iteration that is never shared so I don’t think that should be an issue. Ie, calling 1000 ik from two threads = 2000 individual trac_ik initialisations that are thrown away immediately after their first solve. I have been running the code for a long time and never had any thread crashing problems.

    If I want to fix this, would you recommend just going through the codebase and replacing all the timeouts with an iteration counts or is there a better way?

    Thanks.

  2. Patrick Beeson Account Deactivated

    I see. I didn’t read your original post carefully enough.

    Have you tried simply doubling the timeout? It won’t use the full timeout on Speed if an answer is found. Once it finds an answer it will return immediately. So doubling speed shouldn’t double the actual execution time.

  3. Austin Gregg-Smith reporter

    Thanks, I will try and benchmark that solution.

    I’m suspect that it will not be that reliable though, because I cannot control how many times my code is called in parallel. Eg it could easily be called 100x in parallel by another system at which point I would have to increase the timeout by 100x.

  4. Patrick Beeson Account Deactivated

    So you cannot programmatically query the number of threads that are going to be generated and Nx times that for each instance?

  5. Austin Gregg-Smith reporter

    It would be complicated to do so and also would be unreliable. For example:

    • A user calls my planning code 3x in parallel so they increase the timeout by 3x.
    • One of the plans ends early and gives some information which causes the user of the code to want to spawn another 3 requests.
    • they could set the timeout for the next 3 requests to be 5x but the two remaining plans would now have bad performance because their timeout is set too low for the new load conditions.
    • The two original plans finish and so the later 3 plans now have a timeout that is too large which will negatively effect planning performance if they are in a hard region where a large proportion of ik will (correctly) fail to find a solution.

    Another use case is where I run optimisers on the planning code. Often I will start them from completely different code bases which have no knowledge of each other. I also don’t have fine grain control over how the optimisers spawn the planner in parallel so can’t do any clever tricks to set the timeout accordingly or dynamically change the timeout based on load.

    This is further complicated by the fact I am running the optimisation on a shared machine so it will often have unpredictable load from other users that I cannot control.

    I think an iteration based stopping condition would fix this without any weird performance caveats or complications to my users code. I am willing to implement it. is there anything I should know before I try to implement it?

  6. Austin Gregg-Smith reporter

    I have some data based on your suggestion.

    I call a function which does ik 1000 times while modifying 3 variable

    1. isViable : A solvable ik request vs a non solvable ik request
    2. ikTimeout: ranging between 1 and 50ms in 10 increments
    3. run serially vs run parallel. (the parallel run oversubscribes the CPU by 12x. I ran it on a machine with 12 cores and a simple parallel for).

    For each function call I record 2 variables:

    1. the percentage of ik solutions that are found. (for ikViable = 1, the answer should be 100%, for ikViable=0 the answer should be 0%)
    2. the average time taken for each ik call. (total function time /1000)

    For each set of variables I run the function 20 times so that the mean and standard deviation can be seen on the plots.

    Serial Run:

    Parallel Run:

    Analysis:

    Serial run:

    • graph 1/4: the Ik found percentage increases up to around 15ms. Maybe the 5ms default is too low for me anyway.
    • graph 2/4: For the unsolvable ik, the performance penalty is much lower than the timeout (as you said). It looks like the penalty is around 1/10th of the timeout.

    Parallel run:

    • graph 1/4: The ik found percentage tends to 100% as the timeout increases and the variance also reduces.
    • graph 2/4: The penalty for an unsolvable solution is around 6ms vs 4ms for serial with a timout of 50ms
    • graph 4/4: The average ik time is about 2x the serial averages.

    Conclusion:

    You may be right that just setting a higher timeout is a reasonable interim solution. It still makes me uneasy though. I’m still not sure though as I don’t think I have captured enough information to characterise the performance penalty of this approach.

    I have made a version that runs on iteration, but have not got it to match the master branch performance yet. (even with lots of iterations it only solves about 70% of the solvable queries).

  7. Patrick Beeson Account Deactivated

    The issue with # iterations, is that that doesn’t really guarantee a solution is found either. I guess it might be a bit more stable than duration.

    So I don’t know exactly what your application is that needs to run all these solvers in parallel, but if you know beforehand that the Cartesian goal pose if DEFINITELY reachable, and you want to ensure that a solution is found, then you can make the time be any huge number. This will guarantee a solution is found (in the limit) and will return immediately upon finding a solution given the Speed condition.

  8. Austin Gregg-Smith reporter

    Thanks, but I don’t understand why a max iteration count is not equivalent to timeout.

    From my understanding of how the code is behaving, it will quit early if a solution is found but otherwise continue until the timeout. The iteration version would be equivalent where it will quit early if a solution is found otherwise run x iterations. There are some complications because you have 2 different methods that might require a different number of iterations, but that can be worked around.

    Your second point is interesting, but I’m not sure if I can take advantage of it. I am abusing the ros descartes planner to use trac_ik rather than ikfast because I need to be able plan for arbitrary robot types that ikfast does not support. The upshot is I don’t know if poses are reachable or not beforehand.

  9. Patrick Beeson Account Deactivated

    Max iteration count is certainly not equivalent to timeout, but as I said, probably more stable across platforms. My only point that I was trying to make is that a sufficient # of iterations is going to vary wildly robot configurations. This is the same for time as well, but I chose time, as 1) MoveIt! people already used time, and 2) it’s easier to understand for most application developers.

    However, as you note, iterations would be “deterministic” (disregarding the random number generation in the IK solver itself) with iterations across # of threads, # of processors, CPU speed, etc. Thus, I think adding a max_iterations flag that can be used in parallel with timeout is fine. That is, I wouldn’t replace timeout, only make the system stop if max_iterations OR timeout is met. Then default max_iterations to UINT_MAX or something, unless the user passes in a value. In your case, you might have a timeout of let’s say 5 seconds or something wildly large and a reasonable max_iterations based on experiments on a particular robot configuration.

    As you hinted at, KDL and NLOpt will have different numbers of iterations, so you might just chose to use the max_iterations in the nlOpt thread, ignoring it for KDL until it’s time to stop. If this looks straightforward, I’d have no issues with a PR adding in this feature.

  10. Log in to comment