Investigate on timings for Gao algortihms

Issue #108 resolved
David Lucas repo owner created an issue

I ran timing tests on Gao algorithm. I picked a GRS code of length i, dimension i/2 over GF(1009). Iterate on i between 100 and 1000, increasing i by 100 at each iteration. I measured timings for 10 successive decodings at each iteration (on a random "codeword" with as many errors as half the minimum distance), and plotted the following timing graph, by taking the median of the 10 timings measured at each step. wPf is in red, and partial xgcd in green.

timings_gao.png

It's not really regular, which seems a bit weird...

Comments (12)

  1. David Lucas reporter

    I did my timings again, with a GRS code built as explained above. This time though, it's 100 decodings instead of 10 per iteration (still random codewords with as many errors as floor((n-k)/2)). Weak Popov form is still in red, partial xgcd in green. y scale is in seconds.

    And this time, it has only been tested on the algorithms, not on the whole decoding process.

    tmp_f7G3wZ.png

  2. Johan Rosenkilde

    Wow, that's a surprisingly big difference. Is the timings again over an decoding, or only the euclidean algorithm-part? Also, did you compare with the inbuilt xgcd, to see how much could potentially be gained by optimising/cythonizing your partial xgcd implementation?

  3. David Lucas reporter

    Surprising indeed.

    It's only over the euclidean part.

    What you have is :

    generate random codeword
    perform pre-computations (Lagrange interpolation, ...)
    create matrix for wPf computation and apply weights
    
    measure time:
        xgcd
    measure time:
       wpf
    

    Comparison with inbuilt xgcd is on the way, I'll update this thread as soon as I have the results.

  4. David Lucas reporter

    Here it is. Green is my implementation, red is inbuilt.

    I picked the same method as before to build the code. For partial xgcd, stop occurs when one of the Euler coefficients's degree is below (n + k)/2.

    I picked for each code length 100 couples of random polynomials of degree 1000 (coefficients in GF(1009)) and performed the two xgcds using these polynomials.

    tmp_PATQ9Q.png

  5. Johan Rosenkilde

    Why is your code getting faster with longer polynomials?

    I don't understand what you say: I guess you don't pick random polynomials of degree 1000, but degree = code length? Be aware that the polynomials in decoding don't behave as random polynomials, and that the speed profiles might be somewhat different for random polynomials. However, your test here is very fine since we're just trying to get an impression on the potential speed gains.

    Btw. the inbuilt xgcd, does this call a low-level library xgcd? Note that this question might not have a unique answer since different libraries are used for different fields (e.g. GF(2) is different from GF(2^n) is different from GF(p) for p > 2, etc.), and for some of these libraries a low-level xgcd might be employed, while others not.

  6. David Lucas reporter

    No, I was always picking polynomials of degree 1000. It wasn't smart, I redid the timings with degree = code length, plot below.

    For your question regarding the inbuilt xgcd, it a different xgcd implementation depending on the nature of the input. One is written in rings/integer.pyx, several other in rings/polynomial/ (one using Flint, another using NTL, another calling the one for univariate polynomials). I'm not sure it depends on the nature of the coefficients, I'd tend to say it depends on the library used to create the polynomials. The one for univariate polynomials is in rings/ring.pyx and is called_xgcd_univariate_polynomail

    tmp_iC9ddG.png

  7. Johan Rosenkilde

    Ok, good job. These timings are very encouraging: basically, it's not worth it (IMHO) to spend too much time chasing after that factor <2. Thus we can just be happy with your partial xgcd implementation :-)

    What you wrote "it depends on the library used to create the polynomials" is exactly what I meant. It speaks of the efficiency of Sage/Python's layer to C/C++ that there is so little loss in speed between your Sage implementation and the library implementation.

  8. Clément Pernet

    Very nice work David. How does the comparison goes for larger instances (maybe unreallistic in coding theory), say length 10^5 or 10^6 I kind of disagree with Johan: if the 2 codes behave similarly (say factor <2) then, let's use the one that is already there, at the right place, and not duplicate code for some specific instance of a computation! This way we ensure that if someone shows up with a drastically faster code for xgcd in the future, then we'll silently and smoothly benefit from the upgrade.

  9. Johan Rosenkilde

    The problem is that the in-built xgcd does not support early termination. And it's a lot of work to mod that in all the polynomial libraries that are used. And I don't know if this particular variant of xgcd (early termination + compute only the "left" auxiliary polynomials) is generally useful.

  10. David Lucas reporter

    And they're all written in Cython, which implies extra work.

    Anyway, here are the timings Clément requested! I did only 10 instances of each xgcd per step, I assume you'll guess why. It's over GF(100003), code length increases by 10^4 at each step.tmp_LTj9Ea.png

  11. Johan Rosenkilde

    Ah, ok, for large degrees and over the prime field, the underlying polynomial library has an d*log(d) algorithm for gcd. David could also quite easily implement that if we wanted :-)

  12. Log in to comment