Slow random_element method for codes
I noticed the following : with (not so) big codes, random_elements really slows down.
It's probably because of the implementations, which works exactly as the previous __contains__
, namely it creates a subspace of the code's ambient space spanned by the rows of the code's generator matrix and calls random_element
over this subspace.
Timings follows.
Comments (11)
-
reporter -
I still find it slow. What about making a random message an encoding it ?
-
reporter Faster indeed.
-
By "random linear combination", did you mean "vector-matrix multiplication"? That's exactly the implementation of the encoder, right? So there should be no time difference. If you meant that you literally do a loop over the rows of the generator matrix and sum with random coefficients, then, sure, this is going to be (a constant factor) slower than the vector-matrix multiplication.
But implementing
random_element
using the encoder is a much nicer idea. This has the advantage that if the default encoder does not construct the generator matrix, this doesn't need to happen.One thing, though. Earlier, we were testing
syndrome
and caching the vector space in the background. Forrandom_element
we have the same possibility: cache the vector space in the background and callrandom_element
on that (what's the speed of this?).Multiple uses of the cached vector space in the background strengthens the argument for doing it like this: instead of duplicating vector space functionality,
AbstractLinearCode
could call the existing functions whenever possible.I don't think we should immediately do this, but it is interesting to keep in mind.
-
reporter I was indeed doing a loop over the rows.
I was considering the same idea about caching the vector space in the background. I ran the same computations as before, and here are the timings:
It's way better than anything else I tested so far...
-
reporter I have a counter-argument though : caching the vector space helps if you want to perform a lot of computations. However, for big codes (I'm currently testing with
n = 2^13, k = 2^12, over GF(2^13)
) first call to the function to build the vector space is really, really slow ( > 2min)... I doubt one likes to have this kind of computation time if he wants to create only a few random elements. And consider that if you create a cyclic code from a code using probabilistic version of the algorithm to extract the generator polynomial, which performs calls torandom_element()
, the slowdown will be insane with large codes. -
Ok, so it seems to be the same issue as with
syndrome
: the vector space functions are indeed faster, but one of the reasons is that they compute the reduced row-echelon form of the generator matrix upon construction. This means certain functions can be slightly optimised (but most of the gain probably comes from being written in C), but construction takes a lot of time.I agree with you that it is unacceptable that the first call to
random_element
should be so expensive. So the vector-matrix product implementation (i.e. use Encoder) is best. -
Be careful with doing testing over only one field. Different fields are implemented differently. There are at least 4 distinct categories:
GF(2)
,GF(2^n)
,GF(p)
forp
odd prime, andGF(p^n)
. -
Indeed, the difference is not persistent across field types. Binary extension fields are where
RowSpace.random_element()
performs the fastest compared to vector-matrix product, while for prime fields, it's the other way around! The dimension also plays a role. It was exactly the same thing withsyndrome
.Not using the Encoder framework or anything, I just quickly wrote the following code which compares the
RowSpace.random_element()
with a vector-matrix product. Play around with theR
and the field to see the effect.import time def timeit(f, N=10): times = [] for i in range(N): before = time.time() c = f() times.append(time.time() - before) return median(times) def vectorspace(M): V = M.row_space() def f(): return V.random_element() return f def product(M): k = M.nrows() K = M.base_ring() def f(): m = random_vector(K, k) return m*M return f def getmatrix(F,n,k): return random_matrix(F, k, k) F = GF(59,'a') R = .2 times_vs, times_pr = [] , [] for n in [ 2^i for i in range(3, 12) ]: print "Testing size ", n M = random_matrix(F, n, ceil(R*n)) times_vs.append((n, timeit(vectorspace(M)))) times_pr.append((n, timeit(product(M)))) g = list_plot(times_vs, color='red') g+= list_plot(times_pr, color='blue') show(g)
-
reporter Thanks for the input, I did not think about these differences.
I'll check the differences using your code, thanks for the implementation!
-
reporter - changed status to resolved
- Log in to comment
Timings for a new implementation performing a random linear combination over all rows of the generator matrix (computation time of the generator matrix not considered in the timings)