Wiki

Clone wiki

sage_coding_project / Examples

An overview of the new library

We will present a few methods to exemplify and get started with the new design. In the first section, you will learn how to:

  • Create a Generalized Reed-Solomon code
  • Check its parameters
  • Encode a vector from its message space into a codeword
  • Put some errors in this codeword using a brand new Channel system
  • Correct the errors and get back into the message space.

If you want to learn more about the new Encoder and Decoder system, you can go to Section 2.

If you want to learn more about the Channels, you can check Section 3.

An advanced example going through the McEliece cryptosystem is provided in Section 4.

##Section 1: First steps with the new code tools ##

First of all, we want to create a new code to experiment with. We choose to create the [40, 12, 29] Generalized Reed-Solomon code over a Finite Field of size 59. To do this, we need up to 3 elements :

  • The dimension
  • The list of evaluation points
  • And, optionally, the list of column multipliers

Note that the length of our code is inferred from the length of the list of evaluation points. The finite field is inferred from this list too.

#!python

sage: F = GF(59)
sage: length = 40
sage: dimension = 12
sage: eval_pts = F.list()[:length]
sage: col_mults = F.list()[1:length+1]
sage: C = codes.GeneralizedReedSolomonCode(eval_pts, dimension, col_mults)

Our code is now created. We can check for its parameters and its minimum distance.

#!python

sage: C.length()
40
sage: C.dimension()
12
sage: C.base_field()
Finite Field of size 59
sage: C.minimum_distance()
29

All these parameters are summarized inside the string representation of our code :

#!python

sage: C
[40, 12, 29] Generalized Reed-Solomon Code over Finite Field of size 59
Now we want to generate a codeword and try to play around with it. We can do that in two ways :

  • We can just generate a random element of our code, like in the following example.

    sage: c = C.random_element()
    sage: c # random
    (51, 53, 11, 11, 27, 15, 45, 37, 42, 36, 38, 1, 54, 45, 2, 0, 52, 51, 23, 25, 44, 31, 42, 18, 48, 38, 23, 35, 10, 51, 23, 38, 56, 37, 23, 35, 47, 21, 42, 58)
    sage: c in C
    True
    
  • Alternatively, we can create a "message" and then encode it into a codeword:

    sage: msg = random_vector(GF(59), C.dimension()) #random
    (0, 57, 13, 11, 34, 47, 52, 18, 20, 43, 37, 32)
    sage: c = C.encode(msg)
    (0, 20, 4, 38, 35, 28, 34, 41, 26, 41, 15, 34, 15, 48, 6, 23, 44, 37, 14, 6, 30, 39, 7, 35, 21, 58, 9, 30, 36, 34, 36, 0, 41, 49, 37, 15, 10, 58, 10, 14)
    sage: c in C
    True
    

Either way, we obtained a codeword. So, we might want to put some errors in it, and try to correct these errors afterwards. We can obviously do it by changing randomly some random positions of our codeword, but we propose here something more general: communication channels. Channel objects are meant as abstractions for communication channels and for manipulation of data representation. In this case, we want to emulate a communication channel which adds some, but not too many, errors to a transmitted word:

#!python

sage: err = 14
sage: Chan = channels.StaticErrorRateChannel(C.ambient_space(), err)
sage: Chan
Static error rate channel creating 14 error(s)
sage: r = Chan.transmit(c)
sage: c - r
sage: len((c-r).nonzero_positions())
14 #so we get 14 errors as expected

r is now a "received word", obtained as a codeword with errors on it. We can try to correct the errors and recover the original codeword.

#!python

sage: c_dec = C.decode_to_code(r)
sage: c_dec == c
True

Perhaps we rather want the original message back, rather than the codeword. All we then have to do is to unencode it back to the message space:

#!python

sage: m_unenc = C.unencode(c_dec)
sage: m_unenc == msg
True

It is also possible to do the two previous operations (correct the errors and recover the original message) in one line, as illustrated below:

#!python

sage: m_unenc2 = C.decode_to_message(r)
sage: m_unenc2 == msg
True

Section 2: A deeper view of the Encoder and Decoder structure

In the previous section, we saw that encoding, decoding and unencoding a vector can be easily done using methods directly on the code object. These methods are actually shortcuts, added for usability, for when the user does not care more specifically about how encoding and decoding takes place. At some point, however, he might need more control. That's what this section will present.

At the core, the three mentioned operations are handled by Encoders and Decoders objects. These objects possess their own methods to operate on words. When we call (as seen above) C.encode(msg) we actually call the method encode on the default encoder of C. Every code object possess a list of encoders and decoders it can use. Let us see how one can explore this:

#!python

sage: C = codes.GeneralizedReedSolomonCode(GF(59).list()[:40], 12, GF(59).list()[1:41])
sage: C.encoders_available()
['EvaluationPolynomial', 'EvaluationVector']
sage: C.decoders_available()
['BerlekampWelch',
 'Gao',
 'ErrorErasure',
 'GuruswamiSudan',
 'KeyEquationSyndrome']

We got a list of the available encoders and decoders for our GRS code. Rather than using the default ones as we did before, we can now ask for specific encoder and decoder:

#!python

sage: Evect = C.encoder("EvaluationVector")
sage: Evect
Evaluation vector-style encoder for the [40, 12, 29] Generalized Reed-Solomon Code over Finite Field of size 59
sage: type(Evect)
<class `sage.coding.grs.EncoderGRSEvaluationVector`>
sage: msg = random_vector(GF(59), C.dimension()) #random
sage: c = Evect.encode(msg)
sage: BW = C.decoder("BerlekampWelch")
sage: BW
Berlekamp-Welch decoder for the [40, 12, 29] Generalized Reed-Solomon Code over Finite Field of size 59

The method C.encoder(encoder_name) is a short-hand for constructing the encoder manually, by calling the constructor for EncoderGRSEvaluationVector yourself. If you don't supply encoder_name, you get the default encoder for the code. C.encoder() also has an important side-effect: it caches the constructed encoder before returning it. This means that each time we will access the same EvaluationVector encoder for C.

#!python

sage: Evect == C.encoder()
True

(since Encoder does not override __eq__, the above tests memory address equality.)

All the above things are similar for Decoders. This reinforces that Encoders and Decoders are rarely constructed but used many times, which allows them to perform expensive precomputation at construction or first use, for the benefit of future use.

Message spaces

The point of an Encoder is to encode messages into the code. These messages are often just vectors over the field of the code and whose length match its dimension. But it could be anything: vectors over other fields, polynomials or something quite different. Therefore, each Encoder has a message_space(). For instance, we saw earlier that our GRS code has two possible encoders; let us investigate the other

#!python

sage: Epoly = C.encoder("EvaluationPolynomial")
sage: Epoly
Evaluation polynomial-style encoder for the [40, 12, 29] Generalized Reed-Solomon Code over Finite Field of size 59
sage: Epoly.message_space()
Univariate Polynomial Ring in x over Finite Field of size 59
sage: msg_p = Epoly.message_space().random_element(degree=C.dimension()-1); msg_p #random
31*x^11 + 49*x^10 + 56*x^9 + 31*x^8 + 36*x^6 + 58*x^5 + 9*x^4 + 17*x^3 + 29*x^2 + 50*x + 46

Epoly reflects that GRS codes are often construed as evaluations of polynomials, and that a natural way to consider their messages are as polynomials of degree at most k-1, where k is the dimension of the code. Notice that the message space of Epoly is all univariate polynomials: message_space is the ambient space of the messages, and sometimes an Encoder demands that the messages are actually picked from a subspace hereof.

The default encoder for a code is always one with vector behaviour, so when we call decode_to_message or unencode methods on the code itself, as illustrated on the first example, this will always return vectors whose length are the dimension of the code.

Generator matrices

Whenever the message space of an Encoder is a vector space and it encodes using a linear map, then the Encoder will possess a generator matrix (note that this notion does not make sense for other types of encoders), which specifies that linear map.

Generator matrices have been placed on Encoder objects since a code has many generator matrices, and each of these will encode messages differently. A code itself also has a generator_matrix() method, but this is again simply a convenience method which forwards the query to the default encoder.

Let us see this in Sage, using the first encoder we constructed:

#!python

sage: Evect.message_space()
Vector space of dimension 12 over Finite Field of size 59
sage: Evect.generator_matrix()
<large matrix>
sage: Evect.generator_matrix() == C.generator_matrix()
True

Decoding to codewords

As we saw previously, a code has a decode_to_codeword and a decode_to_message. Every Decoder also has these two methods, and the methods on a code simply forward the calls to the default decoder.

There are two reasons for having these two methods: convenience and speed. Convenience is clear: having both methods provides a useful shortcut depending on the user's needs. Concerning speed, some decoders naturally decode directly to a codeword, while others directly to a message space. Supporting both methods therefore avoids unnecessary work in encoding and unencoding.

However, decode_to_message implies that there is a message space and an encoding from that space to the code behind the scenes. A Decoder has methods message_space and connected_encoder to inform about this. Let us illustrate that by a long example :

#!python

#Make a new quite small code. We take a random element in C
sage: C = codes.GeneralizedReedSolomonCode(GF(59).list()[1:41], 3, GF(59).list()[1:41])
sage: c = C.random_element()
sage: c # random
(55, 37, 27, 47, 1, 29, 35, 41, 10, 23, 43, 33, 15, 11, 43, 15, 8, 44, 27, 38, 40, 55, 46, 35, 44, 36, 33, 57, 12, 38, 39, 37, 54, 53, 56, 26, 44, 14, 17, 16)

#Create two decoders: Syndrome and Gao
sage: Syn = C.decoder("KeyEquationSyndrome")
sage: Gao = C.decoder("Gao")

#Check their message spaces
sage: Syn.message_space()
Vector space of dimension 3 over Finite Field of size 59
sage: Gao.message_space()
Univariate Polynomial Ring in x over Finite Field of size 59

#and now we unencode
sage: Syn.decode_to_message(c)
(55,9,43)

sage: Gao.decode_to_message(c)
43*x^2 + 9*x + 55

##Section 3: A deeper look at channels ##

In Section 1, we briefly introduced the new Channel objects as a new way to put errors in a word. In this section, we will look deeper at their functionality and introduce a second Channel.

Consider again the ChannelStaticErrorRate from previous. The idea is a channel that places errors in the transmitted vector but within limits. We can describe these limits in two ways:

  • The first one was illustrated in Section 1 and consists in passing an integer, as shown below :

    sage: C = codes.GeneralizedReedSolomonCode(GF(59).list()[:40], 12)
    sage: t = 14
    sage: Chan = channels.StaticErrorRateChannel(C.ambient_space(), t)
    sage: Chan
    Static error rate channel creating 14 error(s)
    
  • We can also pass a tuple of two integers, the first smaller than the second. Then each time a word is transmitted, a random number of errors in the range of these two integers will be added:

    sage: t = (1, 14)
    sage: Chan = channels.StaticErrorRateChannel(C.ambient_space(), t)
    sage: Chan
    Static error rate channel creating between 1 and 14 errors
    

We already know that a channel has a transmit method which will perform transmission over the channel; in this case it will return the transmitted word with some errors in it. The transmit method will always check if the provided word belongs to the input space of the channel. In a case we are 100% sure that our word is in the input space, we might want to avoid this check, if one is simulating millions of transmissions. For this usage there is transmit_unsafe which does the same as transmit but without checking the input.

#!python

sage: c = C.random_element()
sage: c in C
True
sage: c_trans = Chan.transmit_unsafe(c)

A channel for errors and erasures

Let us introduce a new Channel object which adds errors and erasures. When it transmits a word, it both adds some errors as well as erase some positions.

#!python

sage: Chan = channels.ErrorErasureChannel(C.ambient_space(), 3, 4)
sage: Chan
Error-and-erasure channel creating 3 error(s) and 4 erasure(s)

The first parameter is the input space of the channel. The next two are (respectively) the number of errors and the number or erasures. Each of these can be tuples too, just as with StaticErrorRateChannel.

As opposed to the StaticErrorRateChannel, the output of the ErrorErasureChannel is not the same as its input space, i.e. th ambient space of C. Rather, it will return two vectors: the first is the transmitted word with the errors added and erased positions set to 0. The second one is the erasure vector over which has 1 on the erased positions and 0 elsewhere. This is reflected in the channel's output_space():

#!python

sage: Chan.output_space()
Cartesian product of Vector space of dimension 40 over Finite Field of size 59, Vector space of dimension 40 over Finite Field of size 2
sage: c = C.random_element()
sage: Chan.transmit(c) # random
((27, 50, 44, 43, 22, 15, 56, 2, 5, 40, 0, 0, 28, 30, 56, 53, 25, 15, 30, 13, 0, 52, 56, 38, 31, 10, 9, 3, 26, 53, 0, 19, 26, 55, 22, 0, 24, 9, 9, 58),
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0))

It is guaranteed that errors and erasures will never overlap, so when you ask for e errors and t erasures, you will always receive a vector with e errors and t erased positions.

##Section 4: Going further: a toy McEliece cryptosystem ##

We illustrate how our framework can be used to build a generic draft implementation of the McEliece cryptosystem. We will then instantiate this with a Generalized Reed-Solomon code (which is, of course, broken by Sidelnikov and Chestakov).

For those unfamiliar with this cryptosystem, first a quick overview. This is an asymmetric encryption algorithm, which works in three steps.

  • Key generation. Alice chooses a code which admits a decoding algorithm, and computes a generator matrix G for this code. She then performs some scrambling operations on G which are supposedly difficult to reverse to obtain G_pub, a different generator matrix for the code. She keeps the specification of the code and G as her private key, while the public key is G_pub as well as the maximum number of errors tau that her decoding algorithm can correct.

  • Encryption. Bob wishes to send a message to Alice. He uses the matrix G_pub from her public key to encode his message, and adds tau errors to it afterwards. This he sends to Alice.

  • Decryption. Alice uses her algorithm to decode the error-added word back into the message space to obtain Bob's message.

This can be easily implemented generically in our new structure.

Let us first write the key generation method from a generic code C.

#!python

def mc_eliece_key_gen(C):
    G = C.generator_matrix()
    S = random_matrix(C.base_field(), C.dimension(), C.dimension())
    while S.is_singular():
        S = random_matrix(C.base_field(), C.dimension(), C.dimension())
    sigma = range(0, C.length())
    shuffle(sigma)
    P = matrix(C.base_field(), C.length(), C.length())
    for i in sigma:
        P[i,sigma[i]] = 1
    G_pub = S*G*P
    t = C.decoder().decoding_radius()
    pubkey = (G_pub, t)
    privkey = (S.inverse(), G, P.inverse(), C)
    return pubkey, privkey

then we write the encryption function:

#!python
def mc_eliece_encrypt(m, pubkey):
    G_pub, t = pubkey
    c = vector(m) * G_pub
    ambient = VectorSpace(G_pub.base_ring(), G_pub.ncols())
    channel = channels.StaticErrorRateChannel(ambient, t)
    y = channel.transmit(c)
    return y

Finally, the decryption method :

#!python

def mc_eliece_decrypt(y, privkey):
    S, G, P, C = privkey
    y = y * P
    m = C.decode_to_message(y)
    m = m * S
    return m

Once again, the decoding structure brings versatility to this method. We can use the crypto system with any code having a decoding algorithm. Below, a GRS code is used:

#!python
sage: F = GF(59)
sage: n, k =  40, 12
sage: C = codes.GeneralizedReedSolomonCode(F.list()[:n], k, F.list()[1:n+1])
sage: pub, priv = mc_eliece_key_gen(C)
sage: plaintext = vector(GF(59), (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37))
sage: ciphertext = mc_eliece_encrypt(plaintext, pub)
sage: deciphered = mc_eliece_decrypt(ciphertext, priv)
sage: plaintext == deciphered
True

Note that, of course, the plain decoder from the GeneralizedReedSolomonCode fails

#!python
sage: C.decode_to_message(ciphertext)
DecodingFailure                           Traceback (most recent call last)
  ...

sage: C.decode_to_code(ciphertext)
DecodingFailure                           Traceback (most recent call last)
  ...

Updated