Overview

Bitsliced Trivium in CorePy

Paul Crowley, paul@lshift.net, 2008

Copyright Paul Crowley, LShift Ltd, 2008, released under MIT licence

http://hg.opensource.lshift.net/trivium-corepy/

This implementation of Trivium computes 128 independent streams of
output simultaneously at under 1 cycle/byte - over four times faster
than standard implementations - using the SSE2 instruction set.  This
is not particularly useful for normal use, but is very handy for
cryptanalytic applications, and in particular for investigating the
"cube attack" on Trivium.

Rather than using the traditional combination of C and assembler, this
implementation is based on the tool CorePy, which makes it easy to
create and call sequences of machine instructions from Python code.  I
commend this tool to crypto implementers: it made implementation much
faster and I believe resulted in clearer code.  You will need to fetch
and build CorePy-1.0 to use this code, and change the path in the
"sys.path.append" line in source files to refer to it.  The URL is at
the end of this README.

Note that the overhead of executing assembly or accessing memory is
very high - run "accesstest" to measure it on your system.  In order
to make the speed gained in assembly useful, I have moved much of the
work of summing a cube into an assembler inner loop, resulting in a
tremendous speedup.  On my 2.4 GHz Intel Core 2 system, I can sum a
30-dimensional cube using output bit 771 in around 43 seconds.

Please email me if you find yourself looking at this code, I am very
keen to hear from you.

The following files can be found here:

cubeattack-test
    Tests a "maxterm" listed in the tables at the end of the cube
    attack paper.

maxterms.txt
    List of maxterms in the Dinur and Shamir paper on the cube attack.

2008-12-23-maxterms.txt
    Maxterms emailed to me by Itai Dinur to replace the problematic
    ones in the paper.
    
convert-dinurshamir-maxterms
    Utility to convert maxterms copy-pasted from the PDF of the paper
    to the maxterms.txt format.

find-key-bits
    Find correct key polynomial for a maxterm
    
linearity-test
    Crudely test a candidate maxterm for linearity

paralleltrivium-test
    A short test to exercise our SSE2 implementation, and compare the
    results to the simple implementation.

benchmark
    Tests the speed of this implementation and reports in clocks/byte.
    Note that the clock speed is hardwired and may need to change on
    your system.
    
printcode
    Prints an assembly representation of our implementation.

accesstimer
    Times access to various CorePy primitives.

REFERENCES:

CorePy:
    http://www.corepy.org/

Trivium: 
    http://en.wikipedia.org/wiki/Trivium_(cipher)
    http://www.ecrypt.eu.org/stream/trivium.html

The "cube attack":
    http://en.wikipedia.org/wiki/Cube_attack
    http://eprint.iacr.org/2008/385
    
This code:
    http://www.lshift.net/blog/2008/12/09/trivium-sse2-corepy-and-the-cube-attack