HTTPS SSH

FTensor

FTensor is a set of C++ classes that allow a great deal of abstraction when dealing with tensors, yet delivers uncompromising efficiency. It uses template expressions to provide expressiveness and speed.

FTensor is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. A copy of the license should be included in the file LICENSE.

FTensor uses template expressions to optimize code. I owe a huge debt to Todd Veldhuizen who originally used template expressions in developing the Blitz library (see www.oonumerics.org). FTensor uses many of the ideas from that library.

FTensor's biggest claim to fame is that it handles implicit summation. Thus you can write

A(i,j) = B(i,k)*C(k,j)

instead of having to write

A = sum(B(i,k)*C(k,j),k)

Also, the result is strongly typed by the indices, so you can't write

A(i,k) = B(i,k)*C(k,j)

or even

A = B(i,k)*C(k,j)

It has Tensor0, Tensor1, Tensor2, Tensor2_symmetric, Dg (Rank 3 symmetric on the first two indices), Tensor3_antisymmetric (antisymmetric on the last two indices), Christof (Rank 3 symmetric on the last two indices), Ddg (Rank 4 symmetric on the first two, and last two, indices), Riemann (Rank 4 antisymmetric on the first two, and last two, indices, and symmetric under cyclic permutation of the last three indices) and Tensor4. This eclectic choice of tensors happened because the library was originally developed for a General Relativity code.

The dimension of the tensors are determined by a template parameter. You can construct a Tensor like this.

Tensor1<double,5> T1 {0,1,2,3,4};

The order of the elements in the constructors follow row-major convention. For example in a Tensor3, data elements are as follow:

Tensor3<int,N,M,P> {d000, d001, ..., d00P,
                    d010, d011, ..., d0MP,
                    d100, d101, ..., dNMP}

If you want to turn on bounds checking (so that, for example, T1(5) will give you a run time error) then compile everything with FTENSOR_DEBUG defined.

It can handle pointers. So you could write

double a0[10000], a1[10000], a2[10000], b0[10000], b1[10000], b2[10000],
 c0[10000], c1[10000], c2[10000];

Tensor1<double*,3> A(a0,a1,a2), B(b0,b1,b2), C(c0,c1,c2);

Index<'i'> i;
for(int a=0;a<10000;a++)
{
  A(i)=B(i)+C(i);
  ++A;
  ++B;
  ++C;
}

If you are familiar with Blitz, it also uses template expressions to optimize code. However, Blitz optimizes one expression at a time. So, for example, if you want to invert a 3x3 matrix, you can write it like

det=a(0,0)*a(1,1)*a(2,2) 
    + a(1,0)*a(2,1)*a(0,2) 
    + a(2,0)*a(0,1)*a(1,2)
    - a(0,0)*a(2,1)*a(1,2)
    - a(1,0)*a(0,1)*a(2,2)
    - a(2,0)*a(1,1)*a(0,2);
inverse(0,0)= (a(1,1)*a(2,2) - a(1,2)*a(1,2))/det;
inverse(0,1)= (a(0,2)*a(1,2) - a(0,1)*a(2,2))/det;
inverse(0,2)= (a(0,1)*a(1,2) - a(0,2)*a(1,1))/det;
inverse(1,1)= (a(0,0)*a(2,2) - a(0,2)*a(0,2))/det;
inverse(1,2)= (a(0,2)*a(0,1) - a(0,0)*a(1,2))/det;
inverse(2,2)= (a(1,1)*a(0,0) - a(1,0)*a(1,0))/det;

However, det is just going to be thrown away at the end. We don't need to store it for all (10,000 or whatever) points. We just need to compute it for one point, use it in six expressions, and forget it. The Blitz method makes you ship the memory of det in and out of the cache 6 times. A better way to do this is to put the whole inversion into one loop. I've seen a factor of 4 improvement doing it this way. The disadvantages, which are all-too-real, are that you have to manually start the loop, and you have to remember to increment the variables. In the case of the inversion, it ends up looking like

double det;
for(int i=0;i<10000;i++
{
    det=a(0,0)*a(1,1)*a(2,2) 
    + a(1,0)*a(2,1)*a(0,2) 
    + a(2,0)*a(0,1)*a(1,2)
    - a(0,0)*a(2,1)*a(1,2)
    - a(1,0)*a(0,1)*a(2,2)
    - a(2,0)*a(1,1)*a(0,2);
    inverse(0,0)= (a(1,1)*a(2,2) - a(1,2)*a(1,2))/det;
    inverse(0,1)= (a(0,2)*a(1,2) - a(0,1)*a(2,2))/det;
    inverse(0,2)= (a(0,1)*a(1,2) - a(0,2)*a(1,1))/det;
    inverse(1,1)= (a(0,0)*a(2,2) - a(0,2)*a(0,2))/det;
    inverse(1,2)= (a(0,2)*a(0,1) - a(0,0)*a(1,2))/det;
    inverse(2,2)= (a(1,1)*a(0,0) - a(1,0)*a(1,0))/det;
    ++a;
    ++inverse;
}

Forgetting to put in the ++ operators could result in subtle bugs. You could also have problems if you put in more than one loop:

for(int i=0;i<10000;i++)
{
    a(i,j)=...
    ++a;
}

for(int i=0;i<10000;i++)
{
    a(i,j)+=...
    ++a;
}

This will end up writing off of the end of a. Furthermore, I use the restrict keyword, so you might get some weird problems if you try to alias things. You might want to #define the restrict away. I found that it actually decreased performance for extremely complicated expressions.

Basically, you're giving up some expressive power.

It can handle quite complex expressions. As a real life example

K_new(i,j)=Lapse*(R(i,j) + Trace_K*K(i,j) - (2*K(i,k)^K_mix(k,j)) 
            - 0.5*matter_ADM*g(i,j) - S_ADM(i,j))
            + Shift_up(k)*dK(i,j,k)
            + (K(i,k)*dShift_up(k,j) || K(j,k)*dShift_up(k,i))
            - ddLapse(i,j) + (dLapse(k)*christof(k,i,j));

K_new is symmetric, the ^ operator means contract to make a Tensor2_symmetric, and the || means add to make a Tensor2_symmetric (it is not a symmetrizer, so it doesn't divide by 2). I had to use these operators (instead of * and +) to keep the compilers from making a Tensor2 instead. You can't assign a Tensor2 to a Tensor2_symmetric, so you have to explicitly request the symmetrized result.

Some compilers were able to optimize the entire expression. The page at

http://www.wlandry.net/Projects/FTensor

has a more complete discussion of compilers and performance.

There is a directory for tests with a README that should explain how to use them.

If you have any questions, feedback, bug reports, feature requests, etc., feel free to send me an email.

Enjoy, Walter Landry wlandry@caltech.edu