- removed comment
thorn Vectors does not provide a sum() reduction
It would be very useful for some applications (eg interpolation) if thorn Vectors provided a sum() function similar to the sum() member of vecmathlib
real_t sum() const {
// return (*this)[0] + (*this)[1] + (*this)[2] + (*this)[3];
// __m256d x = _mm256_hadd_pd(v, v);
// __m128d xlo = _mm256_extractf128_pd(x, 0);
// __m128d xhi = _mm256_extractf128_pd(x, 1);
realvec_t x = *this;
x = _mm256_hadd_pd(x.v, x.v);
return x[0] + x[2];
}
Most likely one can just copy and paste the code from vecmathlib. vecmathlib's license is not LGPL but permissive enough for inclusion and we can also ask Erik if one can use it and change the license of the affected code lines to LGPL in Cactus, the license is here:
Copyright (c) 2012, 2013 Erik Schnetter <eschnetter@gmail.com>
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
Keyword: Vectors
Comments (6)
-
-
reporter - removed comment
Thank you. I was looking at CarpetInterp2's lagrange interpolation. See https://bitbucket.org/eschnett/carpet/branch/rhaas/fasterp though this has not yet been speed tested compared to the previous implementation.
The sum() at the end is done via a loop over the elt function. I did a quick check to see what an omp simd reduction(+: var) would do and found that at least on my laptop it does a similar loop and no horizontal sum at all.
-
- removed comment
I would implement
sum
via a sequence of shuffle and vector-add instructions, where the last add can be a scalar add:// assuming vector size is 4, input is s4 t4 = __builtin_shuffle(s4, {1,0,3,2}); s2 = s4 + t4; t2 = __builtin_shuffle(s4, {2,3,0,1}); s1 = s2 + t2;
-
reporter - removed comment
You mean in vectors.h or in CarpetInterp2? In the latter I am limited to either functions from vectors.h or standards compliant code and
__builtin_shuffle
sounds like it will be compiler dependent. I will give things a try though. -
reporter - changed status to resolved
- removed comment
Provided (it seems) in git hash f9cbab7bd3fd534fec2d254665ce6228f6f3ef71 "Vectors: Provide vectorized reduction operators" of cactusutils.
-
reporter - edited description
- changed status to closed
- Log in to comment
A
sum
SIMD reduction is quite expensive, much more so (ten times?) than a vectorizedadd
. We can certainly copy the code over (I wrote it, so I can re-licence it), or we can switch Cactus to Vecmathlib in the long term.There might be another way to speed up an interpolation by preferring certain directions. I don't know whether this is already implemented -- did you think about a particular thorn?