Boolean Matrices

Issue #413 new
GK Palem created an issue

Could not find much information specific to the Boolean Matrices, hence posting this to know more about how to optimize the performance specifically for Boolean Matrices.

By Boolean Matrix, I refer to matrices that have all of their elements either zero (0) or one (1).

Would like to know if Blaze already considers Boolean matrices as special case, to get more higher performance than the regular int, float etc. matrix counterparts. For example, considering a Boolean Matrix M,

  • M x M to yield another Boolean Matrix (as matrix Multiplication result, as well as the element-wise multiplication variants)
  • sum() column-wise, row-wise to yield Boolean Vectors
  • clamping the values of a generic int matrix to make it a Boolean Matrix etc..

operations can take advantage of bit-packing, boolean AND, boolean OR etc. (instead of actual multiplication, or addition or otherwise)

Wondering if this is already done when we use `DynamicMatrix<bool>` types. If not is there a way we could get the maximum performance out of this knowledge that all elements are only 1 or 0 and hence their operations could be vastly optimized using the boolean arithmetic?

Thank you in advance.

Comments (2)

  1. Klaus Iglberger

    Hi GK!

    Blaze does not provide any special treatment for boolean matrices other than bitwise operations (i.e. &&, ||, etc.). Therefore what you suggest could be a reasonable addition. Thanks, we’ll take it into account for one of the next releases.

    Best regards,

    Klaus!

  2. GK Palem reporter

    Thank you Klaus.

    Below are some of the frequently used operators in this Boolean context, specifically on the Vector side.

    Considering a Boolean Vector V (that is either derived from rowSum, columnSum reduced from a Matrix, or created directly):

    • which.min(V == TRUE), which.max(V == FALSE) get the index of the lowest bit / highest bit that is True / False;
    • which(V == TRUE) get the list of indices for all the bits that are set to True/False; The return value is a vector (of non-boolean type holding the indices of the elements that satisfy the condition).
    • Iterate over the elements whose value is True/False.
    • Turn on all bits; Turn off all bits.
    • bitsum count of all bits that are set to True/False.

    I believe currently some of these are possible, but would be great if these methods are optimized specifically for Boolean vectors/matrices. Just writing it down here for the sake of completeness.

    Some time back I have gathered one small utility for bit vectors, that goes something like below.

        typedef unsigned long long bitunit;
    
        // Returns the 0-based position of the highest bit set. For example, HiBitPosition<64>==6 and HiBitPosition<63>==5;
        template<unsigned int x>
        struct HiBitPosition
        {
            static const unsigned int Pos = HiBitPosition<(x >> 1)>::Pos + 1;
        };
        template<>
        struct HiBitPosition<1>
        {
            static const unsigned int Pos = 0;
        };
    
        // Creates a bitunit with all the bits in it set to 1
        template<unsigned int x>
        struct TurnOnAllBits
        {
            static const bitunit Value = (TurnOnAllBits<(x-1)>::Value | ((bitunit)1 << (x-1)));
        };
        template<>
        struct TurnOnAllBits<1>
        {
            static const bitunit Value = (bitunit)1;
        };
    
        template<typename T> inline size_t bitsum(T bu) { return _bitsum<sizeof(T)>((const BYTE*)&bu); }
    
        /////////// bit count utilities /////////////////////
    
        inline size_t bitsum(BYTE b) { static const char *const _Bitsperbyte =
                                                    "\0\1\1\2\1\2\2\3\1\2\2\3\2\3\3\4"
                                                    "\1\2\2\3\2\3\3\4\2\3\3\4\3\4\4\5"
                                                    "\1\2\2\3\2\3\3\4\2\3\3\4\3\4\4\5"
                                                    "\2\3\3\4\3\4\4\5\3\4\4\5\4\5\5\6"
                                                    "\1\2\2\3\2\3\3\4\2\3\3\4\3\4\4\5"
                                                    "\2\3\3\4\3\4\4\5\3\4\4\5\4\5\5\6"
                                                    "\2\3\3\4\3\4\4\5\3\4\4\5\4\5\5\6"
                                                    "\3\4\4\5\4\5\5\6\4\5\5\6\5\6\6\7"
                                                    "\1\2\2\3\2\3\3\4\2\3\3\4\3\4\4\5"
                                                    "\2\3\3\4\3\4\4\5\3\4\4\5\4\5\5\6"
                                                    "\2\3\3\4\3\4\4\5\3\4\4\5\4\5\5\6"
                                                    "\3\4\4\5\4\5\5\6\4\5\5\6\5\6\6\7"
                                                    "\2\3\3\4\3\4\4\5\3\4\4\5\4\5\5\6"
                                                    "\3\4\4\5\4\5\5\6\4\5\5\6\5\6\6\7"
                                                    "\3\4\4\5\4\5\5\6\4\5\5\6\5\6\6\7"
                                                    "\4\5\5\6\5\6\6\7\5\6\6\7\6\7\7\x8";  return _Bitsperbyte[b]; }
        template<>
        inline size_t _bitsum<1>(const BYTE* pb) { return bitsum(pb[0]); }
        template<>
        inline size_t _bitsum<2>(const BYTE* pb) { return bitsum(pb[0]) + bitsum(pb[1]); }
        template<>
        inline size_t _bitsum<4>(const BYTE* pb) { return bitsum(pb[0]) + bitsum(pb[1]) + bitsum(pb[2]) + bitsum(pb[3]); }
        template<>
        inline size_t _bitsum<8>(const BYTE* pb) { return bitsum(pb[0]) + bitsum(pb[1]) + bitsum(pb[2]) + bitsum(pb[3]) + bitsum(pb[4]) + bitsum(pb[5]) + bitsum(pb[6]) + bitsum(pb[7]); }
        template<>
        inline size_t _bitsum<16>(const BYTE* pb){ return bitsum(pb[0]) + bitsum(pb[1]) + bitsum(pb[2]) + bitsum(pb[3]) + bitsum(pb[4]) + bitsum(pb[5]) + bitsum(pb[6]) + bitsum(pb[7]) + bitsum(pb[8]) + bitsum(pb[9]) + bitsum(pb[10]) + bitsum(pb[11]) + bitsum(pb[12]) + bitsum(pb[13]) + bitsum(pb[14]) + bitsum(pb[15]); }
    
        template<typename T>
        inline size_t bitsum(const std::vector<T>& advec)
        {
            return reduce<Red::toBitCount, Tr::Null> (advec, (size_t)0);
        }
    

    But it does not use vectorization, parallelization etc., and I did not really know how to do it for matrices.

    Looking forward to a great performance boosted boolean vector, matrix implementation from Blaze.

    Thank you for your great work till now and sharing it publicly. It is very helpful.

  3. Log in to comment