[AVX512] SIMD4: review implementation - remove blend operations

SIMD 4 implementation uses blend operation to perform mask operations for non AVX512VL instruction sets. This will most likely introduce overhead. The same can be done using cast to 512b vectors and using mask operations supported by AVX512F.

The task is to review the code for SIMD4 and make sure that all blending operations are removed.