A lot of test failures on Arch Linux (64-bit)

Issue #2 resolved
Ruben Van Boxem created an issue

I built clamp and tried running the tests. The disappointing results are attached.

I have pretty much every possible OpenCL backend installed (AMD cpu, NVIDIA gpu, Intel cpu, and pocl cpu), clinfo output is also attached. There is only one GPU device though, NVIDIA OpenCL on a dedicated GPU. The CPU is an Intel.

I just followed the readme when building, passed no special options when configuring, and I have the OpenCL 1.2 headers installed in /usr/include. Suffice to say the GPU stuff works in other applications (I run CUDA simulations on this PC).

What might be going wrong and is there anything I need to do to (help) fix it? I can see it selecting OpenCL 1.2 and SPIR, but I have no clue which device the tests are actually running on.

Output of git branch -vv: AMP-439_on_master a5641ab [origin/AMP-439_on_master] [AMP-439] Fix rewriter build

Comments (26)

  1. Jack Chung

    In C++AMP, currently the 1st CL platform and its 1st CL device (GPU preferred, then CPU) would be used. From your clinfo.txt it seems POCL is the first CL platform. I haven't spent anytime on POCL yet. Would it be possible to change the order so NVIDIA or AMD platform be used?

  2. Ruben Van Boxem reporter

    I uninstalled pocl, and now the AMD APP OpenCL 2.0 device is the first one in the clinfo output.

    There are slightly more test failures, but most of them seem to originate in some objcopy failure which may indicate some earlier problem (missing output section). Attached is the output of the first failing test's command, with "-v -save-temps" appended to the clang invocation:

    /home/ruben/Development/cppamp-3.5/build/compiler/bin/clang++ -v -save-temps -I/home/ruben/Development/cppamp-3.5/src/utils -I/usr/include/CL/../ -DGTEST_HAS_TR1_TUPLE=0 -stdlib=libc++ -std=c++amp -I/home/ruben/Development/cppamp-3.5/src/include -I/home/ruben/Development/cppamp-3.5/src/libc++/libcxx/include -std=c++amp -std=c++amp -L/home/ruben/Development/cppamp-3.5/build/build/Release/lib -L/home/ruben/Development/cppamp-3.5/build/libc++/libcxx/lib -L/home/ruben/Development/cppamp-3.5/build/libc++/libcxxrt/lib -Wl,--rpath=/home/ruben/Development/cppamp-3.5/build/build/Release/lib:/home/ruben/Development/cppamp-3.5/build/libc++/libcxx/lib:/home/ruben/Development/cppamp-3.5/build/libc++/libcxxrt/lib -lc++ -lcxxrt -ldl -lpthread -Wl,--whole-archive -lmcwamp -Wl,--no-whole-archive  -lpthread  /home/ruben/Development/cppamp-3.5/src/tests/Unit/AmpMath/amp_math_acos.cpp -o /home/ruben/Development/cppamp-3.5/build/tests/Unit/AmpMath/Output/amp_math_acos.cpp.tmp.out && /home/ruben/Development/cppamp-3.5/build/tests/Unit/AmpMath/Output/amp_math_acos.cpp.tmp.out
    clang version 3.5.0 (tags/RELEASE_350/final)
    Target: x86_64-unknown-linux-gnu
    Thread model: posix
    Found candidate GCC installation: /usr/lib/gcc/x86_64-unknown-linux-gnu/4.4.7
    Found candidate GCC installation: /usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.4
    Found candidate GCC installation: /usr/lib/gcc/x86_64-unknown-linux-gnu/4.9.2
    Found candidate GCC installation: /usr/lib64/gcc/x86_64-unknown-linux-gnu/4.4.7
    Found candidate GCC installation: /usr/lib64/gcc/x86_64-unknown-linux-gnu/4.7.4
    Found candidate GCC installation: /usr/lib64/gcc/x86_64-unknown-linux-gnu/4.9.2
    Selected GCC installation: /usr/lib64/gcc/x86_64-unknown-linux-gnu/4.9.2
    Candidate multilib: .;@m64
    Candidate multilib: 32;@m32
    Selected multilib: .;@m64
     "/home/ruben/Development/cppamp-3.5/build/compiler/bin/clang-3.5" -cc1 -triple x86_64-unknown-linux-gnu -E -disable-free -disable-llvm-verifier -main-file-name amp_math_acos.cpp -mrelocation-model static -mdisable-fp-elim -fmath-errno -masm-verbose -mconstructor-aliases -munwind-tables -fuse-init-array -target-cpu x86-64 -v -dwarf-column-info -resource-dir /home/ruben/Development/cppamp-3.5/build/compiler/bin/../lib/clang/3.5.0 -D GTEST_HAS_TR1_TUPLE=0 -I /home/ruben/Development/cppamp-3.5/src/utils -I /usr/include/CL/../ -I /home/ruben/Development/cppamp-3.5/src/include -I /home/ruben/Development/cppamp-3.5/src/libc++/libcxx/include -I/opt/intel/composerxe/ipp/include -I/opt/intel/composerxe/mkl/include -I/opt/intel/composerxe/tbb/include -internal-isystem /usr/include/c++/v1 -internal-isystem /usr/local/include -internal-isystem /home/ruben/Development/cppamp-3.5/build/compiler/bin/../lib/clang/3.5.0/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -std=c++amp -fdeprecated-macro -fdebug-compilation-dir /home/ruben/Development/cppamp-3.5/build -ferror-limit 19 -fmessage-length 131 -mstackrealign -fobjc-runtime=gcc -fcxx-exceptions -fexceptions -fdiagnostics-show-option -fcolor-diagnostics -o amp_math_acos.ii -x c++ /home/ruben/Development/cppamp-3.5/src/tests/Unit/AmpMath/amp_math_acos.cpp
    clang -cc1 version 3.5.0 based upon LLVM 3.5.0svn default target x86_64-unknown-linux-gnu
    ignoring nonexistent directory "/include"
    ignoring duplicate directory "/usr/include/CL/.."
      as it is a non-system directory that duplicates a system directory
    #include "..." search starts here:
    #include <...> search starts here:
     /home/ruben/Development/cppamp-3.5/src/utils
     /home/ruben/Development/cppamp-3.5/src/include
     /home/ruben/Development/cppamp-3.5/src/libc++/libcxx/include
     /opt/intel/composerxe/ipp/include
     /opt/intel/composerxe/mkl/include
     /opt/intel/composerxe/tbb/include
     /usr/include/c++/v1
     /usr/local/include
     /home/ruben/Development/cppamp-3.5/build/compiler/bin/../lib/clang/3.5.0/include
     /usr/include/CL/..
    End of search list.
     "/home/ruben/Development/cppamp-3.5/build/compiler/bin/clang-3.5" -cc1 -triple x86_64-unknown-linux-gnu -S -disable-free -disable-llvm-verifier -main-file-name amp_math_acos.cpp -mrelocation-model static -mdisable-fp-elim -fmath-errno -masm-verbose -mconstructor-aliases -munwind-tables -fuse-init-array -target-cpu x86-64 -v -dwarf-column-info -resource-dir /home/ruben/Development/cppamp-3.5/build/compiler/bin/../lib/clang/3.5.0 -std=c++amp -fdeprecated-macro -fdebug-compilation-dir /home/ruben/Development/cppamp-3.5/build -ferror-limit 19 -fmessage-length 131 -mstackrealign -fobjc-runtime=gcc -fcxx-exceptions -fexceptions -fdiagnostics-show-option -fcolor-diagnostics -o amp_math_acos.s -x c++-cpp-output amp_math_acos.ii
    clang -cc1 version 3.5.0 based upon LLVM 3.5.0svn default target x86_64-unknown-linux-gnu
    #include "..." search starts here:
    End of search list.
     "/home/ruben/Development/cppamp-3.5/build/compiler/bin/clang-3.5" -cc1as -triple x86_64-unknown-linux-gnu -filetype obj -main-file-name amp_math_acos.cpp -target-cpu x86-64 -mrelax-all -o amp_math_acos.o amp_math_acos.s
     "/home/ruben/Development/cppamp-3.5/build/compiler/bin/clang-3.5" -cc1 -triple x86_64-unknown-linux-gnu -E -disable-free -disable-llvm-verifier -main-file-name amp_math_acos.cpp -mrelocation-model static -mdisable-fp-elim -fmath-errno -masm-verbose -mconstructor-aliases -munwind-tables -fuse-init-array -target-cpu x86-64 -v -dwarf-column-info -resource-dir /home/ruben/Development/cppamp-3.5/build/compiler/bin/../lib/clang/3.5.0 -D GTEST_HAS_TR1_TUPLE=0 -I /home/ruben/Development/cppamp-3.5/src/utils -I /usr/include/CL/../ -I /home/ruben/Development/cppamp-3.5/src/include -I /home/ruben/Development/cppamp-3.5/src/libc++/libcxx/include -I/opt/intel/composerxe/ipp/include -I/opt/intel/composerxe/mkl/include -I/opt/intel/composerxe/tbb/include -internal-isystem /usr/include/c++/v1 -internal-isystem /usr/local/include -internal-isystem /home/ruben/Development/cppamp-3.5/build/compiler/bin/../lib/clang/3.5.0/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -std=c++amp -fdeprecated-macro -fdebug-compilation-dir /home/ruben/Development/cppamp-3.5/build -ferror-limit 19 -fmessage-length 131 -mstackrealign -fobjc-runtime=gcc -fcxx-exceptions -fexceptions -fdiagnostics-show-option -fcolor-diagnostics -o amp_math_acos.ii -x c++amp-kernel /home/ruben/Development/cppamp-3.5/src/tests/Unit/AmpMath/amp_math_acos.cpp
    clang -cc1 version 3.5.0 based upon LLVM 3.5.0svn default target x86_64-unknown-linux-gnu
    ignoring nonexistent directory "/include"
    ignoring duplicate directory "/usr/include/CL/.."
      as it is a non-system directory that duplicates a system directory
    #include "..." search starts here:
    #include <...> search starts here:
     /home/ruben/Development/cppamp-3.5/src/utils
     /home/ruben/Development/cppamp-3.5/src/include
     /home/ruben/Development/cppamp-3.5/src/libc++/libcxx/include
     /opt/intel/composerxe/ipp/include
     /opt/intel/composerxe/mkl/include
     /opt/intel/composerxe/tbb/include
     /usr/include/c++/v1
     /usr/local/include
     /home/ruben/Development/cppamp-3.5/build/compiler/bin/../lib/clang/3.5.0/include
     /usr/include/CL/..
    End of search list.
     "/home/ruben/Development/cppamp-3.5/build/compiler/bin/clang-3.5" -cc1 -D __GPU__=1 -famp-is-device -fno-builtin -fno-common -O2 -triple i386-unknown-linux-gnu -S -disable-free -disable-llvm-verifier -main-file-name amp_math_acos.cpp -mrelocation-model static -mdisable-fp-elim -fmath-errno -masm-verbose -mconstructor-aliases -munwind-tables -fuse-init-array -target-cpu x86-64 -v -resource-dir /home/ruben/Development/cppamp-3.5/build/compiler/bin/../lib/clang/3.5.0 -std=c++amp -fdeprecated-macro -fdebug-compilation-dir /home/ruben/Development/cppamp-3.5/build -ferror-limit 19 -fmessage-length 131 -mstackrealign -fobjc-runtime=gcc -fcxx-exceptions -fexceptions -fdiagnostics-show-option -fcolor-diagnostics -o amp_math_acos.s -x c++amp-kernel-cpp-output amp_math_acos.ii -emit-llvm-bc
    error: invalid value 'c++amp-kernel-cpp-output' in '-x c++amp-kernel-cpp-output'
    
  3. Jack Chung

    Hi rubenvb, thanks for your updates. Now could you help do the followings?

    • remove "-save-temps" , I don't think my hack in Clang Driver support that option yet. So it will likely break.
    • could you confirm if /usr/include/CL contains OpenCL headers from AMD? On my box here's the build command for the same test case. And you can notice the OpenCL headers point to /opt/AMDAPP/include/CL .
    /home/whchung/cppamp35/build/compiler/bin/clang++ -v -I/home/whchung/cppamp35/src/utils -I/opt/AMDAPP/include/CL/../ -DGTEST_HAS_TR1_TUPLE=0 -stdlib=libc++ -std=c++amp -I/home/whchung/cppamp35/src/include -I/home/whchung/cppamp35/src/libc++/libcxx/include -std=c++amp -std=c++amp -L/home/whchung/cppamp35/build/build/Release/lib -L/home/whchung/cppamp35/build/libc++/libcxx/lib -L/home/whchung/cppamp35/build/libc++/libcxxrt/lib -Wl,--rpath=/home/whchung/cppamp35/build/build/Release/lib:/home/whchung/cppamp35/build/libc++/libcxx/lib:/home/whchung/cppamp35/build/libc++/libcxxrt/lib -lc++ -lcxxrt -ldl -lpthread -Wl,--whole-archive -lmcwamp -Wl,--no-whole-archive  -lpthread  /home/whchung/cppamp35/src/tests/Unit/AmpMath/amp_math_acos.cpp -o /home/whchung/cppamp35/build/tests/Unit/AmpMath/Output/amp_math_acos.cpp.tmp.out && /home/whchung/cppamp35/build/tests/Unit/AmpMath/Output/amp_math_acos.cpp.tmp.out
    

    Also, could you help provide the version of objcopy you are using? I'm using "GNU objcopy (GNU Binutils for Ubuntu) 2.24".

  4. Ruben Van Boxem reporter

    OK, figured -save-temps might not work as intended with all the extra magic going on. The rest of the output (from where the error is in my previous message) is this:

    clang -cc1 version 3.5.0 based upon LLVM 3.5.0svn default target x86_64-unknown-linux-gnu
    ignoring nonexistent directory "/include"
    #include "..." search starts here:
    #include <...> search starts here:
     /home/ruben/Development/cppamp-3.5/src/utils
     /opt/AMDAPP/SDK/include/CL/..
     /home/ruben/Development/cppamp-3.5/src/include
     /home/ruben/Development/cppamp-3.5/src/libc++/libcxx/include
     /opt/intel/composerxe/ipp/include
     /opt/intel/composerxe/mkl/include
     /opt/intel/composerxe/tbb/include
     /usr/include/c++/v1
     /usr/local/include
     /home/ruben/Development/cppamp-3.5/build/compiler/bin/../lib/clang/3.5.0/include
     /usr/include
    End of search list.
     "/home/ruben/Development/cppamp-3.5/build/compiler/bin/clamp-assemble" /tmp/amp_math_acos-a395cf.s /tmp/amp_math_acos-5e5ff1.o
     "/home/ruben/Development/cppamp-3.5/build/compiler/bin/clamp-link" --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o /home/ruben/Development/cppamp-3.5/build/tests/Unit/AmpMath/Output/amp_math_acos.cpp.tmp.out /usr/lib64/gcc/x86_64-unknown-linux-gnu/4.9.2/../../../../lib64/crt1.o /usr/lib64/gcc/x86_64-unknown-linux-gnu/4.9.2/../../../../lib64/crti.o /usr/lib64/gcc/x86_64-unknown-linux-gnu/4.9.2/crtbegin.o -L/home/ruben/Development/cppamp-3.5/build/build/Release/lib -L/home/ruben/Development/cppamp-3.5/build/libc++/libcxx/lib -L/home/ruben/Development/cppamp-3.5/build/libc++/libcxxrt/lib -L/usr/lib64/gcc/x86_64-unknown-linux-gnu/4.9.2 -L/usr/lib64/gcc/x86_64-unknown-linux-gnu/4.9.2/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib64/gcc/x86_64-unknown-linux-gnu/4.9.2/../../.. -L/home/ruben/Development/cppamp-3.5/build/compiler/bin/../lib -L/lib -L/usr/lib --rpath=/home/ruben/Development/cppamp-3.5/build/build/Release/lib:/home/ruben/Development/cppamp-3.5/build/libc++/libcxx/lib:/home/ruben/Development/cppamp-3.5/build/libc++/libcxxrt/lib -lc++ -lcxxrt -ldl -lpthread --whole-archive -lmcwamp --no-whole-archive -lpthread /tmp/amp_math_acos-9a88ea.o /tmp/amp_math_acos-5e5ff1.o -L/opt/intel/composerxe/compiler/lib/intel64 -L/opt/intel/composerxe/ipp/../compiler/lib/intel64 -L/opt/intel/composerxe/ipp/lib/intel64 -L/opt/intel/composerxe/compiler/lib/intel64 -L/opt/intel/composerxe/mkl/lib/intel64 -L/opt/intel/composerxe/tbb/lib/intel64/gcc4.4 -lc++ -lm -lgcc_s -lgcc -lc -lgcc_s -lgcc /usr/lib64/gcc/x86_64-unknown-linux-gnu/4.9.2/crtend.o /usr/lib64/gcc/x86_64-unknown-linux-gnu/4.9.2/../../../../lib64/crtn.o
    objcopy: error: the input file '/tmp/tmp.sBlPYHG23f/amp_math_acos-5e5ff1.host.o' has no sections
    mv: cannot stat '/tmp/tmp.sBlPYHG23f/amp_math_acos-5e5ff1.host.o.new': No such file or directory
    Generating OpenCL SPIR kernel
    Generating OpenCL C kernel
    Use OpenCL 1.2 C++AMP runtime
    Use OpenCL SPIR kernel
    error   : Binary format for key='0', ident='' is not recognized
    

    /usr/include/CL contains the Khronos OpenCL 1.2 headers (installed from this package which fetches them from the OpenCL repository). Changing the include path passed to the command to /opt/AMDAPP/SDK/include does not change anything unfortunately.

    The version of objcopy I am using is the one from the Arch repositories: GNU objcopy (GNU Binutils) 2.25

    I don't think the OpenCL code is already compiled at this stage though, correct? So it might be something going wrong with the generation of the section of the object file which contains that code.

  5. Jack Chung

    Hi rubenvb, thanks for the information. Based on the error messages from you I can see where the script fails. It's located in clamp-link ( /home/ruben/Development/cppamp-3.5/build/compiler/bin/clamp-link ).

          KERNEL_FILE=$TEMP_DIR/$FILENAME.kernel.bc
          HOST_FILE=$TEMP_DIR/$FILENAME.host.o
    
          # extract kernel section
          objcopy -O binary -j .kernel $ARG $KERNEL_FILE
    
          # extract host section
          objcopy -R .kernel $ARG $HOST_FILE
    
          # strip all symbols specified in symbol.txt from $HOST_FILE
          # NOTICE these 2 lines went wrong!!!
          objcopy @$CXXAMP_SERIALIZE_SYMBOL_FILE $HOST_FILE $HOST_FILE.new
          mv $HOST_FILE.new $HOST_FILE
    
          # find cxxamp_serialize symbols and save them into symbol.txt
          objdump -t $HOST_FILE -j .text 2> /dev/null | grep "g.*__cxxamp_serialize" | awk '{print "-L"$6}' >> $CXXAMP_SERIALIZE_SYMBOL_FILE
    

    What the script does here is to:

    1. extract kernel codes from an object (stored in .kernel section), save them into $KERNEL_FILE
    2. extract host codes from objects, save them into $HOST_FILE
    3. strip CPU serialization functions (functions with __cxxamp_serialize pattern, which push kernel arguments prior to kernel launch) in $HOST_FILE to preserve C++ ODR rule ( if a kernel is referenced by multiple host codes (translation units), multiple copies of identical CPU serialization functions would be emitted ) by reading a file ( $CXXAMP_SERIALIZE_SYMBOL_FILE )
    4. for CPU serialization functions not stripped, append their names into ( $CXXAMP_SERIALIZE_SYMBOL_FILE ), so they would be stripped for other objects

    Like you guessed, at this stage we are still dealing with LLVM IR. No OpenCL code is generated yet (it would be done in the later part of clamp-link).

    Could you help change clamp-link in your build directory, save the temp files somewhere, and observe what the content of $HOST_FILE is, and what the content of $CXXAMP_SERIALIZE_SYMBOL_FILE is? Sorry it's the first time I've ever encountered this issue so I'll need your help to diagnose and fix this problem.

    Finally, it seems you are using Arch, right? I'm using Ubuntu and also have never tried it on Arch before.

  6. Ruben Van Boxem reporter

    Here are the tempfiles and the modified clamp-link script. If you need anything else, please ask. I'm happy you're actively looking into this.

    I was planning on creating an Arch user package for clamp to promote it a bit once I get it working, and after that, see what it gives on Windows+MinGW-w64.

  7. Jack Chung

    Hi rubenvb, based on these temp files. Could you try the following commands on your system? Both of these commands work on my Ubuntu box and I assume one of them would fail on your side.

    objcopy @symbol.txt amp_math_acos-c3ca07.host.o amp_math_acos-c3ca07.host.o.new
    objcopy @symbol.txt amp_math_acos-fcd334.host.o amp_math_acos-fcd334.host.o.new
    

    I suspect this error is caused by different behavior of objcopy between Arch and Ubuntu. If that is the case, then I'll need to build an Arch box to fix it.

  8. Ruben Van Boxem reporter

    The second works without error. The first fails with:

    objcopy: error: the input file 'amp_math_acos-c3ca07.host.o' has no sections
    

    If i run nm on the first object file, I get only:

    0000000000008e84 A _binary__tmp_amp_math_acos_a0e64f_s_size
    

    The second object file has a load of symbols which is more likea C++ object file.

    Unless you're relying on Ubuntu patches to binutils, I doubt it is the cause, as Arch's binutils is built from the unpatched sources of the release. Could it be a mismatch of LLVM 3.5.1 with Clamp's 3.5.0 version?

  9. Jack Chung

    Hi rubenvb, I think we're observing some discrepancies here. On my Ubuntu box both commands work fine. In fact I can even use object files provided by you to create workable executable on my box!

    As I looked into clamp-link and your error log more, I think there might be others issue than this objcopy one. Because even if we got objcopy/mv commands failed in clamp-link, from the error log it seems the executable is produced anyhow. So instead of only seeing a compilation issue we may have another runtime issue as well:

    No protocol specified
    error   : Binary format for key='0', ident='' is not recognized
    

    I'd like to first ask you to check if you can find executable files named like amp_math_acos.cpp.tmp.out under /home/ruben/Development/cppamp-3.5/build/tests/Unit/AmpMath/Output . If it's there, then could you help locate a box which have only ONE OpenCL SDK installed (ex: AMD APP SDK, or NVidia CUDA SDK), and try execute the binary.

  10. Ruben Van Boxem reporter

    Right, after removing the Intel and AMD OpenCL drivers from the system, the NVIDIA one is used (as now I can see that the OpenCL 1.1 implementation is used).

    Some tests still fail though, but that would be worth another bug report I suppose?

    Would it be possible to implement an OpenCL device/platform selector environment variable to remedy this in a more extensible manner? I can't imagine requiring users to have a single OpenCL platform installed is realistic. Wouldn't a number (starting from 0, or two, e.g. 0:0) suffice combined with clinfo output? Maybe you could wiggle the clinfo sources from AMD and put them next to our AMP runtime as convenience. I'm rambling here. Perhaps a mailing list of sorts would be a better place to discuss this kind of thing. Is there one?

    Thanks for the help so far!

  11. Jack Chung

    Hi rubenvb, unfortunately there is no mailing list for CLAMP support so let's abuse the bug tracking system for now. J

    Multiple GPU / multiple CL platform support is actually our next step. Ideally we can expect something within February.

  12. Ruben Van Boxem reporter

    All right. I'll be waiting (and experimenting with MS C++ AMP in the meantime, cause, well, I never really used it yet).

    I'll leave this report for the objcopy and multiple platform support and start a new "discussion issue" for the other questions I have.

    If I can help with anything else to help solve these issues, please ask.

  13. Marcin Copik

    I can repeat this problem on my machine: Mint Linux 17.1, installed three OpenCL platforms (AMD APP, Intel, CUDA). I'm pretty sure that it happened just after installing CUDA (before that I had only CPU-based platforms: AMD and Intel). Compilation works fine, but the execution fails with:

    error   : Binary format for key='0', ident='' is not recognized
    

    It happens for both previous version (based on Clang-3.3) and current one. It seems that everything got broken after installing CUDA.

    I attach my clinfo; although AMD APP is a first platform, valgrind shows that the application is failing somewhere in the Intel library:

    ==16884== Invalid read of size 16
    ==16884==    at 0x40841DC: __intel_sse2_strrchr (in /usr/lib/intel/intel-opencl-1.2-4.6.0.92/opencl-1.2-4.6.0.92/lib64/libtbb.so.2)
    ==16884==    by 0x406C3D1: tbb::internal::init_dl_data() (dynamic_link.cpp:332)
    ==16884==    by 0x406C306: __sti__$E (dynamic_link.cpp:500)
    ==16884==    by 0x408E7C1: ??? (in /usr/lib/intel/intel-opencl-1.2-4.6.0.92/opencl-1.2-4.6.0.92/lib64/libtbb.so.2)
    ==16884==    by 0x4067572: ??? (in /usr/lib/intel/intel-opencl-1.2-4.6.0.92/opencl-1.2-4.6.0.92/lib64/libtbb.so.2)
    ==16884==    by 0xC39FA4F: ??? (in /usr/lib/intel/intel-opencl-1.2-4.6.0.92/opencl-1.2-4.6.0.92/lib64/libintelocl.so)
    ==16884==    by 0x40100FC: call_init.part.0 (dl-init.c:64)
    ==16884==    by 0x4010222: _dl_init (dl-init.c:36)
    ==16884==    by 0x4014C6F: dl_open_worker (dl-open.c:577)
    ==16884==    by 0x400FFF3: _dl_catch_error (dl-error.c:187)
    ==16884==    by 0x40143BA: _dl_open (dl-open.c:661)
    ==16884==    by 0x53F602A: dlopen_doit (dlopen.c:66)
    
  14. Jack Chung

    Hi Marcin, from your clinfo log it seems although CL platform supports SPIR, but the only GPU instance is an NVIDIA, which doesn't do SPIR. So could you try if the issue would go away if you "export CLAMP_NOSPIR=ON" by forcing SPIR off?

  15. Marcin Copik

    Yes, the problem is gone and it's executing OpenCL C kernel. However, it seems that the kernel is not executed at all - HelloWorld example gives the exact data before parallel for_each and matrix multiplication example ends with a data mismatch a the very first element.

  16. Marcin Copik

    Ok, this issue appears only in clang-3.3. In 3.5 everything's fine. Thanks for such quick reply!

  17. Marcin Copik

    Hi, everything broke on my PC after making an update from Kalmar repo yesterday. Previously, I was able to run the computations on CPU-based OpenCL implementations. Right now (with CLAMP_NOSPIR=ON) I've got this problem (even for Kalmar samples! here is Hello World):

    ==27798== Invalid read of size 16
    ==27798==    at 0x40841DC: __intel_sse2_strrchr (in /usr/lib/intel/intel-opencl-1.2-4.6.0.92/opencl-1.2-4.6.0.92/lib64/libtbb.so.2)
    ==27798==    by 0x406C3D1: tbb::internal::init_dl_data() (dynamic_link.cpp:332)
    ==27798==    by 0x406C306: __sti__$E (dynamic_link.cpp:500)
    ==27798==    by 0x408E7C1: ??? (in /usr/lib/intel/intel-opencl-1.2-4.6.0.92/opencl-1.2-4.6.0.92/lib64/libtbb.so.2)
    ==27798==    by 0x4067572: ??? (in /usr/lib/intel/intel-opencl-1.2-4.6.0.92/opencl-1.2-4.6.0.92/lib64/libtbb.so.2)
    ==27798==    by 0xC3A9A4F: ??? (in /usr/lib/intel/intel-opencl-1.2-4.6.0.92/opencl-1.2-4.6.0.92/lib64/libintelocl.so)
    ==27798==    by 0x40100FC: call_init.part.0 (dl-init.c:64)
    ==27798==    by 0x4010222: _dl_init (dl-init.c:36)
    ==27798==    by 0x4014C6F: dl_open_worker (dl-open.c:577)
    ==27798==    by 0x400FFF3: _dl_catch_error (dl-error.c:187)
    ==27798==    by 0x40143BA: _dl_open (dl-open.c:661)
    ==27798==    by 0x53F102A: dlopen_doit (dlopen.c:66)
    ==27798==  Address 0x67544c0 is 64 bytes inside a block of size 79 alloc'd
    ==27798==    at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
    ==27798==    by 0x400630D: open_path (dl-load.c:2271)
    ==27798==    by 0x4008F90: _dl_map_object (dl-load.c:2456)
    ==27798==    by 0x400D601: openaux (dl-deps.c:63)
    ==27798==    by 0x400FFF3: _dl_catch_error (dl-error.c:187)
    ==27798==    by 0x400DD04: _dl_map_object_deps (dl-deps.c:254)
    ==27798==    by 0x4014AAA: dl_open_worker (dl-open.c:272)
    ==27798==    by 0x400FFF3: _dl_catch_error (dl-error.c:187)
    ==27798==    by 0x40143BA: _dl_open (dl-open.c:661)
    ==27798==    by 0x53F102A: dlopen_doit (dlopen.c:66)
    ==27798==    by 0x400FFF3: _dl_catch_error (dl-error.c:187)
    ==27798==    by 0x53F162C: _dlerror_run (dlerror.c:163)
    ==27798== 
    ==27798== Invalid read of size 8
    ==27798==    at 0xA3F359B: ??? (in /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.352.21)
    ==27798==    by 0x6A82C7E: Concurrency::OpenCLContext::OpenCLContext() (in /home/mcopik/Projekty/steiar/cppamp-driver-ng-35_build_new/build/Release/lib/libmcwamp_opencl.so)
    ==27798==    by 0x6A81B4B: _GLOBAL__sub_I_mcwamp_opencl.cpp (in /home/mcopik/Projekty/steiar/cppamp-driver-ng-35_build_new/build/Release/lib/libmcwamp_opencl.so)
    ==27798==    by 0x4010139: call_init.part.0 (dl-init.c:78)
    ==27798==    by 0x4010222: _dl_init (dl-init.c:36)
    ==27798==    by 0x4014C6F: dl_open_worker (dl-open.c:577)
    ==27798==    by 0x400FFF3: _dl_catch_error (dl-error.c:187)
    ==27798==    by 0x40143BA: _dl_open (dl-open.c:661)
    ==27798==    by 0x53F102A: dlopen_doit (dlopen.c:66)
    ==27798==    by 0x400FFF3: _dl_catch_error (dl-error.c:187)
    ==27798==    by 0x53F162C: _dlerror_run (dlerror.c:163)
    ==27798==    by 0x53F10C0: dlopen@@GLIBC_2.2.5 (dlopen.c:87)
    ==27798==  Address 0x40 is not stack'd, malloc'd or (recently) free'd
    ==27798== 
    ==27798== 
    ==27798== Process terminating with default action of signal 11 (SIGSEGV)
    ==27798==  Access not within mapped region at address 0x40
    ==27798==    at 0xA3F359B: ??? (in /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.352.21)
    ==27798==    by 0x6A82C7E: Concurrency::OpenCLContext::OpenCLContext() (in /home/mcopik/Projekty/steiar/cppamp-driver-ng-35_build_new/build/Release/lib/libmcwamp_opencl.so)
    ==27798==    by 0x6A81B4B: _GLOBAL__sub_I_mcwamp_opencl.cpp (in /home/mcopik/Projekty/steiar/cppamp-driver-ng-35_build_new/build/Release/lib/libmcwamp_opencl.so)
    ==27798==    by 0x4010139: call_init.part.0 (dl-init.c:78)
    ==27798==    by 0x4010222: _dl_init (dl-init.c:36)
    ==27798==    by 0x4014C6F: dl_open_worker (dl-open.c:577)
    ==27798==    by 0x400FFF3: _dl_catch_error (dl-error.c:187)
    ==27798==    by 0x40143BA: _dl_open (dl-open.c:661)
    ==27798==    by 0x53F102A: dlopen_doit (dlopen.c:66)
    ==27798==    by 0x400FFF3: _dl_catch_error (dl-error.c:187)
    ==27798==    by 0x53F162C: _dlerror_run (dlerror.c:163)
    ==27798==    by 0x53F10C0: dlopen@@GLIBC_2.2.5 (dlopen.c:87)
    ==27798==  If you believe this happened as a result of a stack
    ==27798==  overflow in your program's main thread (unlikely but
    ==27798==  possible), you can try to increase the size of the
    ==27798==  main thread stack using the --main-stacksize= flag.
    ==27798==  The main thread stack size used in this run was 8388608.
    

    It's weird, because two OpenCL implementations are referenced here: NVIDIA's and Intel's. However, when I tried to disable NVIDIA OpenCL vendor, I've got this problem and the same thing happened on my laptop, where only one vendor (AMD) is provided:

    There is no device can be used to do the computation
    

    Is it a final change that only GPUs can be used to parallelize code with Kalmar?

  18. Marcin Copik

    I can see that the change was introduced in commit 1ec91493, with message "Add check to make sure Devices always has more than one devices that can do the computation".

  19. Marcin Copik

    Hi Jack, thanks for the reply. Yes, it helped - I'm able to run computations on CPU. Should I expect that Kalmar will support only GPUs in the future?

  20. Marcin Copik

    Have you provided full support for multiple OpenCL platforms? Because I feel that after recent updates things got worse and now I have to disable all OpenCL vendors except one, or I'll get a segfault with Valgrind showing mixed usage of different libraries (e.g. AMD's and Intel's implementation). I won't attack you with another logs if you are aware of that issue ;-)

  21. Jack Chung

    Marcin, unfortunately multiple OpenCL platforms is always pretty low down in our priority. We are working on getting multiple OpenCL devices but all of our test hardware contain only 1 platform.

  22. Log in to comment