Common subdirectories: magma/.git and magma-2/.git Only in magma-2: .gitignore Only in magma-2: HIP-notes.txt diff magma/Makefile magma-2/Makefile 1a2,15 > # build process > # > # For hipMAGMA (branch of MAGMA that builds on HIP), the build process is basically: > # 0: Clone the repo, or download a release > # 1: Copy your `make.inc` for your specific platform > # 2: If its the repo, then run `make -f make.gen.interface_hip`, and `make -f make.gen.magmablas_hip`, > # and `make -f make.gen.testing_hip` > # 3: Now, run `make`, like normal. > # 4: You can make specific testers if the builds are failing > # > > > > # ------------------------------------------------------------------------------ 9,15c23,36 < # defaults if nothing else is given in make.inc < CC ?= cc < CXX ?= c++ < NVCC ?= nvcc < FORT ?= < ifeq ($(FORT),) < $(warning No Fortran compiler was given in FORT in make.inc. Some testers will not be able to check their results.) --- > # -------------------- > # configuration > > # should MAGMA be built on CUDA (NVIDIA only) or HIP (AMD or NVIDIA) > # enter 'cuda' or 'hip' respectively > BACKEND ?= cuda > > # set these to their real paths > CUDADIR ?= /usr/local/cuda > HIPDIR ?= /opt/rocm/hip > > # require either hip or cuda > ifeq (,$(findstring $(BACKEND),"hip cuda")) > $(error "'BACKEND' should be either 'cuda' or 'hip' (got '$(BACKEND)')") 18,20c39,40 < ARCH ?= ar < ARCHFLAGS ?= cr < RANLIB ?= ranlib --- > # -------------------- > # programs 22,23c42,70 < # shared libraries require -fPIC < #FPIC = -fPIC --- > # set compilers > CC ?= gcc > CXX ?= g++ > FORT ?= gfortran > HIPCC ?= hipcc > NVCC ?= nvcc > DEVCC ?= NONE > > > # set from 'BACKEND' > ifeq ($(BACKEND),cuda) > DEVCC = $(NVCC) > else ifeq ($(BACKEND),hip) > DEVCC = $(HIPCC) > > # if we are using HIP, make sure generated sources are up to date > # Technically, this 'recursive' make which we don't like to do, but also this is a simple solution > # that allows that file to handle all code generation > # Another reason is that I don't want to flood the namespace (for example, that file also > # defines an 'all' and 'clean' target as phonies) > # So, in the future that whole file may be integrated, but for now this seems simplest > # Detect number of jobs here, so it runs at an appropriate speed > MAKE_PID := $(shell echo $$PPID) > JOB_FLAG := $(filter -j%, $(subst -j ,-j,$(shell ps T | grep "^\s*$(MAKE_PID).*$(MAKE)"))) > JOBS := $(subst -j,,$(JOB_FLAG)) > tmp := $(shell $(MAKE) -j$(JOBS) -f make.gen.hipMAGMA 1>&2) > else > $(warning BACKEND: $(BACKEND) not recognized) > endif 25,31c72,75 < # may want -std=c99 for CFLAGS, -std=c++11 for CXXFLAGS < CFLAGS ?= -O3 $(FPIC) -DADD_ -Wall -MMD < CXXFLAGS ?= $(CFLAGS) -std=c++11 < NVCCFLAGS ?= -O3 -DADD_ -Xcompiler "$(FPIC) -Wall -Wno-unused-function" -std=c++11 < FFLAGS ?= -O3 $(FPIC) -DADD_ -Wall -Wno-unused-dummy-argument < F90FLAGS ?= -O3 $(FPIC) -DADD_ -Wall -Wno-unused-dummy-argument < LDFLAGS ?= -O3 $(FPIC) --- > # and utilities > ARCH ?= ar > ARCHFLAGS ?= cr > RANLIB ?= ranlib 33d76 < INC ?= -I$(CUDADIR)/include 35,36c78,79 < LIBDIR ?= -L$(CUDADIR)/lib64 < LIB ?= -lcudart -lcudadevrt -lcublas -lcusparse -llapack -lblas -lpthread -lm --- > # -------------------- > # flags/settings 38c81,93 < GPU_TARGET ?= Kepler Maxwell Pascal --- > # Use -fPIC to make shared (.so) and static (.a) library; > # can be commented out if making only static library. > FPIC ?= -fPIC > > # now, generate our flags > CFLAGS ?= -O3 $(FPIC) -DNDEBUG -DADD_ -Wall -fopenmp -std=c99 > CXXFLAGS ?= -O3 $(FPIC) -DNDEBUG -DADD_ -Wall -fopenmp -std=c++11 > FFLAGS ?= -O3 $(FPIC) -DNDEBUG -DADD_ -Wall -Wno-unused-dummy-argument > F90FLAGS ?= -O3 $(FPIC) -DNDEBUG -DADD_ -Wall -Wno-unused-dummy-argument -x f95-cpp-input > LDFLAGS ?= $(FPIC) -fopenmp > > DEVCCFLAGS ?= -O3 -DNDEBUG -DADD_ > # DEVCCFLAGS are populated later in `backend-specific` 42a98 > # where to install to? 45d100 < 59,61d113 < CFLAGS += -DHAVE_CUBLAS < CXXFLAGS += -DHAVE_CUBLAS < 65c117 < codegen = python tools/codegen.py --- > codegen = ./tools/codegen.py 66a119 > ifeq ($(BACKEND),cuda) 68,81c121,279 < # ------------------------------------------------------------------------------ < # NVCC options for the different cards < # First, add smXX for architecture names < ifneq ($(findstring Kepler, $(GPU_TARGET)),) < GPU_TARGET += sm_30 sm_35 < endif < ifneq ($(findstring Maxwell, $(GPU_TARGET)),) < GPU_TARGET += sm_50 < endif < ifneq ($(findstring Pascal, $(GPU_TARGET)),) < GPU_TARGET += sm_60 < endif < ifneq ($(findstring Volta, $(GPU_TARGET)),) < GPU_TARGET += sm_70 --- > # ------------------------------------------------------------------------------ > # NVCC options for the different cards > # First, add smXX for architecture names > # Internal CUDA architectures we support > # TODO: Filter on regex to discard the named architectures? > CUDA_ARCH_ := $(GPU_TARGET) > ifneq ($(findstring Kepler, $(GPU_TARGET)),) > CUDA_ARCH_ += sm_30 > CUDA_ARCH_ += sm_35 > endif > ifneq ($(findstring Maxwell, $(GPU_TARGET)),) > CUDA_ARCH_ += sm_50 > endif > ifneq ($(findstring Pascal, $(GPU_TARGET)),) > CUDA_ARCH_ += sm_60 > endif > ifneq ($(findstring Volta, $(GPU_TARGET)),) > CUDA_ARCH_ += sm_70 > endif > ifneq ($(findstring Turing, $(GPU_TARGET)),) > CUDA_ARCH_ += sm_75 > endif > ifneq ($(findstring Ampere, $(GPU_TARGET)),) > CUDA_ARCH_ += sm_80 > endif > > > # Remember to add to CMakeLists.txt too! > > # Next, add compile options for specific smXX > # sm_xx is binary, compute_xx is PTX for forward compatability > # MIN_ARCH is lowest requested version > # Use it ONLY in magma_print_environment; elsewhere use __CUDA_ARCH__ or magma_getdevice_arch() > # NV_SM accumulates sm_xx for all requested versions > # NV_COMP is compute_xx for highest requested version > # > # See also $(info compile for ...) in Makefile > > ## Suggestion by Mark (from SLATE) > # Valid architecture numbers > # TODO: remove veryold ones? > VALID_SMS = 30 32 35 37 50 52 53 60 61 62 70 72 75 80 > > # code=sm_XX is binary, code=compute_XX is PTX > GENCODE_SM = -gencode arch=compute_$(sm),code=sm_$(sm) > GENCODE_COMP = -gencode arch=compute_$(sm),code=compute_$(sm) > > # Get gencode options for all sm_XX in cuda_arch_. > NV_SM := $(filter %, $(foreach sm, $(VALID_SMS),$(if $(findstring sm_$(sm), $(CUDA_ARCH_)),$(GENCODE_SM)))) > NV_COMP := $(filter %, $(foreach sm, $(VALID_SMS),$(if $(findstring sm_$(sm), $(CUDA_ARCH_)),$(GENCODE_COMP)))) > > ifeq ($(NV_SM),) > $(error GPU_TARGET, currently $(GPU_TARGET), must contain one or more of Fermi, Kepler, Maxwell, Pascal, Volta, Turing, or valid sm_[0-9][0-9]. Please edit your make.inc file) > else > # Get last option (last 2 words) of nv_compute. > nwords := $(words $(NV_COMP)) > nwords_1 := $(shell expr $(nwords) - 1) > NV_COMP_LAST := $(wordlist $(nwords_1), $(nwords), $(NV_COMP)) > endif > > # Use all sm_XX (binary), and the last compute_XX (PTX) for forward compatibility. > DEVCCFLAGS += $(NV_SM) $(NV_COMP_LAST) > LIBS += -lcublas -lcudart > > # Get first (minimum) architecture > MIN_ARCH := $(wordlist 1, 1, $(foreach sm, $(VALID_SMS),$(if $(findstring sm_$(sm), $(CUDA_ARCH_)),$(sm)0))) > ifeq ($(MIN_ARCH),) > $(error GPU_TARGET, currently $(GPU_TARGET), must contain one or more of Fermi, Kepler, Maxwell, Pascal, Volta, Turing, or valid sm_[0-9][0-9]. Please edit your make.inc file) > endif > > DEVCCFLAGS += -DHAVE_CUDA -DHAVE_CUBLAS -DMIN_CUDA_ARCH=$(MIN_ARCH) > > > CFLAGS += -DMIN_CUDA_ARCH=$(MIN_ARCH) > CXXFLAGS += -DMIN_CUDA_ARCH=$(MIN_ARCH) > > CFLAGS += -DHAVE_CUDA -DHAVE_CUBLAS > CXXFLAGS += -DHAVE_CUDA -DHAVE_CUBLAS > else ifeq ($(BACKEND),hip) > > # ------------------------------------------------------------------------------ > # hipcc backend > # Source: https://llvm.org/docs/AMDGPUUsage.html#target-triples > > # Filter our human readable names and replace with numeric names > HIP_ARCH_ := $(GPU_TARGET) > ifneq ($(findstring kaveri, $(GPU_TARGET)),) > HIP_ARCH_ += gfx700 > endif > ifneq ($(findstring hawaii, $(GPU_TARGET)),) > HIP_ARCH_ += gfx701 > endif > ifneq ($(findstring kabini, $(GPU_TARGET)),) > HIP_ARCH_ += gfx703 > endif > ifneq ($(findstring mullins, $(GPU_TARGET)),) > HIP_ARCH_ += gfx703 > endif > ifneq ($(findstring bonaire, $(GPU_TARGET)),) > HIP_ARCH_ += gfx704 > endif > ifneq ($(findstring carrizo, $(GPU_TARGET)),) > HIP_ARCH_ += gfx801 > endif > ifneq ($(findstring iceland, $(GPU_TARGET)),) > HIP_ARCH_ += gfx802 > endif > ifneq ($(findstring tonga, $(GPU_TARGET)),) > HIP_ARCH_ += gfx802 > endif > ifneq ($(findstring fiji, $(GPU_TARGET)),) > HIP_ARCH_ += gfx803 > endif > # These are in the documentation, and the leftmost column *seems* like a continuation > # of gfx803 > ifneq ($(findstring polaris10, $(GPU_TARGET)),) > HIP_ARCH_ += gfx803 > endif > ifneq ($(findstring polaris11, $(GPU_TARGET)),) > HIP_ARCH_ += gfx803 > endif > > ifneq ($(findstring tongapro, $(GPU_TARGET)),) > HIP_ARCH_ += gfx805 > endif > ifneq ($(findstring stoney, $(GPU_TARGET)),) > HIP_ARCH_ += gfx810 > endif > > ## Suggestion by Mark (from SLATE) > # Valid architecture numbers > # TODO: remove veryold ones? > VALID_GFXS = 600 601 602 700 701 702 703 704 705 801 802 803 805 810 900 902 904 906 908 909 90c 1010 1011 1012 1030 1031 1032 1033 > > > # Generated GFX option > TARGET_GFX = --amdgpu-target=gfx$(gfx) > > # Get gencode options for all sm_XX in cuda_arch_. > AMD_GFX := $(filter %, $(foreach gfx, $(VALID_GFXS),$(if $(findstring gfx$(gfx), $(HIP_ARCH_)),$(TARGET_GFX)))) > > ifeq ($(AMD_GFX),) > $(error GPU_TARGET, currently $(GPU_TARGET), must contain one or more of the targets for AMDGPUs (https://llvm.org/docs/AMDGPUUsage.html#target-triples), or valid gfx[0-9][0-9][0-9][0-9]?. Please edit your make.inc file) > else > endif > > # Use all sm_XX (binary), and the last compute_XX (PTX) for forward compatibility. > DEVCCFLAGS += $(AMD_GFX) > > # Get first (minimum) architecture > MIN_ARCH := $(wordlist 1, 1, $(foreach gfx, $(VALID_GFXS),$(if $(findstring gfx$(gfx), $(HIP_ARCH_)),$(gfx)))) > ifeq ($(MIN_ARCH),) > $(error GPU_TARGET, currently $(GPU_TARGET), did not contain a minimum arch) > endif > > # just so we know > CFLAGS += -DHAVE_HIP > CXXFLAGS += -DHAVE_HIP > DEVCCFLAGS += -DHAVE_HIP 83,185d280 < ifneq ($(findstring Turing, $(GPU_TARGET)),) < GPU_TARGET += sm_75 < endif < ifneq ($(findstring Ampere, $(GPU_TARGET)),) < GPU_TARGET += sm_80 < endif < # Remember to add to CMakeLists.txt too! < < < # Next, add compile options for specific smXX < # sm_xx is binary, compute_xx is PTX for forward compatability < # MIN_ARCH is lowest requested version < # Use it ONLY in magma_print_environment; elsewhere use __CUDA_ARCH__ or magma_getdevice_arch() < # NV_SM accumulates sm_xx for all requested versions < # NV_COMP is compute_xx for highest requested version < # < # See also $(info compile for ...) in Makefile < NV_SM := < NV_COMP := < < ifneq ($(findstring sm_10, $(GPU_TARGET)),) < $(warning CUDA arch 1.x is no longer supported by CUDA >= 6.x and MAGMA >= 2.0) < endif < ifneq ($(findstring sm_13, $(GPU_TARGET)),) < $(warning CUDA arch 1.x is no longer supported by CUDA >= 6.x and MAGMA >= 2.0) < endif < ifneq ($(findstring sm_20, $(GPU_TARGET)),) < MIN_ARCH ?= 200 < NV_SM += -gencode arch=compute_20,code=sm_20 < NV_COMP := -gencode arch=compute_20,code=compute_20 < $(warning CUDA arch 2.x is no longer supported by CUDA >= 9.x) < endif < ifneq ($(findstring sm_30, $(GPU_TARGET)),) < MIN_ARCH ?= 300 < NV_SM += -gencode arch=compute_30,code=sm_30 < NV_COMP := -gencode arch=compute_30,code=compute_30 < endif < ifneq ($(findstring sm_32, $(GPU_TARGET)),) < MIN_ARCH ?= 320 < NV_SM += -gencode arch=compute_32,code=sm_32 < NV_COMP := -gencode arch=compute_32,code=compute_32 < endif < ifneq ($(findstring sm_35, $(GPU_TARGET)),) < MIN_ARCH ?= 350 < NV_SM += -gencode arch=compute_35,code=sm_35 < NV_COMP := -gencode arch=compute_35,code=compute_35 < endif < ifneq ($(findstring sm_50, $(GPU_TARGET)),) < MIN_ARCH ?= 500 < NV_SM += -gencode arch=compute_50,code=sm_50 < NV_COMP := -gencode arch=compute_50,code=compute_50 < endif < ifneq ($(findstring sm_52, $(GPU_TARGET)),) < MIN_ARCH ?= 520 < NV_SM += -gencode arch=compute_52,code=sm_52 < NV_COMP := -gencode arch=compute_52,code=compute_52 < endif < ifneq ($(findstring sm_53, $(GPU_TARGET)),) < MIN_ARCH ?= 530 < NV_SM += -gencode arch=compute_53,code=sm_53 < NV_COMP := -gencode arch=compute_53,code=compute_53 < endif < ifneq ($(findstring sm_60, $(GPU_TARGET)),) < MIN_ARCH ?= 600 < NV_SM += -gencode arch=compute_60,code=sm_60 < NV_COMP := -gencode arch=compute_60,code=compute_60 < endif < ifneq ($(findstring sm_61, $(GPU_TARGET)),) < MIN_ARCH ?= 610 < NV_SM += -gencode arch=compute_61,code=sm_61 < NV_COMP := -gencode arch=compute_61,code=compute_61 < endif < ifneq ($(findstring sm_62, $(GPU_TARGET)),) < MIN_ARCH ?= 620 < NV_SM += -gencode arch=compute_62,code=sm_62 < NV_COMP := -gencode arch=compute_62,code=compute_62 < endif < ifneq ($(findstring sm_70, $(GPU_TARGET)),) < MIN_ARCH ?= 700 < NV_SM += -gencode arch=compute_70,code=sm_70 < NV_COMP := -gencode arch=compute_70,code=compute_70 < endif < ifneq ($(findstring sm_71, $(GPU_TARGET)),) < MIN_ARCH ?= 710 < NV_SM += -gencode arch=compute_71,code=sm_71 < NV_COMP := -gencode arch=compute_71,code=compute_71 < endif < ifneq ($(findstring sm_75, $(GPU_TARGET)),) < MIN_ARCH ?= 750 < NV_SM += -gencode arch=compute_75,code=sm_75 < NV_COMP := -gencode arch=compute_75,code=compute_75 < endif < ifneq ($(findstring sm_80, $(GPU_TARGET)),) < MIN_ARCH ?= 800 < NV_SM += -gencode arch=compute_80,code=sm_80 < NV_COMP := -gencode arch=compute_80,code=compute_80 < endif < ifeq ($(NV_COMP),) < $(error GPU_TARGET, currently $(GPU_TARGET), must contain one or more of Fermi, Kepler, Maxwell, Pascal, Volta, Turing, Ampere, or valid sm_[0-9][0-9]. Please edit your make.inc file) < endif < NVCCFLAGS += $(NV_SM) $(NV_COMP) < CFLAGS += -DMIN_CUDA_ARCH=$(MIN_ARCH) < CXXFLAGS += -DMIN_CUDA_ARCH=$(MIN_ARCH) 223d317 < interface_cuda \ 225,226d318 < magmablas \ < testing \ 228,233c320,346 < sparse \ < sparse/blas \ < sparse/control \ < sparse/include \ < sparse/src \ < sparse/testing \ --- > > # the directory in which the MAGMA sparse source is located > # change to sparse_hip for hipified sources > # right now, just use old one so the dense section still builds > > ifeq ($(BACKEND),cuda) > SPARSE_DIR ?= sparse > subdirs += interface_cuda > subdirs += testing > subdirs += magmablas > > # add all sparse folders > # Don't do it for HIP yet > subdirs += $(SPARSE_DIR) $(SPARSE_DIR)/blas $(SPARSE_DIR)/control $(SPARSE_DIR)/include $(SPARSE_DIR)/src $(SPARSE_DIR)/testing > > else ifeq ($(BACKEND),hip) > SPARSE_DIR ?= ./sparse_hip > subdirs += interface_hip > subdirs += magmablas_hip > subdirs += testing > > subdirs += $(SPARSE_DIR) $(SPARSE_DIR)/blas $(SPARSE_DIR)/control $(SPARSE_DIR)/include $(SPARSE_DIR)/src $(SPARSE_DIR)/testing > > endif > > > 236a350,351 > #$(info $$Makefiles=$(Makefiles)) > 243a359,361 > #$(info $$libmagma_src=$(libmagma_src)) > #$(info $$libmagma_all=$(libmagma_all)) > 268c386,391 < libmagma_dlink_obj := magmablas/dynamic.link.o --- > ifeq ($(BACKEND),cuda) > libmagma_dlink_obj := magmablas/dynamic.link.o > else ifeq ($(BACKEND),hip) > libmagma_dlink_obj := magmablas_hip/dynamic.link.o > endif > 274c397,405 < libsparse_dlink_obj := sparse/blas/dynamic.link.o --- > > ifeq ($(BACKEND),cuda) > libsparse_dlink_obj := $(SPARSE_DIR)/blas/dynamic.link.o > else ifeq ($(BACKEND),hip) > # No dynamic parallelism support in HIP > #libsparse_dlink_obj := $(SPARSE_DIR)/blas/dynamic.link.o > endif > > 314c445 < MAGMA_INC = -I./include --- > MAGMA_INC = -I./include -I./testing 318a450,451 > > ifeq ($(BACKEND),cuda) 320a454,457 > else ifeq ($(BACKEND),hip) > $(libsparse_obj): MAGMA_INC += -I./control -I./magmablas_hip -I$(SPARSE_DIR)/include -I$(SPARSE_DIR)/control > $(sparse_testing_obj): MAGMA_INC += -I$(SPARSE_DIR)/include -I$(SPARSE_DIR)/control -I./testing > endif 355a493,494 > #$(info $$libmagma_obj=$(libmagma_obj)) > 407c546 < .PHONY: all lib static shared clean test dense sparse docs --- > .PHONY: all lib static shared clean test dense sparse 422a562 > ifeq ($(BACKEND),cuda) 425,428c565,568 < < docs: < cd docs && ${MAKE} < --- > else ifeq ($(BACKEND),hip) > sparse-test: $(SPARSE_DIR)/testing > sparse-testing: $(SPARSE_DIR)/testing > endif 445c585 < $(findstring -fPIC, $(NVCCFLAGS))) --- > $(findstring -fPIC, $(DEVCCFLAGS))) 512,513d651 < interface_cuda_obj := $(filter interface_cuda/%.o, $(libmagma_obj)) < magmablas_obj := $(filter magmablas/%.o, $(libmagma_obj)) 516,518c654,667 < sparse_control_obj := $(filter sparse/control/%.o, $(libsparse_obj)) < sparse_blas_obj := $(filter sparse/blas/%.o, $(libsparse_obj)) < sparse_src_obj := $(filter sparse/src/%.o, $(libsparse_obj)) --- > sparse_control_obj := $(filter $(SPARSE_DIR)/control/%.o, $(libsparse_obj)) > sparse_blas_obj := $(filter $(SPARSE_DIR)/blas/%.o, $(libsparse_obj)) > sparse_src_obj := $(filter $(SPARSE_DIR)/src/%.o, $(libsparse_obj)) > > > ifeq ($(BACKEND),cuda) > interface_cuda_obj := $(filter interface_cuda/%.o, $(libmagma_obj)) > magmablas_obj := $(filter magmablas/%.o, $(libmagma_obj)) > else ifeq ($(BACKEND),hip) > interface_hip_obj := $(filter interface_hip/%.o, $(libmagma_obj)) > magmablas_hip_obj := $(filter magmablas_hip/%.o, $(libmagma_obj)) > #$(info $$magmablas_hip_obj=$(magmablas_hip_obj)) > endif > 529d677 < interface_cuda: $(interface_cuda_obj) 531c679,687 < magmablas: $(magmablas_obj) --- > > ifeq ($(BACKEND),cuda) > interface_cuda: $(interface_cuda_obj) > magmablas: $(magmablas_obj) > else ifeq ($(BACKEND),hip) > interface_hip: $(interface_hip_obj) > magmablas_hip: $(magmablas_hip_obj) > endif > 546c702 < cd testing && python ./run_tests.py --- > cd testing && ./run_tests.py 558a715 > ifeq ($(BACKEND),cuda) 564a722,731 > else ifeq ($(BACKEND),hip) > > interface_hip/clean: > -rm -f $(interface_hip_obj) > > magmablas_hip/clean: > -rm -f $(magmablas_hip_obj) > > endif > 604a772,773 > # object file rules > 615c784 < --- > 619c788,818 < %.$(o_ext): %.cpp --- > > > # ------------------------------------------------------------------------------ > # DEVICE kernels > > # set the device extension > ifeq ($(BACKEND),cuda) > d_ext := cu > else ifeq ($(BACKEND),hip) > d_ext := cpp > CXXFLAGS += -D__HIP_PLATFORM_HCC__ > endif > > > ifeq ($(BACKEND),cuda) > > %.i: %.$(d_ext) > $(DEVCC) -E $(DEVCCFLAGS) $(CPPFLAGS) -c -o $@ $< > > %.$(o_ext): %.$(d_ext) > $(DEVCC) $(DEVCCFLAGS) $(CPPFLAGS) -c -o $@ $< > > %.o: %.cpp > $(CXX) $(CXXFLAGS) $(CPPFLAGS) -c -o $@ $< > > else ifeq ($(BACKEND),hip) > > %.hip.o: %.hip.cpp > $(DEVCC) $(DEVCCFLAGS) $(CXXFLAGS) $(CPPFLAGS) -c -o $@ $< > > %.o: %.cpp 621a821,828 > # use `hipcc` for all .cpp's. It may be a bit slower (althought I haven't tested it) > # but there's no good way to tell whether or not it fails for some reason. (buggy > # hipcc is probably the culprit) > #%.o: %.cpp > # $(DEVCC) $(DEVCCFLAGS) $(CPPFLAGS) -c -o $@ $< > > endif > 633,634c840,842 < # ------------------------------------------------------------------------------ < # CUDA kernels --- > ifeq ($(BACKEND),cuda) > $(libmagma_dynamic_obj): %.$(o_ext): %.$(d_ext) > $(DEVCC) $(DEVCCFLAGS) $(CPPFLAGS) -I$(SPARSE_DIR)/include -dc -o $@ $< 636,637c844,848 < %.i: %.cu < $(NVCC) -E $(NVCCFLAGS) $(CPPFLAGS) -c -o $@ $< --- > $(libmagma_dlink_obj): $(libmagma_dynamic_obj) > $(DEVCC) $(DEVCCFLAGS) $(CPPFLAGS) -dlink -I$(SPARSE_DIR)/include -o $@ $^ > > $(libsparse_dynamic_obj): %.$(o_ext): %.$(d_ext) > $(DEVCC) $(DEVCCFLAGS) $(CPPFLAGS) -I$(SPARSE_DIR)/include -dc -o $@ $< 639,640c850,851 < %.$(o_ext): %.cu < $(NVCC) $(NVCCFLAGS) $(CPPFLAGS) -c -o $@ $< --- > $(libsparse_dlink_obj): $(libsparse_dynamic_obj) > $(DEVCC) $(DEVCCFLAGS) $(CPPFLAGS) -dlink -I$(SPARSE_DIR)/include -o $@ $^ 642,643c853,856 < $(libmagma_dynamic_obj): %.$(o_ext): %.cu < $(NVCC) $(NVCCFLAGS) $(CPPFLAGS) -I./sparse/include -dc -o $@ $< --- > else ifeq ($(BACKEND),hip) > > $(libmagma_dynamic_obj): %.$(o_ext): %.$(d_ext) > $(DEVCC) $(DEVCCFLAGS) $(CPPFLAGS) -I$(SPARSE_DIR)/include -c -o $@ $< 646c859 < $(NVCC) $(NVCCFLAGS) $(CPPFLAGS) -dlink -I./sparse/include -o $@ $^ --- > $(DEVCC) $(DEVCCFLAGS) $(CPPFLAGS) -dlink -I$(SPARSE_DIR)/include -o $@ $^ 648,649c861,862 < $(libsparse_dynamic_obj): %.$(o_ext): %.cu < $(NVCC) $(NVCCFLAGS) $(CPPFLAGS) -I./sparse/include -dc -o $@ $< --- > $(libsparse_dynamic_obj): %.$(o_ext): %.$(d_ext) > $(DEVCC) $(DEVCCFLAGS) $(CPPFLAGS) -I$(SPARSE_DIR)/include -c -o $@ $< 652c865 < $(NVCC) $(NVCCFLAGS) $(CPPFLAGS) -dlink -I./sparse/include -o $@ $^ --- > $(DEVCC) $(DEVCCFLAGS) $(CPPFLAGS) -I$(SPARSE_DIR)/include -c -o $@ $^ 653a867 > endif 717a932 > #TODO: add hip specific ones 722c937 < -DHAVE_CUBLAS -DHAVE_clBLAS \ --- > -DHAVE_CUBLAS -DHAVE_clBLAS -DHAVE_HIP \ 735c950 < # MAGMA --- > # MAGMA 738c953 < cp sparse/include/*.h $(DESTDIR)$(prefix)/include --- > cp $(SPARSE_DIR)/include/*.h $(DESTDIR)$(prefix)/include 743c958 < # pkgconfig --- > # pkgconfig diff magma/Makefile.gen magma-2/Makefile.gen 2,3c2,3 < # auto-generated by codegen.py $(libmagma_old), Thu Feb 11 16:58:17 2021 < libmagma_old := control/magma_f77.cpp control/magma_param.F90 control/magma.F90 control/abs.cpp control/affinity.cpp control/auxiliary.cpp control/connection_mgpu.cpp control/constants.cpp control/get_batched_crossover.cpp control/get_batched_gemm_decision.cpp control/get_nb.cpp control/get_ntcol.cpp control/magma_bulge.cpp control/magma_threadsetting.cpp control/magma_timer.cpp control/magma_winthread.cpp control/magma_yield.cpp control/magma_zauxiliary.cpp control/magma_zbulge.cpp control/magma_znan_inf.cpp control/pthread_barrier.cpp control/sqrt.cpp control/strlcpy.cpp control/thread_queue.cpp control/trace.cpp control/xerbla.cpp control/zpanel_to_q.cpp control/zprint.cpp control/magma_sf77.cpp control/magma_df77.cpp control/magma_cf77.cpp control/magma_zf77.cpp control/magma_sfortran.F90 control/magma_dfortran.F90 control/magma_cfortran.F90 control/magma_zfortran.F90 control/magmablas_sf77.cpp control/magmablas_df77.cpp control/magmablas_cf77.cpp control/magmablas_zf77.cpp control/magmablas_sfortran.F90 control/magmablas_dfortran.F90 control/magmablas_cfortran.F90 control/magmablas_zfortran.F90 interface_cuda/alloc.cpp interface_cuda/blas_h_v2.cpp interface_cuda/blas_z_v1.cpp interface_cuda/blas_z_v2.cpp interface_cuda/copy_v1.cpp interface_cuda/copy_v2.cpp interface_cuda/error.cpp interface_cuda/interface.cpp interface_cuda/interface_v1.cpp src/cblas_z.cpp src/zcposv_gpu.cpp src/zposv_gpu.cpp src/zpotrf_gpu.cpp src/zpotri_gpu.cpp src/zpotrs_gpu.cpp src/zlauum_gpu.cpp src/ztrtri_gpu.cpp src/zpotrf_mgpu.cpp src/zpotrf_mgpu_right.cpp src/zpotrf3_mgpu.cpp src/zposv.cpp src/zpotrf.cpp src/zpotri.cpp src/zlauum.cpp src/ztrtri.cpp src/zpotrf_m.cpp src/zcgesv_gpu.cpp src/zcgetrs_gpu.cpp src/dgmres_plu_gpu.cpp src/dxgesv_gmres_gpu.cpp src/xshgetrf_gpu.cpp src/xhsgetrf_gpu.cpp src/zgerfs_nopiv_gpu.cpp src/zgesv_gpu.cpp src/zgesv_nopiv_gpu.cpp src/zgetrf_gpu.cpp src/zgetrf_nopiv_gpu.cpp src/zgetri_gpu.cpp src/zgetrs_gpu.cpp src/zgetrs_nopiv_gpu.cpp src/zgetrf_mgpu.cpp src/zgetrf2_mgpu.cpp src/zgerbt_gpu.cpp src/zgesv.cpp src/zgesv_rbt.cpp src/zgetrf.cpp src/zgetf2_nopiv.cpp src/zgetrf_nopiv.cpp src/zgetrf_m.cpp src/zcgeqrsv_gpu.cpp src/zgelqf_gpu.cpp src/zgels3_gpu.cpp src/zgels_gpu.cpp src/zgegqr_gpu.cpp src/zgeqrf2_gpu.cpp src/zgeqrf3_gpu.cpp src/zgeqrf_gpu.cpp src/zgeqr2x_gpu.cpp src/zgeqr2x_gpu-v2.cpp src/zgeqr2x_gpu-v3.cpp src/zgeqrs3_gpu.cpp src/zgeqrs_gpu.cpp src/zlarfb_gpu.cpp src/zlarfb_gpu_gemm.cpp src/zungqr_gpu.cpp src/zunmql2_gpu.cpp src/zunmqr2_gpu.cpp src/zunmqr_gpu.cpp src/zgeqrf_mgpu.cpp src/zgeqp3_gpu.cpp src/zlaqps_gpu.cpp src/zgelqf.cpp src/zgels.cpp src/zgeqlf.cpp src/zgeqrf.cpp src/zgeqrf_ooc.cpp src/zgglse.cpp src/zggrqf.cpp src/zunglq.cpp src/zungqr.cpp src/zungqr2.cpp src/zunmlq.cpp src/zunmql.cpp src/zunmqr.cpp src/zunmrq.cpp src/zgeqp3.cpp src/zlaqps.cpp src/zgeqrf_m.cpp src/zungqr_m.cpp src/zunmqr_m.cpp src/zhetrf_gpu.cpp src/zchesv_gpu.cpp src/zhesv.cpp src/zhetrf.cpp src/dsidi.cpp src/zhetrf_aasen.cpp src/zhetrf_nopiv.cpp src/zhetrf_nopiv_cpu.cpp src/zsytrf_nopiv_cpu.cpp src/zhetrf_nopiv_gpu.cpp src/zsytrf_nopiv_gpu.cpp src/zhetrs_nopiv_gpu.cpp src/zsytrs_nopiv_gpu.cpp src/zhesv_nopiv_gpu.cpp src/zsysv_nopiv_gpu.cpp src/zlahef_gpu.cpp src/dsyevd_gpu.cpp src/dsyevdx_gpu.cpp src/zheevd_gpu.cpp src/zheevdx_gpu.cpp src/zheevr_gpu.cpp src/zheevx_gpu.cpp src/zhetrd2_gpu.cpp src/zhetrd_gpu.cpp src/zunmtr_gpu.cpp src/dsyevd.cpp src/dsyevdx.cpp src/zheevd.cpp src/zheevdx.cpp src/zheevr.cpp src/zheevx.cpp src/dlaex0.cpp src/dlaex1.cpp src/dlaex3.cpp src/dmove_eig.cpp src/dstedx.cpp src/zhetrd.cpp src/zlatrd.cpp src/zlatrd2.cpp src/zstedx.cpp src/zungtr.cpp src/zunmtr.cpp src/zhetrd_mgpu.cpp src/zlatrd_mgpu.cpp src/dsyevd_m.cpp src/zheevd_m.cpp src/dsyevdx_m.cpp src/zheevdx_m.cpp src/dlaex0_m.cpp src/dlaex1_m.cpp src/dlaex3_m.cpp src/dstedx_m.cpp src/zstedx_m.cpp src/zunmtr_m.cpp src/zbulge_applyQ_v2.cpp src/zhetrd_he2hb.cpp src/zhetrd_hb2st.cpp src/zbulge_back.cpp src/zungqr_2stage_gpu.cpp src/zunmqr_2stage_gpu.cpp src/zhegvdx_2stage.cpp src/zheevdx_2stage.cpp src/zbulge_back_m.cpp src/zbulge_applyQ_v2_m.cpp src/zheevdx_2stage_m.cpp src/zhegvdx_2stage_m.cpp src/zhetrd_he2hb_mgpu.cpp src/core_zlarfy.cpp src/core_zhbtype1cb.cpp src/core_zhbtype2cb.cpp src/core_zhbtype3cb.cpp src/dsygvd.cpp src/dsygvdx.cpp src/zhegst.cpp src/zhegvd.cpp src/zhegvdx.cpp src/zhegvr.cpp src/zhegvx.cpp src/zhegst_gpu.cpp src/zhegst_m.cpp src/dsygvd_m.cpp src/zhegvd_m.cpp src/dsygvdx_m.cpp src/zhegvdx_m.cpp src/ztrsm_m.cpp src/dgeev.cpp src/zgeev.cpp src/zgehrd.cpp src/zgehrd2.cpp src/zlahr2.cpp src/zlahru.cpp src/dlaln2.cpp src/dlaqtrsd.cpp src/zlatrsd.cpp src/dtrevc3.cpp src/dtrevc3_mt.cpp src/ztrevc3.cpp src/ztrevc3_mt.cpp src/zunghr.cpp src/dgeev_m.cpp src/zgeev_m.cpp src/zgehrd_m.cpp src/zlahr2_m.cpp src/zlahru_m.cpp src/zunghr_m.cpp src/dgesdd.cpp src/zgesdd.cpp src/dgesvd.cpp src/zgesvd.cpp src/zgebrd.cpp src/zlabrd_gpu.cpp src/zungbr.cpp src/zunmbr.cpp src/zgetf2_batched.cpp src/zgetf2_nopiv_batched.cpp src/zgetrf_panel_batched.cpp src/zgetrf_panel_nopiv_batched.cpp src/zgetrf_batched.cpp src/zgetrf_nopiv_batched.cpp src/zgetrs_batched.cpp src/zgetrs_nopiv_batched.cpp src/zgesv_batched.cpp src/zgesv_nopiv_batched.cpp src/zgerbt_batched.cpp src/zgesv_rbt_batched.cpp src/zgetri_outofplace_batched.cpp src/zpotf2_batched.cpp src/zpotrf_batched.cpp src/zpotrf_panel_batched.cpp src/zpotrs_batched.cpp src/zposv_batched.cpp src/zlarft_batched.cpp src/zlarfb_gemm_batched.cpp src/zgeqrf_panel_batched.cpp src/zgeqrf_batched.cpp src/zgeqrf_expert_batched.cpp src/zpotf2_vbatched.cpp src/zpotrf_panel_vbatched.cpp src/zpotrf_vbatched.cpp src/zgetf2_native.cpp src/zgetrf_panel_native.cpp src/zpotrf_panel_native.cpp magmablas/zaxpycp.cu magmablas/zcaxpycp.cu magmablas/zdiinertia.cu magmablas/zgeadd.cu magmablas/zgeadd2.cu magmablas/zgeam.cu magmablas/zgemm_fermi.cu magmablas/zgemm_reduce.cu magmablas/zgemv_conj.cu magmablas/zgemv_fermi.cu magmablas/zgerbt.cu magmablas/zgerbt_kernels.cu magmablas/zgetmatrix_transpose.cpp magmablas/zhemm.cu magmablas/zhemv.cu magmablas/zhemv_upper.cu magmablas/zher2k.cpp magmablas/zherk.cpp magmablas/zherk_small_reduce.cu magmablas/zlacpy.cu magmablas/zlacpy_conj.cu magmablas/zlacpy_sym_in.cu magmablas/zlacpy_sym_out.cu magmablas/zlag2c.cu magmablas/clag2z.cu magmablas/zlange.cu magmablas/zlanhe.cu magmablas/zlaqps2_gpu.cu magmablas/zlarf.cu magmablas/zlarfbx.cu magmablas/zlarfg-v2.cu magmablas/zlarfg.cu magmablas/zlarfgx-v2.cu magmablas/zlarft_kernels.cu magmablas/zlarfx.cu magmablas/zlascl.cu magmablas/zlascl2.cu magmablas/zlascl_2x2.cu magmablas/zlascl_diag.cu magmablas/zlaset.cu magmablas/zlaset_band.cu magmablas/zlaswp.cu magmablas/zclaswp.cu magmablas/zlaswp_sym.cu magmablas/zlat2c.cu magmablas/clat2z.cu magmablas/dznrm2.cu magmablas/zsetmatrix_transpose.cpp magmablas/zswap.cu magmablas/zswapblk.cu magmablas/zswapdblk.cu magmablas/zsymm.cu magmablas/zsymmetrize.cu magmablas/zsymmetrize_tiles.cu magmablas/zsymv.cu magmablas/zsymv_upper.cu magmablas/ztranspose.cu magmablas/ztranspose_conj.cu magmablas/ztranspose_conj_inplace.cu magmablas/ztranspose_inplace.cu magmablas/ztrmm.cu magmablas/ztrmv.cu magmablas/ztrsm.cu magmablas/ztrsv.cu magmablas/ztrtri_diag.cu magmablas/ztrtri_lower.cu magmablas/ztrtri_lower_batched.cu magmablas/ztrtri_upper.cu magmablas/ztrtri_upper_batched.cu magmablas/magmablas_z_v1.cpp magmablas/magmablas_zc_v1.cpp magmablas/zbcyclic.cpp magmablas/zgetmatrix_transpose_mgpu.cpp magmablas/zsetmatrix_transpose_mgpu.cpp magmablas/zhemv_mgpu.cu magmablas/zhemv_mgpu_upper.cu magmablas/zhemm_mgpu.cpp magmablas/zher2k_mgpu.cpp magmablas/zherk_mgpu.cpp magmablas/zgetf2.cu magmablas/zgeqr2.cpp magmablas/zgeqr2x_gpu-v4.cu magmablas/zpotf2.cu magmablas/zgetf2_native_kernel.cu magmablas/zhetrs.cu magmablas/zgeadd_batched.cu magmablas/zgemm_batched.cpp magmablas/cgemm_batched_core.cu magmablas/dgemm_batched_core.cu magmablas/sgemm_batched_core.cu magmablas/zgemm_batched_core.cu magmablas/zgemm_batched_smallsq.cu magmablas/cgemv_batched_core.cu magmablas/dgemv_batched_core.cu magmablas/sgemv_batched_core.cu magmablas/zgemv_batched_core.cu magmablas/zhemv_batched_core.cu magmablas/zgeqr2_batched.cu magmablas/zgeqrf_batched_smallsq.cu magmablas/zgerbt_func_batched.cu magmablas/zgetf2_kernels.cu magmablas/zgetrf_batched_smallsq_noshfl.cu magmablas/zgetrf_batched_smallsq_shfl.cu magmablas/getrf_setup_pivinfo.cu magmablas/zhemm_batched_core.cu magmablas/zher2k_batched.cpp magmablas/zherk_batched.cpp magmablas/cherk_batched_core.cu magmablas/zherk_batched_core.cu magmablas/zlaswp_batched.cu magmablas/zpotf2_kernels.cu magmablas/set_pointer.cu magmablas/zset_pointer.cu magmablas/zsyr2k_batched.cpp magmablas/dsyrk_batched_core.cu magmablas/ssyrk_batched_core.cu magmablas/ztrmm_batched_core.cu magmablas/ztrsm_batched.cpp magmablas/ztrsm_batched_core.cpp magmablas/ztrsm_small_batched.cu magmablas/ztrsv_batched.cu magmablas/ztrtri_diag_batched.cu magmablas/zgetf2_nopiv_kernels.cu magmablas/zgemm_vbatched_core.cu magmablas/cgemm_vbatched_core.cu magmablas/dgemm_vbatched_core.cu magmablas/sgemm_vbatched_core.cu magmablas/zgemv_vbatched_core.cu magmablas/cgemv_vbatched_core.cu magmablas/dgemv_vbatched_core.cu magmablas/sgemv_vbatched_core.cu magmablas/zhemm_vbatched_core.cu magmablas/zhemv_vbatched_core.cu magmablas/cherk_vbatched_core.cu magmablas/zherk_vbatched_core.cu magmablas/ssyrk_vbatched_core.cu magmablas/dsyrk_vbatched_core.cu magmablas/ztrmm_vbatched_core.cu magmablas/ztrsm_vbatched_core.cu magmablas/ztrtri_diag_vbatched.cu magmablas/zgemm_vbatched.cpp magmablas/zgemv_vbatched.cpp magmablas/zhemm_vbatched.cpp magmablas/zhemv_vbatched.cpp magmablas/zher2k_vbatched.cpp magmablas/zherk_vbatched.cpp magmablas/zsyr2k_vbatched.cpp magmablas/zsyrk_vbatched.cpp magmablas/ztrmm_vbatched.cpp magmablas/ztrsm_vbatched.cpp magmablas/zpotf2_kernels_var.cu magmablas/prefix_sum.cu magmablas/vbatched_aux.cu magmablas/vbatched_check.cu magmablas/blas_zbatched.cpp magmablas/hgemm_batched_core.cu magmablas/slag2h.cu magmablas/hlag2s.cu magmablas/hlaconvert.cu magmablas/hlaswp.cu magmablas/hset_pointer.cu --- > # auto-generated by codegen.py $(libmagma_old), Thu Feb 11 16:18:36 2021 > libmagma_old := control/magma_f77.cpp control/magma_param.F90 control/magma.F90 control/abs.cpp control/affinity.cpp control/auxiliary.cpp control/constants.cpp control/get_batched_crossover.cpp control/get_batched_gemm_decision.cpp control/get_nb.cpp control/get_ntcol.cpp control/magma_bulge.cpp control/magma_threadsetting.cpp control/magma_timer.cpp control/magma_winthread.cpp control/magma_yield.cpp control/magma_zauxiliary.cpp control/magma_zbulge.cpp control/magma_znan_inf.cpp control/pthread_barrier.cpp control/sqrt.cpp control/strlcpy.cpp control/thread_queue.cpp control/trace.cpp control/xerbla.cpp control/zpanel_to_q.cpp control/zprint.cpp control/magma_sf77.cpp control/magma_df77.cpp control/magma_cf77.cpp control/magma_zf77.cpp control/magma_sfortran.F90 control/magma_dfortran.F90 control/magma_cfortran.F90 control/magma_zfortran.F90 control/magmablas_sf77.cpp control/magmablas_df77.cpp control/magmablas_cf77.cpp control/magmablas_zf77.cpp control/magmablas_sfortran.F90 control/magmablas_dfortran.F90 control/magmablas_cfortran.F90 control/magmablas_zfortran.F90 src/cblas_z.cpp src/zcposv_gpu.cpp src/zposv_gpu.cpp src/zpotrf_gpu.cpp src/zpotri_gpu.cpp src/zpotrs_gpu.cpp src/zlauum_gpu.cpp src/ztrtri_gpu.cpp src/zpotrf_mgpu.cpp src/zpotrf_mgpu_right.cpp src/zpotrf3_mgpu.cpp src/zposv.cpp src/zpotrf.cpp src/zpotri.cpp src/zlauum.cpp src/ztrtri.cpp src/zpotrf_m.cpp src/zcgesv_gpu.cpp src/zcgetrs_gpu.cpp src/dgmres_plu_gpu.cpp src/dxgesv_gmres_gpu.cpp src/xshgetrf_gpu.cpp src/xhsgetrf_gpu.cpp src/zgerfs_nopiv_gpu.cpp src/zgesv_gpu.cpp src/zgesv_nopiv_gpu.cpp src/zgetrf_gpu.cpp src/zgetrf_nopiv_gpu.cpp src/zgetri_gpu.cpp src/zgetrs_gpu.cpp src/zgetrs_nopiv_gpu.cpp src/zgetrf_mgpu.cpp src/zgetrf2_mgpu.cpp src/zgerbt_gpu.cpp src/zgesv.cpp src/zgesv_rbt.cpp src/zgetrf.cpp src/zgetf2_nopiv.cpp src/zgetrf_nopiv.cpp src/zgetrf_m.cpp src/zcgeqrsv_gpu.cpp src/zgelqf_gpu.cpp src/zgels3_gpu.cpp src/zgels_gpu.cpp src/zgegqr_gpu.cpp src/zgeqrf2_gpu.cpp src/zgeqrf3_gpu.cpp src/zgeqrf_gpu.cpp src/zgeqr2x_gpu.cpp src/zgeqr2x_gpu-v2.cpp src/zgeqr2x_gpu-v3.cpp src/zgeqrs3_gpu.cpp src/zgeqrs_gpu.cpp src/zlarfb_gpu.cpp src/zlarfb_gpu_gemm.cpp src/zungqr_gpu.cpp src/zunmql2_gpu.cpp src/zunmqr2_gpu.cpp src/zunmqr_gpu.cpp src/zgeqrf_mgpu.cpp src/zgeqp3_gpu.cpp src/zlaqps_gpu.cpp src/zgelqf.cpp src/zgels.cpp src/zgeqlf.cpp src/zgeqrf.cpp src/zgeqrf_ooc.cpp src/zgglse.cpp src/zggrqf.cpp src/zunglq.cpp src/zungqr.cpp src/zungqr2.cpp src/zunmlq.cpp src/zunmql.cpp src/zunmqr.cpp src/zunmrq.cpp src/zgeqp3.cpp src/zlaqps.cpp src/zgeqrf_m.cpp src/zungqr_m.cpp src/zunmqr_m.cpp src/zhetrf_gpu.cpp src/zchesv_gpu.cpp src/zhesv.cpp src/zhetrf.cpp src/dsidi.cpp src/zhetrf_aasen.cpp src/zhetrf_nopiv.cpp src/zhetrf_nopiv_cpu.cpp src/zsytrf_nopiv_cpu.cpp src/zhetrf_nopiv_gpu.cpp src/zsytrf_nopiv_gpu.cpp src/zhetrs_nopiv_gpu.cpp src/zsytrs_nopiv_gpu.cpp src/zhesv_nopiv_gpu.cpp src/zsysv_nopiv_gpu.cpp src/zlahef_gpu.cpp src/dsyevd_gpu.cpp src/dsyevdx_gpu.cpp src/zheevd_gpu.cpp src/zheevdx_gpu.cpp src/zheevr_gpu.cpp src/zheevx_gpu.cpp src/zhetrd2_gpu.cpp src/zhetrd_gpu.cpp src/zunmtr_gpu.cpp src/dsyevd.cpp src/dsyevdx.cpp src/zheevd.cpp src/zheevdx.cpp src/zheevr.cpp src/zheevx.cpp src/dlaex0.cpp src/dlaex1.cpp src/dlaex3.cpp src/dmove_eig.cpp src/dstedx.cpp src/zhetrd.cpp src/zlatrd.cpp src/zlatrd2.cpp src/zstedx.cpp src/zungtr.cpp src/zunmtr.cpp src/zhetrd_mgpu.cpp src/zlatrd_mgpu.cpp src/dsyevd_m.cpp src/zheevd_m.cpp src/dsyevdx_m.cpp src/zheevdx_m.cpp src/dlaex0_m.cpp src/dlaex1_m.cpp src/dlaex3_m.cpp src/dstedx_m.cpp src/zstedx_m.cpp src/zunmtr_m.cpp src/zbulge_applyQ_v2.cpp src/zhetrd_he2hb.cpp src/zhetrd_hb2st.cpp src/zbulge_back.cpp src/zungqr_2stage_gpu.cpp src/zunmqr_2stage_gpu.cpp src/zhegvdx_2stage.cpp src/zheevdx_2stage.cpp src/zbulge_back_m.cpp src/zbulge_applyQ_v2_m.cpp src/zheevdx_2stage_m.cpp src/zhegvdx_2stage_m.cpp src/zhetrd_he2hb_mgpu.cpp src/core_zlarfy.cpp src/core_zhbtype1cb.cpp src/core_zhbtype2cb.cpp src/core_zhbtype3cb.cpp src/dsygvd.cpp src/dsygvdx.cpp src/zhegst.cpp src/zhegvd.cpp src/zhegvdx.cpp src/zhegvr.cpp src/zhegvx.cpp src/zhegst_gpu.cpp src/zhegst_m.cpp src/dsygvd_m.cpp src/zhegvd_m.cpp src/dsygvdx_m.cpp src/zhegvdx_m.cpp src/ztrsm_m.cpp src/dgeev.cpp src/zgeev.cpp src/zgehrd.cpp src/zgehrd2.cpp src/zlahr2.cpp src/zlahru.cpp src/dlaln2.cpp src/dlaqtrsd.cpp src/zlatrsd.cpp src/dtrevc3.cpp src/dtrevc3_mt.cpp src/ztrevc3.cpp src/ztrevc3_mt.cpp src/zunghr.cpp src/dgeev_m.cpp src/zgeev_m.cpp src/zgehrd_m.cpp src/zlahr2_m.cpp src/zlahru_m.cpp src/zunghr_m.cpp src/dgesdd.cpp src/zgesdd.cpp src/dgesvd.cpp src/zgesvd.cpp src/zgebrd.cpp src/zlabrd_gpu.cpp src/zungbr.cpp src/zunmbr.cpp src/zgetf2_batched.cpp src/zgetf2_nopiv_batched.cpp src/zgetrf_panel_batched.cpp src/zgetrf_panel_nopiv_batched.cpp src/zgetrf_batched.cpp src/zgetrf_nopiv_batched.cpp src/zgetrs_batched.cpp src/zgetrs_nopiv_batched.cpp src/zgesv_batched.cpp src/zgesv_nopiv_batched.cpp src/zgerbt_batched.cpp src/zgesv_rbt_batched.cpp src/zgetri_outofplace_batched.cpp src/zpotf2_batched.cpp src/zpotrf_batched.cpp src/zpotrf_panel_batched.cpp src/zpotrs_batched.cpp src/zposv_batched.cpp src/zlarft_batched.cpp src/zlarfb_gemm_batched.cpp src/zgeqrf_panel_batched.cpp src/zgeqrf_batched.cpp src/zgeqrf_expert_batched.cpp src/zpotf2_vbatched.cpp src/zpotrf_panel_vbatched.cpp src/zpotrf_vbatched.cpp src/zgetf2_native.cpp src/zgetrf_panel_native.cpp src/zpotrf_panel_native.cpp interface_cuda/alloc.cpp interface_cuda/blas_h_v2.cpp interface_cuda/blas_z_v1.cpp interface_cuda/blas_z_v2.cpp interface_cuda/copy_v1.cpp interface_cuda/copy_v2.cpp interface_cuda/error.cpp interface_cuda/connection_mgpu.cpp interface_cuda/interface.cpp interface_cuda/interface_v1.cpp magmablas/zaxpycp.cu magmablas/zcaxpycp.cu magmablas/zdiinertia.cu magmablas/zgeadd.cu magmablas/zgeadd2.cu magmablas/zgeam.cu magmablas/zgemm_fermi.cu magmablas/zgemm_reduce.cu magmablas/zgemv_conj.cu magmablas/zgemv_fermi.cu magmablas/zgerbt.cu magmablas/zgerbt_kernels.cu magmablas/zgetmatrix_transpose.cpp magmablas/zhemm.cu magmablas/zhemv.cu magmablas/zhemv_upper.cu magmablas/zher2k.cpp magmablas/zherk.cpp magmablas/zherk_small_reduce.cu magmablas/zlacpy.cu magmablas/zlacpy_conj.cu magmablas/zlacpy_sym_in.cu magmablas/zlacpy_sym_out.cu magmablas/zlag2c.cu magmablas/clag2z.cu magmablas/zlange.cu magmablas/zlanhe.cu magmablas/zlaqps2_gpu.cu magmablas/zlarf.cu magmablas/zlarfbx.cu magmablas/zlarfg-v2.cu magmablas/zlarfg.cu magmablas/zlarfgx-v2.cu magmablas/zlarft_kernels.cu magmablas/zlarfx.cu magmablas/zlascl.cu magmablas/zlascl2.cu magmablas/zlascl_2x2.cu magmablas/zlascl_diag.cu magmablas/zlaset.cu magmablas/zlaset_band.cu magmablas/zlaswp.cu magmablas/zclaswp.cu magmablas/zlaswp_sym.cu magmablas/zlat2c.cu magmablas/clat2z.cu magmablas/dznrm2.cu magmablas/zsetmatrix_transpose.cpp magmablas/zswap.cu magmablas/zswapblk.cu magmablas/zswapdblk.cu magmablas/zsymm.cu magmablas/zsymmetrize.cu magmablas/zsymmetrize_tiles.cu magmablas/zsymv.cu magmablas/zsymv_upper.cu magmablas/ztranspose.cu magmablas/ztranspose_conj.cu magmablas/ztranspose_conj_inplace.cu magmablas/ztranspose_inplace.cu magmablas/ztrmm.cu magmablas/ztrmv.cu magmablas/ztrsm.cu magmablas/ztrsv.cu magmablas/ztrtri_diag.cu magmablas/ztrtri_lower.cu magmablas/ztrtri_lower_batched.cu magmablas/ztrtri_upper.cu magmablas/ztrtri_upper_batched.cu magmablas/magmablas_z_v1.cpp magmablas/magmablas_zc_v1.cpp magmablas/zbcyclic.cpp magmablas/zgetmatrix_transpose_mgpu.cpp magmablas/zsetmatrix_transpose_mgpu.cpp magmablas/zhemv_mgpu.cu magmablas/zhemv_mgpu_upper.cu magmablas/zhemm_mgpu.cpp magmablas/zher2k_mgpu.cpp magmablas/zherk_mgpu.cpp magmablas/zgetf2.cu magmablas/zgeqr2.cpp magmablas/zgeqr2x_gpu-v4.cu magmablas/zpotf2.cu magmablas/zgetf2_native_kernel.cu magmablas/zhetrs.cu magmablas/zgeadd_batched.cu magmablas/zgemm_batched.cpp magmablas/cgemm_batched_core.cu magmablas/dgemm_batched_core.cu magmablas/sgemm_batched_core.cu magmablas/zgemm_batched_core.cu magmablas/zgemm_batched_smallsq.cu magmablas/cgemv_batched_core.cu magmablas/dgemv_batched_core.cu magmablas/sgemv_batched_core.cu magmablas/zgemv_batched_core.cu magmablas/zhemv_batched_core.cu magmablas/zgeqr2_batched.cu magmablas/zgeqrf_batched_smallsq.cu magmablas/zgerbt_func_batched.cu magmablas/zgetf2_kernels.cu magmablas/zgetrf_batched_smallsq_noshfl.cu magmablas/zgetrf_batched_smallsq_shfl.cu magmablas/getrf_setup_pivinfo.cu magmablas/zhemm_batched_core.cu magmablas/zher2k_batched.cpp magmablas/zherk_batched.cpp magmablas/cherk_batched_core.cu magmablas/zherk_batched_core.cu magmablas/zlaswp_batched.cu magmablas/zpotf2_kernels.cu magmablas/set_pointer.cu magmablas/zset_pointer.cu magmablas/zsyr2k_batched.cpp magmablas/dsyrk_batched_core.cu magmablas/ssyrk_batched_core.cu magmablas/ztrmm_batched_core.cu magmablas/ztrsm_batched.cpp magmablas/ztrsm_batched_core.cpp magmablas/ztrsm_small_batched.cu magmablas/ztrsv_batched.cu magmablas/ztrtri_diag_batched.cu magmablas/zgetf2_nopiv_kernels.cu magmablas/zgemm_vbatched_core.cu magmablas/cgemm_vbatched_core.cu magmablas/dgemm_vbatched_core.cu magmablas/sgemm_vbatched_core.cu magmablas/zgemv_vbatched_core.cu magmablas/cgemv_vbatched_core.cu magmablas/dgemv_vbatched_core.cu magmablas/sgemv_vbatched_core.cu magmablas/zhemm_vbatched_core.cu magmablas/zhemv_vbatched_core.cu magmablas/cherk_vbatched_core.cu magmablas/zherk_vbatched_core.cu magmablas/ssyrk_vbatched_core.cu magmablas/dsyrk_vbatched_core.cu magmablas/ztrmm_vbatched_core.cu magmablas/ztrsm_vbatched_core.cu magmablas/ztrtri_diag_vbatched.cu magmablas/zgemm_vbatched.cpp magmablas/zgemv_vbatched.cpp magmablas/zhemm_vbatched.cpp magmablas/zhemv_vbatched.cpp magmablas/zher2k_vbatched.cpp magmablas/zherk_vbatched.cpp magmablas/zsyr2k_vbatched.cpp magmablas/zsyrk_vbatched.cpp magmablas/ztrmm_vbatched.cpp magmablas/ztrsm_vbatched.cpp magmablas/zpotf2_kernels_var.cu magmablas/prefix_sum.cu magmablas/vbatched_aux.cu magmablas/vbatched_check.cu magmablas/blas_zbatched.cpp magmablas/hgemm_batched_core.cu magmablas/slag2h.cu magmablas/hlag2s.cu magmablas/hlaconvert.cu magmablas/hlaswp.cu magmablas/hset_pointer.cu 50,67d49 < interface_cuda/blas_s_v1.cpp: interface_cuda/blas_z_v1.cpp < $(codegen) -p s $< < < interface_cuda/blas_d_v1.cpp: interface_cuda/blas_z_v1.cpp < $(codegen) -p d $< < < interface_cuda/blas_c_v1.cpp: interface_cuda/blas_z_v1.cpp < $(codegen) -p c $< < < interface_cuda/blas_s_v2.cpp: interface_cuda/blas_z_v2.cpp < $(codegen) -p s $< < < interface_cuda/blas_d_v2.cpp: interface_cuda/blas_z_v2.cpp < $(codegen) -p d $< < < interface_cuda/blas_c_v2.cpp: interface_cuda/blas_z_v2.cpp < $(codegen) -p c $< < 1663a1646,1663 > interface_cuda/blas_s_v1.cpp: interface_cuda/blas_z_v1.cpp > $(codegen) -p s $< > > interface_cuda/blas_d_v1.cpp: interface_cuda/blas_z_v1.cpp > $(codegen) -p d $< > > interface_cuda/blas_c_v1.cpp: interface_cuda/blas_z_v1.cpp > $(codegen) -p c $< > > interface_cuda/blas_s_v2.cpp: interface_cuda/blas_z_v2.cpp > $(codegen) -p s $< > > interface_cuda/blas_d_v2.cpp: interface_cuda/blas_z_v2.cpp > $(codegen) -p d $< > > interface_cuda/blas_c_v2.cpp: interface_cuda/blas_z_v2.cpp > $(codegen) -p c $< > 2715d2714 < control/connection_mgpu.cpp \ 2753,2761d2751 < interface_cuda/alloc.cpp \ < interface_cuda/blas_h_v2.cpp \ < interface_cuda/blas_z_v1.cpp \ < interface_cuda/blas_z_v2.cpp \ < interface_cuda/copy_v1.cpp \ < interface_cuda/copy_v2.cpp \ < interface_cuda/error.cpp \ < interface_cuda/interface.cpp \ < interface_cuda/interface_v1.cpp \ 2984a2975,2984 > interface_cuda/alloc.cpp \ > interface_cuda/blas_h_v2.cpp \ > interface_cuda/blas_z_v1.cpp \ > interface_cuda/blas_z_v2.cpp \ > interface_cuda/copy_v1.cpp \ > interface_cuda/copy_v2.cpp \ > interface_cuda/error.cpp \ > interface_cuda/connection_mgpu.cpp \ > interface_cuda/interface.cpp \ > interface_cuda/interface_v1.cpp \ 3163,3168d3162 < interface_cuda/blas_s_v1.cpp \ < interface_cuda/blas_d_v1.cpp \ < interface_cuda/blas_c_v1.cpp \ < interface_cuda/blas_s_v2.cpp \ < interface_cuda/blas_d_v2.cpp \ < interface_cuda/blas_c_v2.cpp \ 3700a3695,3700 > interface_cuda/blas_s_v1.cpp \ > interface_cuda/blas_d_v1.cpp \ > interface_cuda/blas_c_v1.cpp \ > interface_cuda/blas_s_v2.cpp \ > interface_cuda/blas_d_v2.cpp \ > interface_cuda/blas_c_v2.cpp \ 4061c4061 < # auto-generated by codegen.py $(libmagma_dynamic_old), Thu Feb 11 16:58:18 2021 --- > # auto-generated by codegen.py $(libmagma_dynamic_old), Thu Feb 11 16:18:37 2021 4081c4081 < # auto-generated by codegen.py $(libtest_old), Thu Feb 11 16:58:18 2021 --- > # auto-generated by codegen.py $(libtest_old), Thu Feb 11 16:18:37 2021 4127c4127 < # auto-generated by codegen.py $(liblapacktest_old), Thu Feb 11 16:58:18 2021 --- > # auto-generated by codegen.py $(liblapacktest_old), Thu Feb 11 16:18:37 2021 4198c4198 < # auto-generated by codegen.py $(testing_old), Thu Feb 11 16:58:18 2021 --- > # auto-generated by codegen.py $(testing_old), Thu Feb 11 16:18:37 2021 5908,5909c5908,5909 < # auto-generated by codegen.py $(libsparse_old), Thu Feb 11 16:58:19 2021 < libsparse_old := sparse/blas/magma_z_blaswrapper.cpp sparse/blas/zbajac_csr.cu sparse/blas/zbajac_csr_overlap.cu sparse/blas/zgeaxpy.cu sparse/blas/zgecsr5mv.cu sparse/blas/zgecsrmv.cu sparse/blas/zgeellmv.cu sparse/blas/zgeelltmv.cu sparse/blas/zgeellrtmv.cu sparse/blas/zgesellcmv.cu sparse/blas/zgesellcmmv.cu sparse/blas/zjacobisetup.cu sparse/blas/zlobpcg_shift.cu sparse/blas/zlobpcg_residuals.cu sparse/blas/zlobpcg_maxpy.cu sparse/blas/zmdotc.cu sparse/blas/zgemvmdot.cu sparse/blas/zmdot_shfl.cu sparse/blas/zmergebicgstab2.cu sparse/blas/zmergebicgstab3.cu sparse/blas/zmergeidr.cu sparse/blas/zmergecg.cu sparse/blas/zmergecgs.cu sparse/blas/zmergeqmr.cu sparse/blas/zmergebicgstab.cu sparse/blas/zmergetfqmr.cu sparse/blas/zmgecsrmv.cu sparse/blas/zmgeellmv.cu sparse/blas/zmgeelltmv.cu sparse/blas/zmgesellcmmv.cu sparse/blas/zpipelinedgmres.cu sparse/blas/zilu.cpp sparse/blas/magma_zcuspmm.cpp sparse/blas/magma_zcuspaxpy.cpp sparse/blas/zcgecsrmv_mixed_prec.cu sparse/blas/zparilu.cpp sparse/blas/zparilu_kernels.cu sparse/blas/zparic_kernels.cu sparse/blas/zparilut_kernels.cu sparse/blas/zparilut_candidates.cu sparse/blas/magma_zthrsrm.cu sparse/blas/magma_zpreselect.cu sparse/blas/magma_zsampleselect.cu sparse/blas/zcompact.cu sparse/blas/magma_zmcsrcompressor_gpu.cu sparse/blas/magma_zdiagcheck.cu sparse/blas/zgecsrreimsplit.cu sparse/blas/zgedensereimsplit.cu sparse/blas/magma_zmconjugate.cu sparse/blas/magma_zget_rowptr.cu sparse/blas/magma_zmatrixtools_gpu.cu sparse/blas/zjaccard_weights.cu sparse/blas/zgeisai_trsv.cu sparse/blas/zgeisai_maxblock.cu sparse/blas/zgeisai_batched32.cu sparse/blas/zge3pt.cu sparse/blas/zmergeblockkrylov.cu sparse/blas/zgecscsyncfreetrsm.cu sparse/control/error.cpp sparse/control/magma_zdomainoverlap.cpp sparse/control/magma_zutil_sparse.cpp sparse/control/magma_zfree.cpp sparse/control/magma_zmatrixchar.cpp sparse/control/magma_zmconvert.cpp sparse/control/magma_zmgenerator.cpp sparse/control/magma_zmio.cpp sparse/control/magma_zsolverinfo.cpp sparse/control/magma_zcsrsplit.cpp sparse/control/magma_zpariluutils.cpp sparse/control/magma_zmcsrpass.cpp sparse/control/magma_zmcsrpass_gpu.cpp sparse/control/magma_zmcsrcompressor.cpp sparse/control/magma_zmscale.cpp sparse/control/magma_zmshrink.cpp sparse/control/magma_zmslice.cpp sparse/control/magma_zmdiagdom.cpp sparse/control/magma_zmdiff.cpp sparse/control/magma_zmlumerge.cpp sparse/control/magma_zmtranspose.cpp sparse/control/magma_zmtranspose_cpu.cpp sparse/control/magma_zmtransfer.cpp sparse/control/magma_zmilustruct.cpp sparse/control/magma_zselect.cpp sparse/control/magma_zsort.cpp sparse/control/magma_zvinit.cpp sparse/control/magma_zvio.cpp sparse/control/magma_zvtranspose.cpp sparse/control/magma_zvpass.cpp sparse/control/magma_zvpass_gpu.cpp sparse/control/mmio.cpp sparse/control/magma_zgeisai_tools.cpp sparse/control/magma_zmsupernodal.cpp sparse/control/magma_zmfrobenius.cpp sparse/control/magma_zmatrix_tools.cpp sparse/control/magma_zparilu_kernels.cpp sparse/control/magma_zparic_kernels.cpp sparse/control/magma_zparilut_kernels.cpp sparse/control/magma_zparilut_tools.cpp sparse/control/magma_zparict_tools.cpp sparse/src/zcg.cpp sparse/src/zcg_res.cpp sparse/src/zcg_merge.cpp sparse/src/zpcg_merge.cpp sparse/src/zbicgstab.cpp sparse/src/zbicg.cpp sparse/src/zpbicg.cpp sparse/src/zbicgstab_merge.cpp sparse/src/zbicgstab_merge2.cpp sparse/src/zbicgstab_merge3.cpp sparse/src/zqmr.cpp sparse/src/zqmr_merge.cpp sparse/src/ztfqmr.cpp sparse/src/ztfqmr_unrolled.cpp sparse/src/ztfqmr_merge.cpp sparse/src/zpqmr.cpp sparse/src/zpqmr_merge.cpp sparse/src/zptfqmr.cpp sparse/src/zptfqmr_merge.cpp sparse/src/zidr.cpp sparse/src/zidr_merge.cpp sparse/src/zidr_strms.cpp sparse/src/ziterref.cpp sparse/src/zftjacobi.cpp sparse/src/zjacobi.cpp sparse/src/zbaiter.cpp sparse/src/zbaiter_overlap.cpp sparse/src/zpcg.cpp sparse/src/zcgs.cpp sparse/src/zcgs_merge.cpp sparse/src/zpcgs.cpp sparse/src/zpcgs_merge.cpp sparse/src/zbpcg.cpp sparse/src/zfgmres.cpp sparse/src/zpbicgstab.cpp sparse/src/zpidr.cpp sparse/src/zpidr_merge.cpp sparse/src/zpidr_strms.cpp sparse/src/zbombard.cpp sparse/src/zbombard_merge.cpp sparse/src/zpbicgstab_merge.cpp sparse/src/zlobpcg.cpp sparse/src/zlsqr.cpp sparse/src/zcustomic.cpp sparse/src/zcustomilu.cpp sparse/src/zparilu_gpu.cpp sparse/src/zparilu_cpu.cpp sparse/src/zparic_gpu.cpp sparse/src/zparic_cpu.cpp sparse/src/zparilut_gpu.cpp sparse/src/zparilut_cpu.cpp sparse/src/zparict_cpu.cpp sparse/src/zparilut.cpp sparse/src/zparict.cpp sparse/src/zgeisai_apply.cpp sparse/src/zgeisai_lower.cpp sparse/src/zgeisai_upper.cpp sparse/src/magma_zqr_wrapper.cpp sparse/src/magma_zcustomspmv.cpp sparse/src/magma_zcustomprecond.cpp sparse/src/magma_z_precond_wrapper.cpp sparse/src/magma_z_solver_wrapper.cpp sparse/src/zresidual.cpp sparse/src/zresidualvec.cpp sparse/src/zjacobidomainoverlap.cpp --- > # auto-generated by codegen.py $(libsparse_old), Thu Feb 11 16:18:38 2021 > libsparse_old := sparse/blas/magma_z_blaswrapper.cpp sparse/blas/zbajac_csr.cu sparse/blas/zbajac_csr_overlap.cu sparse/blas/zgeaxpy.cu sparse/blas/zgecsr5mv.cu sparse/blas/zgecsrmv.cu sparse/blas/zgeellmv.cu sparse/blas/zgeelltmv.cu sparse/blas/zgeellrtmv.cu sparse/blas/zgesellcmv.cu sparse/blas/zgesellcmmv.cu sparse/blas/zjacobisetup.cu sparse/blas/zlobpcg_shift.cu sparse/blas/zlobpcg_residuals.cu sparse/blas/zlobpcg_maxpy.cu sparse/blas/zmdotc.cu sparse/blas/zgemvmdot.cu sparse/blas/zmdot_shfl.cu sparse/blas/zmergebicgstab2.cu sparse/blas/zmergebicgstab3.cu sparse/blas/zmergeidr.cu sparse/blas/zmergecg.cu sparse/blas/zmergecgs.cu sparse/blas/zmergeqmr.cu sparse/blas/zmergebicgstab.cu sparse/blas/zmergetfqmr.cu sparse/blas/zmgecsrmv.cu sparse/blas/zmgeellmv.cu sparse/blas/zmgeelltmv.cu sparse/blas/zmgesellcmmv.cu sparse/blas/zpipelinedgmres.cu sparse/blas/zilu.cpp sparse/blas/magma_zcuspmm.cpp sparse/blas/magma_zcuspaxpy.cpp sparse/blas/zcgecsrmv_mixed_prec.cu sparse/blas/zparilu.cpp sparse/blas/zparilu_kernels.cu sparse/blas/zparic_kernels.cu sparse/blas/zparilut_kernels.cu sparse/blas/zparilut_candidates.cu sparse/blas/magma_zthrsrm.cu sparse/blas/magma_zpreselect.cu sparse/blas/magma_zsampleselect.cu sparse/blas/magma_zsampleselect_nodp.cu sparse/blas/zcompact.cu sparse/blas/magma_zmcsrcompressor_gpu.cu sparse/blas/magma_zdiagcheck.cu sparse/blas/zgecsrreimsplit.cu sparse/blas/zgedensereimsplit.cu sparse/blas/magma_zmconjugate.cu sparse/blas/magma_zget_rowptr.cu sparse/blas/magma_zmatrixtools_gpu.cu sparse/blas/zjaccard_weights.cu sparse/blas/zgeisai_trsv.cu sparse/blas/zgeisai_maxblock.cu sparse/blas/zgeisai_batched32.cu sparse/blas/zge3pt.cu sparse/blas/zmergeblockkrylov.cu sparse/blas/zgecscsyncfreetrsm.cu sparse/control/error.cpp sparse/control/magma_zdomainoverlap.cpp sparse/control/magma_zutil_sparse.cpp sparse/control/magma_zfree.cpp sparse/control/magma_zmatrixchar.cpp sparse/control/magma_zmconvert.cpp sparse/control/magma_zmgenerator.cpp sparse/control/magma_zmio.cpp sparse/control/magma_zsolverinfo.cpp sparse/control/magma_zcsrsplit.cpp sparse/control/magma_zpariluutils.cpp sparse/control/magma_zmcsrpass.cpp sparse/control/magma_zmcsrpass_gpu.cpp sparse/control/magma_zmcsrcompressor.cpp sparse/control/magma_zmscale.cpp sparse/control/magma_zmshrink.cpp sparse/control/magma_zmslice.cpp sparse/control/magma_zmdiagdom.cpp sparse/control/magma_zmdiff.cpp sparse/control/magma_zmlumerge.cpp sparse/control/magma_zmtranspose.cpp sparse/control/magma_zmtranspose_cpu.cpp sparse/control/magma_zmtransfer.cpp sparse/control/magma_zmilustruct.cpp sparse/control/magma_zselect.cpp sparse/control/magma_zsort.cpp sparse/control/magma_zvinit.cpp sparse/control/magma_zvio.cpp sparse/control/magma_zvtranspose.cpp sparse/control/magma_zvpass.cpp sparse/control/magma_zvpass_gpu.cpp sparse/control/mmio.cpp sparse/control/magma_zgeisai_tools.cpp sparse/control/magma_zmsupernodal.cpp sparse/control/magma_zmfrobenius.cpp sparse/control/magma_zmatrix_tools.cpp sparse/control/magma_zparilu_kernels.cpp sparse/control/magma_zparic_kernels.cpp sparse/control/magma_zparilut_kernels.cpp sparse/control/magma_zparilut_tools.cpp sparse/control/magma_zparict_tools.cpp sparse/src/zcg.cpp sparse/src/zcg_res.cpp sparse/src/zcg_merge.cpp sparse/src/zpcg_merge.cpp sparse/src/zbicgstab.cpp sparse/src/zbicg.cpp sparse/src/zpbicg.cpp sparse/src/zbicgstab_merge.cpp sparse/src/zbicgstab_merge2.cpp sparse/src/zbicgstab_merge3.cpp sparse/src/zqmr.cpp sparse/src/zqmr_merge.cpp sparse/src/ztfqmr.cpp sparse/src/ztfqmr_unrolled.cpp sparse/src/ztfqmr_merge.cpp sparse/src/zpqmr.cpp sparse/src/zpqmr_merge.cpp sparse/src/zptfqmr.cpp sparse/src/zptfqmr_merge.cpp sparse/src/zidr.cpp sparse/src/zidr_merge.cpp sparse/src/zidr_strms.cpp sparse/src/ziterref.cpp sparse/src/zftjacobi.cpp sparse/src/zjacobi.cpp sparse/src/zbaiter.cpp sparse/src/zbaiter_overlap.cpp sparse/src/zpcg.cpp sparse/src/zcgs.cpp sparse/src/zcgs_merge.cpp sparse/src/zpcgs.cpp sparse/src/zpcgs_merge.cpp sparse/src/zbpcg.cpp sparse/src/zfgmres.cpp sparse/src/zpbicgstab.cpp sparse/src/zpidr.cpp sparse/src/zpidr_merge.cpp sparse/src/zpidr_strms.cpp sparse/src/zbombard.cpp sparse/src/zbombard_merge.cpp sparse/src/zpbicgstab_merge.cpp sparse/src/zlobpcg.cpp sparse/src/zlsqr.cpp sparse/src/zcustomic.cpp sparse/src/zcustomilu.cpp sparse/src/zparilu_gpu.cpp sparse/src/zparilu_cpu.cpp sparse/src/zparic_gpu.cpp sparse/src/zparic_cpu.cpp sparse/src/zparilut_gpu_nodp.cpp sparse/src/zparilut_gpu.cpp sparse/src/zparilut_cpu.cpp sparse/src/zparict_cpu.cpp sparse/src/zparilut.cpp sparse/src/zparict.cpp sparse/src/zgeisai_apply.cpp sparse/src/zgeisai_lower.cpp sparse/src/zgeisai_upper.cpp sparse/src/magma_zqr_wrapper.cpp sparse/src/magma_zcustomspmv.cpp sparse/src/magma_zcustomprecond.cpp sparse/src/magma_z_precond_wrapper.cpp sparse/src/magma_z_solver_wrapper.cpp sparse/src/zresidual.cpp sparse/src/zresidualvec.cpp sparse/src/zjacobidomainoverlap.cpp 6291a6292,6300 > sparse/blas/magma_ssampleselect_nodp.cu: sparse/blas/magma_zsampleselect_nodp.cu > $(codegen) -p s $< > > sparse/blas/magma_dsampleselect_nodp.cu: sparse/blas/magma_zsampleselect_nodp.cu > $(codegen) -p d $< > > sparse/blas/magma_csampleselect_nodp.cu: sparse/blas/magma_zsampleselect_nodp.cu > $(codegen) -p c $< > 7218a7228,7236 > sparse/src/sparilut_gpu_nodp.cpp: sparse/src/zparilut_gpu_nodp.cpp > $(codegen) -p s $< > > sparse/src/dparilut_gpu_nodp.cpp: sparse/src/zparilut_gpu_nodp.cpp > $(codegen) -p d $< > > sparse/src/cparilut_gpu_nodp.cpp: sparse/src/zparilut_gpu_nodp.cpp > $(codegen) -p c $< > 7406a7425 > sparse/blas/magma_zsampleselect_nodp.cu \ 7511a7531 > sparse/src/zparilut_gpu_nodp.cpp \ 7656a7677,7679 > sparse/blas/magma_ssampleselect_nodp.cu \ > sparse/blas/magma_dsampleselect_nodp.cu \ > sparse/blas/magma_csampleselect_nodp.cu \ 7965a7989,7991 > sparse/src/sparilut_gpu_nodp.cpp \ > sparse/src/dparilut_gpu_nodp.cpp \ > sparse/src/cparilut_gpu_nodp.cpp \ 8026c8052 < # auto-generated by codegen.py $(libsparse_dynamic_old), Thu Feb 11 16:58:19 2021 --- > # auto-generated by codegen.py $(libsparse_dynamic_old), Thu Feb 11 16:18:38 2021 8050c8076 < # auto-generated by codegen.py $(sparse_testing_old), Thu Feb 11 16:58:19 2021 --- > # auto-generated by codegen.py $(sparse_testing_old), Thu Feb 11 16:18:38 2021 8346,8347c8372,8373 < # auto-generated by codegen.py $(header_old), Thu Feb 11 16:58:20 2021 < header_old := include/magma_z.h include/magma_zc.h include/magmablas_z.h include/magmablas_z_v1.h include/magmablas_z_v1_map.h include/magmablas_zc.h include/magmablas_zc_v1.h include/magmablas_zc_v1_map.h include/magma_zlapack.h include/magma_zbulge.h include/magma_zbulgeinc.h include/magma_zgehrd_m.h include/magma_zbatched.h include/magma_zvbatched.h magmablas/commonblas_z.h magmablas/ztrtri.cuh magmablas/ztrtri_lower_device.cuh magmablas/ztrtri_upper_device.cuh magmablas/zgerbt.h magmablas/zpotf2_devicesfunc.cuh magmablas/zlarfg_devicesfunc.cuh magmablas/ztrsv_template_device.cuh testing/testing_z.h sparse/include/magmasparse_z.h sparse/include/magmasparse_zc.h sparse/include/magmasparse_types.h --- > # auto-generated by codegen.py $(header_old), Thu Feb 11 16:18:38 2021 > header_old := include/magma_z.h include/magma_zc.h include/magmablas_z.h include/magmablas_z_v1.h include/magmablas_z_v1_map.h include/magmablas_zc.h include/magmablas_zc_v1.h include/magmablas_zc_v1_map.h include/magma_zlapack.h include/magma_zbulge.h include/magma_zbulgeinc.h include/magma_zgehrd_m.h include/magma_zbatched.h include/magma_zvbatched.h testing/testing_z.h magmablas/commonblas_z.h magmablas/ztrtri.cuh magmablas/ztrtri_lower_device.cuh magmablas/ztrtri_upper_device.cuh magmablas/zgerbt.h magmablas/zpotf2_devicesfunc.cuh magmablas/zlarfg_devicesfunc.cuh magmablas/ztrsv_template_device.cuh sparse/include/magmasparse_z.h sparse/include/magmasparse_zc.h sparse/include/magmasparse_types.h 8450a8477,8485 > testing/testing_s.h: testing/testing_z.h > $(codegen) -p s $< > > testing/testing_d.h: testing/testing_z.h > $(codegen) -p d $< > > testing/testing_c.h: testing/testing_z.h > $(codegen) -p c $< > 8523,8531d8557 < testing/testing_s.h: testing/testing_z.h < $(codegen) -p s $< < < testing/testing_d.h: testing/testing_z.h < $(codegen) -p d $< < < testing/testing_c.h: testing/testing_z.h < $(codegen) -p c $< < 8558a8585 > testing/testing_z.h \ 8567d8593 < testing/testing_z.h \ 8606a8633,8635 > testing/testing_s.h \ > testing/testing_d.h \ > testing/testing_c.h \ 8631,8633d8659 < testing/testing_s.h \ < testing/testing_d.h \ < testing/testing_c.h \ diff magma/Makefile.subdir magma-2/Makefile.subdir 28d27 < interface_cuda \ 31d29 < magmablas \ 34,38d31 < sparse \ < sparse/blas \ < sparse/control \ < sparse/src \ < sparse/testing \ 50a44,54 > > ifeq ($(BACKEND),cuda) > targets += interface_cuda magmablas > > # add CUDA-specific sparse > targets += sparse sparse/blas sparse/control sparse/src sparse/testing > else ifeq ($(BACKEND),hip) > targets += interface_hip magmablas_hip > targets += sparse_hip sparse_hip/blas sparse_hip/control sparse_hip/src sparse_hip/testing > endif > diff magma/README magma-2/README 9c9 < and where CUDA, CPU BLAS, and LAPACK are installed on your system. --- > and where CUDA, HIP, CPU BLAS, and LAPACK are installed on your system. 11c11 < libraries and operating systems. The examples rely on paths such as $CUDADIR --- > libraries and operating systems. The examples rely on paths such as $CUDADIR, $HIPDIR, diff magma/README.hipMAGMA magma-2/README.hipMAGMA 9,11d8 < < **NOTE**: Please don't mix CUDA and HIP builds. So if you've already built with CUDA, please remove the entire folder (or clone/untar to a new one) and start with a clean, fresh copy of MAGMA source code on the hipMAGMA branch before you start the build process < 19c16 < * `export OPENBLASDIR=/nfs/apps/spack/opt/spack/linux-centos7-x86_64/gcc-7.2.0/openblas-0.2.20-rnb5trk6z6o767ontlvlrjviswap2wxu` --- > * `export OPENBLASDIR=/nfs/apps/spack/opt/spack/linux-centos7-x86_64/gcc-7.2.0/openblas-0.2.20-rnb5trk6z6o767on1tlvlrjviswap2wxu` 22,27c19 < * In `make.inc`, `INC += -I/opt/rocm/hipblas/include -I/opt/rocm/hipsparse/include` (add this if there are errors related to including `hipblas.h` or `hipsparse.h` < < 5. Generate the HIP-specific code, by running this command: < * `make -f make.gen.hipMAGMA` < < To ensure the sources are up to date, re-run this command if you change the original CUDA sources. --- > * In `make.inc`, `INC += -I/opt/rocm/hipblas/include -I/opt/rocm/hipsparse/include` (add this if there are errors related to including `hipblas.h` or `hipsparse.h`) 30c22 < * `make lib/libmagma.so -j32` (builds the shared lib using 32 cores) --- > * `make lib -j32` (builds the shared lib using 32 cores) 31a24 > * `make sparse -j32` (builds the sparse library) 34c27 < * `make -f make.gen.hipMAGMA -j16 && make testing_hip/testing_dgemm -j16 && ./testing_hip/testing_dgemm` --- > * `make testing_hip/testing_dgemm -j16 && ./testing_hip/testing_dgemm` 36a30,44 > > ## OpenMP support > > In general, HIP frontend (clang) support for OpenMP is iffy, `-fopenmp` has not always worked well. Sometimes, in `make.inc` we had to include `FOPENMP = -fopenmp -L$(HIPDIR)/../aomp/lib -I$(HIPDIR)/../aomp/include` > > Relevant Link(s): > * https://rocmdocs.amd.com/en/latest/Current_Release_Notes/Current-Release-Notes.html#auxiliary-package-supporting-openmp > * https://github.com/CEED/libCEED/issues/654 > > > ## `sparse` library / hipSPARSE support > > In general, the sparse library on the HIP backend does not work right now. We aren't sure what the solution is yet, but for now you cannot build it. > > So, don't run `make sparse` or `make all`. You should use `make lib`, and `make test`. diff magma/README_FP16_Iterative_Refinement.txt magma-2/README_FP16_Iterative_Refinement.txt 32c32 < We provided a tester (testing_dxgesv_gpu ) for the new functionality. Since the API is very similar to the existing magma_dsgesv_gpu API, the new functionality can also be called from the existing tester (testing_dsgesv_gpu) with the '--version 2’ argument (for magma_dsgesv_iteref_gpu FP32) and the '--version 3’ argument (for magma_dhegsv_iteref_gpu FP16-TC). --- > We provided a tester (testing_dxgesv_gpu ) for the new functionality. Since the API is very similar to the existing magma_dsgesv_gpu API, the new functionality can also be called from the existing tester (testing_dsgesv_gpu) with the '--version 2' argument (for magma_dsgesv_iteref_gpu FP32) and the '--version 3' argument (for magma_dhegsv_iteref_gpu FP16-TC). Common subdirectories: magma/blas_fix and magma-2/blas_fix Common subdirectories: magma/control and magma-2/control Common subdirectories: magma/docs and magma-2/docs Common subdirectories: magma/example and magma-2/example Common subdirectories: magma/fortran and magma-2/fortran Common subdirectories: magma/include and magma-2/include Common subdirectories: magma/interface_cuda and magma-2/interface_cuda Common subdirectories: magma/lib and magma-2/lib Common subdirectories: magma/magmablas and magma-2/magmablas Only in magma-2: make.check-hip Only in magma-2: make.gen.hipMAGMA Only in magma: make.inc Common subdirectories: magma/make.inc-examples and magma-2/make.inc-examples Only in magma: my-make-log-file.txt Common subdirectories: magma/results and magma-2/results Common subdirectories: magma/scripts and magma-2/scripts Common subdirectories: magma/sparse and magma-2/sparse Common subdirectories: magma/src and magma-2/src Common subdirectories: magma/testing and magma-2/testing Common subdirectories: magma/tools and magma-2/tools