1. petsc
  2. PETSc
  3. petsc

Branches

sarich/rnet-fix-rebase

View source
sarich/rnet-fix-rebase
  • Contributors
    1. Loading...
Author Commit Message Date Builds
250+ commits behind master.
RNET Technologies
removed names from comments
John Eisenlohr
use row-wise formatting for ilu-factored sggpu matrix
John Eisenlohr
cleanup
John Eisenlohr
reformat the factored matrix to store by rows
sarich
get sggpu working without cusp Hg-commit: ca670bb53d2ea81f384549154aa977cbc65c594c
sarich
fix an mpisggpu problem, it wasnt creating gpu vectors with MatGetVecs Hg-commit: 2533f0da8bbd106fa75e1ca69a7c8a27d977adb6
sarich
set default polynomial order to 20 and fixed up eigenvalue estimator Hg-commit: b4a5755da6f4d532cef3831fd40bd8c2f69ba19f
RNET Technologies
Examples other than ex14 does not run with mpisggpu and bjacobi preconditioner because the coloring routines assume symmetric matrix. There could be more issues other than this. Hg-commit: 88102df0059db8b82c96b55580d4ac14d0dbd1b4
RNET Technologies
Bjacobi with mpisggpu works for ex14.c Hg-commit: 9633a697b5316cc5e04325a5a7de38f2eeafe48d
John Eisenlohr
use the choice version of matsolve for ilu Hg-commit: 21105daa79a2e5ff92293c64ebb9a4abd742a1fe
John Eisenlohr
adding sggpusse Hg-commit: ed6be1b9c411a6104746a813a03bc1a2eee65df8
RNET Technologies
Just a little cleanup Hg-commit: 8ad65b6fb8ce3264d09f8a707bb19c89ed3b6ae7
John Eisenlohr
specialized sggpu ilu solve code for dimensions 1 and 4 Hg-commit: 7793645eda70de0ebe12e26a92c022ea2d910931
RNET Technologies
A working implementation of mpisggpu Hg-commit: 5f7a1e40268e650335ef99c97745f5514f07a713
sarich
add mpigpu vector type Hg-commit: 1a21f3970bc86687f926420a920c62fc9eee9ca1 add mpigpu vector type files Hg-commit: 64280497cee6d851322289cd2d1984c05226fd36
sarich
fix some seqgpu vec impls. Still problem with vecsize >~ 1e7 Hg-commit: e7ba0f9cab9ad3124e217e6af428ad1d2071eec6
RNET Technologies
Initial working version of MPISGGPU type. 'mpirun -np 2 ./ex14 -dm_mat_type mpisggpu -dm_vec_type mpicusp' gives the same result as single processor. Needs to be further tested for scalability and correctness. Also, different number of processes need to be handled. Hg-commit: a5f86cf9e5063817a4842481972e0cf69844b026
John Eisenlohr
fixed cusp vec test for bb in MatSolve_SeqSGGPU_gpu Hg-commit: 2052e31033d257272567a9b8a2ac3f924708cd39
John Eisenlohr
removed copy operation for both upper and lower diag block solve (mat*vec) in cpu matsolve Hg-commit: 9ac24ba14b8981b5b3c76323b5633d5765086f77
John Eisenlohr
sggpu ilu Hg-commit: 5bd2fe4c0f1f3948feea2cd6316990f008d2fb38
RNET Technologies
Addding mmsggpu.cu and cuPrintf.cu Hg-commit: fa2d4a1cb07db1a60ae06dcc15b32cc03e851c8d
RNET Technologies
Continued MPI implementation for SGGPU type. Removed the additional padding being done by each process. However, each process needs to have access to the entire vector. That needs to be implemented in MatMult_MPISGGPU. Hg-commit: 700ff84c5da653aa9baf7c0023acf113b5fff81f
RNET Technologies
Continued implementation of MPISGGPU type. The matrix is distributed by rows and each processor stores the entire matrix. Need to asjust the size so that the entire matrix i not store on each processor. The matrices on each processor can be viewed. Hg-commit: 1fc0a9ec842ede8262b6e9495fa2212131ed3174
RNET Technologies
Slightly extended the new data type MPISGGPU to behave like SEQSGGPU with 1 process. No communication yet and ex14 is not completing the first iteration. Hg-commit: 04226624827afc7cd2687e87c498585c4ede0f41
sarich
fix blockstructgrid bug Hg-commit: e94356c62c0c71375b2730238534fa21470ff1b1
RNET Technologies
moving mpisggpu.c to mpisggpu.cu Hg-commit: 25125dbb9d192036222da5ccf7b2464fd20149d7
RNET Technologies
Added a new data type MPISGGPU that exactly behaves like SEQSGGPU. This type is the starting point for implementing MPI version of SGGPU. Hg-commit: 796529d20776ffcde7903ed56605e15a731dec38
sarich
fix mpibaij bug that broke blockstructgrid Hg-commit: 7780375e458e18e96878f9c3b0dd0a414b861dbe
sarich
fixes to get avx no-cuda to compile Hg-commit: ac3c867f8f7957cb6a0be0b64b2cf5dafcf40590
RNET Technologies
Made the blockstructgrid code to exit if no support for SSE3 or AVX. Cleaned up a lot of stray code in blockstructgrid. However, couldnt test any of the MatMult functions due to non availability of SSE3 or AVX. Hg-commit: 46693036dd3a91de7cdbbeb321241283fbae74c5
RNET Technologies
Replaced SSE2 with SSE3 for BAIJ vectorization. Cleaned up the code. Need to test it on both SSE3 and AVX. Hg-commit: 957b65fc449728def9b94de01b75e4901efd04e9
sarich
fix missing assigment of integer arrays Hg-commit: 7632edcbfcbb7659d4a4d5f3f983a2eb2716b472
sarich
coloring working for seqsggpu and seqgpu vec, not for seqcusp Hg-commit: a90a34d067bc4021663cb8f6c034c3f0713d2596
sarich
add zero guard for pc_polynomial Hg-commit: 3a183dbc97a70a1aa2606480e076c6d575785e9d
sarich
work on finite differences for seqsggpu Hg-commit: 433f434f41dd60d84b3514c2a3ccb83746b9c8a6
sarich
fix up trace option to look nicer in source Hg-commit: 8897a6902ebca122cbcb82fc3d3ec116bb01b428
sarich
use latest seqgpu Hg-commit: b4f07cb16c78266c9ef61efe9836c5d51224527d
sarich
add seterrq for unimplemented sggpu functions Hg-commit: 2273b5406242e3c2a774051422d1a25c1e64b954
sarich
add polynomial pc Hg-commit: becbc6995ac8046e9e4a44b02f98254798e1f0e5
sarich
fix vecgpu compile problem using opt flags Hg-commit: 3010314fccf2e31113cb57199a58b97c98605b73
sarich
fix makefile merge problem Hg-commit: 255d4a942cafe01a3caa594233945c8543d4ce09
Justin Holewinski
Try to preallocate diagonals in SGGPU Hg-commit: 06f7c76bd607b509e3651ef0b36696f7f448ee7f
Justin Holewinski
Clean up the SGGPU kernel Hg-commit: de9e3fa71b01c9e70572c5d709a14b8516bad686
Justin Holewinski
More SGGPU tweaks Hg-commit: 4fa6e7e2eaad1e84d44b22a295bf76ef19b5b8b1
Justin Holewinski
Revert some failed SGGPU optimizations Hg-commit: 00bffe6ce70c17f977b8c6ef27f19ca0d4af7884
Justin Holewinski
Remove SGGPU debug code that caused correctness issues Hg-commit: a895dc1fcded97d8373774fb5c5f3173a55ccce2
Justin Holewinski
SGGPU tweaks Hg-commit: 7c71124cfe2c054b9886887f15eb19e398fb62e3
Justin Holewinski
First implementation of sggpu matrix type Hg-commit: 6d636373d7ae0b2d1058e55bbb1b9fc5f511ceb4
Justin Holewinski
SGGPU init tweaks Hg-commit: 8b5df8f356ac1bb5e0bf1665f7a014aa2f13348c
Justin Holewinski
Revert 0f444a1abbe6 Hg-commit: c8742c0c71dcedce26665ad3858d3a7ee433d3ae
Justin Holewinski
Add misc. cleanup for SGGPU Hg-commit: af227c54a047c91f4ccde4a7e620271fb60f49fe
Justin Holewinski
Start skeleton for new SGGPU matrix type Hg-commit: ccdc3579299fb7a0d22f68b1cfdade262fc164ce
Justin Holewinski
Fix build on non-AVX machines Hg-commit: f3a2ab93f4ac370eaefcd1272f2227a90a573a27
Justin Holewinski
Build fixes Hg-commit: 4c61fa91e17af5e4fc35cdbd0459fa4c8caa54cb
Justin Holewinski
Revert 10fb8c894b16 Hg-commit: b1d755c06716b0de3c9dddcbc6cb6274006a689a
Justin Holewinski
Revert 252bc63678b3 Hg-commit: 7c3801c08297f6da8a02888b34bbc9bc8f3a61ec
Justin Holewinski
Fix MatSetValues for DOF > 1 Hg-commit: 10fb8c894b166adcf3d1af88e5c453e485dfb2fb
Justin Holewinski
Fix issue in exSG Hg-commit: 252bc63678b30e261f6cd6134b5af42d46c03966
Justin Holewinski
Misc. changes to structgridgpu Hg-commit: 0f444a1abbe68f233c0f977afe6d21246494af0e
Daniel Lowell
Updated Jeswin/Justin's Matrix type to incorporate the use of the gpu Vector type. To run an example you need to explicitly specify the vector type "seqgpu", e.g.: ./ex14 -pc_type none -ksp_type gmres -vec_type seqgpu -mat_type structgridgpu Hg-commit: 22109f376036f55fd118d680009ad9669c888408
Justin Holewinski
[SGGPU] Create linearized indices automatically instead of relying on hard-coded defaults, which were often not working Hg-commit: eb0971c9e3a84478a1277552609b0d486b5acd82
Justin Holewinski
[SGGPU] Add a missing __syncthreads() Hg-commit: 938036b99cd7d8017cd803c8c6c4d1e82b177948
Justin Holewinski
[SGGPU] Merge in Jeswin's latest kernel and fix some minor issues Hg-commit: c030a04970df1d1cbb5cc943c3516c66c7e1b51f
Justin Holewinski
[SGGPU] Add placeholder for MatDiagonalScale_SeqSG for later PFLOTRAN work Hg-commit: 6c97e51fd9858562d29468a6af979ae41005b445
dpnkarthik
Adding mpi bsg folder for fixing build issues Hg-commit: cb929df960c73e64941d48d8b4558bd2f2bc8768 Adding mpi bsg folder for fixing build issues Hg-commit: 130698eb681bcd679eb4af39c3cd0bd9517dbf20
dpnkarthik
Sequential BAIJ working with ILU preconditioning Hg-commit: 1e4d038d742eb8447b9f216f51d5c19d904f738c
dpnkarthik
MPI vectorization for BAIJ Hg-commit: 0d1005f751897ce5057208f8cc82ad52737346e4
dpnkarthik
Vectorized BAIJ functions Hg-commit: d00b0c0874ab6d1f3eef5d1b8cc955437b79115b
dpnkarthik
All vectorization bugs fixed Hg-commit: 3ac2e72ed794f9944f7ca8131b5736d87a7a9df0
dpnkarthik
Baij vectorization Hg-commit: 6a999d1d42ab9f68f6345f436b08bda5814d4350
dpnkarthik
Avx matmult with generic odd dof Hg-commit: daf039a5278827a95a725b304f14110359212004
dpnkarthik
Adding Avx matmult for even - dof Hg-commit: b99c795951f0c75a64c96f1b13f447dc2c3cb8da
dpnkarthik
Blocked Setvalues Hg-commit: 0bf952091e1b526b9d843fd4e9f866dd73d32932
dpnkarthik
single stencil, improved set values Hg-commit: 2d9b09fa588cb1fd78c95295817bfde25f12cc16
Jeswin Samuel Godwin
Stand alone code for the SPMV kernels which are tested for the GPGPU paper are added which includes col-dia format, ellpack, hybrid, diagonal, ellpack. I have added changes to the kernel. I have included the Script for testing the code. There is a Notepad in Structured Grid folder which explains how to run the file. Let me know if you have any issues running them. Hg-commit: 908c0d461f0610f74553b9ed884f0fd59a885775
dpnkarthik
MatFunctions added for BSG Hg-commit: a922540663d7f0b98ac2f17499c8c8333805b51f
dpnkarthik
Other functions1 Hg-commit: 7e9e0a43cd83b7510b6b81d45782147614f9f393
dpnkarthik
Adding Diagonal into matrix structure Hg-commit: 2f4e35661f62a8b23458875c23e21bc0201421b1
dpnkarthik
Bug Fixes Hg-commit: 2cdd23c9e21eec0bb6199ca65e758903d6557397
dpnkarthik
Fixing bug with sub matrix Hg-commit: ae398f00329bd6165f3c9ddcc694d7017ed35764
dpnkarthik
Updating getsubmatrix in MatOps Hg-commit: c771ac94bf7cc48b5472014fbc15956be1365342
dpnkarthik
MatGetSubMatrix -- tested using standalone system. Yet to test within Petsc. Required: MatMult for submatrices Hg-commit: bdba002d500d3e2bf9be2d1e649f232528c2c1bb
Daniel Lowell
Fixed ex14.c by removing explicit called to VecSetType Hg-commit: 05011989607b4fc6cd9ba095dfb9a282687fe8e5 Merging Hg-commit: 89f0dd25352b4843b2f90ea648776b6ca53d61ad
Daniel Lowell
Resetting the two examples to original DOF which cause a seg fault. Hg-commit: 5bce12a2a804e11f17b6b9048c7dd9166ee9074e
Jeswin Samuel Godwin
Updates to structgrid kernel Hg-commit: 6032c9901c21d0641c935c7cc85b1e9db0a1f141
Daniel Lowell
Changed pstack.c #if statment Hg-commit: acdf92f89b4b8c43cbac5033c979e4ef6cd6ca5f
dpnkarthik
Adding openmp to example baij Hg-commit: 96afde6a9f47b6e42c0d7c7f7856744808667118
dpnkarthik
Adding example problem to compare BlockSG with Block CSR Hg-commit: 97af623d42c5c7ad27111e3709d6527c5dd4b345
dpnkarthik
Fixing bug with offset calculation for smaller workset Hg-commit: 1967c2d47e9885ac7803bd5cb307fa5bda5d8d2d
dpnkarthik
OpenMp support added in Default version (ver1). Custom codes for dof <= 30 will be available in matmultbintrinsics file. OpenMp support with single time thread creation is added for dof = 4. Gives less improvement. Hg-commit: c8ad22adcf7da0dcd2bf8e7263ad46990c0613b4 Merged Hg-commit: 7b38b34a5e1ce53c0b92a357d1e96b612fcfa710
Daniel Lowell
removed mention of bjgpu from makfile Hg-commit: 4f4be084695ef4f957012e1c92fc0262bc28f943
Daniel Lowell
Fixed PP printf statement on vecgpu.cu Hg-commit: 056c27593f9f78f593781034febc1e22c85aac97
Daniel Lowell
Added makefile to seqgpu. Hg-commit: 14abec1c6c6692b14893a67471e492e488ec3797
Daniel Lowell
Changed to the correct CUSP enum Hg-commit: 29717ce96fc8782a4374fc59c21f0af985c6ed38
Daniel Lowell
Major mod. to matsturctgridgpu.cu to make the repo update. Hg-commit: d82c6d26eb40fe4d476ac6f66808b5bb0c105af9
dpnkarthik
Loops unrolled for performance Hg-commit: 39295e0e58734d49e1df64fdc044b56581bfe41b
Daniel Lowell
Attempting to override old copies. Hg-commit: 8619823c5ae2ae1e872efbdb8291f7bd127e7bfb
Daniel Lowell
Checking if there is an issue with the repo. Hg-commit: 692ba2595ed7f8bfd8dbeb402863fcb5d6ff24e4
Daniel Lowell
Put back copy back to host at the end of MatMult kernel Hg-commit: 85ec8c3fde0aec5216f8dac084b0ec61865a3b20
Daniel Lowell
Removed end comment out of merged sggpu Hg-commit: 08d2b24b0f4ac51a6d203e3051bf44d3fb15cb43
Daniel Lowell
Removed BCGSGPU Hg-commit: d221486c335fe4a09b8e02731d24980246b996ca
dpnkarthik
Block StructGrid with workingsetsize Hg-commit: fbe41ca391bf0515dccd34c9c1b03e749bb1625f
dpnkarthik
Preproccesor to clean loops Hg-commit: 3220782a77aab1de20bf0e35674ee9851173e6f3
dpnkarthik
Loop unrolled : for testing Hg-commit: f9c37d8f8040add86c7481d1960fb2efdba425cb
dpnkarthik
Function pointer to clean the code Hg-commit: b3851ab8840f2d423fbd6e40597a764a8dd4ec76
dpnkarthik
Working Block structured grid Hg-commit: c9e28141c1b595c4bf7157b3b9c02a5a8461a0c8
Jeswin Samuel Godwin
Bug fixes and changes in Structgridgpu. Tested in ex5, ex14 and exSG. Dynamic allocation of Shared Memory is used. Hg-commit: efbff52c63c61268bdac7856e56cb03fa6c48a6a
Jeswin Samuel Godwin
Structgrid GPU bug fixes. Hg-commit: a51b6fe7d200a558824e51a4b1a3d45148bab15a
dpnkarthik
Changing access pattern to have stride-1 accesses in Coeff matrix Hg-commit: 0953983cc5a9cbec6b86f61f0cdf54600635520c
dpnkarthik
Changing access pattern with padding Hg-commit: 862d398b74b0ceb22011f510186e89409d4cce9d
dpnkarthik
removing indirection from coefficient accesses Hg-commit: 4fa4633d9e0412dde49d16ffb0814e06142aca2c
Sravya Tirukkovalur
Simplified the setting values into mat part of the tests. Hg-commit: c80cc16054ce12dc875b997c6376ce6be5777950
dpnkarthik
Changing Stencil order group Hg-commit: 7e1650785266af424087171cd815b6ffd93b6914
Daniel Lowell
updating my end Hg-commit: cbcd790ba491cbf892b47dca3f28b7bfce7dfe11
Daniel Lowell
Merged some kernels and added Jeswins kernels Hg-commit: 0d897c2948987a783ac378fd0b3f632108392d82
Daniel Lowell
Few extra changes. Hg-commit: 2393993da98aee4e5354b9b748f237dc9df28ce3
Daniel Lowell
Implemented dynamic shared memory over all kernel call with execption of 3 reduction kernels. Hg-commit: 80e0bfb61db17952cc6942868c10df77e84c378c
Daniel Lowell
Fixed bugs related to pipelined asynchronous kernel calls on large data sets. Moved debugging information into preprocessor #if #endif statements so they will not be compiled if not required. Hg-commit: 7615c5d2abb36eab8388ac1abd4153bd080cdea6
Daniel Lowell
Added kernels, and updated kernels to have asynchronous streaming Hg-commit: a1031bfd3845e1718d9cf1ccdd1ec2aef20d4f41
Daniel Lowell
Added gpu files Hg-commit: 7c413c23e2514c601093180db919d4786a0fd9f0
Daniel Lowell
Working GPU Vectype. Converges on most examples. However since only MatMult_SeqAIJ for cuda has been implemented, some examples will fail as will some PCs. Hg-commit: 0bd1d69854bd954685ba0034e9217fa88953eefc
Daniel Lowell
cudaksp Hg-commit: 10d06a906d1af90d4642d8af3c9e132fea12e483
Sravya Tirukkovalur
Intialize k in cleanup loop explictly (compatibility with all platforms/ compilers) Hg-commit: 1bc264b1f7c2e695aa707a4723a53ecccf26b090
dpnkarthik
Merging C and D part Hg-commit: 38f1f41dd017f14a090cb513037fb0272f1a1ac8 merge c and d and change loop iteration order for c Hg-commit: c7e2a031171bc41f7e30fd057f0a60ac48d43b88
dpnkarthik
Adding 2d support on SG_MatMult and fixing issue with created Mat->a Hg-commit: 59d15d981cb14aedcbd52aecd1dd4239e859faaa
Sravya Tirukkovalur
changes openmp Hg-commit: 6d62250cc2f206f186c628bb0350eb8d21f9c154
Sravya Tirukkovalur
Added some comments for the new matmult(without padding) Hg-commit: 656f7128c0f081a5cf17c43b1d62d1648d5f8cf7
Sravya Tirukkovalur
Changed all instances of malloc and free in structgrid to PetscMalloc and PetscFree. Hg-commit: f185e6e9d6de3f9a86f032a80a0666d09307c3b0
Sravya Tirukkovalur
Added openmp and software refetching to the new matmult(doesnt require padding) implementation. exSG2 test program, which can be used to compare perfromance. (exSG can be used to check correctness) Note: Need to test the test programs and struct grid implementations for dof>1 Hg-commit: f5c41be09dc9d98b5bf21e5daa0a2d07d1e3c0aa
dpnkarthik
Diagram showing the y indices affected by each diagonal Hg-commit: 34cea691abf2e86cc9a7e554a207e0dfb880d0ba
RNET Technologies
In the _MatOps structure, the members MatSetStencil_seqSG and MatSetGrid_SeqSG have been moved to 130 and 131 positions as a result of conflict in previous merge. The file matstructgrid.c has been modified accordingly in this commit. Hg-commit: 112b4f685fb26107d557eae71035939c7ad53f80
dpnkarthik
New MatMult Structure with partitioned For loops Hg-commit: 1e4a7661c28b4ff8146e132e9c6c285c867d4e3f
dpnkarthik
Fix on selecting the correct diagonal offset in set values Hg-commit: 254f7e41fc5cf47715eca9bb09e786b711007c9e
dpnkarthik
Updating Preallocate with setting the flag after pre allocation Hg-commit: b59203d8ef3bad7018d4e9b97d96831b47e9a9d3
Jeswin Samuel Godwin
Updated StructgridGPU Kernel: Fixed errors associated with Indexing. Added the Z dimension to the Kernel. Hg-commit: 314122babaf9f823a3d104fc13c178deeeed9c11
Sravya Tirukkovalur
Added structgridgpu as a directory in structgrid/makefile instead of impls/makefile Uncommented GPU testing in exSG.c. Hg-commit: 096dc722daa0a9493c616e152cbda2de818b682d
dpnkarthik
Added help file for stencil neighbors Hg-commit: 9a2d51826171c81d74cb011e4d17f08331d2fad5
Daniel Lowell
Fixed error on SGGPU MatMult V2 computing wrong Y. Issue was unnecessary if statement restricting values computed. Hg-commit: 2f8917ca012e98cabf555dbca057d7f5db574966
Sravya Tirukkovalur
scope of variables changed in openmp code Hg-commit: bff9c251bc120de5b627b1ed33a74701f0576e33
Sravya Tirukkovalur
openmp +avx fixed Hg-commit: 8c112b4f905b945eb9157277a9ea0370f224b08c
Sravya Tirukkovalur
In first loop: k<= instead of k<lda1-WIDTH In second loop: k=lda1-lda1%WIDTH instead of ; This is helpful while using openmp threads where k can have different values for each thread. Hg-commit: 29b914d45e20d2705f57c31a0d1328b05174f943
Sravya Tirukkovalur
changed k+WIDTH<lda1 to k<lda1-WIDTH Hg-commit: 46052422c09cfc41b605080ae5a633e898ab198e
Sravya Tirukkovalur
Included timing test in exSG.c, which times matmult of AIJ, SG(AVX), SG(AVX+OPENMP), AIJCUSP, SGGPU Hg-commit: d0f7547faf615050434b969258a462d91d9721f0
Daniel Lowell
Corrected err in GPU MatMult Kernel version 2 where Xindex would underflow and still do the multiplication leading to undefined behavior. Test example now has norm in line with other versions. Hg-commit: dd3176eccbec298be6913e5d742e0aa0e70dbeb6
Daniel Lowell
Added to structgrid MatView although in this variant the matrix is converted to a SeqAIJ before viewing. Hg-commit: d45be1de637eee7466ed4fe5caad0ccb6ce810a0
Sravya Tirukkovalur
openmp+avx and timing testexSG2.c Hg-commit: 3558b50d293b7f0f7ef89596e70ae257240b4afe
Sravya Tirukkovalur
compare STRUCTGRIDGPU with SEQAIJCUSP Hg-commit: d1708946e5e491773fcc13416aaaba2f5e8816fe
Sravya Tirukkovalur
Made the test generic to any dimension and any value of m, n, p. Values can be specified at run time as [-m] [-n] [-p] [-dim] Hg-commit: 09d0e999d9c65373373ede7703b1a9c766cfa4a8
RNET Technologies
Added exSG.c (Renamed from test2.c) Hg-commit: 64c7d42849fff46e4de93c7e2bbd04e95d86d898
RNET Technologies
Modified the test for testing structgrid datatypes. Hg-commit: dc3ebe08e350ecbb9651bb96b39dd8ecc671b9bc
Sravya Tirukkovalur
matmulopenmp.c: Added openmp matmul kernel test2.c: To check correctness of Matmul kernels of Structgrid, ie, AVX, Openmp and GPU matstructgrid.c: added few lines to set up mat->m,n,p. Hg-commit: 670aa156c2538acf2c43ab0820cf1f9da4388728
dpnkarthik
Efficient Tiling implemented for X vector. Hg-commit: 6603b74541e7afc834b370d7c56559d2c3e19271
Daniel Lowell
Reorganized kernel wrapper so that all device memory is allocated only once per program run as is block and grid shape. Also tightened up tile sizes so that more of shared memory is used. Hg-commit: f83aa32bc5e38c04f6b699d7aa61c59ffbaeebd8
Daniel Lowell
Added tigher bounds to shared memory allocation in order for a more efficient use of shared memory. Hg-commit: cabed52b437762c1f2099c0a28359e32a66b78e6
Daniel Lowell
Tiling now works on kernel version 2. It should be stable beyond 512x512, just testing now. Tiling with shared memory now is aware of the amount of shared memory, so that the tile sizes are adjusted accordingly. Hg-commit: bd34204b4856f1d411d018c1da5695ade0d16bdb
Daniel Lowell
Reverted back to 3-D implementation of kernel threads. This made it easier to tile along all axes for very large arrays. Y and A use block local shared memory. Need to retest kernel for stability at very large problems sizes. Hg-commit: 3dd09a776dee3a9bed68ef607daee36ec74776ef
RNET Technologies
Using the flag of PetscCUSPFlag in Mat base class, modified the SGCUDAMatMult kernels such that the matrix is copied to the GPU only once for every iteration of linear solver Hg-commit: 353c3795378f0c853b2a22b5e829a2508bd031e8
Daniel Lowell
Fixed the timer values so that they have the same units. Hg-commit: 4dd9b648a5501af673c29e43a733b729a7d48722
Daniel Lowell
Changed the MatMult kernel so that now the threads and blocks are 1-D. Tiling is now implemented so that Y vectors which are larger than 6144 elements (49152 bytes of shared mem/ sizeof(double)) can be processes by the kernel. I have tried step sizes of up to 65536 elements for y (-da_grid_x 256 -da grid_y 256 for ex5, and -da_grid_x 128 -da grid_y 128 for ex14) they all converge. Timing output is supressed for now, but they are easy to identify to uncomment. Next I would like to implement coalesced shared memory reads from A and x. This will increase the shared memory overhead by 3x, but may have a significant speed up with large grid sizes. Daniel Lowell ANL-MCS Hg-commit: 6c9f65d76b8b8292ec72e717a11148821869d093
Daniel Lowell
removed a barrier __syncthreads() in kernel Hg-commit: 85185e568855c70fd1c4bdf1b626f3cbe371eff7
Daniel Lowell
Added cuPrintf.cu and cuPrintf.h files to structgridgpi folder. Hg-commit: f38db84cbdbb6f9f9d7aeaccd8b4262f72ebbd3a
Jeswin Samuel Godwin
Kernel Hg-commit: 556c1bdfd2c5638ff19e414ea6da2c56e0a82cb7
Daniel Lowell
Cleaned up some code for merge. Hg-commit: 397155ae99c54a31659cdb21ad9ce956b94aa89d
Daniel Lowell
Added a version 2 of matmult_kernel for structgrid gpu using shared, resigster, and constant memory May need more optimizations, also there is no guarantee this kernel converges in all examples. More testing is required to verify that it is in fact a "safe kernel". Currently it does converge on the default version of ex5 in PETSc examples. Submitting it as-is. To switch over to Jeswin's kernel simpily comment out my call to version2 and uncomment out the original call in function PetscErrorCode MatMult_SeqSGGPU(Mat mat, Vec x, Vec y). Hg-commit: c667e464a0fffcd62bbbf8e06c3ac5a59544de3d
Daniel Lowell
Added path to makefile in src/mat/impls to structgridgpu. Otherwise the PETSc build will not recognize that folder. Hg-commit: 76987e962e99da291e29000c3c9606efadfb8529 merging
RNET Technologies
Added overlaying comments to the new code added so far by RNET, OSU, and ANL. Hg-commit: c2105a40a7d0e64f941a46275f45c0f5d045fbc8
RNET Technologies
Added a new data type for GPU support namely, matstructgridgpu. This has been done by emulating the techniuqe used by PETSc development team for adding seqaijcusp. The framework for using the new datatype is in place but ex5 does not converge. Also, the memory is copied for every iteration of the linear solver. This needs to be handled yet. To test the datatype run ./ex5 -da_mat_type structgridgpu Hg-commit: 26e4dbf84837f3f45ed7f9fa66d3f2d23e767168
dpnkarthik
Comments Added Hg-commit: fd88fd65a745dd6a794fb55262ae0ce18af6ae63
RNET Technologies
rnet-all branch Hg-commit: b2c14743ff470e2a54afb84fe2b27666fa82c8e8