1
0
mirror of https://github.com/paboyle/Grid.git synced 2025-06-12 20:27:06 +01:00
Commit Graph

7558 Commits

Author SHA1 Message Date
d22eebe553 BLas options 2023-12-21 15:23:03 -05:00
8bcbd82680 BLAS based layout and implementation 2023-12-21 15:21:24 -05:00
dfa617c439 Batched SGEMM/DGEMM/ZGEMM/CGEMM
Hip, Cuda version and vanilla CPU
One MKL stub in comments, to be tested as different.
2023-12-21 14:01:18 -05:00
48d1f0df89 Optimised partially, working 2023-12-21 12:33:47 -05:00
b75cb7a12c Blas batched partial implementation on Frontier only for now 2023-12-21 12:31:33 -05:00
332563e037 Debugged, reducing verbose 2023-12-21 12:30:57 -05:00
0cce97a4fe verbosity only 2023-12-20 21:30:10 -05:00
95a8e4be64 rocblas 2023-12-20 21:27:59 -05:00
abcd6b8cb6 Faster version 2023-12-19 15:17:46 -05:00
e8f21c9b6d Memmory verbose control improvement 2023-12-19 15:16:58 -05:00
e054078b11 Verbose 2023-12-05 16:15:17 -05:00
6835a7f208 Better logging, test on 81 point stencil 2023-11-29 19:20:47 -05:00
f59993b979 Nbasis§ 2023-11-29 09:47:36 -05:00
2290b8f680 Verbose 2023-11-29 09:47:04 -05:00
2c54be651c Further updates 2023-11-29 09:43:29 -05:00
e859a199df Reduce volume to interior for coarse stencil -- worth up to 4x gain 2023-11-28 10:23:16 -05:00
0a3682ad0b MultiRHS work 2023-11-28 07:43:37 -05:00
59abaeb5cd Time stamp 2023-11-24 12:56:45 -05:00
3e448435d3 Restrict to interior 2023-11-23 18:23:29 -05:00
a294bc3c5b Relax constraints for multiRHS 2023-11-23 18:20:42 -05:00
b302ad3d49 multiRHS test in place, passes Yay! 2023-11-23 18:20:15 -05:00
82fc4b1e94 Finalise 2023-11-23 18:19:41 -05:00
b4f1740380 Finalise message 2023-11-23 18:19:16 -05:00
031f85247c multRHS initial support -- needs optimisation for multi project/promote.
Bug fix in freeing intermediate grids to stop double free
2023-11-23 18:18:35 -05:00
639cc6f73a better support for multiRHS coarse space
Still to add restriction of domain of last loop to interior of padded cell (expect about 4.5x on test volume on Crusher)
2023-11-23 18:16:26 -05:00
09946cf1ba Improved, works on 48^3 moving to multiRHS optimisations 2023-11-15 18:03:05 -05:00
f4fa95e7cb Use 5.3.0 2023-11-15 18:01:38 -05:00
100e29e35e Allow expression as argument to norm2 2023-11-15 18:00:44 -05:00
4cbe471a83 devVector 2023-11-15 18:00:07 -05:00
8bece1f861 Faster to transpose the matrix and apply with column major order 2023-11-15 17:58:38 -05:00
a3ca71ec01 Lots more setup options, still working on them 2023-11-15 17:58:04 -05:00
e0543e8af5 Implement flexible preconditioned CG 2023-11-15 17:57:39 -05:00
c1eb80d01a Print which have converged 2023-11-15 17:57:08 -05:00
a26121d97b Better printing 2023-11-15 17:56:45 -05:00
043031a757 Report resid on failed convergence 2023-11-15 17:56:22 -05:00
807aeebe4c Resize tol in constructor 2023-11-15 17:55:57 -05:00
8aa1a37aad For Mirs preconditioner solver 2023-11-15 17:55:32 -05:00
4efa042f50 C++17 change 2023-10-24 10:57:50 -04:00
c7cb37e970 c++17 accepted 2023-10-24 10:57:24 -04:00
d34b207eab Avoid HIP warnings 2023-10-24 10:57:04 -04:00
0e6fa6f6b8 DOn't need the Cshift for the period optimisation 2023-10-24 10:56:31 -04:00
38b87de53f This works around a stacksize limit on AMD GPU 2023-10-24 10:56:07 -04:00
aa5047a9e4 Faster blockProject blockPromote 2023-10-24 10:49:55 -04:00
24b6ee0df9 M4 file 2023-10-24 10:36:48 -04:00
1e79cc9cbe Avoid compiler error 2023-10-24 10:36:09 -04:00
b3925df9c3 Verbose on CPU-GPU xfer, remove performance by default 2023-10-24 10:25:01 -04:00
351795ac3a Better messaging 2023-10-20 19:33:04 -04:00
9c9c42d0df Tests on frontier with real speed up . 3.5x on 16^3 at mq=0.01 2023-10-20 19:27:13 -04:00
b6ad1bafc7 Normal memory SendToRecvFrom asynchronous for use in general stencil
code
2023-10-20 19:27:13 -04:00
a5ca40f446 Better verbose -- track CPU GPU motion under --log Memory, others go to
debug output stream
2023-10-20 19:27:13 -04:00