Peter Boyle
|
e07cb2b9de
|
Accelerator memory
|
2024-01-17 16:24:31 -05:00 |
|
Peter Boyle
|
a1f8bbb078
|
accelerator memory print
|
2024-01-17 16:24:09 -05:00 |
|
Peter Boyle
|
7909683f3b
|
MultiRHS
|
2024-01-17 16:21:07 -05:00 |
|
Peter Boyle
|
25f71913b7
|
MultiRHS coarse
|
2024-01-04 12:01:17 -05:00 |
|
Peter Boyle
|
34ddd2b7b1
|
MultiRHS coarse space
|
2024-01-04 12:00:53 -05:00 |
|
Peter Boyle
|
d5fd90b2f3
|
Add 48^3 rtest
|
2024-01-04 12:00:01 -05:00 |
|
Peter Boyle
|
b7c7000d0d
|
Don't need the numerical rounding tolerance in multigrid
|
2023-12-22 18:10:23 -05:00 |
|
Peter Boyle
|
551f6c4edd
|
Synchronise changes
|
2023-12-22 18:09:11 -05:00 |
|
Peter Boyle
|
defd814750
|
Speed up the coarsened matrix matrix evaluation.
It is block project limited.
Could be sped up with calls to Batched GEMM and a data layout change.
|
2023-12-22 18:07:03 -05:00 |
|
Peter Boyle
|
3d517bbd2a
|
Synchronise decouple from the launch
Speeds up multileg stencils
|
2023-12-22 18:06:13 -05:00 |
|
Peter Boyle
|
78ab955fec
|
Better padded cell exchange
|
2023-12-22 18:05:41 -05:00 |
|
Peter Boyle
|
dd13937bb6
|
Better opt face gather scatter
|
2023-12-22 18:03:38 -05:00 |
|
Peter Boyle
|
66a1b63aa9
|
Faster grid/blas layout change.
Halo exchange is now the only slow part.
Revisit
|
2023-12-21 20:50:18 -05:00 |
|
Peter Boyle
|
22c611bd1a
|
Delete temp file
|
2023-12-21 18:32:31 -05:00 |
|
Peter Boyle
|
c9bb1bf8ea
|
Passing new BLAs based
|
2023-12-21 18:31:17 -05:00 |
|
Peter Boyle
|
9e489887cf
|
General coarse multiRHS move to BLAS implementation
|
2023-12-21 15:24:48 -05:00 |
|
Peter Boyle
|
9feb801bb9
|
Much simpler GPU implementation
|
2023-12-21 15:24:06 -05:00 |
|
Peter Boyle
|
c00b495933
|
Multigrid
|
2023-12-21 15:23:31 -05:00 |
|
Peter Boyle
|
d22eebe553
|
BLas options
|
2023-12-21 15:23:03 -05:00 |
|
Peter Boyle
|
8bcbd82680
|
BLAS based layout and implementation
|
2023-12-21 15:21:24 -05:00 |
|
Peter Boyle
|
dfa617c439
|
Batched SGEMM/DGEMM/ZGEMM/CGEMM
Hip, Cuda version and vanilla CPU
One MKL stub in comments, to be tested as different.
|
2023-12-21 14:01:18 -05:00 |
|
Peter Boyle
|
48d1f0df89
|
Optimised partially, working
|
2023-12-21 12:33:47 -05:00 |
|
Peter Boyle
|
b75cb7a12c
|
Blas batched partial implementation on Frontier only for now
|
2023-12-21 12:31:33 -05:00 |
|
Peter Boyle
|
332563e037
|
Debugged, reducing verbose
|
2023-12-21 12:30:57 -05:00 |
|
Peter Boyle
|
0cce97a4fe
|
verbosity only
|
2023-12-20 21:30:10 -05:00 |
|
Peter Boyle
|
95a8e4be64
|
rocblas
|
2023-12-20 21:27:59 -05:00 |
|
Peter Boyle
|
abcd6b8cb6
|
Faster version
|
2023-12-19 15:17:46 -05:00 |
|
Peter Boyle
|
e8f21c9b6d
|
Memmory verbose control improvement
|
2023-12-19 15:16:58 -05:00 |
|
Peter Boyle
|
e054078b11
|
Verbose
|
2023-12-05 16:15:17 -05:00 |
|
Peter Boyle
|
6835a7f208
|
Better logging, test on 81 point stencil
|
2023-11-29 19:20:47 -05:00 |
|
Peter Boyle
|
f59993b979
|
Nbasis§
|
2023-11-29 09:47:36 -05:00 |
|
Peter Boyle
|
2290b8f680
|
Verbose
|
2023-11-29 09:47:04 -05:00 |
|
Peter Boyle
|
2c54be651c
|
Further updates
|
2023-11-29 09:43:29 -05:00 |
|
Peter Boyle
|
e859a199df
|
Reduce volume to interior for coarse stencil -- worth up to 4x gain
|
2023-11-28 10:23:16 -05:00 |
|
Peter Boyle
|
0a3682ad0b
|
MultiRHS work
|
2023-11-28 07:43:37 -05:00 |
|
Peter Boyle
|
59abaeb5cd
|
Time stamp
|
2023-11-24 12:56:45 -05:00 |
|
Peter Boyle
|
3e448435d3
|
Restrict to interior
|
2023-11-23 18:23:29 -05:00 |
|
Peter Boyle
|
a294bc3c5b
|
Relax constraints for multiRHS
|
2023-11-23 18:20:42 -05:00 |
|
Peter Boyle
|
b302ad3d49
|
multiRHS test in place, passes Yay!
|
2023-11-23 18:20:15 -05:00 |
|
Peter Boyle
|
82fc4b1e94
|
Finalise
|
2023-11-23 18:19:41 -05:00 |
|
Peter Boyle
|
b4f1740380
|
Finalise message
|
2023-11-23 18:19:16 -05:00 |
|
Peter Boyle
|
031f85247c
|
multRHS initial support -- needs optimisation for multi project/promote.
Bug fix in freeing intermediate grids to stop double free
|
2023-11-23 18:18:35 -05:00 |
|
Peter Boyle
|
639cc6f73a
|
better support for multiRHS coarse space
Still to add restriction of domain of last loop to interior of padded cell (expect about 4.5x on test volume on Crusher)
|
2023-11-23 18:16:26 -05:00 |
|
Peter Boyle
|
09946cf1ba
|
Improved, works on 48^3 moving to multiRHS optimisations
|
2023-11-15 18:03:05 -05:00 |
|
Peter Boyle
|
f4fa95e7cb
|
Use 5.3.0
|
2023-11-15 18:01:38 -05:00 |
|
Peter Boyle
|
100e29e35e
|
Allow expression as argument to norm2
|
2023-11-15 18:00:44 -05:00 |
|
Peter Boyle
|
4cbe471a83
|
devVector
|
2023-11-15 18:00:07 -05:00 |
|
Peter Boyle
|
8bece1f861
|
Faster to transpose the matrix and apply with column major order
|
2023-11-15 17:58:38 -05:00 |
|
Peter Boyle
|
a3ca71ec01
|
Lots more setup options, still working on them
|
2023-11-15 17:58:04 -05:00 |
|
Peter Boyle
|
e0543e8af5
|
Implement flexible preconditioned CG
|
2023-11-15 17:57:39 -05:00 |
|