Peter Boyle
|
11dc2c5e1d
|
PVdagM initialise
|
2025-04-04 18:35:06 -04:00 |
|
Peter Boyle
|
130e07a422
|
Non hermitian support
|
2025-04-04 18:35:05 -04:00 |
|
Peter Boyle
|
c74d11e3d7
|
PVdagM MG
|
2025-02-01 11:04:13 -05:00 |
|
|
c4fc972fec
|
Merge branch 'feature/deprecate-uvm' into develop
|
2025-01-31 16:32:36 +00:00 |
|
Peter Boyle
|
1147b8ea40
|
Cheby poly setup
|
2024-09-26 14:20:32 -04:00 |
|
|
066544281f
|
Deprecate UVM
|
2024-09-17 13:34:27 +00:00 |
|
Peter Boyle
|
aefd255a3c
|
Verbose
|
2024-04-30 05:20:41 -04:00 |
|
Peter Boyle
|
1c5aa939fd
|
Subspace setup changes
|
2024-04-30 05:19:09 -04:00 |
|
Peter Boyle
|
13713b2a76
|
Much faster little dirac operator calculation
|
2024-04-05 01:04:40 -04:00 |
|
Peter Boyle
|
36a14e4ee3
|
Best setup and introduce an HDCG refine method
|
2024-04-05 01:03:33 -04:00 |
|
Peter Boyle
|
da890dc293
|
Verbose changes
|
2024-04-01 14:18:00 -04:00 |
|
Peter Boyle
|
f0a8c7d045
|
Playing with chebyshevs
|
2024-04-01 14:16:11 -04:00 |
|
Peter Boyle
|
070b61f08f
|
Simplifying the MultiRHS solver to make it do SRHS *and* MRHS
|
2024-03-06 14:04:33 -05:00 |
|
Peter Boyle
|
79fc821d8d
|
reorg headers
|
2024-02-27 11:39:37 -05:00 |
|
Peter Boyle
|
9c2565f64e
|
Working and faster version
|
2024-02-21 14:46:43 -05:00 |
|
Peter Boyle
|
e1d0a7cec3
|
Batched blas
|
2024-02-21 14:38:20 -05:00 |
|
Peter Boyle
|
b19ae8f465
|
Nbasis method for convenience
|
2024-02-21 14:36:19 -05:00 |
|
Peter Boyle
|
3d13fd56c5
|
Precompute phases, save memory in hermitian
|
2024-01-22 17:43:35 -05:00 |
|
Peter Boyle
|
42ae36bc28
|
WOrking
|
2024-01-17 16:39:14 -05:00 |
|
Peter Boyle
|
c69f73ff9f
|
Working
|
2024-01-17 16:38:46 -05:00 |
|
Peter Boyle
|
b754a152c6
|
Flag guard correctly
|
2024-01-17 16:25:28 -05:00 |
|
Peter Boyle
|
551f6c4edd
|
Synchronise changes
|
2023-12-22 18:09:11 -05:00 |
|
Peter Boyle
|
defd814750
|
Speed up the coarsened matrix matrix evaluation.
It is block project limited.
Could be sped up with calls to Batched GEMM and a data layout change.
|
2023-12-22 18:07:03 -05:00 |
|
Peter Boyle
|
3d517bbd2a
|
Synchronise decouple from the launch
Speeds up multileg stencils
|
2023-12-22 18:06:13 -05:00 |
|
Peter Boyle
|
66a1b63aa9
|
Faster grid/blas layout change.
Halo exchange is now the only slow part.
Revisit
|
2023-12-21 20:50:18 -05:00 |
|
Peter Boyle
|
c00b495933
|
Multigrid
|
2023-12-21 15:23:31 -05:00 |
|
Peter Boyle
|
d22eebe553
|
BLas options
|
2023-12-21 15:23:03 -05:00 |
|
Peter Boyle
|
8bcbd82680
|
BLAS based layout and implementation
|
2023-12-21 15:21:24 -05:00 |
|
Peter Boyle
|
dfa617c439
|
Batched SGEMM/DGEMM/ZGEMM/CGEMM
Hip, Cuda version and vanilla CPU
One MKL stub in comments, to be tested as different.
|
2023-12-21 14:01:18 -05:00 |
|
Peter Boyle
|
b75cb7a12c
|
Blas batched partial implementation on Frontier only for now
|
2023-12-21 12:31:33 -05:00 |
|
Peter Boyle
|
0cce97a4fe
|
verbosity only
|
2023-12-20 21:30:10 -05:00 |
|
Peter Boyle
|
e054078b11
|
Verbose
|
2023-12-05 16:15:17 -05:00 |
|
Peter Boyle
|
6835a7f208
|
Better logging, test on 81 point stencil
|
2023-11-29 19:20:47 -05:00 |
|
Peter Boyle
|
2290b8f680
|
Verbose
|
2023-11-29 09:47:04 -05:00 |
|
Peter Boyle
|
2c54be651c
|
Further updates
|
2023-11-29 09:43:29 -05:00 |
|
Peter Boyle
|
e859a199df
|
Reduce volume to interior for coarse stencil -- worth up to 4x gain
|
2023-11-28 10:23:16 -05:00 |
|
Peter Boyle
|
0a3682ad0b
|
MultiRHS work
|
2023-11-28 07:43:37 -05:00 |
|
Peter Boyle
|
3e448435d3
|
Restrict to interior
|
2023-11-23 18:23:29 -05:00 |
|
Peter Boyle
|
639cc6f73a
|
better support for multiRHS coarse space
Still to add restriction of domain of last loop to interior of padded cell (expect about 4.5x on test volume on Crusher)
|
2023-11-23 18:16:26 -05:00 |
|
Peter Boyle
|
8bece1f861
|
Faster to transpose the matrix and apply with column major order
|
2023-11-15 17:58:38 -05:00 |
|
Peter Boyle
|
a3ca71ec01
|
Lots more setup options, still working on them
|
2023-11-15 17:58:04 -05:00 |
|
Peter Boyle
|
38b87de53f
|
This works around a stacksize limit on AMD GPU
|
2023-10-24 10:56:07 -04:00 |
|
Peter Boyle
|
4341d96bde
|
Massively sped up coarse grid mult, comms
Save 3ms spend (60% of time !) on cudaMalloc !!
|
2023-10-20 19:27:13 -04:00 |
|
Peter Boyle
|
e064f17346
|
Faster halo exchange
|
2023-10-20 19:27:13 -04:00 |
|
Peter Boyle
|
7cc3435ba8
|
Imporved General coarsened matrix
|
2023-10-20 19:27:13 -04:00 |
|
Peter Boyle
|
f5dcea9dbf
|
Updates for Frontier
|
2023-10-20 19:27:12 -04:00 |
|
Peter Boyle
|
d29abfdcaf
|
Transfer code to Frontier now
|
2023-10-06 21:03:34 -04:00 |
|
Peter Boyle
|
6a3bc9865e
|
Verbose change
|
2023-10-06 21:02:04 -04:00 |
|
Peter Boyle
|
7f6e0f57d0
|
No IO in file
|
2023-10-06 13:39:53 -04:00 |
|
Peter Boyle
|
eacebfad74
|
Reorganise multigrid into multiple headers
|
2023-10-06 10:46:21 -04:00 |
|