Peter Boyle
|
eafc150034
|
Test fft asserts
|
2024-10-23 16:46:26 -04:00 |
|
Peter Boyle
|
1e893af775
|
GPU happy
|
2024-10-23 14:52:15 -04:00 |
|
Peter Boyle
|
d9f430a575
|
Happy GPU
|
2024-10-23 14:51:16 -04:00 |
|
Peter Boyle
|
5ae77876a8
|
Meson field and Aslash field on GPU; some compiler warning removed
|
2024-10-18 19:08:06 -04:00 |
|
|
5cc4f3241d
|
Meson field test
|
2024-10-18 15:42:30 +00:00 |
|
|
03687c1d62
|
Final version of test, closer to original again
|
2024-10-15 14:35:17 +00:00 |
|
|
066544281f
|
Deprecate UVM
|
2024-09-17 13:34:27 +00:00 |
|
|
160969a758
|
UVM tester, doesn't turn up anything
|
2024-09-10 18:09:42 +00:00 |
|
Peter Boyle
|
575eb72182
|
Converges on 16^3
|
2024-08-27 19:20:38 +00:00 |
|
Peter Boyle
|
29f6b8a74a
|
Setup
|
2024-08-27 12:02:49 -04:00 |
|
Peter Boyle
|
9779aaea33
|
16^3 optimise
|
2024-08-27 11:38:35 -04:00 |
|
Peter Boyle
|
ec25604a67
|
Fastest solver for mrhs multigrid
|
2024-08-27 11:32:34 -04:00 |
|
Peter Boyle
|
b461184797
|
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
|
2024-07-23 09:53:58 -04:00 |
|
Peter Boyle
|
486412635a
|
8^4 test for PETSc
|
2024-07-22 15:25:17 -04:00 |
|
Peter Boyle
|
8b23a1546a
|
Force compile temporarily
|
2024-07-22 15:24:56 -04:00 |
|
Peter Boyle
|
a901e4e369
|
Regressed performance for paper
|
2024-07-22 15:24:04 -04:00 |
|
Peter Boyle
|
804d9367d4
|
Regressed performance
|
2024-07-22 15:23:25 -04:00 |
|
Peter Boyle
|
41d8adca95
|
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
|
2024-07-11 15:38:45 +00:00 |
|
Peter Boyle
|
7c246606c1
|
Schur additional case
|
2024-07-10 22:04:32 +00:00 |
|
Peter Boyle
|
12b8be7cb9
|
Best so far on 96^3 350 Evecs converged on 4^4 block
|
2024-06-18 16:31:37 -04:00 |
|
Peter Boyle
|
b5926c1d21
|
Broadcast time info
|
2024-06-11 15:16:25 -04:00 |
|
Peter Boyle
|
dc80b08969
|
96^3 test
|
2024-06-10 15:07:29 -04:00 |
|
Peter Boyle
|
0e607a55e7
|
Updated for 8^4 test
|
2024-05-26 20:53:05 +00:00 |
|
Peter Boyle
|
ad14a82742
|
Working aas good as possible on 48^3 in double
|
2024-05-16 10:55:45 -04:00 |
|
Peter Boyle
|
5c3ace7c3e
|
Merge branch 'develop' into feature/scidac-wp1
|
2024-04-30 05:26:06 -04:00 |
|
Peter Boyle
|
98cf247f33
|
prepare to switch to mixed precision
|
2024-04-30 05:23:45 -04:00 |
|
Peter Boyle
|
0cf16522d1
|
Refine with HDCG choice
|
2024-04-30 05:22:14 -04:00 |
|
Peter Boyle
|
5147a42818
|
Updated hdcg
|
2024-04-05 01:05:57 -04:00 |
|
Peter Boyle
|
5b79d51c22
|
Improvements
|
2024-04-01 14:18:40 -04:00 |
|
Peter Boyle
|
59b0cc11df
|
REduce the time in single
|
2024-03-26 00:42:40 +00:00 |
|
Peter Boyle
|
d01e5fa838
|
Improved FlightRecorder
|
2024-03-22 15:42:32 +00:00 |
|
Peter Boyle
|
fab1efb48c
|
More britney logging improvements
|
2024-03-19 14:36:21 +00:00 |
|
|
2704b82084
|
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
|
2024-03-12 15:16:24 +00:00 |
|
|
cf8632bbac
|
Britney test option
|
2024-03-12 15:15:35 +00:00 |
|
|
2b4399f8b1
|
more HOST_NAME_MAX fix
|
2024-03-07 15:26:01 +09:00 |
|
Peter Boyle
|
cc04dc42dc
|
Merge branch 'develop' into feature/scidac-wp1
|
2024-03-06 14:55:21 -05:00 |
|
Peter Boyle
|
070b61f08f
|
Simplifying the MultiRHS solver to make it do SRHS *and* MRHS
|
2024-03-06 14:04:33 -05:00 |
|
|
9b5f741e85
|
Reproducing CG can be more useful now
|
2024-03-06 00:03:16 +00:00 |
|
Peter Boyle
|
436bf1d9d3
|
Merge pull request #455 from clarkedavida/hisq_fat_links
Hisq fat links
|
2024-02-29 15:29:39 -05:00 |
|
Peter Boyle
|
cd15abe9d1
|
Mrhs prep
|
2024-02-27 11:41:13 -05:00 |
|
Dennis Bollweg
|
b507fe209c
|
Added SpinColourMatrix case to sliceSum Test
|
2024-02-27 11:28:32 -05:00 |
|
david clarke
|
94581e3c7a
|
accelerator_for is broken
|
2024-02-23 15:58:33 -07:00 |
|
Dennis Bollweg
|
15878f7613
|
sliceSumReduction_cub_large now also faster than CPU on Frontier
|
2024-02-16 13:55:21 -05:00 |
|
dbollweg
|
6f3455900e
|
Adding sliceSumReduction_cub_small/large since hipcub cannot deal with arb. large vobjs
|
2024-02-16 13:15:02 -05:00 |
|
dbollweg
|
b5659d106e
|
more test cases
|
2024-02-09 13:37:14 -05:00 |
|
dbollweg
|
9514035b87
|
refactor slicesum: slicesum uses GPU version by default now
|
2024-02-09 13:02:28 -05:00 |
|
dbollweg
|
ab2de131bd
|
work towards sliceSum for sycl backend
|
2024-02-06 13:24:45 -05:00 |
|
Dennis Bollweg
|
b8b9dc952d
|
Async memcpy's and cleanup
|
2024-02-01 17:55:35 -05:00 |
|
Dennis Bollweg
|
79a6ed32d8
|
Use accelerator_for2d and DeviceSegmentedRecude to avoid kernel launch latencies
|
2024-02-01 16:41:03 -05:00 |
|
dbollweg
|
caa5f97723
|
Add sliceSum gpu using cub/hipcub
|
2024-01-31 16:50:06 -05:00 |
|