Peter Boyle
804d9367d4
Regressed performance
2024-07-22 15:23:25 -04:00
Peter Boyle
12b8be7cb9
Best so far on 96^3 350 Evecs converged on 4^4 block
2024-06-18 16:31:37 -04:00
Peter Boyle
dc80b08969
96^3 test
2024-06-10 15:07:29 -04:00
Peter Boyle
0e607a55e7
Updated for 8^4 test
2024-05-26 20:53:05 +00:00
Peter Boyle
ad14a82742
Working aas good as possible on 48^3 in double
2024-05-16 10:55:45 -04:00
Peter Boyle
98cf247f33
prepare to switch to mixed precision
2024-04-30 05:23:45 -04:00
Peter Boyle
0cf16522d1
Refine with HDCG choice
2024-04-30 05:22:14 -04:00
Peter Boyle
5147a42818
Updated hdcg
2024-04-05 01:05:57 -04:00
Peter Boyle
5b79d51c22
Improvements
2024-04-01 14:18:40 -04:00
Peter Boyle
cc04dc42dc
Merge branch 'develop' into feature/scidac-wp1
2024-03-06 14:55:21 -05:00
Peter Boyle
070b61f08f
Simplifying the MultiRHS solver to make it do SRHS *and* MRHS
2024-03-06 14:04:33 -05:00
Peter Boyle
cd15abe9d1
Mrhs prep
2024-02-27 11:41:13 -05:00
Peter Boyle
eb702f581b
Running on 12 rhs on 18 nodes of frontier
2024-01-22 17:44:15 -05:00
Peter Boyle
d967eb53de
Working for first time
2024-01-17 16:31:12 -05:00
Peter Boyle
25f71913b7
MultiRHS coarse
2024-01-04 12:01:17 -05:00
Peter Boyle
d5fd90b2f3
Add 48^3 rtest
2024-01-04 12:00:01 -05:00
Peter Boyle
22c611bd1a
Delete temp file
2023-12-21 18:32:31 -05:00
Peter Boyle
c9bb1bf8ea
Passing new BLAs based
2023-12-21 18:31:17 -05:00
Peter Boyle
9e489887cf
General coarse multiRHS move to BLAS implementation
2023-12-21 15:24:48 -05:00
Peter Boyle
abcd6b8cb6
Faster version
2023-12-19 15:17:46 -05:00
Peter Boyle
6835a7f208
Better logging, test on 81 point stencil
2023-11-29 19:20:47 -05:00
Peter Boyle
f59993b979
Nbasis§
2023-11-29 09:47:36 -05:00
Peter Boyle
e859a199df
Reduce volume to interior for coarse stencil -- worth up to 4x gain
2023-11-28 10:23:16 -05:00
Peter Boyle
0a3682ad0b
MultiRHS work
2023-11-28 07:43:37 -05:00
Peter Boyle
59abaeb5cd
Time stamp
2023-11-24 12:56:45 -05:00
Peter Boyle
b302ad3d49
multiRHS test in place, passes Yay!
2023-11-23 18:20:15 -05:00
Peter Boyle
09946cf1ba
Improved, works on 48^3 moving to multiRHS optimisations
2023-11-15 18:03:05 -05:00
Peter Boyle
9c9c42d0df
Tests on frontier with real speed up . 3.5x on 16^3 at mq=0.01
2023-10-20 19:27:13 -04:00
Peter Boyle
0ae4478cd9
Checkpoint the subspace and ldop
2023-10-20 19:27:13 -04:00
Peter Boyle
ae4e705e09
Use random vec as easier for debug
2023-10-20 19:27:13 -04:00
david clarke
eb89579fe7
Merge remote-tracking branch 'origin/develop' into develop
2023-10-10 22:43:51 -06:00
Peter Boyle
2111e7ab5f
Run at physical mass
2023-10-06 21:20:21 -04:00
Peter Boyle
a751c42cc5
Checkpoint restore the setup
2023-10-06 21:03:08 -04:00
Peter Boyle
b58fd80379
I/O for coarse op and reorganise multigrid headers
2023-10-06 13:43:46 -04:00
Peter Boyle
3bc2da5321
Merge branch 'feature/scidac-wp1' of https://github.com/paboyle/Grid into feature/scidac-wp1
2023-10-05 16:57:59 -04:00
Peter Boyle
2d710d6bfd
Optimised parameters for 16^3
2023-10-05 16:56:55 -04:00
Peter Boyle
6532b7f32b
Eliminate older inefficient coarsening implementation
2023-10-05 16:56:15 -04:00
Peter Boyle
fcf5023845
Running on Frontier
2023-10-05 16:50:59 -04:00
Peter Boyle
737d3ffb98
ADEF1 and 1 hop projection
2023-10-03 14:22:18 -04:00
Peter Boyle
8a70314f54
Merge branch 'develop' into feature/scidac-wp1
2023-10-02 17:24:55 -04:00
Peter Boyle
e187bcb85c
Updating
2023-09-29 17:10:17 -04:00
Peter Boyle
be18ffe3b4
Further tuning and lanczos
2023-09-27 16:21:58 -04:00
Peter Boyle
3a86cce8c1
Compile
2023-09-27 16:19:18 -04:00
Peter Boyle
37884d369f
Coarse space is expensive, but gives a speed up in fine matrix multiplies now.
...
Down to optimisation
2023-09-25 17:24:19 -04:00
Peter Boyle
9246e653cd
Basic non-local coarsening of operator test
2023-09-25 17:20:58 -04:00
Peter Boyle
b9dcad89e8
Test cases for coarsening with non-local stencil
2023-09-07 10:53:22 -04:00
Peter Boyle
2b43308208
First cut non-local coarsening
2023-08-25 17:38:07 -04:00
Christopher Kelly
f44dce390f
Implemented acclerator-optimized versions of localCopyRegion and insertSliceLocal to speed up padding
...
Fixed const correctness on PaddedCell methods
Fixed compile issues on Crusher
Added timing breakdowns for PaddedCell::Expand and the padded implementations of the staples, visible under --log Performance
Optimized kernel for StaplePadded
Test_iwasaki_action_newstaple now repeats the calculation 10 times and reports average timings
2023-06-27 14:58:10 -04:00
Christopher Kelly
6f6844ccf1
Added new StapleAll and RectStapleAll functions that return the staples for all mu as an array
...
Modified plaq+rectangle gauge actions to use the above
Added a test code to confirm the above changes
2023-06-26 15:48:47 -04:00
Christopher Kelly
4c6613d72c
Modified RectStapleDouble and RectStapleOptimised to use Gauge-BC respecting CshiftLink
...
Added test code tests/debug/Test_optimized_staple_gaugebc demonstrating equivalence of above to RectStapleUnoptimised for cconj gauge BCs
Removed optimized staple only being used for periodic gauge BCs; it is now always used
2023-06-26 10:20:23 -04:00