22c611bd1a
Delete temp file
2023-12-21 18:32:31 -05:00
c9bb1bf8ea
Passing new BLAs based
2023-12-21 18:31:17 -05:00
9e489887cf
General coarse multiRHS move to BLAS implementation
2023-12-21 15:24:48 -05:00
abcd6b8cb6
Faster version
2023-12-19 15:17:46 -05:00
6835a7f208
Better logging, test on 81 point stencil
2023-11-29 19:20:47 -05:00
f59993b979
Nbasis§
2023-11-29 09:47:36 -05:00
e859a199df
Reduce volume to interior for coarse stencil -- worth up to 4x gain
2023-11-28 10:23:16 -05:00
0a3682ad0b
MultiRHS work
2023-11-28 07:43:37 -05:00
59abaeb5cd
Time stamp
2023-11-24 12:56:45 -05:00
b302ad3d49
multiRHS test in place, passes Yay!
2023-11-23 18:20:15 -05:00
09946cf1ba
Improved, works on 48^3 moving to multiRHS optimisations
2023-11-15 18:03:05 -05:00
9c9c42d0df
Tests on frontier with real speed up . 3.5x on 16^3 at mq=0.01
2023-10-20 19:27:13 -04:00
0ae4478cd9
Checkpoint the subspace and ldop
2023-10-20 19:27:13 -04:00
ae4e705e09
Use random vec as easier for debug
2023-10-20 19:27:13 -04:00
2111e7ab5f
Run at physical mass
2023-10-06 21:20:21 -04:00
a751c42cc5
Checkpoint restore the setup
2023-10-06 21:03:08 -04:00
b58fd80379
I/O for coarse op and reorganise multigrid headers
2023-10-06 13:43:46 -04:00
3bc2da5321
Merge branch 'feature/scidac-wp1' of https://github.com/paboyle/Grid into feature/scidac-wp1
2023-10-05 16:57:59 -04:00
2d710d6bfd
Optimised parameters for 16^3
2023-10-05 16:56:55 -04:00
6532b7f32b
Eliminate older inefficient coarsening implementation
2023-10-05 16:56:15 -04:00
fcf5023845
Running on Frontier
2023-10-05 16:50:59 -04:00
737d3ffb98
ADEF1 and 1 hop projection
2023-10-03 14:22:18 -04:00
8a70314f54
Merge branch 'develop' into feature/scidac-wp1
2023-10-02 17:24:55 -04:00
e187bcb85c
Updating
2023-09-29 17:10:17 -04:00
be18ffe3b4
Further tuning and lanczos
2023-09-27 16:21:58 -04:00
3a86cce8c1
Compile
2023-09-27 16:19:18 -04:00
37884d369f
Coarse space is expensive, but gives a speed up in fine matrix multiplies now.
...
Down to optimisation
2023-09-25 17:24:19 -04:00
9246e653cd
Basic non-local coarsening of operator test
2023-09-25 17:20:58 -04:00
b9dcad89e8
Test cases for coarsening with non-local stencil
2023-09-07 10:53:22 -04:00
2b43308208
First cut non-local coarsening
2023-08-25 17:38:07 -04:00
f44dce390f
Implemented acclerator-optimized versions of localCopyRegion and insertSliceLocal to speed up padding
...
Fixed const correctness on PaddedCell methods
Fixed compile issues on Crusher
Added timing breakdowns for PaddedCell::Expand and the padded implementations of the staples, visible under --log Performance
Optimized kernel for StaplePadded
Test_iwasaki_action_newstaple now repeats the calculation 10 times and reports average timings
2023-06-27 14:58:10 -04:00
6f6844ccf1
Added new StapleAll and RectStapleAll functions that return the staples for all mu as an array
...
Modified plaq+rectangle gauge actions to use the above
Added a test code to confirm the above changes
2023-06-26 15:48:47 -04:00
4c6613d72c
Modified RectStapleDouble and RectStapleOptimised to use Gauge-BC respecting CshiftLink
...
Added test code tests/debug/Test_optimized_staple_gaugebc demonstrating equivalence of above to RectStapleUnoptimised for cconj gauge BCs
Removed optimized staple only being used for periodic gauge BCs; it is now always used
2023-06-26 10:20:23 -04:00
4241c7d4a3
Imported coalescedReadGeneralPermute GPU implementation from Christoph
...
Fixed bug in padded staple code where extract was being called on the result before the GPU view was closed
Fixed compile issue with pointer cast in padded staple code
Added timing summaries of padded staple code and timing breakdown of staple implementation to Test_padded_cell_staple
2023-06-21 16:01:01 -04:00
7b11075102
The user can now specify the implementation of Cshift used by the PaddedCell class through a virtual base class API. Implementations for default (regular Cshift) and for gauge links (which respects the gauge BCs)
...
Fixed const-correctness for PaddedCell and ConjugateGimpl::setDirections
Modified test code for padded-cell implementation of staple, rect-staple to use cconj BCs
2023-06-20 17:09:56 -04:00
abc658dca5
Added coalescedReadGeneralPermute CPU implementation based on Christoph's GPT code
...
In a test code, implemented a padded-cell version of the staple and rectangular-staple calculation
2023-06-20 16:14:25 -04:00
9c8750f261
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
2023-05-11 12:29:09 -04:00
ccd21f96ff
Plaquette agreeing and moving to final form (slowly) need to optimise
2023-02-01 22:57:44 -05:00
4b90cb8888
First cut passes combining padded cell with general stencil towards fast plaquette and staggered force
2023-02-01 22:14:10 -05:00
3dbfce5223
Tests clean build on HIP
2022-11-16 20:15:51 -05:00
8cd4263974
Tests compile
2021-04-25 22:20:37 -04:00
2983b6fdf6
Optional (superficial) changes to make comparison with Hadrons WardIdentity module easier: use Schur solver; example of Hadrons random gauge init; logging updates; only solve reverse propagator if provided
2021-01-23 12:41:48 +00:00
11a5fd09d6
Hot config
2021-01-21 21:39:41 -05:00
873519e960
Enable existing conserved current code for CUDA (compiles OK for CUDA 10.1). Add option to Test_cayley_mres to load a configuration
2020-12-14 16:06:10 +00:00
d201277652
Expose Nc as a compile time configure option.
...
Remove precision option
2020-10-07 13:07:00 -04:00
d982a5b6d5
Fix coaarsened
2020-09-01 00:14:04 -04:00
1a4c8c3387
Global edit with change to View usage. autoView() creates a wrapper object that closes the view when scope closes.
2020-06-05 18:52:35 -04:00
f999408e92
View locatoin and access mode
2020-05-21 16:14:20 -04:00
29ae5615c0
Seqeuential fix
2020-04-29 03:05:15 -04:00
ed70cce542
Test for 5D DWF obserevables
2020-04-23 04:29:45 -04:00