Peter Boyle
59b0cc11df
REduce the time in single
2024-03-26 00:42:40 +00:00
Peter Boyle
d01e5fa838
Improved FlightRecorder
2024-03-22 15:42:32 +00:00
Peter Boyle
fab1efb48c
More britney logging improvements
2024-03-19 14:36:21 +00:00
2704b82084
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
2024-03-12 15:16:24 +00:00
cf8632bbac
Britney test option
2024-03-12 15:15:35 +00:00
2b4399f8b1
more HOST_NAME_MAX fix
2024-03-07 15:26:01 +09:00
9b5f741e85
Reproducing CG can be more useful now
2024-03-06 00:03:16 +00:00
Peter Boyle
436bf1d9d3
Merge pull request #455 from clarkedavida/hisq_fat_links
...
Hisq fat links
2024-02-29 15:29:39 -05:00
Dennis Bollweg
b507fe209c
Added SpinColourMatrix case to sliceSum Test
2024-02-27 11:28:32 -05:00
david clarke
94581e3c7a
accelerator_for is broken
2024-02-23 15:58:33 -07:00
Dennis Bollweg
15878f7613
sliceSumReduction_cub_large now also faster than CPU on Frontier
2024-02-16 13:55:21 -05:00
dbollweg
6f3455900e
Adding sliceSumReduction_cub_small/large since hipcub cannot deal with arb. large vobjs
2024-02-16 13:15:02 -05:00
dbollweg
b5659d106e
more test cases
2024-02-09 13:37:14 -05:00
dbollweg
9514035b87
refactor slicesum: slicesum uses GPU version by default now
2024-02-09 13:02:28 -05:00
dbollweg
ab2de131bd
work towards sliceSum for sycl backend
2024-02-06 13:24:45 -05:00
Dennis Bollweg
b8b9dc952d
Async memcpy's and cleanup
2024-02-01 17:55:35 -05:00
Dennis Bollweg
79a6ed32d8
Use accelerator_for2d and DeviceSegmentedRecude to avoid kernel launch latencies
2024-02-01 16:41:03 -05:00
dbollweg
caa5f97723
Add sliceSum gpu using cub/hipcub
2024-01-31 16:50:06 -05:00
david clarke
4924b3209e
projectU3 yields a unitary matrix
2024-01-23 14:43:58 -07:00
david clarke
f5b3d582b0
first attempt at U3 projection
2024-01-22 02:49:40 -07:00
david clarke
981c93d67a
update Test_fatLinks to accept Naik
2024-01-21 21:09:19 -07:00
david clarke
9cd4128833
fix naik bug
2023-11-03 14:11:38 -06:00
david clarke
df9b958c40
naik now returns separately
2023-10-30 17:40:53 -06:00
david clarke
3d3376d1a3
LePage works, trying Naik
2023-10-27 16:26:31 -06:00
david clarke
21ed6ac0f4
added floating-point support
2023-10-20 13:54:26 -06:00
david clarke
7bb8ab7000
improve smearing templating
2023-10-20 08:41:02 -06:00
david clarke
391fd9cc6a
try lepage term
2023-10-17 14:57:15 -06:00
david clarke
36600899e2
working 7-link; Grid_log; generalShift
2023-10-12 11:11:39 -06:00
david clarke
b9c70d156b
Merge branch 'develop' into hisq_fat_links
2023-10-10 22:44:17 -06:00
david clarke
eb89579fe7
Merge remote-tracking branch 'origin/develop' into develop
2023-10-10 22:43:51 -06:00
david clarke
0cfd13d18b
7-link working
2023-10-10 22:41:52 -06:00
Peter Boyle
c5f1420dea
Merge remote-tracking branch 'LupoA/develop' into LupoA-develop
2023-10-02 16:22:35 -04:00
Peter Boyle
018e6da872
Merge pull request #440 from giltirn/feature/paddedcellgauge
...
Feature/paddedcellgauge
2023-10-02 10:00:42 -04:00
david clarke
63d9b8e8a3
Merge remote-tracking branch 'origin/develop' into hisq_fat_links
2023-09-16 23:20:31 -06:00
david clarke
d247031c98
try 7-link
2023-09-16 23:18:16 -06:00
Peter Boyle
b8a7004365
Partial fraction test
2023-08-14 15:17:03 -04:00
david clarke
99d879ea7f
5-link first attempt
2023-08-11 22:56:30 -06:00
Julian Lenz
f7b79cdd45
Added test for ProjectSpn
2023-07-03 18:00:32 +01:00
Alessandro Lupo
b92428f05f
better test
2023-07-02 13:34:03 +01:00
Alessandro Lupo
34b11864b6
prettiest tests
2023-07-02 13:25:57 +01:00
david clarke
9d263d9a7d
fix bug in HISQSmearing; move benchmark b/c i don't understand how makefiles work
2023-06-28 10:05:34 -06:00
david clarke
9015c229dc
add benchmark to see whether matrix multiplication is slower than read from object
2023-06-27 21:28:26 -06:00
Christopher Kelly
f44dce390f
Implemented acclerator-optimized versions of localCopyRegion and insertSliceLocal to speed up padding
...
Fixed const correctness on PaddedCell methods
Fixed compile issues on Crusher
Added timing breakdowns for PaddedCell::Expand and the padded implementations of the staples, visible under --log Performance
Optimized kernel for StaplePadded
Test_iwasaki_action_newstaple now repeats the calculation 10 times and reports average timings
2023-06-27 14:58:10 -04:00
david clarke
a7eabaad56
rudimentary appendShift convenience method, which allows the user to append an arbitrary shift in one line
2023-06-26 23:59:28 -06:00
david clarke
eeb4703b84
develop wrappers to make the stencils easier to construct
2023-06-26 17:45:35 -06:00
Christopher Kelly
6f6844ccf1
Added new StapleAll and RectStapleAll functions that return the staples for all mu as an array
...
Modified plaq+rectangle gauge actions to use the above
Added a test code to confirm the above changes
2023-06-26 15:48:47 -04:00
Christopher Kelly
4c6613d72c
Modified RectStapleDouble and RectStapleOptimised to use Gauge-BC respecting CshiftLink
...
Added test code tests/debug/Test_optimized_staple_gaugebc demonstrating equivalence of above to RectStapleUnoptimised for cconj gauge BCs
Removed optimized staple only being used for periodic gauge BCs; it is now always used
2023-06-26 10:20:23 -04:00
Alessandro Lupo
cff1f8d3b8
rm unused variables and formatting
2023-06-23 16:04:18 +01:00
Alessandro Lupo
f27d2083cd
adjustments in SUn and Sp2n impl
2023-06-23 15:34:08 +01:00
Alessandro Lupo
de30c4e22a
minor improvements
2023-06-23 10:49:41 +01:00