Peter Boyle
d01e5fa838
Improved FlightRecorder
2024-03-22 15:42:32 +00:00
Peter Boyle
fab1efb48c
More britney logging improvements
2024-03-19 14:36:21 +00:00
2704b82084
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
2024-03-12 15:16:24 +00:00
cf8632bbac
Britney test option
2024-03-12 15:15:35 +00:00
2b4399f8b1
more HOST_NAME_MAX fix
2024-03-07 15:26:01 +09:00
9b5f741e85
Reproducing CG can be more useful now
2024-03-06 00:03:16 +00:00
Peter Boyle
436bf1d9d3
Merge pull request #455 from clarkedavida/hisq_fat_links
...
Hisq fat links
2024-02-29 15:29:39 -05:00
Dennis Bollweg
b507fe209c
Added SpinColourMatrix case to sliceSum Test
2024-02-27 11:28:32 -05:00
david clarke
94581e3c7a
accelerator_for is broken
2024-02-23 15:58:33 -07:00
Dennis Bollweg
15878f7613
sliceSumReduction_cub_large now also faster than CPU on Frontier
2024-02-16 13:55:21 -05:00
dbollweg
6f3455900e
Adding sliceSumReduction_cub_small/large since hipcub cannot deal with arb. large vobjs
2024-02-16 13:15:02 -05:00
dbollweg
b5659d106e
more test cases
2024-02-09 13:37:14 -05:00
dbollweg
9514035b87
refactor slicesum: slicesum uses GPU version by default now
2024-02-09 13:02:28 -05:00
dbollweg
ab2de131bd
work towards sliceSum for sycl backend
2024-02-06 13:24:45 -05:00
Dennis Bollweg
b8b9dc952d
Async memcpy's and cleanup
2024-02-01 17:55:35 -05:00
Dennis Bollweg
79a6ed32d8
Use accelerator_for2d and DeviceSegmentedRecude to avoid kernel launch latencies
2024-02-01 16:41:03 -05:00
dbollweg
caa5f97723
Add sliceSum gpu using cub/hipcub
2024-01-31 16:50:06 -05:00
david clarke
4924b3209e
projectU3 yields a unitary matrix
2024-01-23 14:43:58 -07:00
david clarke
f5b3d582b0
first attempt at U3 projection
2024-01-22 02:49:40 -07:00
david clarke
981c93d67a
update Test_fatLinks to accept Naik
2024-01-21 21:09:19 -07:00
david clarke
9cd4128833
fix naik bug
2023-11-03 14:11:38 -06:00
david clarke
df9b958c40
naik now returns separately
2023-10-30 17:40:53 -06:00
david clarke
3d3376d1a3
LePage works, trying Naik
2023-10-27 16:26:31 -06:00
david clarke
21ed6ac0f4
added floating-point support
2023-10-20 13:54:26 -06:00
david clarke
7bb8ab7000
improve smearing templating
2023-10-20 08:41:02 -06:00
david clarke
391fd9cc6a
try lepage term
2023-10-17 14:57:15 -06:00
david clarke
36600899e2
working 7-link; Grid_log; generalShift
2023-10-12 11:11:39 -06:00
david clarke
b9c70d156b
Merge branch 'develop' into hisq_fat_links
2023-10-10 22:44:17 -06:00
david clarke
eb89579fe7
Merge remote-tracking branch 'origin/develop' into develop
2023-10-10 22:43:51 -06:00
david clarke
0cfd13d18b
7-link working
2023-10-10 22:41:52 -06:00
Peter Boyle
c5f1420dea
Merge remote-tracking branch 'LupoA/develop' into LupoA-develop
2023-10-02 16:22:35 -04:00
Peter Boyle
018e6da872
Merge pull request #440 from giltirn/feature/paddedcellgauge
...
Feature/paddedcellgauge
2023-10-02 10:00:42 -04:00
david clarke
63d9b8e8a3
Merge remote-tracking branch 'origin/develop' into hisq_fat_links
2023-09-16 23:20:31 -06:00
david clarke
d247031c98
try 7-link
2023-09-16 23:18:16 -06:00
Peter Boyle
b8a7004365
Partial fraction test
2023-08-14 15:17:03 -04:00
david clarke
99d879ea7f
5-link first attempt
2023-08-11 22:56:30 -06:00
Julian Lenz
f7b79cdd45
Added test for ProjectSpn
2023-07-03 18:00:32 +01:00
Alessandro Lupo
b92428f05f
better test
2023-07-02 13:34:03 +01:00
Alessandro Lupo
34b11864b6
prettiest tests
2023-07-02 13:25:57 +01:00
david clarke
9d263d9a7d
fix bug in HISQSmearing; move benchmark b/c i don't understand how makefiles work
2023-06-28 10:05:34 -06:00
david clarke
9015c229dc
add benchmark to see whether matrix multiplication is slower than read from object
2023-06-27 21:28:26 -06:00
Christopher Kelly
f44dce390f
Implemented acclerator-optimized versions of localCopyRegion and insertSliceLocal to speed up padding
...
Fixed const correctness on PaddedCell methods
Fixed compile issues on Crusher
Added timing breakdowns for PaddedCell::Expand and the padded implementations of the staples, visible under --log Performance
Optimized kernel for StaplePadded
Test_iwasaki_action_newstaple now repeats the calculation 10 times and reports average timings
2023-06-27 14:58:10 -04:00
david clarke
a7eabaad56
rudimentary appendShift convenience method, which allows the user to append an arbitrary shift in one line
2023-06-26 23:59:28 -06:00
david clarke
eeb4703b84
develop wrappers to make the stencils easier to construct
2023-06-26 17:45:35 -06:00
Christopher Kelly
6f6844ccf1
Added new StapleAll and RectStapleAll functions that return the staples for all mu as an array
...
Modified plaq+rectangle gauge actions to use the above
Added a test code to confirm the above changes
2023-06-26 15:48:47 -04:00
Christopher Kelly
4c6613d72c
Modified RectStapleDouble and RectStapleOptimised to use Gauge-BC respecting CshiftLink
...
Added test code tests/debug/Test_optimized_staple_gaugebc demonstrating equivalence of above to RectStapleUnoptimised for cconj gauge BCs
Removed optimized staple only being used for periodic gauge BCs; it is now always used
2023-06-26 10:20:23 -04:00
Alessandro Lupo
cff1f8d3b8
rm unused variables and formatting
2023-06-23 16:04:18 +01:00
Alessandro Lupo
f27d2083cd
adjustments in SUn and Sp2n impl
2023-06-23 15:34:08 +01:00
Alessandro Lupo
de30c4e22a
minor improvements
2023-06-23 10:49:41 +01:00
Christopher Kelly
4241c7d4a3
Imported coalescedReadGeneralPermute GPU implementation from Christoph
...
Fixed bug in padded staple code where extract was being called on the result before the GPU view was closed
Fixed compile issue with pointer cast in padded staple code
Added timing summaries of padded staple code and timing breakdown of staple implementation to Test_padded_cell_staple
2023-06-21 16:01:01 -04:00