Peter Boyle
fab1efb48c
More britney logging improvements
2024-03-19 14:36:21 +00:00
Peter Boyle
95f3d69cf9
Extra hardware test hook
2024-03-12 20:09:37 +00:00
cf8632bbac
Britney test option
2024-03-12 15:15:35 +00:00
976c3e9b59
Hack for flight logging CG inner products.
...
Can be made to work, but could put in some more serious infrastructure
for repro testing and blame attribution (Britney test) if necessary
2024-03-05 23:59:57 +00:00
dbollweg
9514035b87
refactor slicesum: slicesum uses GPU version by default now
2024-02-09 13:02:28 -05:00
dbollweg
ab2de131bd
work towards sliceSum for sycl backend
2024-02-06 13:24:45 -05:00
dbollweg
caa5f97723
Add sliceSum gpu using cub/hipcub
2024-01-31 16:50:06 -05:00
Peter Boyle
2376156fbc
Merge branch 'develop' into feature/dirichlet
2023-03-27 21:33:50 -07:00
Christoph Lehner
458c943987
merged upstream
2022-12-31 11:16:21 +02:00
Christoph Lehner
88015b0858
Split sum in rankSum and GlobalSum
2022-12-26 10:01:32 +01:00
Peter Boyle
dc747c54be
Merge branch 'develop' into feature/dirichlet
...
Conflicts:
Grid/qcd/action/fermion/WilsonCompressor.h
Grid/stencil/Stencil.h
2022-12-13 08:24:58 -05:00
Peter Boyle
82e959f66c
SYCL reduction
2022-11-08 12:45:25 -08:00
Peter Boyle
551a5f8dc8
RRII gpu option
2022-10-11 14:44:55 -04:00
Christopher Kelly
19da647e3c
Added support for non-periodic gauge field implementations in the random gauge shift performed at the start of the HMC trajectory
...
(The above required exposing the gauge implementation to the HMC class through the Integrator class)
Made the random shift optional (default on) through a parameter in HMCparameters
Modified ConjugateBC::CshiftLink such that it supports any shift in -L < shift < L rather than just +-1
Added a tester for the BC-respecting Cshift
Fixed a missing system header include in SSE4 intrinsics wrapper
Fixed sumD_cpu for single-prec types performing an incorrect conversion to a single-prec data type at the end, that fails to compile on some systems
2022-09-09 12:47:09 -04:00
Peter Boyle
1333319941
Tracing
2022-08-31 17:00:25 -04:00
Peter Boyle
588c2f3cb1
Faster axpy_norm and innerProduct
2022-07-01 09:44:58 -04:00
d4ae71b880
sum_gpu_large and sum_gpu templates added.
2022-03-02 15:40:18 +00:00
Peter Boyle
3e882f555d
Large / small sumD options
2022-03-01 08:54:45 -05:00
Thomas Wurm
9e5fb52eb9
Put GlobalSum outside the slice loop
2021-03-08 13:53:34 +01:00
Christopher Kelly
55de69a569
Fixed compile issues with maxLocalNorm2 for non-scalar lattices
...
maxLocalNorm2 test now reuses the random field
2021-02-08 12:03:16 -05:00
Peter Boyle
cd99edcc5f
maxLocalNorm2()
2021-02-04 18:25:49 -05:00
Peter Boyle
936c5ecf69
Reduction GPU no compile fix
2020-06-24 17:28:31 -04:00
Peter Boyle
22cfbdbbb3
Boost precision in inner products in single
2020-06-24 12:52:31 -04:00
Peter Boyle
cdf0a04fc5
Merge branch 'develop' into sycl
2020-06-09 04:00:12 -04:00
Peter Boyle
1a4c8c3387
Global edit with change to View usage. autoView() creates a wrapper object that closes the view when scope closes.
2020-06-05 18:52:35 -04:00
Peter Boyle
f67830587f
Accelerator loop use
2020-06-03 22:50:09 -04:00
Peter Boyle
cb0d1b3399
hopefullly fix buildd fail
2020-05-24 21:27:00 -04:00
Peter Boyle
d1f1ccc705
HIP changes
2020-05-24 21:18:49 -04:00
Peter Boyle
7860a50f70
Make view specify where and drive data motion - first cut.
...
This is a compile tiime option --enable-unified=yes/no
2020-05-21 16:13:16 -04:00
Peter Boyle
bbbee5660d
First compiile on HiP
2020-05-10 05:28:09 -04:00
Peter Boyle
28a1fcaaff
First compile against SYCL
2020-05-05 11:13:27 -07:00
Christoph Lehner
04863f8f38
debug new AcceleratorView
2020-05-04 16:07:03 -04:00
Christoph Lehner
2a1387e992
rankInnerProduct
2020-05-03 17:27:11 -04:00
Christoph Lehner
ddb192bac7
re-work double precision promotion for summit
2020-04-30 16:09:57 -04:00
Christoph Lehner
091d5c605e
towards more precise blocking
2020-04-17 04:25:28 -04:00
Daniel Richtmann
5fc8a273e7
Fused innerProduct + norm2 on first argument operation
2020-04-06 11:52:29 +02:00
Fionn O hOgain
5de9547db5
Removing old debug code
2019-10-08 15:51:28 +01:00
Peter Boyle
be37dfb6f8
Remove debug code
2019-08-15 01:31:40 +01:00
Peter Boyle
3e49dc8a67
Reduction finished and hopefully fixes CI regression fail on single precisoin and force
2019-08-14 15:18:34 +01:00
Peter Boyle
ce97638bac
Think the reduction is now sorted and cleaned up
2019-08-11 11:09:01 +01:00
Peter Boyle
9117f61109
GPU friendly
2019-07-31 01:22:54 +01:00
Peter Boyle
9dad7a0094
Reproducible reduction and axpy_norm offload from Gianluca.
...
Hopefully get CG running entirely on GPU
2019-07-30 00:14:12 +01:00
Peter Boyle
b7e6d111d7
Thread loop changes. Need to offload this file
2019-06-15 07:59:10 +01:00
Peter Boyle
dc5024e88c
The GPU reduction was not working for me and causing errors. Need to revisit.
...
Gianluca is working on deterministic reduction/
2019-06-08 13:39:11 +01:00
gfilaci
1a82533d22
fix inner product with thrust reduction
2019-05-14 15:35:54 +01:00
Peter Boyle
204a090497
Inner product is not working on GPU. Why?
2019-04-28 07:31:56 +01:00
Peter Boyle
c5e081d69c
Re-Merge branch 'develop' into feature/gpu-port
...
Pull in Regensburg MultiGrid pull request
2019-01-03 01:50:16 +00:00
Peter Boyle
715babeac8
GPU reductions first cut; use thrust, non-reproducible. Inclusive scan can fix this if desired.
...
Local reduction to LatticeComplex and then further reduction.
2019-01-01 13:53:37 +00:00
Peter Boyle
422764757d
Updates in tests to make all of Grid compile
2018-12-14 16:55:54 +00:00
Peter Boyle
b57a4d32aa
Merge branch 'develop' into feature/gpu-port
2018-12-13 05:11:34 +00:00