dbollweg
|
0a816b5509
|
Merge branch 'feature/sliceSum_gpu' of https://github.com/dbollweg/Grid into feature/sliceSum_gpu
|
2024-02-22 21:43:06 -05:00 |
|
dbollweg
|
1c8b807c2e
|
free malloc'd memory
|
2024-02-22 21:42:44 -05:00 |
|
Dennis Bollweg
|
15878f7613
|
sliceSumReduction_cub_large now also faster than CPU on Frontier
|
2024-02-16 13:55:21 -05:00 |
|
dbollweg
|
e0d5e3c6c7
|
Merge branch 'paboyle:develop' into feature/sliceSum_gpu
|
2024-02-16 13:16:37 -05:00 |
|
dbollweg
|
6f3455900e
|
Adding sliceSumReduction_cub_small/large since hipcub cannot deal with arb. large vobjs
|
2024-02-16 13:15:02 -05:00 |
|
|
73c0b29535
|
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
|
2024-02-13 20:19:32 +00:00 |
|
|
303b83cdb8
|
Scaling benchmarks, verbosity and MPICH aware in acceleratorInit()
For some reason Dirichlet benchmark fails on several nodes; need to
debug this.
|
2024-02-13 19:48:03 +00:00 |
|
|
5ef4da3f29
|
Silence verbose
|
2024-02-13 19:47:36 +00:00 |
|
|
1502860004
|
Benchmark scripts
|
2024-02-13 19:47:02 +00:00 |
|
|
585efc6f3f
|
More benchmark scripts
|
2024-02-13 19:40:49 +00:00 |
|
|
62055e04dd
|
missing semicolon generates error with some compilers
|
2024-02-13 18:18:27 +01:00 |
|
dbollweg
|
b5659d106e
|
more test cases
|
2024-02-09 13:37:14 -05:00 |
|
dbollweg
|
4b43307402
|
Undo include path changes for level zero api header
|
2024-02-09 13:07:56 -05:00 |
|
dbollweg
|
09af8c25a2
|
Merge branch 'paboyle:develop' into feature/sliceSum_gpu
|
2024-02-09 13:02:59 -05:00 |
|
dbollweg
|
9514035b87
|
refactor slicesum: slicesum uses GPU version by default now
|
2024-02-09 13:02:28 -05:00 |
|
|
7019916294
|
RNG seed change safer for large volumes; this is a long term solution
|
2024-02-07 00:56:39 +00:00 |
|
dbollweg
|
1514b4f137
|
slicesum_sycl passes test
|
2024-02-06 19:08:44 -05:00 |
|
|
91cf5ee312
|
Updated bench script
|
2024-02-06 23:45:10 +00:00 |
|
dbollweg
|
ab2de131bd
|
work towards sliceSum for sycl backend
|
2024-02-06 13:24:45 -05:00 |
|
|
5bfa88be85
|
Aurora MPI standalone benchmake and options that work well
|
2024-02-06 16:28:40 +00:00 |
|
Dennis Bollweg
|
5af8da76d7
|
Fix cuda compilation of Lattice_slicesum_gpu.h
|
2024-02-01 18:02:30 -05:00 |
|
Dennis Bollweg
|
b8b9dc952d
|
Async memcpy's and cleanup
|
2024-02-01 17:55:35 -05:00 |
|
Dennis Bollweg
|
79a6ed32d8
|
Use accelerator_for2d and DeviceSegmentedRecude to avoid kernel launch latencies
|
2024-02-01 16:41:03 -05:00 |
|
dbollweg
|
caa5f97723
|
Add sliceSum gpu using cub/hipcub
|
2024-01-31 16:50:06 -05:00 |
|
|
2a0d75bac2
|
Aurora files
|
2023-12-21 23:20:17 +00:00 |
|
Peter Boyle
|
f48298ad4e
|
Bug fix
|
2023-12-11 20:57:02 -05:00 |
|
root
|
645e47c1ba
|
Config for Ampere Altra ARM
|
2023-12-08 16:17:56 -05:00 |
|
Peter Boyle
|
d1d9827263
|
Integrator logging update
|
2023-12-08 12:14:00 -05:00 |
|
Peter Boyle
|
14643c0aab
|
SDCC benchmarking scripts for A100 nodes and IceLake nodes (AVX512)
|
2023-12-04 15:45:57 -05:00 |
|
Peter Boyle
|
b77a9b8947
|
SDDC compiles starting
|
2023-11-30 14:31:51 -05:00 |
|
Peter Boyle
|
7d077fe493
|
Frontier compiel
|
2023-11-09 13:58:44 -05:00 |
|
Peter Boyle
|
51051df62c
|
3GeV run setup
|
2023-10-16 20:49:52 +03:00 |
|
Peter Boyle
|
33097681b9
|
FTHMC compiled and merged to develop
|
2023-10-14 00:42:55 +03:00 |
|
Peter Boyle
|
07e4900218
|
FTHMC commit
|
2023-10-13 18:21:57 +03:00 |
|
Peter Boyle
|
36ab567d67
|
FTHMC 3 Gev
|
2023-10-13 18:21:57 +03:00 |
|
Peter Boyle
|
e19171523b
|
FTHMC Status at lattice conference commit
|
2023-10-13 18:21:56 +03:00 |
|
Peter Boyle
|
9626a2c7c0
|
Asynch handling
|
2023-10-13 18:21:56 +03:00 |
|
Peter Boyle
|
e936f5b80b
|
IfGridTensor shorthand
|
2023-10-13 18:21:56 +03:00 |
|
Peter Boyle
|
ffc0639cb9
|
Running in HMC tests
|
2023-10-13 18:21:56 +03:00 |
|
Peter Boyle
|
c5b43b322c
|
traceProduct eliminates non-contributing intermediate terms
|
2023-10-13 18:21:56 +03:00 |
|
Peter Boyle
|
c9c4576237
|
Improved frontier cshift
|
2023-10-13 18:21:56 +03:00 |
|
Peter Boyle
|
6d0c2de399
|
Deprecate teh PVC directory and make a PVC-OEM generic PVC target with
no queueing system dependency -- just interactive scripts
|
2023-10-03 17:04:20 +00:00 |
|
Peter Boyle
|
7786ea9921
|
Bug fix in script
|
2023-10-03 09:58:44 -07:00 |
|
Peter Boyle
|
d93eac7b1c
|
Performance regressed and is OK in icpx 2023.2
|
2023-10-03 15:53:14 +00:00 |
|
Peter Boyle
|
afc316f501
|
Rename headers
|
2023-10-02 16:25:11 -04:00 |
|
Peter Boyle
|
f14bfd5c1b
|
Relocate sub includes
|
2023-10-02 16:23:38 -04:00 |
|
Peter Boyle
|
c5f1420dea
|
Merge remote-tracking branch 'LupoA/develop' into LupoA-develop
|
2023-10-02 16:22:35 -04:00 |
|
Peter Boyle
|
018e6da872
|
Merge pull request #440 from giltirn/feature/paddedcellgauge
Feature/paddedcellgauge
|
2023-10-02 10:00:42 -04:00 |
|
Peter Boyle
|
b77bccfac2
|
Merge pull request #444 from mmphys/feature/docX
Update doc complete list of Macports needed to build Grid on a fresh Mac
|
2023-10-02 09:57:11 -04:00 |
|
Peter Boyle
|
80359e0d49
|
Bland SYCL compile
|
2023-09-26 13:20:27 -07:00 |
|