Peter Boyle
|
ff2ea5de18
|
Update Tensor_traits.h
|
2024-04-11 14:25:45 -04:00 |
|
Peter Boyle
|
3ef2a41518
|
ifdef guard ommitted
|
2024-03-26 14:50:32 +00:00 |
|
Peter Boyle
|
aa96f420c6
|
Acclerator ware MPI guard on the Unix domain sockets
|
2024-03-26 14:41:25 +00:00 |
|
Peter Boyle
|
49e9e4ed0e
|
Fences
|
2024-03-26 14:14:06 +00:00 |
|
Peter Boyle
|
5404fc66ab
|
Merge needs a fence on SYCL
|
2024-03-26 00:38:41 +00:00 |
|
Peter Boyle
|
1f53458af8
|
Options to bounce through a host buffer if
--disable-accelerator-aware-mpi
|
2024-03-26 00:37:19 +00:00 |
|
Peter Boyle
|
434c3e7f1d
|
We have a choice of GET or PUT across NVlink
|
2024-03-25 14:32:44 +00:00 |
|
Peter Boyle
|
d1e9fe50d2
|
Xor csum for repro testing
|
2024-03-22 15:42:57 +00:00 |
|
Peter Boyle
|
1bd20cd9e8
|
FlightRecorder
|
2024-03-22 15:40:01 +00:00 |
|
Peter Boyle
|
e49e95b037
|
Upgrade of the Britney test with flight recorder and fast xor checksum
|
2024-03-22 15:39:27 +00:00 |
|
Peter Boyle
|
6f59fed563
|
Flight recorder, resurrecting the "world famous" Britney test
|
2024-03-22 15:32:32 +00:00 |
|
Peter Boyle
|
60b7f6c99d
|
Flight recorder, resurrecting the "world famous" Britney test
|
2024-03-22 15:32:26 +00:00 |
|
Peter Boyle
|
b92dfcc8d3
|
Flight recorder, resurrecting the "world famous" Britney test
|
2024-03-22 15:30:27 +00:00 |
|
Peter Boyle
|
f6fd6dd053
|
Flight recorder, resurrecting the "world famous" Britney test
|
2024-03-22 15:30:01 +00:00 |
|
Peter Boyle
|
fab1efb48c
|
More britney logging improvements
|
2024-03-19 14:36:21 +00:00 |
|
Peter Boyle
|
660eb76d93
|
FFTW from OneAPI
|
2024-03-19 14:28:33 +00:00 |
|
Peter Boyle
|
62e7bf024a
|
Updated flight logging for Britney test
|
2024-03-12 20:10:04 +00:00 |
|
Peter Boyle
|
95f3d69cf9
|
Extra hardware test hook
|
2024-03-12 20:09:37 +00:00 |
|
|
2704b82084
|
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
|
2024-03-12 15:16:24 +00:00 |
|
|
cf8632bbac
|
Britney test option
|
2024-03-12 15:15:35 +00:00 |
|
|
f17b8de907
|
fallback to _POSIX_HOST_NAME_MAX if HOST_NAME_MAX is not defined
|
2024-03-07 15:22:08 +09:00 |
|
Peter Boyle
|
7e5bd46dd3
|
Booster update
|
2024-03-06 19:03:45 +01:00 |
|
|
10116b3be8
|
Force device copyable and tell SYCL to shut it.
|
2024-03-06 01:13:27 +00:00 |
|
|
a46a0f0882
|
force device copyable and don't take crap from SYCL
|
2024-03-06 01:12:49 +00:00 |
|
|
1b93a9be88
|
Print out the hostname
|
2024-03-06 00:01:58 +00:00 |
|
|
783a66b348
|
Deterministic reduction please
|
2024-03-06 00:01:37 +00:00 |
|
|
976c3e9b59
|
Hack for flight logging CG inner products.
Can be made to work, but could put in some more serious infrastructure
for repro testing and blame attribution (Britney test) if necessary
|
2024-03-05 23:59:57 +00:00 |
|
|
f8ca971dae
|
Use of a bare PRECISION macro is not namespace safe and collides with
SYCL
|
2024-03-05 23:59:13 +00:00 |
|
|
21bc8c24df
|
OneMKL batched blas starting
|
2024-03-05 23:58:20 +00:00 |
|
|
30228214f7
|
SYCL conflict with Eigen
|
2024-03-05 23:56:10 +00:00 |
|
Peter Boyle
|
c805f86343
|
USQCD benchmark
|
2024-03-01 00:05:04 -05:00 |
|
Peter Boyle
|
88d8fa43d7
|
Benchmark development
|
2024-02-29 20:01:44 -05:00 |
|
Peter Boyle
|
3c49762875
|
Propagate in the blas routine
|
2024-02-29 15:33:06 -05:00 |
|
Peter Boyle
|
436bf1d9d3
|
Merge pull request #455 from clarkedavida/hisq_fat_links
Hisq fat links
|
2024-02-29 15:29:39 -05:00 |
|
david clarke
|
f70df6e195
|
changed NO_SHIFT and BACKWARD_CONST from define to enum
|
2024-02-29 12:29:30 -07:00 |
|
Peter Boyle
|
ee1b8bbdbd
|
Merge pull request #454 from edbennett/adjoint-broke
fix HMC for non-fundamental representations
|
2024-02-28 14:05:27 -05:00 |
|
Peter Boyle
|
3f1636637d
|
Merge pull request #453 from dbollweg/feature/sliceSum_gpu
Feature/slice sum gpu
|
2024-02-28 14:04:43 -05:00 |
|
Christoph Lehner
|
9f89486df5
|
remove unnecessary code path
|
2024-02-28 19:56:23 +01:00 |
|
Christoph Lehner
|
22b43b86cb
|
Make GPT test suite work with SYCL
|
2024-02-28 12:57:17 +01:00 |
|
dbollweg
|
3c9012676a
|
CUDA cub refuses to reduce vSpinColourMatrix, breaking up into smaller parts like already done for HIP case.
|
2024-02-27 12:41:45 -05:00 |
|
Dennis Bollweg
|
6cd2d8fcd5
|
Replace cuda/hip memcpy with Grid functions
|
2024-02-26 09:55:07 -05:00 |
|
david clarke
|
b02d022993
|
fixed race condition (thx michael)
|
2024-02-23 17:14:28 -07:00 |
|
david clarke
|
94581e3c7a
|
accelerator_for is broken
|
2024-02-23 15:58:33 -07:00 |
|
david clarke
|
88b52cc045
|
Merge branch 'develop' into hisq_fat_links
|
2024-02-23 14:47:15 -07:00 |
|
dbollweg
|
0a816b5509
|
Merge branch 'feature/sliceSum_gpu' of https://github.com/dbollweg/Grid into feature/sliceSum_gpu
|
2024-02-22 21:43:06 -05:00 |
|
dbollweg
|
1c8b807c2e
|
free malloc'd memory
|
2024-02-22 21:42:44 -05:00 |
|
Christoph Lehner
|
66391f84f2
|
Merge branch 'feature/gpt' of ../Grid into develop
|
2024-02-21 19:05:00 +01:00 |
|
Ed Bennett
|
97f7a9ecb3
|
fix HMC for non-fundamental representations
|
2024-02-21 08:27:55 +00:00 |
|
Dennis Bollweg
|
15878f7613
|
sliceSumReduction_cub_large now also faster than CPU on Frontier
|
2024-02-16 13:55:21 -05:00 |
|
dbollweg
|
e0d5e3c6c7
|
Merge branch 'paboyle:develop' into feature/sliceSum_gpu
|
2024-02-16 13:16:37 -05:00 |
|