|
|
4788dd8e2e
|
More states in packet progression for GPU non aware MPI
|
2025-02-12 14:53:57 +00:00 |
|
|
|
1cc5f221f3
|
GET not put ordering is better as I know when I've got all MY data
|
2025-02-12 14:53:05 +00:00 |
|
|
|
93251bfba0
|
GET not put for better ordering in the downstream dependent kernels -- I
know when I'm done, so we can move a barrier / handshake between ranks
intranode to a point off critical path
|
2025-02-12 14:50:21 +00:00 |
|
|
|
0baaddbe98
|
Pipeline mode commit on Aurora. 5+ TF/s on 16^3x32 per tile at 384
nodes.
More concurrency/fine grained scheduling is possible.
|
2025-02-04 19:27:26 +00:00 |
|
|
|
c4fc972fec
|
Merge branch 'feature/deprecate-uvm' into develop
|
2025-01-31 16:32:36 +00:00 |
|
|
|
8cf809e231
|
Best results on Aurora so far
|
2025-01-31 16:14:45 +00:00 |
|
|
|
94019a922e
|
Significantly better performance on Aurora without using pipeline mode
|
2025-01-30 16:36:46 +00:00 |
|
|
|
d6b2727f86
|
Pipeline mode getting better -- 2 nodes @ 10TF/s per node on Aurora
|
2025-01-29 09:22:21 +00:00 |
|
|
|
74a4f43946
|
Optional host buffer bounce for no CUDA aware MPI
|
2025-01-28 15:22:46 +00:00 |
|
|
|
febfe4e77f
|
Make my own reduction a configure flag
|
2024-10-15 14:32:35 +00:00 |
|
|
|
2b5fdcbbc5
|
New software version
|
2024-10-10 21:59:02 +00:00 |
|
|
|
295127d456
|
Deterministic homebrew reduction
|
2024-10-10 21:58:26 +00:00 |
|
Peter Boyle
|
ee4046fe92
|
Added a dimension ordered column sum based reduction for scalar.
Removes dependence on MPI_Allreduce and allows for work around on
systems where this is bollox.
|
2024-09-27 09:26:03 -04:00 |
|
Peter Boyle
|
5c3ace7c3e
|
Merge branch 'develop' into feature/scidac-wp1
|
2024-04-30 05:26:06 -04:00 |
|
Peter Boyle
|
3ef2a41518
|
ifdef guard ommitted
|
2024-03-26 14:50:32 +00:00 |
|
Peter Boyle
|
aa96f420c6
|
Acclerator ware MPI guard on the Unix domain sockets
|
2024-03-26 14:41:25 +00:00 |
|
Peter Boyle
|
1f53458af8
|
Options to bounce through a host buffer if
--disable-accelerator-aware-mpi
|
2024-03-26 00:37:19 +00:00 |
|
Peter Boyle
|
434c3e7f1d
|
We have a choice of GET or PUT across NVlink
|
2024-03-25 14:32:44 +00:00 |
|
Peter Boyle
|
b6ad1bafc7
|
Normal memory SendToRecvFrom asynchronous for use in general stencil
code
|
2023-10-20 19:27:13 -04:00 |
|
Peter Boyle
|
3d437c5cc4
|
Making SYCL happy
|
2023-09-26 13:19:42 -07:00 |
|
Peter Boyle
|
519f795066
|
Header not liked by gcc on mac? puzzling
|
2023-05-22 10:21:12 -04:00 |
|
Peter Boyle
|
074627a5bd
|
Pass file descriptors through AF_UNIX for level_zero
|
2023-04-17 21:50:52 +00:00 |
|
Peter Boyle
|
bd891fb3f5
|
tests to compile
|
2023-04-12 18:32:44 -04:00 |
|
|
|
983b681d46
|
unused statement cleaning
|
2023-04-07 14:12:02 +01:00 |
|
Michael Marshall
|
5764d21161
|
Fixes for --enable-comms=none
|
2023-03-30 10:15:28 +01:00 |
|
Peter Boyle
|
a7e1aceeca
|
Compile fix on Nvidia
|
2023-03-29 14:36:50 -04:00 |
|
Peter Boyle
|
2376156fbc
|
Merge branch 'develop' into feature/dirichlet
|
2023-03-27 21:33:50 -07:00 |
|
Peter Boyle
|
dd3bbb8fa2
|
MOve the synchronise out to the stencil so one call instead of one call per packet
|
2023-03-27 17:27:45 -07:00 |
|
Peter Boyle
|
2fbcf13c46
|
SYCL fix
|
2023-03-27 14:25:14 -07:00 |
|
Peter Boyle
|
f36b87deb5
|
syscall fix
|
2023-03-14 12:09:00 -07:00 |
|
Peter Boyle
|
a11c12e2e7
|
Modifications for partial dirichlet BCs
|
2022-11-15 16:20:01 -05:00 |
|
Peter Boyle
|
204c283e16
|
Merge branch 'feature/dirichlet' of https://github.com/paboyle/Grid into feature/dirichlet
|
2022-10-11 14:59:07 -04:00 |
|
Peter Boyle
|
551a5f8dc8
|
RRII gpu option
|
2022-10-11 14:44:55 -04:00 |
|
Peter Boyle
|
1177b8f661
|
Merge branch 'develop' into feature/dirichlet
|
2022-08-31 19:05:57 -04:00 |
|
Peter Boyle
|
9295ed8d20
|
Print full memory range
|
2022-08-31 16:59:51 -04:00 |
|
Peter Boyle
|
06d9ce1a02
|
Synch ranks on node here for GPU - GPU memcopy
|
2022-08-04 13:35:56 -04:00 |
|
Peter Boyle
|
8137cc7049
|
Allways concurrent comms
|
2022-07-28 12:01:51 -04:00 |
|
Peter Boyle
|
2ab1af5754
|
Ensure no synchronize and not optoin dependent
|
2022-07-19 09:51:06 -07:00 |
|
Peter Boyle
|
f7217d12d2
|
World barrier for clock synch
|
2022-07-11 13:45:31 -04:00 |
|
Peter Boyle
|
7eb29cf529
|
MPI fix
|
2022-05-28 15:51:34 -07:00 |
|
Peter Boyle
|
3f31afa4fc
|
Clean up verbose
|
2022-05-24 18:18:51 -07:00 |
|
Peter Boyle
|
aab3bcb46f
|
Dirichlet first cut - wrong answers on dagger multiply.
Struggling to get a compute node so changing systems
|
2022-02-22 19:58:33 +00:00 |
|
Peter Boyle
|
135808dcfa
|
Less verbose
|
2021-12-07 16:24:24 -05:00 |
|
Peter Boyle
|
2bf3b4d576
|
Update to reduce memory footpring in benchmark test
|
2021-12-07 09:02:02 -08:00 |
|
Peter Boyle
|
16c2a99965
|
Overlap cudamemcpy - didn't set up stream right
|
2021-10-11 13:31:26 -07:00 |
|
Peter Boyle
|
3206f69478
|
SYCL happy
|
2021-09-21 18:01:35 -07:00 |
|
Peter Boyle
|
8eb1232683
|
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
|
2021-09-21 09:25:07 -07:00 |
|
Peter Boyle
|
c6ce3ad03b
|
Some properties
|
2021-09-21 09:20:21 -07:00 |
|
Peter Boyle
|
ca9816bfbb
|
Typo
|
2021-09-21 04:12:04 +02:00 |
|
Peter Boyle
|
109507888b
|
Option to force use of MPI over Nvlink
|
2021-09-21 00:53:25 +02:00 |
|