Peter Boyle
|
e652fc2825
|
Shared Memory test reenabled on every Grid object creation.
Const improvements in Accelerator.h
|
2025-04-07 11:51:40 -04:00 |
|
Peter Boyle
|
4f89f603ae
|
Changes to add back shared memory test on GPU
|
2025-04-04 18:40:15 -04:00 |
|
Peter Boyle
|
adc90d3a86
|
NVLINK GET/PUT on cuda aware mpi
|
2025-04-04 18:35:05 -04:00 |
|
Peter Boyle
|
ebbd015c5c
|
Deprecate shared memory copy as direction matters on nvidia GPU
|
2025-04-04 18:35:05 -04:00 |
|
Peter Boyle
|
4ab73b36b2
|
Deprecate shared memory copy as direction matters on GPU
|
2025-04-04 18:35:05 -04:00 |
|
Christoph Lehner
|
fe66c7ca30
|
verbosity
|
2025-03-13 12:49:36 +00:00 |
|
Christoph Lehner
|
d15a6c5933
|
Merge branch 'develop' of https://github.com/paboyle/Grid into feature-aurora
|
2025-03-13 07:29:55 +00:00 |
|
|
19f9378b98
|
Should work on Aurora nowb
|
2025-03-11 13:50:43 +00:00 |
|
Christoph Lehner
|
9ffd1ed4ce
|
Merged
|
2025-03-08 15:30:08 +00:00 |
|
|
1d22841811
|
Working on aurora, GPT issue turned up is fixed
|
2025-03-06 03:20:18 +00:00 |
|
|
438dfbdb83
|
Only throw if there is a pending list entry in CommsComplete
|
2025-02-25 16:57:27 +00:00 |
|
|
4788dd8e2e
|
More states in packet progression for GPU non aware MPI
|
2025-02-12 14:53:57 +00:00 |
|
|
1cc5f221f3
|
GET not put ordering is better as I know when I've got all MY data
|
2025-02-12 14:53:05 +00:00 |
|
|
93251bfba0
|
GET not put for better ordering in the downstream dependent kernels -- I
know when I'm done, so we can move a barrier / handshake between ranks
intranode to a point off critical path
|
2025-02-12 14:50:21 +00:00 |
|
|
0baaddbe98
|
Pipeline mode commit on Aurora. 5+ TF/s on 16^3x32 per tile at 384
nodes.
More concurrency/fine grained scheduling is possible.
|
2025-02-04 19:27:26 +00:00 |
|
Christoph Lehner
|
84cab5e6e7
|
no comms and log cleanup
|
2025-02-01 16:37:21 +01:00 |
|
|
c4fc972fec
|
Merge branch 'feature/deprecate-uvm' into develop
|
2025-01-31 16:32:36 +00:00 |
|
|
8cf809e231
|
Best results on Aurora so far
|
2025-01-31 16:14:45 +00:00 |
|
|
94019a922e
|
Significantly better performance on Aurora without using pipeline mode
|
2025-01-30 16:36:46 +00:00 |
|
|
d6b2727f86
|
Pipeline mode getting better -- 2 nodes @ 10TF/s per node on Aurora
|
2025-01-29 09:22:21 +00:00 |
|
|
74a4f43946
|
Optional host buffer bounce for no CUDA aware MPI
|
2025-01-28 15:22:46 +00:00 |
|
|
febfe4e77f
|
Make my own reduction a configure flag
|
2024-10-15 14:32:35 +00:00 |
|
|
2b5fdcbbc5
|
New software version
|
2024-10-10 21:59:02 +00:00 |
|
|
295127d456
|
Deterministic homebrew reduction
|
2024-10-10 21:58:26 +00:00 |
|
Peter Boyle
|
ee4046fe92
|
Added a dimension ordered column sum based reduction for scalar.
Removes dependence on MPI_Allreduce and allows for work around on
systems where this is bollox.
|
2024-09-27 09:26:03 -04:00 |
|
Peter Boyle
|
5c3ace7c3e
|
Merge branch 'develop' into feature/scidac-wp1
|
2024-04-30 05:26:06 -04:00 |
|
Peter Boyle
|
3ef2a41518
|
ifdef guard ommitted
|
2024-03-26 14:50:32 +00:00 |
|
Peter Boyle
|
aa96f420c6
|
Acclerator ware MPI guard on the Unix domain sockets
|
2024-03-26 14:41:25 +00:00 |
|
Peter Boyle
|
1f53458af8
|
Options to bounce through a host buffer if
--disable-accelerator-aware-mpi
|
2024-03-26 00:37:19 +00:00 |
|
Peter Boyle
|
434c3e7f1d
|
We have a choice of GET or PUT across NVlink
|
2024-03-25 14:32:44 +00:00 |
|
Peter Boyle
|
b6ad1bafc7
|
Normal memory SendToRecvFrom asynchronous for use in general stencil
code
|
2023-10-20 19:27:13 -04:00 |
|
Peter Boyle
|
3d437c5cc4
|
Making SYCL happy
|
2023-09-26 13:19:42 -07:00 |
|
Peter Boyle
|
519f795066
|
Header not liked by gcc on mac? puzzling
|
2023-05-22 10:21:12 -04:00 |
|
Peter Boyle
|
074627a5bd
|
Pass file descriptors through AF_UNIX for level_zero
|
2023-04-17 21:50:52 +00:00 |
|
Peter Boyle
|
bd891fb3f5
|
tests to compile
|
2023-04-12 18:32:44 -04:00 |
|
|
983b681d46
|
unused statement cleaning
|
2023-04-07 14:12:02 +01:00 |
|
Michael Marshall
|
5764d21161
|
Fixes for --enable-comms=none
|
2023-03-30 10:15:28 +01:00 |
|
Peter Boyle
|
a7e1aceeca
|
Compile fix on Nvidia
|
2023-03-29 14:36:50 -04:00 |
|
Peter Boyle
|
2376156fbc
|
Merge branch 'develop' into feature/dirichlet
|
2023-03-27 21:33:50 -07:00 |
|
Peter Boyle
|
dd3bbb8fa2
|
MOve the synchronise out to the stencil so one call instead of one call per packet
|
2023-03-27 17:27:45 -07:00 |
|
Peter Boyle
|
2fbcf13c46
|
SYCL fix
|
2023-03-27 14:25:14 -07:00 |
|
Peter Boyle
|
f36b87deb5
|
syscall fix
|
2023-03-14 12:09:00 -07:00 |
|
Peter Boyle
|
a11c12e2e7
|
Modifications for partial dirichlet BCs
|
2022-11-15 16:20:01 -05:00 |
|
Peter Boyle
|
204c283e16
|
Merge branch 'feature/dirichlet' of https://github.com/paboyle/Grid into feature/dirichlet
|
2022-10-11 14:59:07 -04:00 |
|
Peter Boyle
|
551a5f8dc8
|
RRII gpu option
|
2022-10-11 14:44:55 -04:00 |
|
Peter Boyle
|
1177b8f661
|
Merge branch 'develop' into feature/dirichlet
|
2022-08-31 19:05:57 -04:00 |
|
Peter Boyle
|
9295ed8d20
|
Print full memory range
|
2022-08-31 16:59:51 -04:00 |
|
Peter Boyle
|
06d9ce1a02
|
Synch ranks on node here for GPU - GPU memcopy
|
2022-08-04 13:35:56 -04:00 |
|
Peter Boyle
|
8137cc7049
|
Allways concurrent comms
|
2022-07-28 12:01:51 -04:00 |
|
Peter Boyle
|
2ab1af5754
|
Ensure no synchronize and not optoin dependent
|
2022-07-19 09:51:06 -07:00 |
|