1
0
mirror of https://github.com/paboyle/Grid.git synced 2026-05-26 12:04:17 +01:00
Commit Graph

105 Commits

Author SHA1 Message Date
paboyle 4788dd8e2e More states in packet progression for GPU non aware MPI 2025-02-12 14:53:57 +00:00
paboyle 1cc5f221f3 GET not put ordering is better as I know when I've got all MY data 2025-02-12 14:53:05 +00:00
paboyle 93251bfba0 GET not put for better ordering in the downstream dependent kernels -- I
know when I'm done, so we can move a barrier / handshake between ranks
intranode to a point off critical path
2025-02-12 14:50:21 +00:00
paboyle 0baaddbe98 Pipeline mode commit on Aurora. 5+ TF/s on 16^3x32 per tile at 384
nodes.
More concurrency/fine grained scheduling is possible.
2025-02-04 19:27:26 +00:00
paboyle c4fc972fec Merge branch 'feature/deprecate-uvm' into develop 2025-01-31 16:32:36 +00:00
paboyle 8cf809e231 Best results on Aurora so far 2025-01-31 16:14:45 +00:00
paboyle 94019a922e Significantly better performance on Aurora without using pipeline mode 2025-01-30 16:36:46 +00:00
paboyle d6b2727f86 Pipeline mode getting better -- 2 nodes @ 10TF/s per node on Aurora 2025-01-29 09:22:21 +00:00
paboyle 74a4f43946 Optional host buffer bounce for no CUDA aware MPI 2025-01-28 15:22:46 +00:00
paboyle febfe4e77f Make my own reduction a configure flag 2024-10-15 14:32:35 +00:00
paboyle 2b5fdcbbc5 New software version 2024-10-10 21:59:02 +00:00
paboyle 295127d456 Deterministic homebrew reduction 2024-10-10 21:58:26 +00:00
Peter Boyle ee4046fe92 Added a dimension ordered column sum based reduction for scalar.
Removes dependence on MPI_Allreduce and allows for work around on
systems where this is bollox.
2024-09-27 09:26:03 -04:00
Peter Boyle 5c3ace7c3e Merge branch 'develop' into feature/scidac-wp1 2024-04-30 05:26:06 -04:00
Peter Boyle 3ef2a41518 ifdef guard ommitted 2024-03-26 14:50:32 +00:00
Peter Boyle aa96f420c6 Acclerator ware MPI guard on the Unix domain sockets 2024-03-26 14:41:25 +00:00
Peter Boyle 1f53458af8 Options to bounce through a host buffer if
--disable-accelerator-aware-mpi
2024-03-26 00:37:19 +00:00
Peter Boyle 434c3e7f1d We have a choice of GET or PUT across NVlink 2024-03-25 14:32:44 +00:00
Peter Boyle b6ad1bafc7 Normal memory SendToRecvFrom asynchronous for use in general stencil
code
2023-10-20 19:27:13 -04:00
Peter Boyle 3d437c5cc4 Making SYCL happy 2023-09-26 13:19:42 -07:00
Peter Boyle 519f795066 Header not liked by gcc on mac? puzzling 2023-05-22 10:21:12 -04:00
Peter Boyle 074627a5bd Pass file descriptors through AF_UNIX for level_zero 2023-04-17 21:50:52 +00:00
Peter Boyle bd891fb3f5 tests to compile 2023-04-12 18:32:44 -04:00
portelli 983b681d46 unused statement cleaning 2023-04-07 14:12:02 +01:00
Michael Marshall 5764d21161 Fixes for --enable-comms=none 2023-03-30 10:15:28 +01:00
Peter Boyle a7e1aceeca Compile fix on Nvidia 2023-03-29 14:36:50 -04:00
Peter Boyle 2376156fbc Merge branch 'develop' into feature/dirichlet 2023-03-27 21:33:50 -07:00
Peter Boyle dd3bbb8fa2 MOve the synchronise out to the stencil so one call instead of one call per packet 2023-03-27 17:27:45 -07:00
Peter Boyle 2fbcf13c46 SYCL fix 2023-03-27 14:25:14 -07:00
Peter Boyle f36b87deb5 syscall fix 2023-03-14 12:09:00 -07:00
Peter Boyle a11c12e2e7 Modifications for partial dirichlet BCs 2022-11-15 16:20:01 -05:00
Peter Boyle 204c283e16 Merge branch 'feature/dirichlet' of https://github.com/paboyle/Grid into feature/dirichlet 2022-10-11 14:59:07 -04:00
Peter Boyle 551a5f8dc8 RRII gpu option 2022-10-11 14:44:55 -04:00
Peter Boyle 1177b8f661 Merge branch 'develop' into feature/dirichlet 2022-08-31 19:05:57 -04:00
Peter Boyle 9295ed8d20 Print full memory range 2022-08-31 16:59:51 -04:00
Peter Boyle 06d9ce1a02 Synch ranks on node here for GPU - GPU memcopy 2022-08-04 13:35:56 -04:00
Peter Boyle 8137cc7049 Allways concurrent comms 2022-07-28 12:01:51 -04:00
Peter Boyle 2ab1af5754 Ensure no synchronize and not optoin dependent 2022-07-19 09:51:06 -07:00
Peter Boyle f7217d12d2 World barrier for clock synch 2022-07-11 13:45:31 -04:00
Peter Boyle 7eb29cf529 MPI fix 2022-05-28 15:51:34 -07:00
Peter Boyle 3f31afa4fc Clean up verbose 2022-05-24 18:18:51 -07:00
Peter Boyle aab3bcb46f Dirichlet first cut - wrong answers on dagger multiply.
Struggling to get a compute node so changing systems
2022-02-22 19:58:33 +00:00
Peter Boyle 135808dcfa Less verbose 2021-12-07 16:24:24 -05:00
Peter Boyle 2bf3b4d576 Update to reduce memory footpring in benchmark test 2021-12-07 09:02:02 -08:00
Peter Boyle 16c2a99965 Overlap cudamemcpy - didn't set up stream right 2021-10-11 13:31:26 -07:00
Peter Boyle 3206f69478 SYCL happy 2021-09-21 18:01:35 -07:00
Peter Boyle 8eb1232683 Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2021-09-21 09:25:07 -07:00
Peter Boyle c6ce3ad03b Some properties 2021-09-21 09:20:21 -07:00
Peter Boyle ca9816bfbb Typo 2021-09-21 04:12:04 +02:00
Peter Boyle 109507888b Option to force use of MPI over Nvlink 2021-09-21 00:53:25 +02:00