1
0
mirror of https://github.com/paboyle/Grid.git synced 2026-05-27 20:44:16 +01:00
Commit Graph

58 Commits

Author SHA1 Message Date
paboyle e5fa3d887f Compile on CUDA 2025-08-21 22:10:27 +01:00
Peter Boyle fe0db53842 FFT offload to GPU and MUCH faster comms.
40x speed up on Frontier
2025-08-21 16:45:38 -04:00
paboyle 9e6a4a4737 Assertion updates to macros (mostly) with backtrace.
WIlson flow to include options for DBW2, Iwasaki, Symanzik.
View logging for data assurance
2025-08-07 15:48:38 +00:00
paboyle 41f344bbd3 Merge with Christoph GPT checksum debug 2025-07-15 03:06:09 +00:00
paboyle bfae14d035 More flight logging 2025-06-27 06:07:34 +00:00
Peter Boyle 6ec5cee368 Preparing for compressed comms 2025-06-17 16:38:10 +02:00
paboyle d418f78352 Making running on Aurora more debuggable 2025-05-23 20:58:16 +00:00
Peter Boyle adc90d3a86 NVLINK GET/PUT on cuda aware mpi 2025-04-04 18:35:05 -04:00
paboyle 19f9378b98 Should work on Aurora nowb 2025-03-11 13:50:43 +00:00
paboyle 1d22841811 Working on aurora, GPT issue turned up is fixed 2025-03-06 03:20:18 +00:00
paboyle 1cc5f221f3 GET not put ordering is better as I know when I've got all MY data 2025-02-12 14:53:05 +00:00
paboyle 0baaddbe98 Pipeline mode commit on Aurora. 5+ TF/s on 16^3x32 per tile at 384
nodes.
More concurrency/fine grained scheduling is possible.
2025-02-04 19:27:26 +00:00
paboyle 8cf809e231 Best results on Aurora so far 2025-01-31 16:14:45 +00:00
paboyle 94019a922e Significantly better performance on Aurora without using pipeline mode 2025-01-30 16:36:46 +00:00
paboyle d6b2727f86 Pipeline mode getting better -- 2 nodes @ 10TF/s per node on Aurora 2025-01-29 09:22:21 +00:00
paboyle febfe4e77f Make my own reduction a configure flag 2024-10-15 14:32:35 +00:00
Peter Boyle 5c3ace7c3e Merge branch 'develop' into feature/scidac-wp1 2024-04-30 05:26:06 -04:00
Peter Boyle 434c3e7f1d We have a choice of GET or PUT across NVlink 2024-03-25 14:32:44 +00:00
Peter Boyle b6ad1bafc7 Normal memory SendToRecvFrom asynchronous for use in general stencil
code
2023-10-20 19:27:13 -04:00
Peter Boyle 2376156fbc Merge branch 'develop' into feature/dirichlet 2023-03-27 21:33:50 -07:00
Peter Boyle dd3bbb8fa2 MOve the synchronise out to the stencil so one call instead of one call per packet 2023-03-27 17:27:45 -07:00
Peter Boyle a11c12e2e7 Modifications for partial dirichlet BCs 2022-11-15 16:20:01 -05:00
Peter Boyle 1177b8f661 Merge branch 'develop' into feature/dirichlet 2022-08-31 19:05:57 -04:00
Peter Boyle 06d9ce1a02 Synch ranks on node here for GPU - GPU memcopy 2022-08-04 13:35:56 -04:00
Peter Boyle 8137cc7049 Allways concurrent comms 2022-07-28 12:01:51 -04:00
Peter Boyle 2ab1af5754 Ensure no synchronize and not optoin dependent 2022-07-19 09:51:06 -07:00
Peter Boyle f7217d12d2 World barrier for clock synch 2022-07-11 13:45:31 -04:00
Peter Boyle 7eb29cf529 MPI fix 2022-05-28 15:51:34 -07:00
Peter Boyle 3f31afa4fc Clean up verbose 2022-05-24 18:18:51 -07:00
Peter Boyle aab3bcb46f Dirichlet first cut - wrong answers on dagger multiply.
Struggling to get a compute node so changing systems
2022-02-22 19:58:33 +00:00
Peter Boyle 135808dcfa Less verbose 2021-12-07 16:24:24 -05:00
Peter Boyle 2bf3b4d576 Update to reduce memory footpring in benchmark test 2021-12-07 09:02:02 -08:00
Peter Boyle 16c2a99965 Overlap cudamemcpy - didn't set up stream right 2021-10-11 13:31:26 -07:00
Peter Boyle c0d56a1c04 Perlmutter tune up 2021-09-22 06:02:34 -07:00
Peter Boyle ca9816bfbb Typo 2021-09-21 04:12:04 +02:00
Peter Boyle 109507888b Option to force use of MPI over Nvlink 2021-09-21 00:53:25 +02:00
Peter Boyle 8195890640 Force MPI over NVLINK 2021-09-14 05:00:17 +01:00
Peter Boyle cd99edcc5f maxLocalNorm2() 2021-02-04 18:25:49 -05:00
Peter Boyle d05ce01809 TOFU behaviour now optional THREAD_MULTIPLE or THREAD_SERIALIZED 2020-11-13 03:52:19 +01:00
Peter Boyle a8309638d4 UVM check in MPI calls 2020-09-03 20:29:26 -04:00
Peter Boyle 0c3095e173 Comms buffers to device memory 2020-09-03 15:45:35 -04:00
Christoph Lehner 197612bc7a fast cpu basisRotate and other small cleanups 2020-07-30 07:08:54 -04:00
nmeyer-ur 8726e94ea7 merge upstream develop 2020-07-07 20:26:47 +02:00
nmeyer-ur 1635c263ee disable TOFU by default 2020-06-30 19:27:08 +02:00
nmeyer-ur 465856331a switch back to serialized; wrong results on single too 2020-06-15 15:39:39 +02:00
nmeyer-ur cc958aa9ed switch back to standard MPI_init due to wrong results in Benchmark_wilson using comms-overlap 2020-06-15 14:21:38 +02:00
nmeyer-ur 4fedd8d29f switch to MPI_THREAD_SERIALIZED instead of SINGLE 2020-05-27 14:08:34 +02:00
nmeyer-ur 9a86059761 symmetrize VLA and fixed size build messages 2020-05-20 20:05:42 +02:00
nmeyer-ur b780b7b7a0 guard prevents multiple TOFU messages 2020-05-20 19:20:59 +02:00
nmeyer-ur fc2e9850d3 temporarily enable TOFU by default when using A64FX or A64FXFIXEDSIZE 2020-05-11 13:25:02 +02:00