|
a7a16df9d0
|
GET not put has kinder barrier sequence for NVLINK type access as when
GET is done, I can use it without barrier. Moves a barrier to a nicer
place, overlapped with DtoH DMA
|
2025-02-12 14:59:28 +00:00 |
|
|
382e0abefd
|
Was issueing a double fence -- the gather also fences
|
2025-02-12 14:57:28 +00:00 |
|
|
6fdefe5b90
|
Barrier sequencing if doing "GET" not "PUT" is different.
This is somewhat better timing for Barriers
|
2025-02-12 14:55:20 +00:00 |
|
|
4788dd8e2e
|
More states in packet progression for GPU non aware MPI
|
2025-02-12 14:53:57 +00:00 |
|
|
1cc5f221f3
|
GET not put ordering is better as I know when I've got all MY data
|
2025-02-12 14:53:05 +00:00 |
|
|
93251bfba0
|
GET not put for better ordering in the downstream dependent kernels -- I
know when I'm done, so we can move a barrier / handshake between ranks
intranode to a point off critical path
|
2025-02-12 14:50:21 +00:00 |
|
|
18b79508b8
|
New line better for pretty print
|
2025-02-12 14:49:48 +00:00 |
|
|
4de5ed1613
|
Remove vector view. The std::vector will not inform Memory manager of
deletion and so a stale entry could be left. It is not and should not be
used.
|
2025-02-12 14:48:46 +00:00 |
|
|
0baaddbe98
|
Pipeline mode commit on Aurora. 5+ TF/s on 16^3x32 per tile at 384
nodes.
More concurrency/fine grained scheduling is possible.
|
2025-02-04 19:27:26 +00:00 |
|
|
8729c46169
|
add clover energy density measurement to default WilsonFlow measurements
|
2025-02-03 14:27:55 +00:00 |
|
|
09f81fe7c3
|
don't force energy density measurement to be every wilson flow iteration
|
2025-02-03 14:27:45 +00:00 |
|
|
1876e5b7c0
|
correct tests/smearing/WilsonFlow to use non-adaptive flow and use correct interface
|
2025-02-03 14:27:29 +00:00 |
|
|
355ec76257
|
Merge pull request #18 from UCL-ARC/bugfix/nvtx
Bugfix/nvtx
|
2025-02-03 11:05:42 +00:00 |
|
|
b50fb34e71
|
Perf on Aurora
|
2025-02-01 18:39:34 +00:00 |
|
|
de84d730ff
|
Fastest run config on Aurora to date
|
2025-02-01 18:08:40 +00:00 |
|
|
c74d11e3d7
|
PVdagM MG
|
2025-02-01 11:04:13 -05:00 |
|
|
84cab5e6e7
|
no comms and log cleanup
|
2025-02-01 16:37:21 +01:00 |
|
|
c4fc972fec
|
Merge branch 'feature/deprecate-uvm' into develop
|
2025-01-31 16:32:36 +00:00 |
|
|
8cf809e231
|
Best results on Aurora so far
|
2025-01-31 16:14:45 +00:00 |
|
|
94019a922e
|
Significantly better performance on Aurora without using pipeline mode
|
2025-01-30 16:36:46 +00:00 |
|
|
4f17c8d081
|
Merge branch 'paboyle:develop' into bugfix/nvtx
|
2025-01-29 13:10:12 +00:00 |
|
|
aaab753982
|
Reverting to older version of nvtx for Tursa support
|
2025-01-29 12:57:38 +00:00 |
|
|
d6b2727f86
|
Pipeline mode getting better -- 2 nodes @ 10TF/s per node on Aurora
|
2025-01-29 09:22:21 +00:00 |
|
|
74a4f43946
|
Optional host buffer bounce for no CUDA aware MPI
|
2025-01-28 15:22:46 +00:00 |
|
|
1caf8b0f86
|
Rename
|
2025-01-28 15:22:37 +00:00 |
|
|
570b72a47b
|
Bugfix. Sorry!
|
2025-01-21 15:37:39 -05:00 |
|
|
a5798a89ed
|
Merge branch 'develop' into specflow
|
2025-01-21 12:13:24 -05:00 |
|
|
3f3661a86f
|
Heading towards PVdagM multigrid
|
2025-01-17 14:33:35 +00:00 |
|
|
f7e2f9a401
|
Checking in spectral flow and DWF/Mobius kernel eigenvalue measurement
|
2025-01-16 20:47:33 +00:00 |
|
|
2848a9b558
|
DWF Kernel lanczos working(?)
|
2025-01-16 01:29:56 +00:00 |
|
|
d4868991af
|
Fixed wrong lib for NVTX in configure.ac and updated to nvtx3
|
2025-01-10 14:53:19 +00:00 |
|
|
e99d42404e
|
Removing the regresion test files that were also in this branch for a clean PR
|
2024-12-16 16:31:22 +00:00 |
|
|
3ba019c747
|
Cleaning up and aligning variable naming between action deriv versions
|
2024-12-03 15:23:00 +00:00 |
|
|
47429218bb
|
patched version + modifications to deriv -> staple in qcd/gauge
|
2024-11-27 16:29:22 +00:00 |
|
|
8fe429346f
|
Dslash testing for reproduce
|
2024-11-11 23:11:11 +00:00 |
|
|
5a4f9bf2e3
|
Force the ROCM version
|
2024-10-29 18:12:31 -04:00 |
|
|
b91fc1b6b4
|
Merge branch 'feature/boosted' into feature/deprecate-uvm
Fixed boosted free field test
|
2024-10-28 16:53:09 -04:00 |
|
|
eafc150034
|
Test fft asserts
|
2024-10-23 16:46:26 -04:00 |
|
|
2877f1a268
|
Verbose reduce
|
2024-10-23 15:14:16 -04:00 |
|
|
1e893af775
|
GPU happy
|
2024-10-23 14:52:15 -04:00 |
|
|
d9f430a575
|
Happy GPU
|
2024-10-23 14:51:16 -04:00 |
|
|
63abe87f36
|
Memory manager verbose improvements that were useful to track an error
|
2024-10-23 14:49:13 -04:00 |
|
|
368d649c8a
|
feature/deprecate-uvm happier -- preallocate device resident neigbour table
|
2024-10-23 14:47:55 -04:00 |
|
|
5603464f39
|
Fix in partial fraction import/export physical and
make the GPU happier on the deprecate-uvm -- don't use static vectors, make member of class
|
2024-10-23 14:45:58 -04:00 |
|
|
655c79f39e
|
Suppress warning on partial override
|
2024-10-23 14:44:41 -04:00 |
|
|
565b231c03
|
Nvcc happy
|
2024-10-23 14:44:17 -04:00 |
|
|
62a9f180fa
|
NVCC happy
|
2024-10-23 14:44:04 -04:00 |
|
|
5ae77876a8
|
Meson field and Aslash field on GPU; some compiler warning removed
|
2024-10-18 19:08:06 -04:00 |
|
|
4ed2c2c74f
|
Config command
|
2024-10-18 13:58:33 -04:00 |
|
|
955da582b6
|
Working on NVCC
|
2024-10-18 13:58:03 -04:00 |
|