1
0
mirror of https://github.com/paboyle/Grid.git synced 2025-06-13 20:57:06 +01:00
Files
Grid/systems/Jupiter/benchmarks/dwf.4node.perf
Peter Boyle 9203126aa5 Scripts
2025-06-11 15:30:16 +02:00

286 lines
16 KiB
Plaintext
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

RANK 2 using NUMA 2 GPU 2 NIC mlx5_2:1
RANK 3 using NUMA 3 GPU 3 NIC mlx5_3:1
RANK 0 using NUMA 0 GPU 0 NIC mlx5_0:1
RANK 1 using NUMA 1 GPU 1 NIC mlx5_1:1
RANK 0 using NUMA 0 GPU 0 NIC mlx5_0:1
RANK 2 using NUMA 2 GPU 2 NIC mlx5_2:1
RANK 1 using NUMA 1 GPU 1 NIC mlx5_1:1
RANK 3 using NUMA 3 GPU 3 NIC mlx5_3:1
RANK 3 using NUMA 3 GPU 3 NIC mlx5_3:1
RANK 0 using NUMA 0 GPU 0 NIC mlx5_0:1
RANK 1 using NUMA 1 GPU 1 NIC mlx5_1:1
RANK 2 using NUMA 2 GPU 2 NIC mlx5_2:1
RANK 1 using NUMA 1 GPU 1 NIC mlx5_1:1
RANK 3 using NUMA 3 GPU 3 NIC mlx5_3:1
RANK 0 using NUMA 0 GPU 0 NIC mlx5_0:1
RANK 2 using NUMA 2 GPU 2 NIC mlx5_2:1
SLURM detected
AcceleratorCudaInit[0]: ========================
AcceleratorCudaInit[0]: Device Number : 0
AcceleratorCudaInit[0]: ========================
AcceleratorCudaInit[0]: Device identifier: NVIDIA GH200 120GB
AcceleratorCudaInit[0]: totalGlobalMem: 102005473280
AcceleratorCudaInit[0]: managedMemory: 1
AcceleratorCudaInit[0]: isMultiGpuBoard: 0
AcceleratorCudaInit[0]: warpSize: 32
AcceleratorCudaInit[0]: pciBusID: 1
AcceleratorCudaInit[0]: pciDeviceID: 0
AcceleratorCudaInit[0]: maxGridSize (2147483647,65535,65535)
AcceleratorCudaInit: using default device
AcceleratorCudaInit: assume user either uses
AcceleratorCudaInit: a) IBM jsrun, or
AcceleratorCudaInit: b) invokes through a wrapping script to set CUDA_VISIBLE_DEVICES, UCX_NET_DEVICES, and numa binding
AcceleratorCudaInit: Configure options --enable-setdevice=no
local rank 0 device 0 bus id: 0009:01:00.0
AcceleratorCudaInit: ================================================
SharedMemoryMpi: World communicator of size 16
SharedMemoryMpi: Node communicator of size 4
0SharedMemoryMpi: SharedMemoryMPI.cc acceleratorAllocDevice 2147483648bytes at 0x4002a0000000 - 40031fffffff for comms buffers
Setting up IPC
__|__|__|__|__|__|__|__|__|__|__|__|__|__|__
__|__|__|__|__|__|__|__|__|__|__|__|__|__|__
__|_ | | | | | | | | | | | | _|__
__|_ _|__
__|_ GGGG RRRR III DDDD _|__
__|_ G R R I D D _|__
__|_ G R R I D D _|__
__|_ G GG RRRR I D D _|__
__|_ G G R R I D D _|__
__|_ GGGG R R III DDDD _|__
__|_ _|__
__|__|__|__|__|__|__|__|__|__|__|__|__|__|__
__|__|__|__|__|__|__|__|__|__|__|__|__|__|__
| | | | | | | | | | | | | |
Copyright (C) 2015 Peter Boyle, Azusa Yamaguchi, Guido Cossu, Antonin Portelli and other authors
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
Current Grid git commit hash=3737a24096282ea179607fc879814710860a0de6: (HEAD -> develop, origin/develop, origin/HEAD) clean
Grid : Message : ================================================
Grid : Message : MPI is initialised and logging filters activated
Grid : Message : ================================================
Grid : Message : This rank is running on host jpbo-012-11.jupiter.internal
Grid : Message : Requested 2147483648 byte stencil comms buffers
Grid : Message : MemoryManager Cache 81604378624 bytes
Grid : Message : MemoryManager::Init() setting up
Grid : Message : MemoryManager::Init() cache pool for recent host allocations: SMALL 8 LARGE 2 HUGE 0
Grid : Message : MemoryManager::Init() cache pool for recent device allocations: SMALL 16 LARGE 8 Huge 0
Grid : Message : MemoryManager::Init() cache pool for recent shared allocations: SMALL 16 LARGE 8 Huge 0
Grid : Message : MemoryManager::Init() Non unified: Caching accelerator data in dedicated memory
Grid : Message : MemoryManager::Init() Using cudaMalloc
Grid : Message : 0.834000 s : ++++++++++++++++++++++++++++++++++++++++++++++++
Grid : Message : 0.838000 s : Testing with full communication
Grid : Message : 0.839000 s : ++++++++++++++++++++++++++++++++++++++++++++++++
Grid : Message : 0.840000 s : Grid Layout
Grid : Message : 0.840000 s : Global lattice size : 64 64 64 64
Grid : Message : 0.846000 s : OpenMP threads : 4
Grid : Message : 0.846000 s : MPI tasks : 2 2 2 2
Grid : Message : 0.165970 s : Initialising 4d RNG
Grid : Message : 0.787270 s : Intialising parallel RNG with unique string 'The 4D RNG'
Grid : Message : 0.787340 s : Seed SHA256: 49db4542db694e3b1a74bf2592a8c1b83bfebbe18401693c2609a4c3af1
Grid : Message : 0.960410 s : Initialising 5d RNG
Grid : Message : 1.142344 s : Intialising parallel RNG with unique string 'The 5D RNG'
Grid : Message : 1.142352 s : Seed SHA256: b6316f2fac44ce14111f93e0296389330b077bfd0a7b359f781c58589f8a
local rank 2 device 0 bus id: 0029:01:00.0
local rank 3 device 0 bus id: 0039:01:00.0
local rank 1 device 0 bus id: 0019:01:00.0
Grid : Message : 44.657270 s : Drawing gauge field
Grid : Message : 55.247733 s : Random gauge initialised
Grid : Message : 55.247745 s : Applying BCs for Dirichlet Block5 [0 0 0 0 0]
Grid : Message : 55.247747 s : Applying BCs for Dirichlet Block4 [0 0 0 0]
Grid : Message : 55.253053 s : Setting up Cshift based reference
Grid : Message : 62.191747 s : *****************************************************************
Grid : Message : 62.191767 s : * Kernel options --dslash-generic, --dslash-unroll, --dslash-asm
Grid : Message : 62.191768 s : *****************************************************************
Grid : Message : 62.191769 s : *****************************************************************
Grid : Message : 62.191769 s : * Benchmarking DomainWallFermionR::Dhop
Grid : Message : 62.191769 s : * Vectorising space-time by 8
Grid : Message : 62.191770 s : * VComplex size is 64 B
Grid : Message : 62.191771 s : * Using Overlapped Comms/Compute
Grid : Message : 62.191771 s : * Using GENERIC Nc WilsonKernels
Grid : Message : 62.191772 s : *****************************************************************
Grid : Message : 62.857568 s : Called warmup
Grid : Message : 65.581790 s : Called Dw 300 times in 2200540 us
Grid : Message : 65.582120 s : mflop/s = 48306525
Grid : Message : 65.582140 s : mflop/s per rank = 3019157.81
Grid : Message : 65.582150 s : mflop/s per node = 12076631.3
Grid : Message : 65.637550 s : norm diff 5.80156793e-14 Line 306
Grid : Message : 75.122153 s : ----------------------------------------------------------------
Grid : Message : 75.122166 s : Compare to naive wilson implementation Dag to verify correctness
Grid : Message : 75.122167 s : ----------------------------------------------------------------
Grid : Message : 75.122167 s : Called DwDag
Grid : Message : 75.122167 s : norm dag result 4.12801829
Grid : Message : 75.123295 s : norm dag ref 4.12801829
Grid : Message : 75.125890 s : norm dag diff 3.42093991e-14 Line 377
Grid : Message : 75.188462 s : Calling Deo and Doe and //assert Deo+Doe == Dunprec
Grid : Message : 75.605683 s : src_e0.500004005
Grid : Message : 75.617824 s : src_o0.499996067
Grid : Message : 75.620089 s : *********************************************************
Grid : Message : 75.620091 s : * Benchmarking DomainWallFermion::DhopEO
Grid : Message : 75.620093 s : * Vectorising space-time by 8
Grid : Message : 75.620094 s : * Using Overlapped Comms/Compute
Grid : Message : 75.620095 s : * Using GENERIC Nc WilsonKernels
Grid : Message : 75.620096 s : *********************************************************
Grid : Message : 76.732272 s : Deo mflop/s = 48068252.4
Grid : Message : 76.732283 s : Deo mflop/s per rank 3004265.77
Grid : Message : 76.732285 s : Deo mflop/s per node 12017063.1
Grid : Message : 76.749317 s : r_e2.06443136
Grid : Message : 76.749652 s : r_o2.06378451
Grid : Message : 76.749955 s : res4.12821587
Grid : Message : 77.198827 s : norm diff 0
Grid : Message : 77.981760 s : norm diff even 0
Grid : Message : 78.455900 s : norm diff odd 0
Grid : Message : 78.539333 s : ++++++++++++++++++++++++++++++++++++++++++++++++
Grid : Message : 78.539337 s : Testing without internode communication
Grid : Message : 78.539338 s : ++++++++++++++++++++++++++++++++++++++++++++++++
Grid : Message : 78.539339 s : Grid Layout
Grid : Message : 78.539339 s : Global lattice size : 64 64 64 64
Grid : Message : 78.539347 s : OpenMP threads : 4
Grid : Message : 78.539348 s : MPI tasks : 2 2 2 2
Grid : Message : 78.798501 s : Initialising 4d RNG
Grid : Message : 78.862916 s : Intialising parallel RNG with unique string 'The 4D RNG'
Grid : Message : 78.862925 s : Seed SHA256: 49db4542db694e3b1a74bf2592a8c1b83bfebbe18401693c2609a4c3af1
Grid : Message : 78.879916 s : Initialising 5d RNG
Grid : Message : 79.941271 s : Intialising parallel RNG with unique string 'The 5D RNG'
Grid : Message : 79.941280 s : Seed SHA256: b6316f2fac44ce14111f93e0296389330b077bfd0a7b359f781c58589f8a
Grid : Message : 124.586264 s : Drawing gauge field
Grid : Message : 135.338090 s : Random gauge initialised
Grid : Message : 135.338102 s : Applying BCs for Dirichlet Block5 [0 0 0 0 0]
Grid : Message : 135.338103 s : Applying BCs for Dirichlet Block4 [0 0 0 0]
Grid : Message : 135.341266 s : Setting up Cshift based reference
Grid : Message : 142.604280 s : *****************************************************************
Grid : Message : 142.604450 s : * Kernel options --dslash-generic, --dslash-unroll, --dslash-asm
Grid : Message : 142.604460 s : *****************************************************************
Grid : Message : 142.604470 s : *****************************************************************
Grid : Message : 142.604480 s : * Benchmarking DomainWallFermionR::Dhop
Grid : Message : 142.604480 s : * Vectorising space-time by 8
Grid : Message : 142.604500 s : * VComplex size is 64 B
Grid : Message : 142.604510 s : * Using Overlapped Comms/Compute
Grid : Message : 142.604510 s : * Using GENERIC Nc WilsonKernels
Grid : Message : 142.604520 s : *****************************************************************
Grid : Message : 142.686034 s : Called warmup
Grid : Message : 144.868543 s : Called Dw 300 times in 2182483 us
Grid : Message : 144.868559 s : mflop/s = 48706194.1
Grid : Message : 144.868561 s : mflop/s per rank = 3044137.13
Grid : Message : 144.868562 s : mflop/s per node = 12176548.5
Grid : Message : 144.887595 s : norm diff 5.80156793e-14 Line 306
Grid : Message : 153.622978 s : ----------------------------------------------------------------
Grid : Message : 153.622994 s : Compare to naive wilson implementation Dag to verify correctness
Grid : Message : 153.622995 s : ----------------------------------------------------------------
Grid : Message : 153.622995 s : Called DwDag
Grid : Message : 153.622996 s : norm dag result 4.12801829
Grid : Message : 153.623604 s : norm dag ref 4.12801829
Grid : Message : 153.626098 s : norm dag diff 3.42093991e-14 Line 377
Grid : Message : 153.691426 s : Calling Deo and Doe and //assert Deo+Doe == Dunprec
Grid : Message : 154.148319 s : src_e0.500004005
Grid : Message : 154.151454 s : src_o0.499996067
Grid : Message : 154.153722 s : *********************************************************
Grid : Message : 154.153724 s : * Benchmarking DomainWallFermion::DhopEO
Grid : Message : 154.153725 s : * Vectorising space-time by 8
Grid : Message : 154.153726 s : * Using Overlapped Comms/Compute
Grid : Message : 154.153727 s : * Using GENERIC Nc WilsonKernels
Grid : Message : 154.153728 s : *********************************************************
Grid : Message : 155.200671 s : Deo mflop/s = 51121022.4
Grid : Message : 155.200682 s : Deo mflop/s per rank 3195063.9
Grid : Message : 155.200684 s : Deo mflop/s per node 12780255.6
Grid : Message : 155.217204 s : r_e2.06443136
Grid : Message : 155.217550 s : r_o2.06378451
Grid : Message : 155.217869 s : res4.12821587
Grid : Message : 155.673744 s : norm diff 0
Grid : Message : 156.463329 s : norm diff even 0
Grid : Message : 156.878866 s : norm diff odd 0
Grid : Message : 157.620761 s : ++++++++++++++++++++++++++++++++++++++++++++++++
Grid : Message : 157.620764 s : Testing without intranode communication
Grid : Message : 157.620765 s : ++++++++++++++++++++++++++++++++++++++++++++++++
Grid : Message : 157.620766 s : Grid Layout
Grid : Message : 157.620766 s : Global lattice size : 64 64 64 64
Grid : Message : 157.620773 s : OpenMP threads : 4
Grid : Message : 157.620774 s : MPI tasks : 2 2 2 2
Grid : Message : 157.671479 s : Initialising 4d RNG
Grid : Message : 157.738691 s : Intialising parallel RNG with unique string 'The 4D RNG'
Grid : Message : 157.738698 s : Seed SHA256: 49db4542db694e3b1a74bf2592a8c1b83bfebbe18401693c2609a4c3af1
Grid : Message : 157.755651 s : Initialising 5d RNG
Grid : Message : 158.848676 s : Intialising parallel RNG with unique string 'The 5D RNG'
Grid : Message : 158.848685 s : Seed SHA256: b6316f2fac44ce14111f93e0296389330b077bfd0a7b359f781c58589f8a
Grid : Message : 202.465158 s : Drawing gauge field
Grid : Message : 213.214546 s : Random gauge initialised
Grid : Message : 213.214561 s : Applying BCs for Dirichlet Block5 [0 0 0 0 0]
Grid : Message : 213.214563 s : Applying BCs for Dirichlet Block4 [0 0 0 0]
Grid : Message : 213.217711 s : Setting up Cshift based reference
Grid : Message : 219.662772 s : *****************************************************************
Grid : Message : 219.662786 s : * Kernel options --dslash-generic, --dslash-unroll, --dslash-asm
Grid : Message : 219.662787 s : *****************************************************************
Grid : Message : 219.662788 s : *****************************************************************
Grid : Message : 219.662788 s : * Benchmarking DomainWallFermionR::Dhop
Grid : Message : 219.662789 s : * Vectorising space-time by 8
Grid : Message : 219.662790 s : * VComplex size is 64 B
Grid : Message : 219.662791 s : * Using Overlapped Comms/Compute
Grid : Message : 219.662791 s : * Using GENERIC Nc WilsonKernels
Grid : Message : 219.662791 s : *****************************************************************
Grid : Message : 220.425592 s : Called warmup
Grid : Message : 222.536249 s : Called Dw 300 times in 2110597 us
Grid : Message : 222.536267 s : mflop/s = 50365105.5
Grid : Message : 222.536269 s : mflop/s per rank = 3147819.09
Grid : Message : 222.536270 s : mflop/s per node = 12591276.4
Grid : Message : 222.541053 s : norm diff 5.80156793e-14 Line 306
Grid : Message : 232.135901 s : ----------------------------------------------------------------
Grid : Message : 232.135915 s : Compare to naive wilson implementation Dag to verify correctness
Grid : Message : 232.135916 s : ----------------------------------------------------------------
Grid : Message : 232.135917 s : Called DwDag
Grid : Message : 232.135918 s : norm dag result 4.12801829
Grid : Message : 232.151938 s : norm dag ref 4.12801829
Grid : Message : 232.154451 s : norm dag diff 3.42093991e-14 Line 377
Grid : Message : 232.216117 s : Calling Deo and Doe and //assert Deo+Doe == Dunprec
Grid : Message : 232.630529 s : src_e0.500004005
Grid : Message : 232.643197 s : src_o0.499996067
Grid : Message : 232.645527 s : *********************************************************
Grid : Message : 232.645529 s : * Benchmarking DomainWallFermion::DhopEO
Grid : Message : 232.645532 s : * Vectorising space-time by 8
Grid : Message : 232.645533 s : * Using Overlapped Comms/Compute
Grid : Message : 232.645534 s : * Using GENERIC Nc WilsonKernels
Grid : Message : 232.645535 s : *********************************************************
Grid : Message : 233.774184 s : Deo mflop/s = 47432091.9
Grid : Message : 233.774194 s : Deo mflop/s per rank 2964505.74
Grid : Message : 233.774196 s : Deo mflop/s per node 11858023
Grid : Message : 233.791552 s : r_e2.06443136
Grid : Message : 233.791899 s : r_o2.06378451
Grid : Message : 233.792204 s : res4.12821587
Grid : Message : 234.230783 s : norm diff 0
Grid : Message : 235.162780 s : norm diff even 0
Grid : Message : 235.291950 s : norm diff odd 0
Grid : Message : 235.765411 s : *******************************************
Grid : Message : 235.765424 s : ******* Grid Finalize ******
Grid : Message : 235.765425 s : *******************************************