lattice-benchmarks/Quda
2023-06-21 14:45:06 +01:00
..
.clang-format first draft of Quda Benchmark 2023-03-31 18:03:39 +01:00
Benchmark_Quda.cpp add timestamps to benchmarks 2023-06-21 14:45:06 +01:00
build-benchmark.sh clean up build script a bit 2023-06-09 18:09:31 +01:00
build-quda.sh clean up build script a bit 2023-06-09 18:09:31 +01:00
env.sh add DWF benchmark 2023-06-05 17:07:07 +01:00
Readme.md Update 'Quda/Readme.md' 2023-06-09 18:20:50 +01:00

QUDA benchmarks

This folder contains benchmarks for the QUDA library.

  • Benchmark_Quda: This benchmark measure floating point performances of fermion matrices (Wilson and DWF), as well as memory bandwidth (using a simple axpy operation). Measurements are performed for a fixed range of problem sizes.

Building

After setting up your compilation environment (Tursa: source /home/dp207/dp207/shared/env/production/env-{base,gpu}.sh):

./build-quda.sh <env_dir>          # build Quda
./build-benchmark.sh <env_dir>     # build benchmark

where <env_dir> is an arbitrary directory where every product will be stored.

Running the Benchmark

The benchmark should be run as

mpirun -np <ranks> <env_dir>/prefix/qudabench/Benchmark_Quda

where <ranks> is the total number of GPU's to use. On Tursa this is 4 times the number of nodes.

Note:

  • on Tursa, the wrapper.sh script that is typically used with Grid is not necessary.
  • due to Qudas automatic tuning, the benchmark might take significantly longer to run than Benchmark_Grid (even though it does fewer things).
    • setting QUDA_ENABLE_TUNING=0 disables all tuning (degrades performance severely). By default, it is turned on.
    • setting QUDA_RESOURCE_PATH=<some folder> enables Quda to save and reuse optimal tuning parameters, making repeated runs much faster