lattice-benchmarks/Quda/Readme.md

# QUDA benchmarks

This folder contains benchmarks for the [QUDA](https://github.com/lattice/quda) library.

- `Benchmark_Quda`: This benchmark measure floating point performances of fermion
matrices (Wilson and DWF), as well as memory bandwidth (using a simple `axpy` operation). Measurements are
performed for a fixed range of problem sizes.

## Building
After setting up your compilation environment (Tursa: `source /home/dp207/dp207/shared/env/production/env-{base,gpu}.sh`):
```bash
./build-quda.sh <env_dir>          # build Quda
./build-benchmark.sh <env_dir>     # build benchmark
```
where `<env_dir>` is an arbitrary directory where every product will be stored.

## Running the Benchmark

The benchmark should be run as
```bash
mpirun -np <ranks> <env_dir>/prefix/qudabench/Benchmark_Quda
```
where `<ranks>` is the total number of GPU's to use. On Tursa this is 4 times the number of nodes.

Note:
- on Tursa, the `wrapper.sh` script that is typically used with Grid is not necessary.
- due to Qudas automatic tuning, the benchmark might take significantly longer to run than `Benchmark_Grid` (even though it does fewer things).
  - setting `QUDA_ENABLE_TUNING=0` disables all tuning (degrades performance severely). By default, it is turned on.
  - setting `QUDA_RESOURCE_PATH=<some folder>` enables Quda to save and reuse optimal tuning parameters, making repeated runs much faster