Go to file
2022-09-08 14:28:19 +01:00
2-racks Initial commit 2022-09-07 17:31:28 +01:00
.gitignore Initial commit 2022-09-07 17:31:28 +01:00
c0-eps-perf.dat Epsilon constraint on power 2022-09-08 14:27:12 +01:00
c0-eps-power.dat Epsilon constraint on power 2022-09-08 14:27:12 +01:00
c0.dat Initial commit 2022-09-07 17:31:28 +01:00
full-analysis.sh Epsilon constraint on power 2022-09-08 14:27:12 +01:00
get-rack12-energy.sh Initial commit 2022-09-07 17:31:28 +01:00
jobs.db Initial commit 2022-09-07 17:31:28 +01:00
LICENSE License file 2022-09-07 17:34:55 +01:00
loc32-eps-perf.dat Epsilon constraint on power 2022-09-08 14:27:12 +01:00
loc32-eps-power.dat Epsilon constraint on power 2022-09-08 14:27:12 +01:00
loc32.dat Initial commit 2022-09-07 17:31:28 +01:00
make-job-db.sh Initial commit 2022-09-07 17:31:28 +01:00
make-perf-epsilon-table.sh Epsilon constraint on power 2022-09-08 14:27:12 +01:00
make-power-epsilon-table.sh Epsilon constraint on power 2022-09-08 14:27:12 +01:00
make-result-table.sh Initial commit 2022-09-07 17:31:28 +01:00
rack-power.db Initial commit 2022-09-07 17:31:28 +01:00
README.md readme update 2022-09-08 14:28:19 +01:00
smi-dmon-8A-C0.db Initial commit 2022-09-07 17:31:28 +01:00
smi-dmon-8A-loc32.db Initial commit 2022-09-07 17:31:28 +01:00
smi-dmon-8B-C0.db Initial commit 2022-09-07 17:31:28 +01:00
smi-dmon-8B-loc32.db Initial commit 2022-09-07 17:31:28 +01:00
smi-dmon-16A-C0.db Initial commit 2022-09-07 17:31:28 +01:00
smi-dmon-16A-loc32.db Initial commit 2022-09-07 17:31:28 +01:00
smi-dmon-16B-C0.db Initial commit 2022-09-07 17:31:28 +01:00
smi-dmon-16B-loc32.db Initial commit 2022-09-07 17:31:28 +01:00

Grid energy-efficiency benchmarks on A100 GPUs

License: CC BY-NC 4.0

Supplemental data for the report "Optimisation of lattice simulations energy efficiency".

Data

At the root of the repository you can find the data associated with the report.

  • jobs.db: is an SQLite databased containing the following data for all benchmark jobs, divided in two tables for both problem sizes.
Field Content
job_id Slurm job id
start Start UNIX epoch
end End UNIX epoch
nodes Number of nodes
clock_limit GPUs clock limit in MHz
slot Job slot (A or B), cf. report
smi_db Associated NVIDIA SMI database
job_dir Job output directory
  • smi-dmon-*.db: NVIDIA SMI power monitoring SQLite database. Each benchmark monitoring is saved under the clock_limit_<c> table, where <c> is the GPU clock limit. Each table has the structure
Field Content
sample Sample index
timestamp Sample time (UTC)
gpu GPU index
power Power draw (W)
temp_gpu GPU temperature
temp_mem GPU memory temperature
activity GPU activity
memory GPU memory activity
clock_mem GPU memory frequency (MHz)
clock_gpu GPU frequency (MHz)
  • rack-power.db: ATOS BullSequana XH2000 power monitoring SQLite database. The data in separated in two tables run_220820 & run_220822, which correspond to the C0 and loc32 problem sizes, respectively. Each table as the following structure
Field Content
sample Sample index
timestamp Sample time (UTC)
rack_1 Rack 1 power draw (W)
rack_2 Rack 2 power draw (W)
rack_3 Rack 3 power draw (W)
rack_4 Rack 4 power draw (W)
  • c0.dat & loc32.dat: tables in text form with processed results. The different columns in the tables are through comments at the beginning of the files.
  • c0-eps.dat & loc32-eps.dat: tables in text form with epsilon-constraint energy-optimal GPU frequencies (cf. report).

Analysis scripts

The results described above can be reproduced with the scripts at the root of the repository. These are shell scripts using standard UNIX tools, with the addition of GNU datamash and SQLite. All scripts have a usage help message when executed without arguments. The complete of results can be reproduced by executing full-analysis.sh which just contains the commands below.

echo '-- make job DBs...'
./make-job-db.sh jobs.db size_C0 2-racks/size-C0
./make-job-db.sh jobs.db size_loc32 2-racks/size-loc32
echo '-- make result tables...'
./make-result-table.sh jobs.db size_C0 2-racks/rack-power.db run_220820 > c0.dat
./make-result-table.sh jobs.db size_loc32 2-racks/rack-power.db run_220822 > loc32.dat
echo '-- make eps-constraint tables...'
./make-perf-epsilon-table.sh c0.dat > c0-eps-perf.dat     
./make-perf-epsilon-table.sh loc32.dat > loc32-eps-perf.dat 
./make-power-epsilon-table.sh c0.dat > c0-eps-power.dat     
./make-power-epsilon-table.sh loc32.dat > loc32-eps-power.dat 

Run data

The 2-racks subdirectory is a complete copy of the run directory from the Tursa supercomputer. It is provided as-is and undocumented (although a number of scripts are commented), many elements are specific to this cluster, and require root access to several parts of the system. It is shared here for transparency, and as an example on how power monitoring can be automatised for such studies.