Modified the Wilson flow adaptive smearing step size update to implement the original Ramos definition of the distance, where previously it used the norm of a difference which scales with the volume and so would choose too coarse or too fine steps depending on the volume. This is based on Chulwoo's code.
Added a test comparing adaptive (with tuneable tolerance) to iterative Wilson flow smearing on a random gauge configuration.
Rework of WilsonFlow class
Fixed logic error in smear method where the step index was initialized to 1 rather than 0, resulting in the logged output value of tau being too large by epsilon
Previously smear_adaptive would maintain the current value of tau as a class member variable whereas smear would compute it separately; now both methods maintain the current value internally and it is updated by the evolve_step routines. Both evolve methods are now const.
smear_adaptive now also maintains the current value of epsilon internally, allowing it to be a const method and also allowing the same class instance to be reused without needing to be reset
Replaced the fixed evaluation of the plaquette energy density and plaquette topological charge during the smearing with a highly flexible general strategy where the user can add arbitrary measurements as functional objects that are evaluated at an arbitrary frequency
By default the same plaquette-based measurements are performed, but additional example functions are provided where the smearing is performed with different choices of measurement that are returned as an array for further processing
Added a method to compute the energy density using the Cloverleaf approach which has smaller discretization errors
Added a new tensor utility operation, copyLane, which allows for the copying of a single SIMD lane between two instances of the same tensor type but potentially different precisions
To LocalCoherenceLanczos, added the option to compute the high/low eval of the fine operator on every restart to aid in tuning the Chebyshev
Added Test_field_array_io which demonstrates and tests a single-file write of an arbitrary array of fields
Added Test_evec_compression which generates evecs using Lanczos and attempts to compress them using the local coherence technique
Added Test_compressed_lanczos_gparity which demonstrates the local coherence Lanczos for G-parity BCs
Added HMC main programs for the 40ID and 48ID G-parity lattices
This compiles and looks right ... but may need some testing
* develop: (762 commits)
Tensor ambiguous fix
Fix for GCC preprocessor/pragma handling bug
Trips up NVCC for reasons I dont understand on summit
Fix GCC complaint
Zero() change
Force a couple of things to compile on NVCC
Remove debug code
nvcc error suppress
Merge develop
Reduction finished and hopefully fixes CI regression fail on single precisoin and force
Double precision variants for summation accuracy
Update todo list
Freeze the seed
Fix compiling of MSource::Gauss for single precision
Think the reduction is now sorted and cleaned up
Fix force term
Printing improvement
GPU reduction fix and also exit backtrace option
GPU friendly
Simplify the comms benchmark
...
# Conflicts:
# Grid/communicator/SharedMemoryMPI.cc
# Grid/qcd/action/fermion/WilsonKernelsAsm.cc
# Grid/qcd/action/fermion/implementation/StaggeredKernelsAsm.h
# Grid/qcd/smearing/StoutSmearing.h
# Hadrons/Modules.hpp
# Hadrons/Utilities/Contractor.cc
# Hadrons/modules.inc
# tests/forces/Test_dwf_force_eofa.cc
# tests/forces/Test_dwf_gpforce_eofa.cc