Added a new, much faster implementation of precision change that uses (optionally) a precomputed workspace containing pointer offsets that is device resident, such that all lattice copying occurs only on the device and no host<->device transfer is required, other than the pointer table. It also avoids the need to unpack and repack the fields using explicit lane copying. When this new precisionChange is called without a workspace, one will be computed on-the-fly; however it is still considerably faster than the original implementation.
In the special case of using double2 and when the Grids are the same, calls to the new precisionChange will automatically use precisionChangeFast, such that there is a single API call for all precision changes.
Reliable update and mixed-prec multishift have been modified to precompute precision change workspaces
Renamed the original precisionChange as precisionChangeOrig
Fixed incorrect pointer offset bug in copyLane
Added a test and a benchmark for precisionChange
Added a test for reliable update CG
To the gauge implementations, added CshiftLink functions calling into the appropriate operation for the BC in a given direction.
GaugeTransform, FourierAcceleratedGaugeFixer and WilsonLoops::FieldStrength no longer implicitly assume periodic boundary conditions; instead the shifted link is obtained using CshiftLink and is aware of the gauge implementation.
Added an assert-check to ensure that the gauge fixing converges within the specified number of steps.
Added functionality to compute the timeslice averaged plaquette
Added functionality to compute the 5LI topological charge and timeslice topological charge
Added a check of the properties of the charge conjugation matrix C=-gamma_2 gamma_4 to Test_gamma
Fixed const correctness for Replicate
Modified Test_fft_gfix to support either conjugate or periodic BCs, optionally disabling Fourier-accelerated gauge fixing, and tuning of alpha using cmdline options
Added storage of final true residual in mixed-prec CG and enhanced log output
Fixed const correctness of multi-shift constructor
Added a mixed precision variant of the multi-shift algorithm that uses a single precision operator and applies periodic reliable update to the residual
Added tests/solver/Test_dwf_multishift_mixedprec to test the above
Fixed local coherence lanczos using the (large!) max approx to the chebyshev eval as the scale from which to judge the quality of convergence, resulting a test that always passes
Added a method to local coherence lanczos class that returns the fine eval/evec pair
Added iterative log output to power method
Added optional disabling of the plaquette check in Nerscio to support loading old G-parity configs which have a factor of 2 error in the plaquette
G-parity Dirac op no longer allows GPBC in the time direction; instead we toggle between periodic and antiperiodic
Replaced thread_for G-parity 5D force insertion implementation with accelerator_for version capable of running on GPUs
Generalized tests/lanczos/Test_dwf_lanczos to support regular DWF as well as Gparity, with the action chosen by a command line option
Modified tests/forces/Test_dwf_gpforce,Test_gpdwf_force,Test_gpwilson_force to use GPBC a spatial direction rather than the t-direction, and antiperiodic BCs for time direction
tests/core/Test_gparity now supports using APBC in time direction using command line toggle