Peter Boyle
e8e20c01b2
Coarsened vector test
2021-01-14 20:46:21 -05:00
Peter Boyle
a4afc3ea2a
Red black coarse space
2021-01-14 20:44:16 -05:00
fa12b9a329
bugfix
2021-01-13 10:04:17 +00:00
45fc7ded3a
test for sum
2021-01-12 09:10:37 +00:00
74de2d9742
whitespace changes
2021-01-08 18:28:36 +00:00
e759367d42
tested and working
2021-01-08 18:04:50 +00:00
Christopher Kelly
1b84f59273
Added a mixed precision multishift algorithm for which the matrix multiplies are performed in single precision but the search directions are accumulated in double precision.
...
A reliable update step is performed at a tunable frequency to correct the residual. A final mixed-prec single-shift solve is performed on each pole to perform cleanup if necessary.
A test is provided to demonstrate the algorithm.
2021-01-06 12:24:44 -05:00
Christopher Kelly
1fb41a4300
Added copyLane function to Tensor_extract_merge.h which copies one lane of data from an input tensor object to a different lane of an output tensor object of potentially different precision
...
precisionChange lattice function now uses copyLane to remove need for temporary scalar objects, reducing register footprint and significantly improving performance
2021-01-06 11:50:56 -05:00
Christopher Kelly
287bac946f
ConjugateGradientMixedPrec now stores final true residual and uses the precisionChange workspaces for improved efficiency
2021-01-06 09:50:41 -05:00
Christopher Kelly
80c14be65e
Added core test to check precision change
2021-01-06 09:34:44 -05:00
Christopher Kelly
d7a2a4852d
Reimplemented precisionChange to run on GPUs. A workspace containing the mapping table can be optionally precomputed and reused for improved performance.
2021-01-06 09:30:49 -05:00
Christopher Kelly
d185f2eaa7
OneFlavourEvenOddRatioRationalPseudoFermionAction now derives from GeneralEvenOddRatioRationalPseudoFermionAction, simply performs transcription of parameters
2020-12-23 16:26:10 -05:00
Christopher Kelly
813d4cd900
Added test program that ensures the generic checkerboarded RHMC (with parameters set appropriately) gives the same answer as the existing 1f code
2020-12-23 16:01:42 -05:00
Christopher Kelly
75c6c6b173
General RHMC pseudofermion action now allows for different rational approximations to be used in the MD and action evaluation
2020-12-23 11:19:26 -05:00
Christoph Lehner
299d0de066
Merge pull request #21 from paboyle/develop
...
Sync
2020-12-22 20:59:15 +01:00
Christopher Kelly
220ad5e3ee
Added more verbose log output to GeneralEvenOddRatioRationalPseudoFermionAction
...
In GeneralEvenOddRatioRationalPseudoFermionAction, setting the bounds check frequency to 0 now disables the check
2020-12-22 11:08:22 -05:00
Christopher Kelly
ba5dc670a5
Reimplemented GparityWilsonImpl::InsertForce5D to run efficiently on GPUs
...
Swapped order of templated tensor code and c-number specializations in Tensor_outer.h to fix compile issue with type deduction on Summit
2020-12-22 10:10:07 -05:00
Peter Boyle
3fe75bc7cb
Merge pull request #329 from nmeyer-ur/feature/a64fx-3
...
Revised dslash/dwf kernels for A64FX
2020-12-20 08:17:15 -05:00
Nils Meyer
45d49d8648
clean up
2020-12-19 03:35:18 +01:00
Nils Meyer
6013183361
removed Asm impls
2020-12-19 03:25:01 +01:00
Nils Meyer
4b882e8056
fixed lost bracket
2020-12-19 03:09:20 +01:00
Nils Meyer
3f9ae6e7e7
Merge branch 'develop' into feature/a64fx-3
2020-12-19 02:37:11 +01:00
Nils Meyer
909acd55cd
vnum variant for prefetches
2020-12-19 02:00:22 +01:00
Nils Meyer
4dd9e39e0d
up to +36% performance gain for dslash/dwf on QPACE 4 using GCC 10.1.1
2020-12-19 00:54:31 +01:00
Christoph Lehner
b4c1317ab4
Merge pull request #22 from DanielRichtmann/feature/clover-access-specifier
...
Clover access specifier
2020-12-18 16:20:19 +01:00
Christopher Kelly
a0ca362690
Added an RHMC pseudofermion action, GeneralEvenOddRatioRationalPseudoFermionAction, that works for an arbitrary fractional power, not just a square root
...
Added a test evolution for the above, Test_rhmc_EOWilsonRatioPowQuarter, demonstrating conservation of Hamiltonian
Fixed HMC ignoring the MetropolisTest parameter of HMCparameters
2020-12-17 16:21:58 -05:00
Christopher Kelly
249b6e61ec
For G-parity BCs the Nd-1 direction is now assumed to be the time direction and setting a twist in this direction will apply antiperiodic BCs
...
Added option to run Test_gparity with antiperiodic time BCs
2020-12-17 14:09:00 -05:00
f36d6f3923
compiles on GPU. 3pt still wrong!!!!
2020-12-17 17:04:08 +00:00
Peter Boyle
7adb253e25
Merge pull request #328 from mmphys/feature/mrespatch
...
Enable existing conserved current code for CUDA
2020-12-17 11:10:29 -05:00
808f1e0e8c
merge develop
2020-12-15 16:33:29 +00:00
Michael Marshall
873519e960
Enable existing conserved current code for CUDA (compiles OK for CUDA 10.1). Add option to Test_cayley_mres to load a configuration
2020-12-14 16:06:10 +00:00
Peter Boyle
9aec4a3c26
SYCL
2020-12-10 02:11:17 -08:00
Daniel Richtmann
c438118fd7
Change access specifier of clover fields in order to allow deriving classes to access these
2020-12-08 14:42:11 +01:00
Peter Boyle
70510d151b
Merge pull request #327 from paboyle/feature/gparity_twist_GPU
...
Feature/gparity twist gpu
2020-12-07 12:02:20 -05:00
Christopher Kelly
9e7bacb5a4
Merge branch 'develop' into feature/gparity_twist_GPU
2020-12-07 11:55:39 -05:00
Christopher Kelly
2ef1fa66a8
Improved performance of G-parity kernel for GPUs by simplifying multLink implementation
2020-12-07 11:53:35 -05:00
Peter Boyle
cf76741ec6
Intel DPCPP Gold happy now (compiles all, runs Benchmark_dwf_fp32 )
2020-12-03 03:47:11 -08:00
Peter Boyle
497e7c1c40
Duplicate code
2020-12-02 17:55:30 -08:00
Peter Boyle
888eacd3b8
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
2020-11-24 21:46:33 -05:00
Peter Boyle
321f0f51b5
Project to SU(N)
2020-11-24 21:46:10 -05:00
Christoph Lehner
17ec9c5545
Merge pull request #20 from paboyle/develop
...
Sync
2020-11-24 12:20:43 +01:00
Peter Boyle
30ad9578a2
Merge branch 'lehner-feature/gpt' into develop
2020-11-24 06:10:24 -05:00
Peter Boyle
9dce101586
Merge branch 'feature/gpt' of https://github.com/lehner/Grid into lehner-feature/gpt
2020-11-24 06:10:16 -05:00
Peter Boyle
97e264d0ff
Christoph's changes
2020-11-23 15:46:11 +00:00
Peter Boyle
683a5e5bf5
Stencil use host vector for integera table on enable-shared=no and mirror it on device
2020-11-23 15:39:51 +00:00
Peter Boyle
d4861a362c
Stencil use non-UVM memory for look up table on enable-shared=no
2020-11-23 15:38:49 +00:00
Peter Boyle
5ff3eae027
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
2020-11-20 13:14:44 -05:00
Peter Boyle
147dc15d26
Update
2020-11-20 13:13:59 -05:00
Christoph Lehner
c61ea72949
Merge pull request #19 from paboyle/develop
...
Sync
2020-11-20 17:31:13 +01:00
Peter Boyle
86e8b9fe38
ALLOC_ALIGN removed
2020-11-20 17:07:16 +01:00