3e947527cb
Move looping over "s" and "site" into kernels for GPU optimisatoin
2018-06-27 21:29:43 +01:00
b710fec6ea
Gpu code first version of specialised kernel
2018-06-13 20:34:39 +01:00
eb7d34a4cc
GPU version
2018-05-14 19:41:47 -04:00
b15db11c60
Kernels -> pure static object to enable device execution
2018-03-24 19:35:20 -04:00
4e1272fabf
Kernels need to be static to work on GPU. No reference to host resident data
2018-03-22 18:44:53 -04:00
8a1d303ab9
GPU friendly stencil improvements
2018-03-19 07:11:03 -04:00
3277bda130
View introduction to prepare for accelerator offload.
...
Probably same problem exists for stencil object
2018-03-04 16:38:08 +00:00
dcf6517a93
Accelerator offload and copy Opt into the kernel for GPU host var safety
2018-02-02 11:35:35 +00:00
e4df025d01
Accelerator related
2018-02-01 23:20:05 +00:00
87ee592176
Pragma changes and layout and warning elimination for nvcc
2018-01-24 13:14:09 +00:00
a97ad1a51d
Namespce
2018-01-14 23:01:01 +00:00
1bd311ba9c
Faster sequential conserved current implementation, now compatible with 5D vectorisation & G-parity.
2017-06-16 16:43:15 +01:00
41af8c12d7
Code cleaning for conserved current contractions. Will now be easier to implement mobius conserved current.
2017-06-16 16:38:59 +01:00
5633a2db20
Faster implementation of conserved current site contraction. Added 5D vectorised support, but not G-parity.
2017-06-12 10:41:02 +01:00
ca1077c560
Merge branch 'develop' of https://github.com/paboyle/Grid into feature/rare_kaon
...
# Conflicts:
# lib/qcd/action/fermion/WilsonFermion5D.cc
# tests/hadrons/Test_hadrons_rarekaon.cc
2017-05-04 16:22:33 +01:00
2ce898efa3
Pretty code
2017-04-26 02:34:25 -04:00
44260643f6
First conserved current implementation for Wilson fermions only. Not implemented for Gparity or 5D-vectorised Wilson fermions.
2017-04-25 18:00:24 +01:00
abba44a837
Hand unrolled for overlapped comms
2017-04-22 17:45:17 +01:00
1d1b225497
Hand unrolled Nc=3 kernels support split phase compute (on-node, off-node).
2017-04-22 09:05:28 -04:00
736bf3c866
Major rework of stencil. Half precision and MPI3 now working.
2017-04-22 11:33:50 +01:00
2c246551d0
Overlap comms and compute options in wilson kernels
2017-02-07 01:37:10 -05:00
caba0d42a5
L1p controls
2016-12-22 17:52:55 +00:00
b7d55f7dfb
Fix a typo in reorg of the --dslash-asm
2016-11-04 11:35:08 +00:00
bb94ddd0eb
Tidy up of mpi3; also some cleaning of the dslash controls.
2016-11-02 08:07:09 +00:00
c190221fd3
Internal SHM comms in non-simd directions working
...
Need to fix simd directions
2016-10-22 18:14:27 +01:00
b58adc6a4b
commVector
2016-10-20 17:00:15 +01:00
c78bbd0f8c
Fix ASM compilation
2016-10-04 15:37:32 +01:00
b9c80318a2
Merge branch 'develop' into feature/hirep
2016-09-13 10:01:51 +01:00
f76f281e58
Cleaning files after fix
2016-09-09 11:34:25 +01:00
aa20cc8b52
Fixing compilation error with AVX512 flag
2016-09-09 02:58:52 -07:00
0fd179fb33
Merge branch 'develop' into feature/hirep
2016-09-01 12:59:53 +01:00
90e70790f3
Feature for z-Mobius prep
2016-08-15 22:31:29 +01:00
089f0ab582
Debugged HMC for Creutz relation
2016-07-28 16:44:41 +01:00
b93e18ed50
Modified the Dirac Kernel class to compile with different number of colours
...
Added the general push_back functionality to accomodate for all defined representations
Compiles, not tested
2016-07-18 16:36:28 +01:00
6d58cb2a68
Enable reordering of the loops in the assembler for cache friendly.
...
This gets in the way of L2 prefetching however. Do next next link in stencil
prefetching.
2016-06-30 14:35:01 -07:00
55f65b81b5
Improvements to the assembler interface that let us move chunks of the
...
site and s loop into the kernels. This will save on function call overhead and
guarantee L2 prefetching strategy is right since OMP can't distribute the
sub-chunks of work.
2016-06-09 01:12:36 -07:00
53d06046b0
Compiling updates for KNL
2016-06-03 03:47:54 -07:00
139cc5f1ae
Large change with KNL preparation
2016-06-03 03:24:26 -07:00
165bffc2e7
Avx512 changes for assembler kernels
2016-03-26 22:25:45 -06:00
fc6ad65751
Pushed the overlap comms tweaks
2016-01-11 06:34:22 -08:00
331768dcff
Added overlap comms compute mode
2016-01-03 01:38:11 +00:00
aae8bf31a7
Global edit adding copyright and license info to every source file.
2016-01-02 14:51:32 +00:00
34a0fde2ad
Fixes to fermion force terms after sign of gamma_mu (0...3) change.
...
Thought I had already committed these.
Believe I have got the Gparity fermion force working.
* tests/Test_gpdwf_force.cc -- correctly predicts dS for two flavour pseudofermion
based on a small dt update of U field.
* tests/Test_hmc_EODWFRatio_Gparity.cc -- ran 1 trajectory on 8^4 with dH=0.21.
Need to accumulate a full plaquette log to believe fully which will take some hours of run time.
2015-12-15 23:14:12 +00:00
3ce10aa975
Fix a regression failure on Mobius; chroma regression added
2015-12-10 22:55:00 +00:00
05a7029600
Stencil change
2015-11-07 00:06:31 -08:00
899ca41cb8
Merge branch 'master' of github.com:paboyle/Grid
...
Conflicts:
lib/qcd/action/fermion/WilsonFermion5D.cc
2015-11-06 03:50:04 -08:00
17af18dcab
Changes for AVX512 assembler
2015-11-06 03:45:51 -08:00
28022755ae
Stencil class name global change to StencilImpl typedef
2015-11-06 05:30:17 -06:00
2f38ebc446
Reintroducing the hand unrolled loops
2015-09-08 17:45:30 +01:00
55cfc89459
* Finished the template/policy style introduction of gparity, except the gparity force terms.
...
So valence sector looks ok.
FermionOperatorImpl.h provides the policy classes.
Expect HMC will introduce a smearing policy and a fermion representation change policy template
param. Will also probably need multi-precision work.
* HMC is running even-odd and non-checkerboarded (checked 4^4 wilson fermion/wilson gauge).
There appears to be a bug in the multi-level integrator -- <e-dH> passes with single level but
not with multi-level.
In any case there looks to be quite a bit to clean up.
This is the "const det" style implementation that is not appropriate yet for clover since
it assumes that Mee is indept of the gauge fields. Easily fixed in future.
2015-08-15 23:25:49 +01:00