James Harrison
|
7f0fc0eff5
|
Remove explicit use of double-precision types in photon.h
|
2016-11-01 16:02:35 +00:00 |
|
paboyle
|
791cb050c8
|
Comms improvements
|
2016-11-01 11:35:43 +00:00 |
|
|
d5e95bc350
|
Merge branch 'release/v0.6.0' into feature/feynman-rules
|
2016-10-31 18:36:21 +00:00 |
|
|
7a84906b5f
|
Merge branch 'release/v0.6.0' into feature/fft-opt
|
2016-10-31 18:31:49 +00:00 |
|
|
66d832c733
|
FFTW header fix
|
2016-10-31 16:39:29 +00:00 |
|
|
e74417ca12
|
big build system polish
|
2016-10-31 16:31:27 +00:00 |
|
Guido Cossu
|
e8c3174ae2
|
Small change in the defines
|
2016-10-30 12:23:11 +00:00 |
|
Guido Cossu
|
9b066e94d0
|
Compilation with both single and double precision
|
2016-10-30 12:04:06 +00:00 |
|
James Harrison
|
618abdf302
|
Add missing volume factor in stochastic QED field
|
2016-10-29 11:04:02 +01:00 |
|
Guido Cossu
|
e1042aef77
|
First version of the doube prec for testing purposes
It does not compile single and double version at the same time
|
2016-10-28 17:20:04 +01:00 |
|
paboyle
|
aa6a839c60
|
avx512 build fix; detect clang/gcc intrinsics vs. ICPC
|
2016-10-28 09:13:09 +01:00 |
|
|
b4d2af8c89
|
threaded FFT
|
2016-10-26 19:46:36 +01:00 |
|
|
434af6aeaa
|
Merge branch 'develop' into feature/fft-opt
|
2016-10-26 18:50:38 +01:00 |
|
|
e90f8ac841
|
Merge branch 'develop' into feature/feynman-rules
|
2016-10-26 18:50:21 +01:00 |
|
|
a1705a8d53
|
debug message removed
|
2016-10-26 18:50:07 +01:00 |
|
|
ca21003f01
|
Merge branch 'feature/fft-opt' into feature/feynman-rules
# Conflicts:
# lib/FFT.h
# lib/qcd/action/fermion/WilsonFermion5D.h
# tests/core/Test_fft.cc
|
2016-10-26 18:44:47 +01:00 |
|
|
14ddf2c234
|
more FFT optimisations
|
2016-10-26 17:36:26 +01:00 |
|
Azusa Yamaguchi
|
bca861e112
|
Note:FFT shoud be GridFFT (Not change yet).
Gauge fix with FFt is added (tests/core)
|
2016-10-25 14:21:48 +01:00 |
|
|
33d199a0ad
|
temporary thread safety in FFT
|
2016-10-25 12:56:40 +01:00 |
|
paboyle
|
b820076b91
|
Merge branch 'develop' into feature/mpi3
|
2016-10-25 06:02:33 +01:00 |
|
paboyle
|
09f66100d3
|
MPI 3 compile on non-linux
|
2016-10-25 06:01:12 +01:00 |
|
azusayamaguchi
|
d7d92af09d
|
Travis fail fix attempt
|
2016-10-25 01:45:53 +01:00 |
|
azusayamaguchi
|
460d0753a1
|
Merge branch 'develop' into feature/mpi3
Conflicts:
lib/simd/Grid_avx512.h
|
2016-10-25 01:08:51 +01:00 |
|
azusayamaguchi
|
8f8058f8a5
|
More random bits on parallel seeding
|
2016-10-25 01:05:52 +01:00 |
|
azusayamaguchi
|
d97a27f483
|
Verbose
|
2016-10-25 01:05:31 +01:00 |
|
azusayamaguchi
|
7c3363b91e
|
Compiles all comms targets
|
2016-10-25 00:04:17 +01:00 |
|
azusayamaguchi
|
b94478fa51
|
mpi, mpi3, shmem all compile.
mpi, mpi3 pass single node multi-rank
|
2016-10-24 23:45:31 +01:00 |
|
|
13bf0482e3
|
FFT optimisation
|
2016-10-24 19:25:40 +01:00 |
|
|
a795b5705e
|
memory optimisation
|
2016-10-24 19:25:15 +01:00 |
|
|
392e064513
|
fast local peek-poke
|
2016-10-24 19:24:21 +01:00 |
|
azusayamaguchi
|
b6a65059a2
|
Update to use shared memory to contain the stencil comms buffers
Tested on 2.1.1.1 1.2.1.1 4.1.1.1 1.4.1.1 2.2.1.1 subnode decompositions
|
2016-10-24 17:30:43 +01:00 |
|
azusayamaguchi
|
ea25a4d9ac
|
Works
|
2016-10-23 06:10:05 +01:00 |
|
azusayamaguchi
|
c190221fd3
|
Internal SHM comms in non-simd directions working
Need to fix simd directions
|
2016-10-22 18:14:27 +01:00 |
|
azusayamaguchi
|
0fcd2e7188
|
Simplify the comms structure prior to implementing Shared memory direct bouncs
|
2016-10-21 22:44:10 +01:00 |
|
azusayamaguchi
|
910b8dd6a1
|
use simd type
|
2016-10-21 22:35:29 +01:00 |
|
azusayamaguchi
|
75ebd3a0d1
|
Typo fixes and rotate for CLANG
|
2016-10-21 22:34:29 +01:00 |
|
|
7c8f79b147
|
more stochastic QED fixes
|
2016-10-21 15:20:12 +01:00 |
|
azusayamaguchi
|
09fd5c43a7
|
Reasonably fast version
|
2016-10-21 15:17:39 +01:00 |
|
|
462921e549
|
QED: fix stochastic field
|
2016-10-21 14:41:08 +01:00 |
|
azusayamaguchi
|
f22317748f
|
Merge branch 'feature/mpi3' of https://github.com/paboyle/Grid into feature/mpi3
|
2016-10-21 13:36:35 +01:00 |
|
azusayamaguchi
|
6a9eae6b6b
|
Reporting improvements
|
2016-10-21 13:36:18 +01:00 |
|
azusayamaguchi
|
fad96cf250
|
StencilBufs
|
2016-10-21 13:36:00 +01:00 |
|
azusayamaguchi
|
f331809c27
|
Use variable type for loop
|
2016-10-21 13:35:37 +01:00 |
|
|
bd6a228af6
|
Merge commit '20a091c3eddfdb67a82ece6413740a93650a2f98' into feature/feynman-rules
|
2016-10-21 13:10:30 +01:00 |
|
|
63d219498b
|
first (dirty) implementation of Feynman stoctachtic EM field
|
2016-10-21 13:10:13 +01:00 |
|
paboyle
|
2c54a53d0a
|
Compile verbose reduce
|
2016-10-21 12:12:14 +01:00 |
|
paboyle
|
306160ad9a
|
bcopy threaded
|
2016-10-21 12:07:28 +01:00 |
|
azusayamaguchi
|
20a091c3ed
|
Intel vs. Clang intrinsics differences absorbed
|
2016-10-21 09:08:36 +01:00 |
|
azusayamaguchi
|
202078eb1b
|
Cray / OpenSHMEM ordering differs
|
2016-10-21 09:07:20 +01:00 |
|
paboyle
|
a762b1fb71
|
MPI3 working with a bounce through shared memory on my laptop.
Longer term plan: make the "u_comm_buf" in Stencil point to the shared region and avoid the
send between ranks on same node.
|
2016-10-21 09:03:26 +01:00 |
|
paboyle
|
5b5925b8e5
|
Forgot to add
|
2016-10-20 17:09:40 +01:00 |
|
paboyle
|
b58adc6a4b
|
commVector
|
2016-10-20 17:00:15 +01:00 |
|
paboyle
|
f9d5e95d72
|
allocator template typedefs moved to AlignedAllocator
|
2016-10-20 16:59:39 +01:00 |
|
paboyle
|
4f8e636a43
|
commVector
|
2016-10-20 16:59:16 +01:00 |
|
paboyle
|
9b39f35ae6
|
commVector different for SHMEM compat
|
2016-10-20 16:58:53 +01:00 |
|
paboyle
|
5fe2b85cbd
|
MPI3 and shared memory support
|
2016-10-20 16:58:01 +01:00 |
|
paboyle
|
c7cccaaa69
|
Comm vector for shmem
|
2016-10-20 16:57:31 +01:00 |
|
paboyle
|
cbcfea466f
|
MPI3
|
2016-10-20 16:57:14 +01:00 |
|
paboyle
|
4955672fc3
|
MPI3
|
2016-10-20 16:57:00 +01:00 |
|
paboyle
|
8c043da5b7
|
SHMEM and comms allocator made different
|
2016-10-20 16:56:05 +01:00 |
|
paboyle
|
3cbe974eb4
|
Layout
|
2016-10-20 16:55:21 +01:00 |
|
|
997fd882ff
|
Merge branch 'develop' into feature/feynman-rules
# Conflicts:
# lib/Threads.h
# lib/qcd/action/fermion/WilsonFermion.cc
# lib/qcd/action/fermion/WilsonFermion.h
# lib/qcd/utils/SUn.h
# lib/simd/Grid_avx.h
# lib/simd/Intel512common.h
|
2016-10-19 18:35:18 +01:00 |
|
paboyle
|
7af9b87318
|
Cache face tables to improve performance.
Extract merge now looking poor.
|
2016-10-18 09:51:37 +01:00 |
|
paboyle
|
811ca45473
|
GNU clang hack for AVX512 since there are missing reduce intrinsics in Clang 3.9 and GCC-6 AVX512 support
|
2016-10-17 16:23:21 +01:00 |
|
paboyle
|
bc1a4d40ba
|
Faster integer handling avoid push_back
|
2016-10-17 16:16:44 +01:00 |
|
paboyle
|
c8079e6621
|
Time the face gateher in x-dir more carefully
|
2016-10-13 22:28:50 +01:00 |
|
azusayamaguchi
|
8b0d171c9a
|
32bit issue on the KNL code variant where byte offsets were stored
|
2016-10-12 17:49:32 +01:00 |
|
azusayamaguchi
|
8bbd9ebc27
|
Reversing changes to Stencil class
|
2016-10-12 13:47:20 +01:00 |
|
azusayamaguchi
|
6472b431f0
|
__rdpmc needed for gcc, clang++
|
2016-10-12 12:29:08 +01:00 |
|
azusayamaguchi
|
bd205a3293
|
Fixing for non x86 and non KNL
|
2016-10-12 12:09:15 +01:00 |
|
azusayamaguchi
|
496beffa88
|
Fix non-KNL build
|
2016-10-12 12:06:08 +01:00 |
|
azusayamaguchi
|
9b63e97108
|
align not absolutely required and confuses clang++
|
2016-10-12 11:51:21 +01:00 |
|
azusayamaguchi
|
81f2aeaece
|
KNL streaming stores, and KNL performance coutners
|
2016-10-12 11:45:22 +01:00 |
|
paboyle
|
2d4a45c758
|
Typecast pointer
|
2016-10-12 09:14:15 +01:00 |
|
paboyle
|
a123dcd7e9
|
Static required for shmem. Reading same object twice requires csum reset
|
2016-10-12 00:29:57 +01:00 |
|
paboyle
|
6b27c42dfe
|
Cosmetic
|
2016-10-12 00:29:39 +01:00 |
|
paboyle
|
f7c2aa3ba5
|
runtime by default
|
2016-10-12 00:29:13 +01:00 |
|
paboyle
|
7240d73184
|
Parallelise the x faces; fix the segv on KNL with comms
|
2016-10-11 22:21:07 +01:00 |
|
paboyle
|
42cd148f5e
|
Base pointer for comms buffer under AVX512 assembly
|
2016-10-11 16:06:06 +01:00 |
|
paboyle
|
6e01264bb7
|
don't use static by default
|
2016-10-11 10:03:39 +01:00 |
|
paboyle
|
6f408256bc
|
FMA4 option moved on the align
|
2016-10-11 10:03:01 +01:00 |
|
paboyle
|
8d11681aac
|
verbose remove
|
2016-10-10 23:50:42 +01:00 |
|
paboyle
|
3d5c9a1ee9
|
No compile fix on clang++ 3.9
|
2016-10-10 23:50:13 +01:00 |
|
paboyle
|
dc389e467c
|
axpy_ssp for any coeff type via template
|
2016-10-10 23:48:05 +01:00 |
|
paboyle
|
3619167d62
|
Mass parameter
|
2016-10-10 23:47:33 +01:00 |
|
paboyle
|
96f1d1b828
|
Debugged Domain wall and Overlap feynman rules (infinite Ls, finite mass).
|
2016-10-10 23:46:45 +01:00 |
|
paboyle
|
657e0a8f4d
|
Mass parameter
|
2016-10-10 23:46:10 +01:00 |
|
paboyle
|
616e7cd83e
|
Mass parameter
|
2016-10-10 23:45:48 +01:00 |
|
paboyle
|
6f26d2e8d4
|
Overlap tree level feynman rule
|
2016-10-10 23:45:18 +01:00 |
|
paboyle
|
c014574504
|
A "please implement me" feynman rule. If this were abstract virtual it would
require/force implementation
|
2016-10-10 23:44:00 +01:00 |
|
paboyle
|
d7ce164e6e
|
Feynman rule for DWF
|
2016-10-10 23:43:36 +01:00 |
|
paboyle
|
c0d5b99016
|
Dminus
|
2016-10-10 23:43:19 +01:00 |
|
paboyle
|
09ca32d678
|
Dminus added for Cayley
|
2016-10-10 23:42:55 +01:00 |
|
paboyle
|
082ae350c6
|
static schedule by default
|
2016-10-10 23:42:30 +01:00 |
|
Guido Cossu
|
611b5d74ba
|
Fix for AVX+FMA3 compilation
|
2016-10-10 15:26:17 +01:00 |
|
Guido Cossu
|
b56c9ffa52
|
Fix for AVXFMA
|
2016-10-10 14:43:37 +01:00 |
|
|
cb02b7088f
|
Merge branch 'develop' into feature/doxygen
# Conflicts:
# configure.ac
|
2016-10-09 13:35:44 +01:00 |
|
Guido Cossu
|
2e453dfbf5
|
Added some instrumentation to benchmark the force computation
|
2016-10-06 17:52:45 +01:00 |
|
paboyle
|
4089984431
|
Timing hooks
|
2016-10-06 09:25:12 +01:00 |
|
Guido Cossu
|
c78bbd0f8c
|
Fix ASM compilation
|
2016-10-04 15:37:32 +01:00 |
|
|
536e2ff073
|
*.inc removed: please don't commit these files either!
|
2016-09-27 11:54:03 +01:00 |
|
paboyle
|
87acd06990
|
Use streaming stores
|
2016-09-26 10:11:34 +01:00 |
|
paboyle
|
9353b6edfe
|
Fenv out of grid namespace
|
2016-09-26 10:09:13 +01:00 |
|
paboyle
|
167cc2650e
|
GNU SOURCE problem on travis
|
2016-09-26 09:58:09 +01:00 |
|
paboyle
|
7089b6d5a5
|
Setting up but not implemented some QED rules
|
2016-09-26 09:43:40 +01:00 |
|
paboyle
|
2ba7d43ddd
|
Divide handling
|
2016-09-26 09:43:14 +01:00 |
|
paboyle
|
836e929565
|
Divide handling improved
|
2016-09-26 09:42:22 +01:00 |
|
paboyle
|
b6713ecb60
|
Momentum space rules for Overlap, DWF untested to date
|
2016-09-26 09:39:09 +01:00 |
|
paboyle
|
52a39f0fcd
|
Divide in ET
|
2016-09-26 09:38:38 +01:00 |
|
paboyle
|
81a7a03076
|
Integer <<
|
2016-09-26 09:38:17 +01:00 |
|
paboyle
|
16b37b956c
|
divide goes to ET
|
2016-09-26 09:37:59 +01:00 |
|
paboyle
|
567b6cf23f
|
demangle moves to logging
|
2016-09-26 09:36:51 +01:00 |
|
paboyle
|
296396646d
|
FPE's on macos set up
|
2016-09-26 09:36:14 +01:00 |
|
Guido Cossu
|
5c190a1b8c
|
Merge branch 'develop' into feature/hirep
|
2016-09-23 11:06:06 +01:00 |
|
Guido Cossu
|
c4ac6e7e8f
|
Consolidating HMC interface
Uniformed interface for standard action in fundamental rep and Hirep
|
2016-09-23 10:47:42 +01:00 |
|
Guido Cossu
|
510e340e16
|
Debugged last commit for the Two index representation
|
2016-09-22 22:16:21 +01:00 |
|
Guido Cossu
|
6ffadca153
|
Restored number of colours to 3
|
2016-09-22 14:22:54 +01:00 |
|
Guido Cossu
|
b6597b74e7
|
Added support for the Two index Symmetric and Antisymmetric representations
Tested for HMC convergence: OK
Added also a test file showing an example for mixed representations
|
2016-09-22 14:17:37 +01:00 |
|
|
a034e9901b
|
Merge branch 'develop' into feature/hadrons
|
2016-09-20 13:49:33 +01:00 |
|
Antonin Portelli
|
0724f7af75
|
QPX single precision implementation
|
2016-09-19 18:09:12 +01:00 |
|
|
2e74520821
|
removed libtool use (BG/Q compatibility)
|
2016-09-16 15:25:49 +01:00 |
|
Antonin Portelli
|
6dd75ad9e5
|
Merge branch 'develop' of github.com:paboyle/Grid into feature/bgq
|
2016-09-16 15:07:54 +01:00 |
|
Guido Cossu
|
fda408ee6f
|
Added first lines for supporting Two Index representations
|
2016-09-13 10:43:30 +01:00 |
|
Guido Cossu
|
b9c80318a2
|
Merge branch 'develop' into feature/hirep
|
2016-09-13 10:01:51 +01:00 |
|
Guido Cossu
|
5df5d52d41
|
Fix for the Intel compiler
|
2016-09-12 17:17:20 +01:00 |
|
Guido Cossu
|
f76f281e58
|
Cleaning files after fix
|
2016-09-09 11:34:25 +01:00 |
|
Guido Cossu
|
aa20cc8b52
|
Fixing compilation error with AVX512 flag
|
2016-09-09 02:58:52 -07:00 |
|
Guido Cossu
|
0fd179fb33
|
Merge branch 'develop' into feature/hirep
|
2016-09-01 12:59:53 +01:00 |
|
Guido Cossu
|
f45ef8d114
|
Minor modification in ActionBase.h
|
2016-09-01 11:46:46 +01:00 |
|
paboyle
|
8535d433a7
|
Cold or hot must support any precisoin
|
2016-08-31 00:27:53 +01:00 |
|
paboyle
|
b573d1f35a
|
Wilson tree level added
|
2016-08-31 00:27:04 +01:00 |
|
paboyle
|
0c1d7e4daf
|
Mom space prop for Wilson action
|
2016-08-31 00:26:36 +01:00 |
|
paboyle
|
02e983a0cd
|
Momentum space prop and free prop convolution
|
2016-08-31 00:26:02 +01:00 |
|
paboyle
|
d15ab66aae
|
FFT moves higher in include order
|
2016-08-31 00:25:22 +01:00 |
|
paboyle
|
9005b82c6d
|
Multi dim FFT, and normalisation fix
|
2016-08-31 00:24:52 +01:00 |
|
paboyle
|
3475f45ce7
|
Demangle support for typeid stuff
|
2016-08-31 00:23:48 +01:00 |
|
paboyle
|
0744f38866
|
Demangle support is useful
|
2016-08-31 00:23:28 +01:00 |
|
Guido Cossu
|
fd5614738d
|
Merge branch 'develop' into feature/hirep
|
2016-08-30 18:21:36 +01:00 |
|
Guido Cossu
|
b0d3e4bb2c
|
Separating travis builds
|
2016-08-30 13:44:07 +01:00 |
|
Guido Cossu
|
b512ccbee6
|
HMC for Adjoint fermions works
Accepts and reproduces known results
Check initial instability of inverters
when starting from hot configurations
|
2016-08-30 11:31:25 +01:00 |
|
paboyle
|
8c89391c02
|
FFTW unresolved fixed when no fftw3.h
|
2016-08-24 16:41:47 +01:00 |
|
paboyle
|
bfac5195b8
|
tidy up
|
2016-08-24 16:38:36 +01:00 |
|
paboyle
|
744691097f
|
Printing
|
2016-08-24 15:05:56 +01:00 |
|
paboyle
|
ff6da364e8
|
FFT double and single precision gives good performance now in multithreaded code.
|
2016-08-24 15:05:00 +01:00 |
|
|
4d11a6f5f2
|
first commit for QPX intrinsics
|
2016-08-23 14:41:44 +01:00 |
|
paboyle
|
88be3b39bb
|
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
|
2016-08-22 18:29:36 +01:00 |
|
paboyle
|
356e7940fd
|
fftw can be switched off
|
2016-08-22 16:24:49 +01:00 |
|
paboyle
|
73ce476890
|
Include fftw headers
|
2016-08-22 16:24:21 +01:00 |
|
paboyle
|
e423a09974
|
FFT improved and test_FFT passing under MPI 8 processes, 8^4 for LatticeComplexD and LatticeSpinMatrixD
|
2016-08-18 02:23:21 +01:00 |
|
paboyle
|
17097a93ec
|
FFTW test ran over 4 mpi processes.
|
2016-08-17 01:33:55 +01:00 |
|