18ce23aa75
Fix NEON SIMD
2023-04-06 11:30:48 +01:00
Peter Boyle
e74666a09c
Double length vector type for fast precision change
2022-11-15 16:34:21 -05:00
Peter Boyle
5fe480d81c
Generic patch
2022-11-15 16:21:45 -05:00
Peter Boyle
8a07b52009
Dirichlet
2022-10-13 18:44:47 -04:00
Peter Boyle
204c283e16
Merge branch 'feature/dirichlet' of https://github.com/paboyle/Grid into feature/dirichlet
2022-10-11 14:59:07 -04:00
Peter Boyle
551a5f8dc8
RRII gpu option
2022-10-11 14:44:55 -04:00
Peter Boyle
c82b164f6b
Merge branch 'feature/dirichlet' of https://github.com/paboyle/Grid into feature/dirichlet
2022-10-04 17:41:48 -04:00
Peter Boyle
97448a93dc
Double2 compiles and dslash runs
2022-09-27 10:55:25 -04:00
Christopher Kelly
19da647e3c
Added support for non-periodic gauge field implementations in the random gauge shift performed at the start of the HMC trajectory
...
(The above required exposing the gauge implementation to the HMC class through the Integrator class)
Made the random shift optional (default on) through a parameter in HMCparameters
Modified ConjugateBC::CshiftLink such that it supports any shift in -L < shift < L rather than just +-1
Added a tester for the BC-respecting Cshift
Fixed a missing system header include in SSE4 intrinsics wrapper
Fixed sumD_cpu for single-prec types performing an incorrect conversion to a single-prec data type at the end, that fails to compile on some systems
2022-09-09 12:47:09 -04:00
Michael Marshall
f252d69eef
Merge branch 'develop' into bugfix/LatTransfer
...
* develop:
Pass serial RNG around
Sycl happier
2021-03-04 20:41:30 +00:00
u61464
679d1d22f7
Sycl happier
2021-03-03 11:21:43 -08:00
Michael Marshall
1059a81a3c
Merge branch 'develop' into bugfix/LatTransfer
...
* develop:
Better SIMD usage/coalescence
2021-02-27 00:21:36 +00:00
Peter Boyle
f9b1f240f6
Better SIMD usage/coalescence
2021-02-26 17:51:41 +01:00
Michael Marshall
3215d88a91
Simplify syntax with Grid::EnableIf post code review. Updated EnableIf so that ReturnType defaults to void in same way as std::enable_if see https://en.cppreference.com/w/cpp/types/enable_if
2021-02-03 15:17:03 +00:00
Nils Meyer
6013183361
removed Asm impls
2020-12-19 03:25:01 +01:00
Nils Meyer
4b882e8056
fixed lost bracket
2020-12-19 03:09:20 +01:00
Nils Meyer
3f9ae6e7e7
Merge branch 'develop' into feature/a64fx-3
2020-12-19 02:37:11 +01:00
Nils Meyer
909acd55cd
vnum variant for prefetches
2020-12-19 02:00:22 +01:00
Nils Meyer
4dd9e39e0d
up to +36% performance gain for dslash/dwf on QPACE 4 using GCC 10.1.1
2020-12-19 00:54:31 +01:00
Peter Boyle
9aec4a3c26
SYCL
2020-12-10 02:11:17 -08:00
Peter Boyle
cc9c993f74
Project on group fix on GPU tracked to reciprocal sqrt collision between CUDA and Grid rsqrt
2020-10-31 18:12:47 -04:00
Peter Boyle
ecd3f890f5
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
2020-09-16 02:30:14 +01:00
Peter Boyle
1c881ce23c
HIP does not like half2 visible members x and y so must define own Half2
2020-09-16 02:28:33 +01:00
Peter Boyle
e14a84317d
GPU math unary calls
2020-08-31 23:50:49 -04:00
nmeyer-ur
d9474c6cb6
compiler-independent build using --enable-simd=A64FX
2020-07-09 10:07:02 +02:00
nmeyer-ur
bbd145382b
enable --enable-simd=A64FX in configure
2020-07-08 12:43:51 +02:00
nmeyer-ur
8726e94ea7
merge upstream develop
2020-07-07 20:26:47 +02:00
nmeyer-ur
a25e4b3d0c
pred 32/64 for float/double instead of 8 in VLA patch
2020-06-13 14:44:37 +02:00
nmeyer-ur
d1210ca12a
switch to double/float instead of float64_t/float32_t in VLA patch
2020-06-13 13:59:32 +02:00
nmeyer-ur
36ea0e222a
type traits for ComplexF/D in VLA patch; cosmetics in VLS intrinsics
2020-06-13 13:42:35 +02:00
nmeyer-ur
92281ec22d
add 3 op Mult for VLA
2020-06-12 18:49:05 +02:00
nmeyer-ur
87266ce099
comment out fcmla in vector types: need also MultAddReal
2020-06-12 18:37:19 +02:00
nmeyer-ur
2a23f133e8
reenable fcmla for VLA
2020-06-12 17:30:38 +02:00
nmeyer-ur
8dbf790f62
correct tbl2 for sp
2020-06-12 17:12:34 +02:00
nmeyer-ur
2402b4940e
vec_imm in float
2020-06-12 15:17:38 +02:00
nmeyer-ur
2111052fbe
apply VLA patch for memcpy reduction suggested by Arm, CAS-162542-D6W7Z7
2020-06-12 14:49:19 +02:00
nmeyer-ur
433766ac62
revert Add/SubTimesI and prefetching in stencil
...
This reverts commit 9b2699226c7a3ca8d45f843f4f8e4658fa082163.
2020-06-08 12:02:53 +02:00
nmeyer-ur
9872c76825
introduce AddTimesI and SubTimesI; slight benefit in operators, but < 1%; breaks all other impls
2020-06-03 15:20:13 +02:00
nmeyer-ur
5ee3ea2144
round-up after testing of prefetches in stencil close
2020-06-03 11:58:20 +02:00
nmeyer-ur
5050833b42
revert changes due to performance penalty in Wilson using MPI
2020-06-02 13:08:57 +02:00
nmeyer-ur
7bee4ebb54
correct predication for svcadd
2020-06-02 10:51:39 +02:00
nmeyer-ur
71cf9851e7
correct type for vecd in TimesI and TimesMinusI
2020-06-02 10:44:15 +02:00
nmeyer-ur
b4735c9904
correct zero in svcadd
2020-06-02 10:38:05 +02:00
nmeyer-ur
9b2699226c
use fcadd in TimesI and TimesMinusI instead of tbl and neg
2020-06-02 10:32:44 +02:00
Peter Boyle
556da86ac3
HIP fp16
2020-05-24 13:41:58 -04:00
nmeyer-ur
6ddcef1bca
fix build error enabling fcmla/mac in vector types for VLA
2020-05-21 21:21:03 +02:00
nmeyer-ur
8c5a5fdfce
disable fcmla in vector type building for VLA
2020-05-21 19:41:42 +02:00
nmeyer-ur
046b1cbbc0
enable fcmla in tensor arithmetics; fixed-size works, VLA does not compile
2020-05-21 19:39:07 +02:00
nmeyer-ur
a65ce237c1
clean up; Exch1 VLA sp+dp integrate, tested, working
2020-05-21 09:48:06 +02:00
nmeyer-ur
cd27f1005d
clean up; Exch1 sp integrate, tested, working
2020-05-21 08:45:43 +02:00