nmeyer-ur
0009b5cee8
updated SVE_README
2020-05-12 19:02:33 +02:00
nmeyer-ur
20d1941a45
enabled asm kernels for fixed-size A64FXFIXEDSIZE
2020-05-12 19:01:12 +02:00
Peter Boyle
d24d8e8398
Use X-direction as more bits meaningful on CUDA.
...
2^31-1 shoulddd always bee enough for SIMD and thread reduced local volume
e.g. 32*2^31 = 2^36 = (2^9)^4 or 512^4 ias big enough.
Where 32 is gpu_threads * Nsimd = 8*4
2020-05-12 10:35:49 -04:00
Christoph Lehner
162e4bb567
no automatic prefetching for now
2020-05-12 07:01:23 -04:00
Peter Boyle
07c0c02f8c
Speed up Cshift
2020-05-11 17:02:01 -04:00
Peter Boyle
8c31c065b5
Keep the Vector fixed to protect it from realloc
2020-05-11 17:00:30 -04:00
nmeyer-ur
b7c76ede29
Removed some assertions in Test_simd and removed exit() in Reduce
2020-05-11 22:43:00 +02:00
nmeyer-ur
05edf803bd
corrected typo
2020-05-12 03:59:59 +09:00
Christoph Lehner
b1c86900b2
Merge pull request #4 from paboyle/develop
...
merge
2020-05-11 20:59:29 +02:00
nmeyer-ur
78b8e40f83
switched to gcc's internal data types
2020-05-11 18:11:23 +02:00
nmeyer-ur
fc2e9850d3
temporarily enable TOFU by default when using A64FX or A64FXFIXEDSIZE
2020-05-11 13:25:02 +02:00
nmeyer-ur
ffaaed679e
MPI_THREAD_SINGLE hack for Fugaku, enabled by -DTOFU
2020-05-11 13:21:39 +02:00
Peter Boyle
bbbee5660d
First compiile on HiP
2020-05-10 05:28:09 -04:00
Peter Boyle
ea08f193e7
Allocator cache spliit into large/small pools
2020-05-10 05:24:26 -04:00
Peter Boyle
2bb2c68e15
Separate pools for small and large allocations cache
2020-05-09 22:57:21 -04:00
Peter Boyle
efe5bc6a3c
Split allocator cache into two pools of different sizes
2020-05-09 22:27:56 -04:00
nmeyer-ur
b2fd8b993a
fixed-size clean up
2020-05-09 22:53:42 +02:00
nmeyer-ur
291ee8c3d0
updated fixed-size implementation; only Exch1 and prefetches missing
2020-05-09 22:18:02 +02:00
nmeyer-ur
e1a5b3ea49
unions for tables eliminate explicit loads, gcc does not complain
2020-05-09 21:21:57 +02:00
nmeyer-ur
55a55660cb
reverted changes
2020-05-09 12:48:42 +02:00
Peter Boyle
384da487bd
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
2020-05-08 18:55:11 -04:00
Peter Boyle
ee1de82a53
Working ITT benchmark again
2020-05-08 18:54:50 -04:00
Peter Boyle
2b576fc185
Comment deadd codde remove
2020-05-08 18:54:29 -04:00
Peter Boyle
52081acfa5
NVCC compile fixes
2020-05-08 13:14:12 -04:00
Peter Boyle
b01b7f761a
Merge pull request #283 from DanielRichtmann/feature/minor-fixes
...
Some small fixes
2020-05-08 10:52:03 -04:00
Daniel Richtmann
c83471bfd0
Fix missing checkerboards for adj und conjugate
2020-05-08 16:44:03 +02:00
Daniel Richtmann
ab0c5d77fb
Correct NonHermitianSchurOperatorBase
2020-05-08 16:44:02 +02:00
Daniel Richtmann
779e3c7442
Const-correctness for retrieval routines of GridStopWatch
2020-05-08 16:43:52 +02:00
Daniel Richtmann
0c570824f2
Add missing declaration of GridCmdOptionInt
2020-05-08 16:43:51 +02:00
Peter Boyle
f8b8e00090
Systematise the accelerator primitives and locate to Grid/threads/Accelerator.h / Accelerator.cc
...
Aim to reduce the amount of cuda and other code variations floating around all over the place.
Will move GpuInit iinto Accelerator.cc from Init.cc
Need to worry about SharedMemoryMPI.cc and the Peer2Peer windows
2020-05-08 06:23:55 -07:00
Peter Boyle
0dd1bdfa94
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
2020-05-08 09:21:43 -04:00
Peter Boyle
1d65e2f62c
Slightly faster Chebyshev; ifdef'ed out the fastest until tested numerics
...
Lifteed from HDCR setup
2020-05-08 09:20:54 -04:00
Peter Boyle
93920c4811
Remove verbose
2020-05-08 09:19:54 -04:00
Peter Boyle
6859a3e1d4
Schur operator
2020-05-08 09:19:12 -04:00
Peter Boyle
21ca182c36
Comments remove
2020-05-08 09:18:24 -04:00
nmeyer-ur
ceb8b374da
API change v3
2020-05-08 15:04:44 +02:00
nmeyer-ur
4bc2ad2894
API change v2
2020-05-08 15:00:25 +02:00
nmeyer-ur
798af3e68f
retry changing StoD API
2020-05-08 14:34:59 +02:00
nmeyer-ur
b0ef2367f3
testing alternate call to PrecisionChange
2020-05-08 14:22:44 +02:00
nmeyer-ur
71a7350a85
changed 2nd argument in Reduce to native vector type
2020-05-08 12:26:51 +02:00
nmeyer-ur
6f79369955
trying to get rid of macro definition error
2020-05-08 12:19:24 +02:00
nmeyer-ur
f9cb6b979f
corrected more typos
2020-05-08 12:11:01 +02:00
nmeyer-ur
ed4d9d17f8
corrected type
2020-05-08 12:09:22 +02:00
nmeyer-ur
fbed02690d
some changes in breaking out A64FX: use -DA64FXFIXEDSIZE for fixed size, but also define GEN
2020-05-08 12:05:31 +02:00
nmeyer-ur
39f3ae5b1d
corrected more types
2020-05-08 11:07:14 +02:00
nmeyer-ur
e64bec8c8e
pulled SVE typedefs out of Optimization
2020-05-08 11:04:21 +02:00
nmeyer-ur
0893b4e552
fixed typos in PrecisionChange
2020-05-08 10:59:07 +02:00
nmeyer-ur
92f0f29670
fixed double overloading vecf in Div, corrected typos
2020-05-08 10:57:23 +02:00
nmeyer-ur
48a340a9d1
GEN seems to defined by default -> some fixes applied
2020-05-08 10:47:49 +02:00
nmeyer-ur
f45621109b
placed typedefs in Optimization
2020-05-08 10:41:52 +02:00