1
0
mirror of https://github.com/paboyle/Grid.git synced 2025-06-12 20:27:06 +01:00
Commit Graph

6235 Commits

Author SHA1 Message Date
82f71643a4 Remove the norm in MdagM 2020-05-12 17:55:53 -04:00
d15ccad8a7 switched to vec* in Reduce 2020-05-12 20:41:14 +02:00
0009b5cee8 updated SVE_README 2020-05-12 19:02:33 +02:00
20d1941a45 enabled asm kernels for fixed-size A64FXFIXEDSIZE 2020-05-12 19:01:12 +02:00
d24d8e8398 Use X-direction as more bits meaningful on CUDA.
2^31-1 shoulddd always bee enough for SIMD and thread reduced local volume

e.g. 32*2^31 = 2^36 = (2^9)^4 or 512^4 ias big enough.

Where 32 is gpu_threads * Nsimd = 8*4
2020-05-12 10:35:49 -04:00
162e4bb567 no automatic prefetching for now 2020-05-12 07:01:23 -04:00
07c0c02f8c Speed up Cshift 2020-05-11 17:02:01 -04:00
8c31c065b5 Keep the Vector fixed to protect it from realloc 2020-05-11 17:00:30 -04:00
b7c76ede29 Removed some assertions in Test_simd and removed exit() in Reduce 2020-05-11 22:43:00 +02:00
05edf803bd corrected typo 2020-05-12 03:59:59 +09:00
b1c86900b2 Merge pull request #4 from paboyle/develop
merge
2020-05-11 20:59:29 +02:00
78b8e40f83 switched to gcc's internal data types 2020-05-11 18:11:23 +02:00
fc2e9850d3 temporarily enable TOFU by default when using A64FX or A64FXFIXEDSIZE 2020-05-11 13:25:02 +02:00
ffaaed679e MPI_THREAD_SINGLE hack for Fugaku, enabled by -DTOFU 2020-05-11 13:21:39 +02:00
bbbee5660d First compiile on HiP 2020-05-10 05:28:09 -04:00
ea08f193e7 Allocator cache spliit into large/small pools 2020-05-10 05:24:26 -04:00
2bb2c68e15 Separate pools for small and large allocations cache 2020-05-09 22:57:21 -04:00
efe5bc6a3c Split allocator cache into two pools of different sizes 2020-05-09 22:27:56 -04:00
b2fd8b993a fixed-size clean up 2020-05-09 22:53:42 +02:00
291ee8c3d0 updated fixed-size implementation; only Exch1 and prefetches missing 2020-05-09 22:18:02 +02:00
e1a5b3ea49 unions for tables eliminate explicit loads, gcc does not complain 2020-05-09 21:21:57 +02:00
55a55660cb reverted changes 2020-05-09 12:48:42 +02:00
384da487bd Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2020-05-08 18:55:11 -04:00
ee1de82a53 Working ITT benchmark again 2020-05-08 18:54:50 -04:00
2b576fc185 Comment deadd codde remove 2020-05-08 18:54:29 -04:00
52081acfa5 NVCC compile fixes 2020-05-08 13:14:12 -04:00
b01b7f761a Merge pull request #283 from DanielRichtmann/feature/minor-fixes
Some small fixes
2020-05-08 10:52:03 -04:00
c83471bfd0 Fix missing checkerboards for adj und conjugate 2020-05-08 16:44:03 +02:00
ab0c5d77fb Correct NonHermitianSchurOperatorBase 2020-05-08 16:44:02 +02:00
779e3c7442 Const-correctness for retrieval routines of GridStopWatch 2020-05-08 16:43:52 +02:00
0c570824f2 Add missing declaration of GridCmdOptionInt 2020-05-08 16:43:51 +02:00
f8b8e00090 Systematise the accelerator primitives and locate to Grid/threads/Accelerator.h / Accelerator.cc
Aim to reduce the amount of cuda and other code variations floating around all over the place.

Will move GpuInit iinto Accelerator.cc from Init.cc
Need to worry about SharedMemoryMPI.cc and the Peer2Peer windows
2020-05-08 06:23:55 -07:00
0dd1bdfa94 Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2020-05-08 09:21:43 -04:00
1d65e2f62c Slightly faster Chebyshev; ifdef'ed out the fastest until tested numerics
Lifteed from HDCR setup
2020-05-08 09:20:54 -04:00
93920c4811 Remove verbose 2020-05-08 09:19:54 -04:00
6859a3e1d4 Schur operator 2020-05-08 09:19:12 -04:00
21ca182c36 Comments remove 2020-05-08 09:18:24 -04:00
ceb8b374da API change v3 2020-05-08 15:04:44 +02:00
4bc2ad2894 API change v2 2020-05-08 15:00:25 +02:00
798af3e68f retry changing StoD API 2020-05-08 14:34:59 +02:00
b0ef2367f3 testing alternate call to PrecisionChange 2020-05-08 14:22:44 +02:00
71a7350a85 changed 2nd argument in Reduce to native vector type 2020-05-08 12:26:51 +02:00
6f79369955 trying to get rid of macro definition error 2020-05-08 12:19:24 +02:00
f9cb6b979f corrected more typos 2020-05-08 12:11:01 +02:00
ed4d9d17f8 corrected type 2020-05-08 12:09:22 +02:00
fbed02690d some changes in breaking out A64FX: use -DA64FXFIXEDSIZE for fixed size, but also define GEN 2020-05-08 12:05:31 +02:00
39f3ae5b1d corrected more types 2020-05-08 11:07:14 +02:00
e64bec8c8e pulled SVE typedefs out of Optimization 2020-05-08 11:04:21 +02:00
0893b4e552 fixed typos in PrecisionChange 2020-05-08 10:59:07 +02:00
92f0f29670 fixed double overloading vecf in Div, corrected typos 2020-05-08 10:57:23 +02:00