nils meyer
|
6fdce60492
|
revised BodyA64FX; 990 GiB/s Wilson, 687 GiB/s DW using intrinsics (armclang 20.0)
|
2020-04-16 22:43:32 +02:00 |
|
nils meyer
|
852db4626a
|
re-introduced HOTFIX cause Grid binaries give wrong results otherwise; checked in good gridverter.py
|
2020-04-15 18:22:19 +02:00 |
|
nils meyer
|
6504a098cc
|
999 GiB/s Wilson; 694 GiB/s DW (DP)
|
2020-04-15 15:06:52 +02:00 |
|
nils meyer
|
79a385faca
|
disabled armclang hotfix cause armclang 20.0 performance gets a little
|
2020-04-15 11:46:55 +02:00 |
|
nils meyer
|
c12a67030a
|
980 GiB/s Wilson; 680 GiB/s DW (DP)
|
2020-04-15 10:55:06 +02:00 |
|
nils meyer
|
581392f2f2
|
now with pf, best results so far using intrinsics+pf
|
2020-04-12 22:06:14 +02:00 |
|
nils meyer
|
113f277b6a
|
enable dslash asm using -DA64FXASM, additionaly -DDSLASHINTRIN for intrinsics impl
|
2020-04-11 04:55:01 +02:00 |
|
nils meyer
|
974586bedc
|
Dslash finally works; cleaned up; uses MOVPRFX in assembly
|
2020-04-10 22:26:40 +02:00 |
|
nils meyer
|
5cdbb7e71e
|
fixed A64FX Dslash; compiles, but does not specialize -> assertion
|
2020-04-09 21:23:39 +02:00 |
|
nmeyer-ur
|
8123590a1b
|
changes
|
2020-04-09 16:45:47 +02:00 |
|
nmeyer-ur
|
cd1efee866
|
changes
|
2020-04-09 16:35:13 +02:00 |
|
nmeyer-ur
|
bd310932f7
|
changes
|
2020-04-09 16:32:31 +02:00 |
|
nmeyer-ur
|
e252c1aca3
|
addressing
|
2020-04-09 15:03:12 +02:00 |
|
nmeyer-ur
|
b140c6a4f9
|
addressing
|
2020-04-09 15:01:15 +02:00 |
|
nmeyer-ur
|
326de36467
|
revised sU addressing scheme
|
2020-04-09 14:44:25 +02:00 |
|
nmeyer-ur
|
9f224a1647
|
fixed typo in single
|
2020-04-09 14:30:21 +02:00 |
|
nmeyer-ur
|
bb46ba9b5f
|
fixed array size in single
|
2020-04-09 14:28:45 +02:00 |
|
nmeyer-ur
|
dd5a22b36b
|
revised declarations
|
2020-04-09 14:21:27 +02:00 |
|
nmeyer-ur
|
1ea85b9972
|
Disabled build message
|
2020-04-09 13:47:21 +02:00 |
|
nmeyer-ur
|
8fb63f1c25
|
added A64FX Wilson kernels single precision
|
2020-04-09 13:41:04 +02:00 |
|
nmeyer-ur
|
77fa586f6c
|
introduced A64FX Wilson kernels
|
2020-04-09 13:30:06 +02:00 |
|
nmeyer-ur
|
15238e8d5e
|
reduce acle works, clean up
|
2020-04-03 20:40:44 +02:00 |
|
nmeyer-ur
|
b27e31957a
|
reduce acle revised
|
2020-04-03 19:46:15 +02:00 |
|
nmeyer-ur
|
46927771e3
|
reduce acle still needs overhaul
|
2020-04-03 19:30:48 +02:00 |
|
nmeyer-ur
|
d8cea77707
|
define simd width in header
|
2020-04-03 19:22:25 +02:00 |
|
nmeyer-ur
|
5f8a76d490
|
clean up, reduction in acle
|
2020-04-03 19:18:24 +02:00 |
|
nmeyer-ur
|
28d49a3b60
|
build problem resolved
|
2020-04-03 16:52:48 +02:00 |
|
nmeyer-ur
|
b4c624ece6
|
added A64FX support
|
2020-04-03 15:43:23 +02:00 |
|
Peter Boyle
|
55cdb17691
|
Integer divide for blocking
|
2020-01-27 12:27:45 -05:00 |
|
Peter Boyle
|
34108296cd
|
Merge branch 'develop' into feature/gpu-port
Conflicts:
Grid/simd/Grid_avx512.h
|
2019-07-20 17:05:35 +01:00 |
|
Peter Boyle
|
76c704b84b
|
Intrinsics for CLANG are now fixed in v6
|
2019-07-20 16:52:24 +01:00 |
|
Peter Boyle
|
a23dc295ac
|
Remove compiler errors and warnings
|
2019-07-18 14:47:02 +01:00 |
|
Peter Boyle
|
fa9cd50c5b
|
Merge branch 'develop' into feature/gpu-port
|
2019-07-16 11:55:17 +01:00 |
|
Peter Boyle
|
703dc20377
|
Compile tests fix
|
2019-06-16 13:59:29 +01:00 |
|
Peter Boyle
|
c7dbf4c87e
|
Scalar support for GPU threads
|
2019-06-15 08:25:43 +01:00 |
|
gfilaci
|
8b6541fb60
|
Fix gpu MultRealPart and MaddRealPart bug
|
2019-05-02 10:58:17 +01:00 |
|
|
91cffef883
|
Updates after review with Peter.
|
2019-03-07 14:30:35 +00:00 |
|
|
b7db99967a
|
Recommendations for Traits classes
|
2019-02-28 20:06:59 +00:00 |
|
Peter Boyle
|
e73b909a48
|
Make tests running past nvcc. Different NVCC versions proving tricky to keep happy. This is 9.2
|
2019-01-02 12:05:30 +00:00 |
|
Peter Boyle
|
9d866d062a
|
GPU support improvements
|
2019-01-01 15:05:03 +00:00 |
|
Peter Boyle
|
b57a4d32aa
|
Merge branch 'develop' into feature/gpu-port
|
2018-12-13 05:11:34 +00:00 |
|
|
fb7d021b9d
|
Hadrons: moving Hadrons to root directory, build system improvements
|
2018-08-28 15:00:40 +01:00 |
|