nmeyer-ur
|
71a7350a85
|
changed 2nd argument in Reduce to native vector type
|
2020-05-08 12:26:51 +02:00 |
|
nmeyer-ur
|
6f79369955
|
trying to get rid of macro definition error
|
2020-05-08 12:19:24 +02:00 |
|
nmeyer-ur
|
f9cb6b979f
|
corrected more typos
|
2020-05-08 12:11:01 +02:00 |
|
nmeyer-ur
|
ed4d9d17f8
|
corrected type
|
2020-05-08 12:09:22 +02:00 |
|
nmeyer-ur
|
fbed02690d
|
some changes in breaking out A64FX: use -DA64FXFIXEDSIZE for fixed size, but also define GEN
|
2020-05-08 12:05:31 +02:00 |
|
nmeyer-ur
|
39f3ae5b1d
|
corrected more types
|
2020-05-08 11:07:14 +02:00 |
|
nmeyer-ur
|
e64bec8c8e
|
pulled SVE typedefs out of Optimization
|
2020-05-08 11:04:21 +02:00 |
|
nmeyer-ur
|
0893b4e552
|
fixed typos in PrecisionChange
|
2020-05-08 10:59:07 +02:00 |
|
nmeyer-ur
|
92f0f29670
|
fixed double overloading vecf in Div, corrected typos
|
2020-05-08 10:57:23 +02:00 |
|
nmeyer-ur
|
48a340a9d1
|
GEN seems to defined by default -> some fixes applied
|
2020-05-08 10:47:49 +02:00 |
|
nmeyer-ur
|
f45621109b
|
placed typedefs in Optimization
|
2020-05-08 10:41:52 +02:00 |
|
nmeyer-ur
|
32d1a0bbea
|
added even more debug output
|
2020-05-08 10:39:26 +02:00 |
|
nmeyer-ur
|
267cce66a1
|
added more debug output
|
2020-05-08 10:29:28 +02:00 |
|
nmeyer-ur
|
3417147b11
|
added real fma, corrected typos in tbls; integrated, must supply A64FXGCC with GEN in configure
|
2020-05-08 10:20:19 +02:00 |
|
nmeyer-ur
|
b338719bc8
|
first transition to fixed-size done, excl. Exch; next step: integration
|
2020-05-07 22:33:28 +02:00 |
|
nmeyer-ur
|
2b81cbe2c2
|
first attempt to introduce tables using fixed-size; still incomplete
|
2020-05-07 22:01:19 +02:00 |
|
nmeyer-ur
|
acff9d6ed2
|
transition to fixed size data types almost done; still incomplete
|
2020-05-07 21:24:07 +02:00 |
|
nmeyer-ur
|
a306a49788
|
first mods for fixed size; still incomplete
|
2020-05-07 19:07:49 +02:00 |
|
nmeyer-ur
|
5abec5b8a9
|
SVE_readme update, update Grid_vector_types.h
|
2020-04-25 13:48:26 +02:00 |
|
nils meyer
|
6db68d6ecb
|
added SVE configure for armclang and gcc
|
2020-04-24 10:10:47 +02:00 |
|
nmeyer-ur
|
39b448affb
|
Merge remote-tracking branch 'origin/develop' into feature/a64fx-2
|
2020-04-22 17:34:12 +02:00 |
|
nils meyer
|
e54a8f05a9
|
Exchange1 with generic version for now, should use svtbl2 in final version
|
2020-04-20 22:45:27 +02:00 |
|
nils meyer
|
64b72fc17f
|
testing gcc 10.0.1: build errors in Exchange1 using -DA64FX and in Lattice_base.h building Dslash only
|
2020-04-19 01:25:40 +02:00 |
|
nils meyer
|
6fdce60492
|
revised BodyA64FX; 990 GiB/s Wilson, 687 GiB/s DW using intrinsics (armclang 20.0)
|
2020-04-16 22:43:32 +02:00 |
|
nils meyer
|
852db4626a
|
re-introduced HOTFIX cause Grid binaries give wrong results otherwise; checked in good gridverter.py
|
2020-04-15 18:22:19 +02:00 |
|
nils meyer
|
6504a098cc
|
999 GiB/s Wilson; 694 GiB/s DW (DP)
|
2020-04-15 15:06:52 +02:00 |
|
nils meyer
|
79a385faca
|
disabled armclang hotfix cause armclang 20.0 performance gets a little
|
2020-04-15 11:46:55 +02:00 |
|
nils meyer
|
c12a67030a
|
980 GiB/s Wilson; 680 GiB/s DW (DP)
|
2020-04-15 10:55:06 +02:00 |
|
nils meyer
|
581392f2f2
|
now with pf, best results so far using intrinsics+pf
|
2020-04-12 22:06:14 +02:00 |
|
nils meyer
|
113f277b6a
|
enable dslash asm using -DA64FXASM, additionaly -DDSLASHINTRIN for intrinsics impl
|
2020-04-11 04:55:01 +02:00 |
|
nils meyer
|
974586bedc
|
Dslash finally works; cleaned up; uses MOVPRFX in assembly
|
2020-04-10 22:26:40 +02:00 |
|
nils meyer
|
5cdbb7e71e
|
fixed A64FX Dslash; compiles, but does not specialize -> assertion
|
2020-04-09 21:23:39 +02:00 |
|
nmeyer-ur
|
8123590a1b
|
changes
|
2020-04-09 16:45:47 +02:00 |
|
nmeyer-ur
|
cd1efee866
|
changes
|
2020-04-09 16:35:13 +02:00 |
|
nmeyer-ur
|
bd310932f7
|
changes
|
2020-04-09 16:32:31 +02:00 |
|
nmeyer-ur
|
e252c1aca3
|
addressing
|
2020-04-09 15:03:12 +02:00 |
|
nmeyer-ur
|
b140c6a4f9
|
addressing
|
2020-04-09 15:01:15 +02:00 |
|
nmeyer-ur
|
326de36467
|
revised sU addressing scheme
|
2020-04-09 14:44:25 +02:00 |
|
nmeyer-ur
|
9f224a1647
|
fixed typo in single
|
2020-04-09 14:30:21 +02:00 |
|
nmeyer-ur
|
bb46ba9b5f
|
fixed array size in single
|
2020-04-09 14:28:45 +02:00 |
|
nmeyer-ur
|
dd5a22b36b
|
revised declarations
|
2020-04-09 14:21:27 +02:00 |
|
nmeyer-ur
|
1ea85b9972
|
Disabled build message
|
2020-04-09 13:47:21 +02:00 |
|
nmeyer-ur
|
8fb63f1c25
|
added A64FX Wilson kernels single precision
|
2020-04-09 13:41:04 +02:00 |
|
nmeyer-ur
|
77fa586f6c
|
introduced A64FX Wilson kernels
|
2020-04-09 13:30:06 +02:00 |
|
nmeyer-ur
|
15238e8d5e
|
reduce acle works, clean up
|
2020-04-03 20:40:44 +02:00 |
|
nmeyer-ur
|
b27e31957a
|
reduce acle revised
|
2020-04-03 19:46:15 +02:00 |
|
nmeyer-ur
|
46927771e3
|
reduce acle still needs overhaul
|
2020-04-03 19:30:48 +02:00 |
|
nmeyer-ur
|
d8cea77707
|
define simd width in header
|
2020-04-03 19:22:25 +02:00 |
|
nmeyer-ur
|
5f8a76d490
|
clean up, reduction in acle
|
2020-04-03 19:18:24 +02:00 |
|
nmeyer-ur
|
28d49a3b60
|
build problem resolved
|
2020-04-03 16:52:48 +02:00 |
|