nmeyer-ur
|
fd3c8b0e85
|
correct build instructions qp4
|
2020-07-01 09:00:38 +02:00 |
|
nmeyer-ur
|
1635c263ee
|
disable TOFU by default
|
2020-06-30 19:27:08 +02:00 |
|
nmeyer-ur
|
a87e45ba25
|
SVE readme update
|
2020-06-18 11:23:08 +02:00 |
|
nmeyer-ur
|
465856331a
|
switch back to serialized; wrong results on single too
|
2020-06-15 15:39:39 +02:00 |
|
nmeyer-ur
|
cc958aa9ed
|
switch back to standard MPI_init due to wrong results in Benchmark_wilson using comms-overlap
|
2020-06-15 14:21:38 +02:00 |
|
nmeyer-ur
|
a25e4b3d0c
|
pred 32/64 for float/double instead of 8 in VLA patch
|
2020-06-13 14:44:37 +02:00 |
|
nmeyer-ur
|
d1210ca12a
|
switch to double/float instead of float64_t/float32_t in VLA patch
|
2020-06-13 13:59:32 +02:00 |
|
nmeyer-ur
|
36ea0e222a
|
type traits for ComplexF/D in VLA patch; cosmetics in VLS intrinsics
|
2020-06-13 13:42:35 +02:00 |
|
nmeyer-ur
|
92281ec22d
|
add 3 op Mult for VLA
|
2020-06-12 18:49:05 +02:00 |
|
nmeyer-ur
|
87266ce099
|
comment out fcmla in vector types: need also MultAddReal
|
2020-06-12 18:37:19 +02:00 |
|
nmeyer-ur
|
2a23f133e8
|
reenable fcmla for VLA
|
2020-06-12 17:30:38 +02:00 |
|
nmeyer-ur
|
8dbf790f62
|
correct tbl2 for sp
|
2020-06-12 17:12:34 +02:00 |
|
nmeyer-ur
|
2402b4940e
|
vec_imm in float
|
2020-06-12 15:17:38 +02:00 |
|
nmeyer-ur
|
2111052fbe
|
apply VLA patch for memcpy reduction suggested by Arm, CAS-162542-D6W7Z7
|
2020-06-12 14:49:19 +02:00 |
|
nmeyer-ur
|
433766ac62
|
revert Add/SubTimesI and prefetching in stencil
This reverts commit 9b2699226c .
|
2020-06-08 12:02:53 +02:00 |
|
nmeyer-ur
|
93a37c8f68
|
test prefetch to L2 in stencil
|
2020-06-08 09:39:50 +02:00 |
|
nmeyer-ur
|
9872c76825
|
introduce AddTimesI and SubTimesI; slight benefit in operators, but < 1%; breaks all other impls
|
2020-06-03 15:20:13 +02:00 |
|
nmeyer-ur
|
5ee3ea2144
|
round-up after testing of prefetches in stencil close
|
2020-06-03 11:58:20 +02:00 |
|
nmeyer-ur
|
5050833b42
|
revert changes due to performance penalty in Wilson using MPI
|
2020-06-02 13:08:57 +02:00 |
|
nmeyer-ur
|
7bee4ebb54
|
correct predication for svcadd
|
2020-06-02 10:51:39 +02:00 |
|
nmeyer-ur
|
71cf9851e7
|
correct type for vecd in TimesI and TimesMinusI
|
2020-06-02 10:44:15 +02:00 |
|
nmeyer-ur
|
b4735c9904
|
correct zero in svcadd
|
2020-06-02 10:38:05 +02:00 |
|
nmeyer-ur
|
9b2699226c
|
use fcadd in TimesI and TimesMinusI instead of tbl and neg
|
2020-06-02 10:32:44 +02:00 |
|
nmeyer-ur
|
5f52804907
|
update calculation of data
|
2020-05-30 10:55:17 +02:00 |
|
nmeyer-ur
|
936071773e
|
correct throughput in wilson and dwf
|
2020-05-29 22:15:59 +02:00 |
|
nmeyer-ur
|
1732f9319e
|
more mods; counters seem to work correctly
|
2020-05-29 18:44:00 +02:00 |
|
nmeyer-ur
|
91c81cab30
|
some corrections; compiles on my laptop; untested
|
2020-05-29 18:19:22 +02:00 |
|
nmeyer-ur
|
38164f8480
|
include counters in WilsonFermionImplementation.h
|
2020-05-29 17:59:26 +02:00 |
|
nmeyer-ur
|
f013979791
|
add counter support in WilsonFermion.h
|
2020-05-29 17:13:59 +02:00 |
|
nmeyer-ur
|
e947b563ea
|
add space in stencil output
|
2020-05-29 17:11:17 +02:00 |
|
nmeyer-ur
|
5cb3530c34
|
enable counters in Benchmark_wilson
|
2020-05-29 15:44:52 +02:00 |
|
nmeyer-ur
|
250008372f
|
update SVE readme
|
2020-05-29 15:44:25 +02:00 |
|
nmeyer-ur
|
4fedd8d29f
|
switch to MPI_THREAD_SERIALIZED instead of SINGLE
|
2020-05-27 14:08:34 +02:00 |
|
nmeyer-ur
|
6ddcef1bca
|
fix build error enabling fcmla/mac in vector types for VLA
|
2020-05-21 21:21:03 +02:00 |
|
nmeyer-ur
|
8c5a5fdfce
|
disable fcmla in vector type building for VLA
|
2020-05-21 19:41:42 +02:00 |
|
nmeyer-ur
|
046b1cbbc0
|
enable fcmla in tensor arithmetics; fixed-size works, VLA does not compile
|
2020-05-21 19:39:07 +02:00 |
|
nmeyer-ur
|
a65ce237c1
|
clean up; Exch1 VLA sp+dp integrate, tested, working
|
2020-05-21 09:48:06 +02:00 |
|
nmeyer-ur
|
cd27f1005d
|
clean up; Exch1 sp integrate, tested, working
|
2020-05-21 08:45:43 +02:00 |
|
nmeyer-ur
|
f8c0a59221
|
clean up; Exch1 dp integrate, tested, working
|
2020-05-21 02:48:14 +02:00 |
|
nmeyer-ur
|
832485699f
|
save some cycles in HtoD and DtoH by direct instead of multi-pass conversion
|
2020-05-20 23:04:35 +02:00 |
|
nmeyer-ur
|
81484a4760
|
symmetrize Mult and MultAddComplex
|
2020-05-20 22:36:45 +02:00 |
|
nmeyer-ur
|
9a86059761
|
symmetrize VLA and fixed size build messages
|
2020-05-20 20:05:42 +02:00 |
|
nmeyer-ur
|
b780b7b7a0
|
guard prevents multiple TOFU messages
|
2020-05-20 19:20:59 +02:00 |
|
nmeyer-ur
|
9e085bd04e
|
guard prevents multiple A64FX build messages
|
2020-05-20 19:16:30 +02:00 |
|
nmeyer-ur
|
6b6bf537d3
|
comment out mac in vector types
|
2020-05-18 20:36:16 +02:00 |
|
nmeyer-ur
|
323a651c71
|
correct typo
|
2020-05-18 19:58:27 +02:00 |
|
nmeyer-ur
|
9f212679f1
|
support fcmla in vector_types, untested
|
2020-05-18 19:55:18 +02:00 |
|
nmeyer-ur
|
032f7dde1a
|
update SVE readme, asm generator
|
2020-05-18 19:10:36 +02:00 |
|
nmeyer-ur
|
50b1db1e8b
|
implemented correct _m form (using 3 operands instead of 2)
|
2020-05-15 10:01:05 +02:00 |
|
nmeyer-ur
|
015d8bb38a
|
introduced assertions in Benchmark_wilson, removed data output from Benchmark_dwf
|
2020-05-15 09:15:50 +02:00 |
|