nmeyer-ur
|
39b448affb
|
Merge remote-tracking branch 'origin/develop' into feature/a64fx-2
|
2020-04-22 17:34:12 +02:00 |
|
nils meyer
|
e54a8f05a9
|
Exchange1 with generic version for now, should use svtbl2 in final version
|
2020-04-20 22:45:27 +02:00 |
|
Peter Boyle
|
0782b76ed4
|
Merge pull request #274 from paboyle/feature/zmobius_paramcompute
ZMobius parameter computation
|
2020-04-20 14:39:29 -04:00 |
|
Christopher Kelly
|
0896f2cead
|
Added missing include guards in bigfloat_double.h
|
2020-04-20 10:30:38 -04:00 |
|
Christopher Kelly
|
181709bba4
|
Merge branch 'develop' into feature/zmobius_paramcompute
|
2020-04-20 09:12:34 -04:00 |
|
nils meyer
|
64b72fc17f
|
testing gcc 10.0.1: build errors in Exchange1 using -DA64FX and in Lattice_base.h building Dslash only
|
2020-04-19 01:25:40 +02:00 |
|
nils meyer
|
6fdce60492
|
revised BodyA64FX; 990 GiB/s Wilson, 687 GiB/s DW using intrinsics (armclang 20.0)
|
2020-04-16 22:43:32 +02:00 |
|
Peter Boyle
|
90229cfb0f
|
Merge pull request #270 from milc-qcd/feature/CGinfo
feature/CGinfo
|
2020-04-16 11:46:08 -04:00 |
|
Peter Boyle
|
0475c46ecb
|
Merge pull request #256 from djm2131/feature/BiCGSTAB
Import BiCGSTAB solvers and tests
|
2020-04-16 11:45:15 -04:00 |
|
Peter Boyle
|
3cca10e617
|
Merge pull request #276 from nils-asmussen/fix/regression_nt
fix regression in tests/core/Test_qed.cc
|
2020-04-16 11:42:39 -04:00 |
|
nils meyer
|
852db4626a
|
re-introduced HOTFIX cause Grid binaries give wrong results otherwise; checked in good gridverter.py
|
2020-04-15 18:22:19 +02:00 |
|
|
43dc2814dd
|
fix regression in core/Test_qed.cc
|
2020-04-15 16:10:15 +01:00 |
|
nils meyer
|
6504a098cc
|
999 GiB/s Wilson; 694 GiB/s DW (DP)
|
2020-04-15 15:06:52 +02:00 |
|
nils meyer
|
79a385faca
|
disabled armclang hotfix cause armclang 20.0 performance gets a little
|
2020-04-15 11:46:55 +02:00 |
|
nils meyer
|
c12a67030a
|
980 GiB/s Wilson; 680 GiB/s DW (DP)
|
2020-04-15 10:55:06 +02:00 |
|
nils meyer
|
581392f2f2
|
now with pf, best results so far using intrinsics+pf
|
2020-04-12 22:06:14 +02:00 |
|
nils meyer
|
113f277b6a
|
enable dslash asm using -DA64FXASM, additionaly -DDSLASHINTRIN for intrinsics impl
|
2020-04-11 04:55:01 +02:00 |
|
Peter Boyle
|
f3a8d039a2
|
Merge branch 'feature/hdcr' into develop
|
2020-04-10 22:01:52 -04:00 |
|
nils meyer
|
974586bedc
|
Dslash finally works; cleaned up; uses MOVPRFX in assembly
|
2020-04-10 22:26:40 +02:00 |
|
|
4e864e56c9
|
develop pull
|
2020-04-10 17:19:18 +01:00 |
|
Peter Boyle
|
014dbfa464
|
Compile fix with OpDirAll
|
2020-04-10 11:57:09 -04:00 |
|
Peter Boyle
|
3b0e07882f
|
Adding another form of polynomial
|
2020-04-10 11:28:33 -04:00 |
|
Peter Boyle
|
8e81a811d0
|
Merge branch 'feature/hdcr' into develop
|
2020-04-10 11:14:49 -04:00 |
|
Peter Boyle
|
aa13118127
|
Missing conjugate already fixed in develop
|
2020-04-10 11:11:24 -04:00 |
|
Peter Boyle
|
6cdb09c884
|
Faster copy region
|
2020-04-10 11:10:52 -04:00 |
|
Peter Boyle
|
a65bc64f10
|
Accelerator peek poke
|
2020-04-10 11:09:59 -04:00 |
|
Peter Boyle
|
11dec4883c
|
Don't throw assert
|
2020-04-10 11:09:11 -04:00 |
|
Peter Boyle
|
afa458c812
|
Extra solvers
|
2020-04-10 11:08:19 -04:00 |
|
Peter Boyle
|
dc50190b8f
|
Faster GPU basis rotation
May need to later include Regensburg optimised CPU variant
|
2020-04-10 11:06:04 -04:00 |
|
nmeyer-ur
|
160f78c1e4
|
changed debug output to variable direct 3
|
2020-04-10 12:23:07 +02:00 |
|
nmeyer-ur
|
7e4e1bbbc2
|
changed debug output to variable direct 2
|
2020-04-10 12:22:04 +02:00 |
|
nmeyer-ur
|
e699b7e9f9
|
changed debug output to variable direct
|
2020-04-10 12:18:30 +02:00 |
|
nmeyer-ur
|
a28bc0de90
|
debug register address test in WilsonHand
|
2020-04-10 12:07:45 +02:00 |
|
nmeyer-ur
|
14d0fe4d6c
|
added predication in WilsonHand
|
2020-04-10 12:04:00 +02:00 |
|
nmeyer-ur
|
0ad2e0815c
|
debug output in WilsonHand
|
2020-04-10 11:56:29 +02:00 |
|
nils meyer
|
1c8ca05e16
|
Merge branch 'feature/a64fx-2' of https://github.com/nmeyer-ur/Grid into feature/a64fx-2
|
2020-04-09 23:32:19 +02:00 |
|
nils meyer
|
dc9c8340bb
|
switched to DSLASHINTRIN for A64FX Dslash intrinsics
|
2020-04-09 23:30:23 +02:00 |
|
nils meyer
|
19eef97503
|
specialized A64FX Dslash kernels
|
2020-04-09 23:25:25 +02:00 |
|
nmeyer-ur
|
635246ce50
|
corrected typo
|
2020-04-09 21:42:50 +02:00 |
|
nils meyer
|
5cdbb7e71e
|
fixed A64FX Dslash; compiles, but does not specialize -> assertion
|
2020-04-09 21:23:39 +02:00 |
|
nmeyer-ur
|
8123590a1b
|
changes
|
2020-04-09 16:45:47 +02:00 |
|
nmeyer-ur
|
86c9c4da8b
|
changes
|
2020-04-09 16:40:06 +02:00 |
|
nmeyer-ur
|
cd1efee866
|
changes
|
2020-04-09 16:35:13 +02:00 |
|
nmeyer-ur
|
bd310932f7
|
changes
|
2020-04-09 16:32:31 +02:00 |
|
nmeyer-ur
|
304762e7ac
|
changes
|
2020-04-09 16:26:01 +02:00 |
|
nmeyer-ur
|
d79ab03a6c
|
changes
|
2020-04-09 16:19:25 +02:00 |
|
nmeyer-ur
|
d5708e0eb2
|
more changes
|
2020-04-09 15:43:34 +02:00 |
|
nmeyer-ur
|
123f6b7a61
|
more changes
|
2020-04-09 15:17:19 +02:00 |
|
nmeyer-ur
|
2b6457dd9a
|
added xp/xm recon accum
|
2020-04-09 15:13:19 +02:00 |
|
nmeyer-ur
|
b367cbd422
|
defined ADD_RESULT
|
2020-04-09 15:08:45 +02:00 |
|