1
0
mirror of https://github.com/paboyle/Grid.git synced 2026-04-28 22:46:00 +01:00

Commit Graph

  • a65bc64f10 Accelerator peek poke Peter Boyle 2020-04-10 11:09:59 -04:00
  • 11dec4883c Don't throw assert Peter Boyle 2020-04-10 11:09:11 -04:00
  • afa458c812 Extra solvers Peter Boyle 2020-04-10 11:08:19 -04:00
  • dc50190b8f Faster GPU basis rotation May need to later include Regensburg optimised CPU variant Peter Boyle 2020-04-10 11:06:04 -04:00
  • 160f78c1e4 changed debug output to variable direct 3 nmeyer-ur 2020-04-10 12:23:07 +02:00
  • 7e4e1bbbc2 changed debug output to variable direct 2 nmeyer-ur 2020-04-10 12:22:04 +02:00
  • e699b7e9f9 changed debug output to variable direct nmeyer-ur 2020-04-10 12:18:30 +02:00
  • a28bc0de90 debug register address test in WilsonHand nmeyer-ur 2020-04-10 12:07:45 +02:00
  • 14d0fe4d6c added predication in WilsonHand nmeyer-ur 2020-04-10 12:04:00 +02:00
  • 0ad2e0815c debug output in WilsonHand nmeyer-ur 2020-04-10 11:56:29 +02:00
  • 1c8ca05e16 Merge branch 'feature/a64fx-2' of https://github.com/nmeyer-ur/Grid into feature/a64fx-2 nils meyer 2020-04-09 23:32:19 +02:00
  • dc9c8340bb switched to DSLASHINTRIN for A64FX Dslash intrinsics nils meyer 2020-04-09 23:30:23 +02:00
  • 19eef97503 specialized A64FX Dslash kernels nils meyer 2020-04-09 23:25:25 +02:00
  • 635246ce50 corrected typo nmeyer-ur 2020-04-09 21:42:50 +02:00
  • 5cdbb7e71e fixed A64FX Dslash; compiles, but does not specialize -> assertion nils meyer 2020-04-09 21:23:39 +02:00
  • 8123590a1b changes nmeyer-ur 2020-04-09 16:45:47 +02:00
  • 86c9c4da8b changes nmeyer-ur 2020-04-09 16:40:06 +02:00
  • cd1efee866 changes nmeyer-ur 2020-04-09 16:35:13 +02:00
  • bd310932f7 changes nmeyer-ur 2020-04-09 16:32:31 +02:00
  • 304762e7ac changes nmeyer-ur 2020-04-09 16:26:01 +02:00
  • d79ab03a6c changes nmeyer-ur 2020-04-09 16:19:25 +02:00
  • d5708e0eb2 more changes nmeyer-ur 2020-04-09 15:43:34 +02:00
  • 123f6b7a61 more changes nmeyer-ur 2020-04-09 15:17:19 +02:00
  • 2b6457dd9a added xp/xm recon accum nmeyer-ur 2020-04-09 15:13:19 +02:00
  • b367cbd422 defined ADD_RESULT nmeyer-ur 2020-04-09 15:08:45 +02:00
  • e252c1aca3 addressing nmeyer-ur 2020-04-09 15:03:12 +02:00
  • b140c6a4f9 addressing nmeyer-ur 2020-04-09 15:01:15 +02:00
  • 326de36467 revised sU addressing scheme nmeyer-ur 2020-04-09 14:44:25 +02:00
  • 9f224a1647 fixed typo in single nmeyer-ur 2020-04-09 14:30:21 +02:00
  • bb46ba9b5f fixed array size in single nmeyer-ur 2020-04-09 14:28:45 +02:00
  • dd5a22b36b revised declarations nmeyer-ur 2020-04-09 14:21:27 +02:00
  • 1ea85b9972 Disabled build message nmeyer-ur 2020-04-09 13:47:21 +02:00
  • 8fb63f1c25 added A64FX Wilson kernels single precision nmeyer-ur 2020-04-09 13:41:04 +02:00
  • 77fa586f6c introduced A64FX Wilson kernels nmeyer-ur 2020-04-09 13:30:06 +02:00
  • 96e8e44fd4 Merge pull request #2 from DanielRichtmann/feature/fused-innerproduct-norm2 Christoph Lehner 2020-04-06 13:16:58 +02:00
  • 5fc8a273e7 Fused innerProduct + norm2 on first argument operation Daniel Richtmann 2020-04-06 11:30:50 +02:00
  • d671a63e78 Update README.md portelli 2020-04-03 19:52:15 +01:00
  • 15238e8d5e reduce acle works, clean up nmeyer-ur 2020-04-03 20:40:44 +02:00
  • b27e31957a reduce acle revised nmeyer-ur 2020-04-03 19:46:15 +02:00
  • 46927771e3 reduce acle still needs overhaul nmeyer-ur 2020-04-03 19:30:48 +02:00
  • d8cea77707 define simd width in header nmeyer-ur 2020-04-03 19:22:25 +02:00
  • 5f8a76d490 clean up, reduction in acle nmeyer-ur 2020-04-03 19:18:24 +02:00
  • 28d49a3b60 build problem resolved nmeyer-ur 2020-04-03 16:52:48 +02:00
  • b4c624ece6 added A64FX support nmeyer-ur 2020-04-03 15:43:23 +02:00
  • 2c22db841a Added momentum scaling to scalar HMC theories in order to follow UKQCD/CPS conventions h.b.rocha 2020-04-02 17:38:47 +01:00
  • b89b1280d5 use gemm twice to complete the Gram Schmidt Yong-Chull Jang 2020-03-31 05:39:31 -04:00
  • ac7090e6d3 block Lanczos cublas buffer is set at the inital step; buffer width is fixed to the block size then cublas Zgemm is called multiple times Yong-Chull Jang 2020-03-30 22:25:50 -04:00
  • 02edbe624f first working version of Gram Schmidt using cublas gemm; explicit data type and site vector size has to be removed Yong-Chull Jang 2020-03-30 18:36:21 -04:00
  • 856d168e41 global sum over vectors of uint64_t Christoph Lehner 2020-03-29 07:56:05 -04:00
  • 6235c7ba98 IPP path fix in configure portelli 2020-03-27 17:23:29 +00:00
  • 7e13724882 removing Hadrons portelli 2020-03-27 12:03:32 +00:00
  • b6cbdd2aa3 Merge pull request #1 from DanielRichtmann/feature/read-openqcd Christoph Lehner 2020-03-26 17:39:04 +01:00
  • a2188ea875 remove debugging printf from WilsonKernelsImplementation Christoph Lehner 2020-03-26 09:12:36 -04:00
  • 9266b89ad8 fix rngs issue; block Lanczos is working Yong-Chull Jang 2020-03-25 15:45:50 -04:00
  • 989af65807 Check in parallel reader for openqcd configs Daniel Richtmann 2020-03-23 17:33:18 +01:00
  • 2db7e6f8ab merge manually Block Lanczos files from Chulwoo's update (last state = commit 731a05 + untracked files) to develop branch; namespace QCD is removed; FIXME: multiple starting vectors result in nan after initial orthogonalization Yong-Chull Jang 2020-03-24 01:03:24 -04:00
  • 60db3133d3 make trace,adj,transpose unary operators Christoph Lehner 2020-03-16 17:59:56 -04:00
  • c9b737a4e7 make trace,adj,transpose unary operators Christoph Lehner 2020-03-16 17:58:30 -04:00
  • 037bb6ea73 Check in reader for openqcd configs Daniel Richtmann 2020-03-16 14:07:52 +01:00
  • 05ebc458e2 Merge pull request #260 from mmphys/feature/distil portelli 2020-03-13 14:00:21 +00:00
  • 3753508957 Making change 1) as simple as possible 2) as much like MSink/Point.hpp as possible Michael Marshall 2020-03-12 13:47:51 +00:00
  • c1677fccf6 Merge branch 'develop' into feature/distil Michael Marshall 2020-03-12 12:45:18 +00:00
  • 35e8e31749 Merge pull request #272 from mmphys/feature/ZPeramb portelli 2020-03-12 12:28:04 +00:00
  • 34813e9b04 Merge branch 'develop' into feature/ZPeramb portelli 2020-03-12 12:27:56 +00:00
  • 373cf61abb bugfix ZPerambulator Felix Erben 2020-03-12 11:44:43 +00:00
  • 4e8fbc4b49 Merge pull request #271 from mmphys/feature/ZDistil portelli 2020-03-12 10:54:07 +00:00
  • 516ac1d4d5 registered module supporting ZMobius action ferben 2020-03-12 10:52:27 +00:00
  • 318f63eb34 Merge pull request #268 from mmphys/a2a-error-log portelli 2020-03-11 11:09:00 +00:00
  • 16503d7532 Merge pull request #267 from mmphys/feature/distil-bugfix portelli 2020-03-11 11:08:23 +00:00
  • 0fa93383b7 changed to push_back according to request ferben 2020-03-11 09:05:01 +00:00
  • 0a827aa7bf Added Hadrons_Error in case blockSize is set too large ferben 2020-03-11 08:52:52 +00:00
  • 165c68e28e Change TrueResiduals to TrueResidualShift and IterationsToComplete to IterationsToCompleteShift Carleton DeTar 2020-02-29 17:51:51 -06:00
  • b32b1ca642 bugfix in perambulator module ferben 2020-02-26 12:06:45 +00:00
  • 9479bc8486 Make IterationsToComplete and TrueResidual externally accessible Carleton DeTar 2020-02-19 17:43:57 -06:00
  • 8a5c13d5fb Still fast moving in changes Peter Boyle 2020-02-06 17:57:26 -05:00
  • bdccb0c91f Working 2 types of decomposition Peter Boyle 2020-02-06 17:26:55 -05:00
  • 68b45f6444 Lower left/upper right region cut paste Peter Boyle 2020-02-06 15:50:26 -05:00
  • ef9b3e658a extra typedef Peter Boyle 2020-02-06 15:47:14 -05:00
  • b9ca40cc44 More precise power method at start Peter Boyle 2020-02-06 10:09:14 -05:00
  • 2f421a5db1 Commeent fix Peter Boyle 2020-02-06 10:08:27 -05:00
  • 10192dfc71 Wall source momenta must be specified for spatial components only. So we don't break existing scripts, allow momentum in time direction as well, but only if zero. Fail early, so do the check in setup() Michael Marshall 2020-01-31 15:02:03 +00:00
  • c69a3b6ef6 When saving eigenvectors, LapEvec now saves eigenvalues for every timeslice as well. I.e. nT x nVec eigenvalues are saved in FileName.evals.conf.h5. A new named tensor, "TimesliceEvals" can be used to simplify restoring these from disk. NB: The changes in BaseIO add support so that Eigen tensors can be easily used in MPI operations, e.g. GlobalSum. See LapEvec.hpp for an example of how this is done. Michael Marshall 2020-01-29 21:20:20 +00:00
  • 852fc1b001 True Hierachical multigrid for DWF Peter Boyle 2020-01-27 13:45:10 -05:00
  • 2b5de5bba5 MdagM operator without norm option Peter Boyle 2020-01-27 13:44:30 -05:00
  • 2e85cae74e Add Jacobi polynomials Peter Boyle 2020-01-27 13:43:49 -05:00
  • 76c823781e Much faster coarsening Peter Boyle 2020-01-27 13:43:19 -05:00
  • 114db3b99d Optional MdagM without norms Peter Boyle 2020-01-27 13:42:51 -05:00
  • 49e123dbda Use explicit linalg calls to get coalesce optimisations on GPU Peter Boyle 2020-01-27 12:44:51 -05:00
  • 8cec294ec9 Make CG a bit less verbose as gettign annoying in nested algorithms. Can use Iterative logging if you want to see more Peter Boyle 2020-01-27 12:44:04 -05:00
  • eb5b720e94 Normal Equations can be used in HDCR now Peter Boyle 2020-01-27 12:43:29 -05:00
  • b2736ec80b Make PrecGCR recursive - it can precondition itself Peter Boyle 2020-01-27 12:42:48 -05:00
  • 086256a032 Less sloppy convergence test on PowerMethod Peter Boyle 2020-01-27 12:41:59 -05:00
  • afc7426f39 Much bigger pointer cache in case of Nvidia due to cost of setting up UVM allocations Peter Boyle 2020-01-27 12:41:16 -05:00
  • 7c061e20c9 All directions of dirac operator for fastt coarsening Peter Boyle 2020-01-27 12:40:13 -05:00
  • e5d1c09665 Faster DhopDirAll for little dirac operator coarsening Peter Boyle 2020-01-27 12:38:54 -05:00
  • 8016a465ae Remove extraneous variable Peter Boyle 2020-01-27 12:35:37 -05:00
  • d8b9742092 DhopDirAll for faster matrix elements of little Dirac operator Peter Boyle 2020-01-27 12:34:54 -05:00
  • 1bd87c35d7 Read coalescing on Nvidia Peter Boyle 2020-01-27 12:29:56 -05:00
  • fa856c9669 Disable information message Peter Boyle 2020-01-27 12:28:46 -05:00
  • 48008e4d8b Thread coordinate creation loop Peter Boyle 2020-01-27 12:28:16 -05:00