1
0
mirror of https://github.com/paboyle/Grid.git synced 2025-04-03 18:55:56 +01:00

Commit Graph

  • 9feb801bb9 Much simpler GPU implementation Peter Boyle 2023-12-21 15:24:06 -05:00
  • c00b495933 Multigrid Peter Boyle 2023-12-21 15:23:31 -05:00
  • d22eebe553 BLas options Peter Boyle 2023-12-21 15:23:03 -05:00
  • 8bcbd82680 BLAS based layout and implementation Peter Boyle 2023-12-21 15:21:24 -05:00
  • dfa617c439 Batched SGEMM/DGEMM/ZGEMM/CGEMM Hip, Cuda version and vanilla CPU One MKL stub in comments, to be tested as different. Peter Boyle 2023-12-21 14:01:18 -05:00
  • 48d1f0df89 Optimised partially, working Peter Boyle 2023-12-21 12:33:47 -05:00
  • b75cb7a12c Blas batched partial implementation on Frontier only for now Peter Boyle 2023-12-21 12:31:33 -05:00
  • 332563e037 Debugged, reducing verbose Peter Boyle 2023-12-21 12:30:57 -05:00
  • 0cce97a4fe verbosity only Peter Boyle 2023-12-20 21:30:10 -05:00
  • 95a8e4be64 rocblas Peter Boyle 2023-12-20 21:27:59 -05:00
  • abcd6b8cb6 Faster version Peter Boyle 2023-12-19 15:17:46 -05:00
  • e8f21c9b6d Memmory verbose control improvement Peter Boyle 2023-12-19 15:16:58 -05:00
  • 026eb8a695 Wilson RMHMC main program Chulwoo Jung 2023-12-12 15:34:03 -05:00
  • 076580c232 Recovering mixed precision CG for Laplace Checking in to move to aurora Chulwoo Jung 2023-12-12 15:32:00 -05:00
  • f48298ad4e Bug fix Peter Boyle 2023-12-11 20:56:03 -05:00
  • 645e47c1ba Config for Ampere Altra ARM root 2023-12-08 16:17:56 -05:00
  • d1d9827263 Integrator logging update Peter Boyle 2023-12-08 12:11:03 -05:00
  • e054078b11 Verbose Peter Boyle 2023-12-05 16:15:17 -05:00
  • 7af6022a2a Added midMD checkpointing (for lattice only for now) Chulwoo Jung 2023-12-04 20:05:41 -05:00
  • 14643c0aab SDCC benchmarking scripts for A100 nodes and IceLake nodes (AVX512) Peter Boyle 2023-12-04 15:45:57 -05:00
  • b77a9b8947 SDDC compiles starting Peter Boyle 2023-11-30 14:31:51 -05:00
  • 6835a7f208 Better logging, test on 81 point stencil Peter Boyle 2023-11-29 19:20:47 -05:00
  • f59993b979 Nbasis§ Peter Boyle 2023-11-29 09:45:58 -05:00
  • 2290b8f680 Verbose Peter Boyle 2023-11-29 09:47:04 -05:00
  • 2c54be651c Further updates Peter Boyle 2023-11-29 09:43:29 -05:00
  • e859a199df Reduce volume to interior for coarse stencil -- worth up to 4x gain Peter Boyle 2023-11-28 10:23:16 -05:00
  • 0a3682ad0b MultiRHS work Peter Boyle 2023-11-28 07:43:37 -05:00
  • 59abaeb5cd Time stamp Peter Boyle 2023-11-24 12:56:45 -05:00
  • 3e448435d3 Restrict to interior Peter Boyle 2023-11-23 18:23:29 -05:00
  • a294bc3c5b Relax constraints for multiRHS Peter Boyle 2023-11-23 18:20:42 -05:00
  • b302ad3d49 multiRHS test in place, passes Yay! Peter Boyle 2023-11-23 18:20:15 -05:00
  • 82fc4b1e94 Finalise Peter Boyle 2023-11-23 18:19:41 -05:00
  • b4f1740380 Finalise message Peter Boyle 2023-11-23 18:19:16 -05:00
  • 031f85247c multRHS initial support -- needs optimisation for multi project/promote. Bug fix in freeing intermediate grids to stop double free Peter Boyle 2023-11-23 18:18:35 -05:00
  • 639cc6f73a better support for multiRHS coarse space Still to add restriction of domain of last loop to interior of padded cell (expect about 4.5x on test volume on Crusher) Peter Boyle 2023-11-23 18:16:26 -05:00
  • 982a60536c Checking in before forking Chulwoo Jung 2023-11-22 16:33:15 -05:00
  • dc36d272ce Gauge RMHMC conserving dH Chulwoo Jung 2023-11-21 13:48:51 -05:00
  • 09946cf1ba Improved, works on 48^3 moving to multiRHS optimisations Peter Boyle 2023-11-15 18:03:05 -05:00
  • f4fa95e7cb Use 5.3.0 Peter Boyle 2023-11-15 18:01:38 -05:00
  • 100e29e35e Allow expression as argument to norm2 Peter Boyle 2023-11-15 18:00:44 -05:00
  • 4cbe471a83 devVector Peter Boyle 2023-11-15 18:00:07 -05:00
  • 8bece1f861 Faster to transpose the matrix and apply with column major order Peter Boyle 2023-11-15 17:58:38 -05:00
  • a3ca71ec01 Lots more setup options, still working on them Peter Boyle 2023-11-15 17:58:04 -05:00
  • e0543e8af5 Implement flexible preconditioned CG Peter Boyle 2023-11-15 17:57:39 -05:00
  • c1eb80d01a Print which have converged Peter Boyle 2023-11-15 17:57:08 -05:00
  • a26121d97b Better printing Peter Boyle 2023-11-15 17:56:45 -05:00
  • 043031a757 Report resid on failed convergence Peter Boyle 2023-11-15 17:56:22 -05:00
  • 807aeebe4c Resize tol in constructor Peter Boyle 2023-11-15 17:55:57 -05:00
  • 8aa1a37aad For Mirs preconditioner solver Peter Boyle 2023-11-15 17:55:32 -05:00
  • 515ff6bf62 Added Laplacian metric, Gauge OpenBC Chulwoo Jung 2023-11-09 21:42:46 -05:00
  • 7d077fe493 Frontier compiel Peter Boyle 2023-11-09 13:58:44 -05:00
  • 9cd4128833 fix naik bug david clarke 2023-11-03 14:11:38 -06:00
  • c8b17c9526 Naik to CShift david clarke 2023-11-02 12:43:22 -06:00
  • 2ae2a81e85 attempt to fix Naik david clarke 2023-10-31 13:54:55 -06:00
  • 69c869d345 fixed stupid typo david clarke 2023-10-30 17:41:52 -06:00
  • df9b958c40 naik now returns separately david clarke 2023-10-30 17:40:53 -06:00
  • 3d3376d1a3 LePage works, trying Naik david clarke 2023-10-27 16:26:31 -06:00
  • 4efa042f50 C++17 change Peter Boyle 2023-10-24 10:57:50 -04:00
  • c7cb37e970 c++17 accepted Peter Boyle 2023-10-24 10:57:24 -04:00
  • d34b207eab Avoid HIP warnings Peter Boyle 2023-10-24 10:57:04 -04:00
  • 0e6fa6f6b8 DOn't need the Cshift for the period optimisation Peter Boyle 2023-10-24 10:56:31 -04:00
  • 38b87de53f This works around a stacksize limit on AMD GPU Peter Boyle 2023-10-24 10:56:07 -04:00
  • aa5047a9e4 Faster blockProject blockPromote Peter Boyle 2023-10-24 10:49:55 -04:00
  • 24b6ee0df9 M4 file Peter Boyle 2023-10-24 10:36:48 -04:00
  • 1e79cc9cbe Avoid compiler error Peter Boyle 2023-10-24 10:36:09 -04:00
  • b3925df9c3 Verbose on CPU-GPU xfer, remove performance by default Peter Boyle 2023-10-24 10:25:01 -04:00
  • f2648e94b9 getHostPointer added to Lattice Christoph Lehner 2023-10-23 13:47:41 +02:00
  • 351795ac3a Better messaging Peter Boyle 2023-10-20 19:33:04 -04:00
  • 9c9c42d0df Tests on frontier with real speed up . 3.5x on 16^3 at mq=0.01 Peter Boyle 2023-10-20 19:25:39 -04:00
  • b6ad1bafc7 Normal memory SendToRecvFrom asynchronous for use in general stencil code Peter Boyle 2023-10-20 19:24:38 -04:00
  • a5ca40f446 Better verbose -- track CPU GPU motion under --log Memory, others go to debug output stream Peter Boyle 2023-10-20 19:23:50 -04:00
  • 9ab54c5565 Overlap comms & data copy/buffer assembly in Ghost zone exchange Peter Boyle 2023-10-20 19:23:00 -04:00
  • 4341d96bde Massively sped up coarse grid mult, comms Save 3ms spend (60% of time !) on cudaMalloc !! Peter Boyle 2023-10-20 19:21:48 -04:00
  • 5fac47a26d Faster halo exchange Peter Boyle 2023-10-19 18:16:47 -04:00
  • e064f17346 Faster halo exchange Peter Boyle 2023-10-19 18:16:23 -04:00
  • afe10ba2a2 More digits Peter Boyle 2023-10-18 22:42:58 -04:00
  • 7cc3435ba8 Imporved General coarsened matrix Peter Boyle 2023-10-18 22:41:53 -04:00
  • 541772313c Verbosity Peter Boyle 2023-10-18 22:40:53 -04:00
  • 3747494a09 Notify delet public Peter Boyle 2023-10-18 22:40:22 -04:00
  • f2b98d0dcc Const safety Peter Boyle 2023-10-18 22:38:12 -04:00
  • 80471bf762 Alternate implementation involving face operations Peter Boyle 2023-10-18 22:37:14 -04:00
  • a06f63c110 Improved I/O and non-lexico option exposed to SciDAC format Peter Boyle 2023-10-18 22:36:39 -04:00
  • 0ae4478cd9 Checkpoint the subspace and ldop Peter Boyle 2023-10-18 22:35:50 -04:00
  • ae4e705e09 Use random vec as easier for debug Peter Boyle 2023-10-18 22:34:21 -04:00
  • f5dcea9dbf Updates for Frontier Peter Boyle 2023-10-10 01:33:36 -04:00
  • 21ed6ac0f4 added floating-point support david clarke 2023-10-20 13:54:26 -06:00
  • 7bb8ab7000 improve smearing templating david clarke 2023-10-20 08:41:02 -06:00
  • 2c824c2641 Merge branch 'develop' into hisq_fat_links david clarke 2023-10-17 16:03:59 -06:00
  • 391fd9cc6a try lepage term david clarke 2023-10-17 14:57:15 -06:00
  • 2207309f8a Spack rules Peter Boyle 2023-10-16 18:38:24 -04:00
  • 51051df62c 3GeV run setup Peter Boyle 2023-10-16 20:49:52 +03:00
  • 33097681b9 FTHMC compiled and merged to develop Peter Boyle 2023-10-14 00:42:55 +03:00
  • 07e4900218 FTHMC commit Peter Boyle 2023-10-13 18:20:43 +03:00
  • 36ab567d67 FTHMC 3 Gev Peter Boyle 2023-10-13 17:58:48 +03:00
  • e19171523b FTHMC Status at lattice conference commit Peter Boyle 2023-10-13 17:57:56 +03:00
  • 9626a2c7c0 Asynch handling Peter Boyle 2023-10-13 17:57:20 +03:00
  • e936f5b80b IfGridTensor shorthand Peter Boyle 2023-10-13 17:56:47 +03:00
  • ffc0639cb9 Running in HMC tests Peter Boyle 2023-10-13 17:55:27 +03:00
  • c5b43b322c traceProduct eliminates non-contributing intermediate terms Peter Boyle 2023-10-13 17:53:58 +03:00
  • c9c4576237 Improved frontier cshift Peter Boyle 2023-10-13 17:46:07 +03:00