1
0
mirror of https://github.com/paboyle/Grid.git synced 2025-09-20 02:01:05 +01:00

Commit Graph

  • f5b3d582b0 first attempt at U3 projection david clarke 2024-01-22 02:49:40 -07:00
  • 981c93d67a update Test_fatLinks to accept Naik david clarke 2024-01-21 21:09:19 -07:00
  • c020b78e02 Merge branch 'develop' into hisq_fat_links david clarke 2024-01-21 20:21:08 -07:00
  • 42ae36bc28 WOrking Peter Boyle 2024-01-17 16:39:14 -05:00
  • c69f73ff9f Working Peter Boyle 2024-01-17 16:38:46 -05:00
  • ca5ae8a2e6 Revert to working. Peter Boyle 2024-01-17 16:32:05 -05:00
  • d967eb53de Working for first time Peter Boyle 2024-01-17 16:31:12 -05:00
  • 839f9f1bbe Don't log memory by default Peter Boyle 2024-01-17 16:25:50 -05:00
  • b754a152c6 Flag guard correctly Peter Boyle 2024-01-17 16:25:28 -05:00
  • e07cb2b9de Accelerator memory Peter Boyle 2024-01-17 16:24:31 -05:00
  • a1f8bbb078 accelerator memory print Peter Boyle 2024-01-17 16:24:09 -05:00
  • 7909683f3b MultiRHS Peter Boyle 2024-01-17 16:21:07 -05:00
  • 25f71913b7 MultiRHS coarse Peter Boyle 2024-01-04 12:01:17 -05:00
  • 34ddd2b7b1 MultiRHS coarse space Peter Boyle 2024-01-04 12:00:53 -05:00
  • d5fd90b2f3 Add 48^3 rtest Peter Boyle 2024-01-04 12:00:01 -05:00
  • b7c7000d0d Don't need the numerical rounding tolerance in multigrid Peter Boyle 2023-12-22 18:10:23 -05:00
  • 551f6c4edd Synchronise changes Peter Boyle 2023-12-22 18:09:11 -05:00
  • defd814750 Speed up the coarsened matrix matrix evaluation. It is block project limited. Could be sped up with calls to Batched GEMM and a data layout change. Peter Boyle 2023-12-22 18:07:03 -05:00
  • 3d517bbd2a Synchronise decouple from the launch Speeds up multileg stencils Peter Boyle 2023-12-22 18:06:13 -05:00
  • 78ab955fec Better padded cell exchange Peter Boyle 2023-12-22 18:05:41 -05:00
  • dd13937bb6 Better opt face gather scatter Peter Boyle 2023-12-22 18:03:38 -05:00
  • 66a1b63aa9 Faster grid/blas layout change. Halo exchange is now the only slow part. Revisit Peter Boyle 2023-12-21 20:50:18 -05:00
  • 22c611bd1a Delete temp file Peter Boyle 2023-12-21 18:32:31 -05:00
  • c9bb1bf8ea Passing new BLAs based Peter Boyle 2023-12-21 18:31:17 -05:00
  • 2a0d75bac2 Aurora files Peter Boyle 2023-12-21 23:19:11 +00:00
  • 9e489887cf General coarse multiRHS move to BLAS implementation Peter Boyle 2023-12-21 15:24:48 -05:00
  • 9feb801bb9 Much simpler GPU implementation Peter Boyle 2023-12-21 15:24:06 -05:00
  • c00b495933 Multigrid Peter Boyle 2023-12-21 15:23:31 -05:00
  • d22eebe553 BLas options Peter Boyle 2023-12-21 15:23:03 -05:00
  • 8bcbd82680 BLAS based layout and implementation Peter Boyle 2023-12-21 15:21:24 -05:00
  • dfa617c439 Batched SGEMM/DGEMM/ZGEMM/CGEMM Hip, Cuda version and vanilla CPU One MKL stub in comments, to be tested as different. Peter Boyle 2023-12-21 14:01:18 -05:00
  • 48d1f0df89 Optimised partially, working Peter Boyle 2023-12-21 12:33:47 -05:00
  • b75cb7a12c Blas batched partial implementation on Frontier only for now Peter Boyle 2023-12-21 12:31:33 -05:00
  • 332563e037 Debugged, reducing verbose Peter Boyle 2023-12-21 12:30:57 -05:00
  • 0cce97a4fe verbosity only Peter Boyle 2023-12-20 21:30:10 -05:00
  • 95a8e4be64 rocblas Peter Boyle 2023-12-20 21:27:59 -05:00
  • abcd6b8cb6 Faster version Peter Boyle 2023-12-19 15:17:46 -05:00
  • e8f21c9b6d Memmory verbose control improvement Peter Boyle 2023-12-19 15:16:58 -05:00
  • 026eb8a695 Wilson RMHMC main program Chulwoo Jung 2023-12-12 15:34:03 -05:00
  • 076580c232 Recovering mixed precision CG for Laplace Checking in to move to aurora Chulwoo Jung 2023-12-12 15:32:00 -05:00
  • f48298ad4e Bug fix Peter Boyle 2023-12-11 20:56:03 -05:00
  • 645e47c1ba Config for Ampere Altra ARM root 2023-12-08 16:17:56 -05:00
  • d1d9827263 Integrator logging update Peter Boyle 2023-12-08 12:11:03 -05:00
  • e054078b11 Verbose Peter Boyle 2023-12-05 16:15:17 -05:00
  • 7af6022a2a Added midMD checkpointing (for lattice only for now) Chulwoo Jung 2023-12-04 20:05:41 -05:00
  • 14643c0aab SDCC benchmarking scripts for A100 nodes and IceLake nodes (AVX512) Peter Boyle 2023-12-04 15:45:57 -05:00
  • b77a9b8947 SDDC compiles starting Peter Boyle 2023-11-30 14:31:51 -05:00
  • 6835a7f208 Better logging, test on 81 point stencil Peter Boyle 2023-11-29 19:20:47 -05:00
  • f59993b979 Nbasis§ Peter Boyle 2023-11-29 09:45:58 -05:00
  • 2290b8f680 Verbose Peter Boyle 2023-11-29 09:47:04 -05:00
  • 2c54be651c Further updates Peter Boyle 2023-11-29 09:43:29 -05:00
  • e859a199df Reduce volume to interior for coarse stencil -- worth up to 4x gain Peter Boyle 2023-11-28 10:23:16 -05:00
  • 0a3682ad0b MultiRHS work Peter Boyle 2023-11-28 07:43:37 -05:00
  • 59abaeb5cd Time stamp Peter Boyle 2023-11-24 12:56:45 -05:00
  • 3e448435d3 Restrict to interior Peter Boyle 2023-11-23 18:23:29 -05:00
  • a294bc3c5b Relax constraints for multiRHS Peter Boyle 2023-11-23 18:20:42 -05:00
  • b302ad3d49 multiRHS test in place, passes Yay! Peter Boyle 2023-11-23 18:20:15 -05:00
  • 82fc4b1e94 Finalise Peter Boyle 2023-11-23 18:19:41 -05:00
  • b4f1740380 Finalise message Peter Boyle 2023-11-23 18:19:16 -05:00
  • 031f85247c multRHS initial support -- needs optimisation for multi project/promote. Bug fix in freeing intermediate grids to stop double free Peter Boyle 2023-11-23 18:18:35 -05:00
  • 639cc6f73a better support for multiRHS coarse space Still to add restriction of domain of last loop to interior of padded cell (expect about 4.5x on test volume on Crusher) Peter Boyle 2023-11-23 18:16:26 -05:00
  • 982a60536c Checking in before forking Chulwoo Jung 2023-11-22 16:33:15 -05:00
  • dc36d272ce Gauge RMHMC conserving dH Chulwoo Jung 2023-11-21 13:48:51 -05:00
  • 09946cf1ba Improved, works on 48^3 moving to multiRHS optimisations Peter Boyle 2023-11-15 18:03:05 -05:00
  • f4fa95e7cb Use 5.3.0 Peter Boyle 2023-11-15 18:01:38 -05:00
  • 100e29e35e Allow expression as argument to norm2 Peter Boyle 2023-11-15 18:00:44 -05:00
  • 4cbe471a83 devVector Peter Boyle 2023-11-15 18:00:07 -05:00
  • 8bece1f861 Faster to transpose the matrix and apply with column major order Peter Boyle 2023-11-15 17:58:38 -05:00
  • a3ca71ec01 Lots more setup options, still working on them Peter Boyle 2023-11-15 17:58:04 -05:00
  • e0543e8af5 Implement flexible preconditioned CG Peter Boyle 2023-11-15 17:57:39 -05:00
  • c1eb80d01a Print which have converged Peter Boyle 2023-11-15 17:57:08 -05:00
  • a26121d97b Better printing Peter Boyle 2023-11-15 17:56:45 -05:00
  • 043031a757 Report resid on failed convergence Peter Boyle 2023-11-15 17:56:22 -05:00
  • 807aeebe4c Resize tol in constructor Peter Boyle 2023-11-15 17:55:57 -05:00
  • 8aa1a37aad For Mirs preconditioner solver Peter Boyle 2023-11-15 17:55:32 -05:00
  • 515ff6bf62 Added Laplacian metric, Gauge OpenBC Chulwoo Jung 2023-11-09 21:42:46 -05:00
  • 7d077fe493 Frontier compiel Peter Boyle 2023-11-09 13:58:44 -05:00
  • 9cd4128833 fix naik bug david clarke 2023-11-03 14:11:38 -06:00
  • c8b17c9526 Naik to CShift david clarke 2023-11-02 12:43:22 -06:00
  • 2ae2a81e85 attempt to fix Naik david clarke 2023-10-31 13:54:55 -06:00
  • 69c869d345 fixed stupid typo david clarke 2023-10-30 17:41:52 -06:00
  • df9b958c40 naik now returns separately david clarke 2023-10-30 17:40:53 -06:00
  • 3d3376d1a3 LePage works, trying Naik david clarke 2023-10-27 16:26:31 -06:00
  • 4efa042f50 C++17 change Peter Boyle 2023-10-24 10:57:50 -04:00
  • c7cb37e970 c++17 accepted Peter Boyle 2023-10-24 10:57:24 -04:00
  • d34b207eab Avoid HIP warnings Peter Boyle 2023-10-24 10:57:04 -04:00
  • 0e6fa6f6b8 DOn't need the Cshift for the period optimisation Peter Boyle 2023-10-24 10:56:31 -04:00
  • 38b87de53f This works around a stacksize limit on AMD GPU Peter Boyle 2023-10-24 10:56:07 -04:00
  • aa5047a9e4 Faster blockProject blockPromote Peter Boyle 2023-10-24 10:49:55 -04:00
  • 24b6ee0df9 M4 file Peter Boyle 2023-10-24 10:36:48 -04:00
  • 1e79cc9cbe Avoid compiler error Peter Boyle 2023-10-24 10:36:09 -04:00
  • b3925df9c3 Verbose on CPU-GPU xfer, remove performance by default Peter Boyle 2023-10-24 10:25:01 -04:00
  • f2648e94b9 getHostPointer added to Lattice Christoph Lehner 2023-10-23 13:47:41 +02:00
  • 351795ac3a Better messaging Peter Boyle 2023-10-20 19:33:04 -04:00
  • 9c9c42d0df Tests on frontier with real speed up . 3.5x on 16^3 at mq=0.01 Peter Boyle 2023-10-20 19:25:39 -04:00
  • b6ad1bafc7 Normal memory SendToRecvFrom asynchronous for use in general stencil code Peter Boyle 2023-10-20 19:24:38 -04:00
  • a5ca40f446 Better verbose -- track CPU GPU motion under --log Memory, others go to debug output stream Peter Boyle 2023-10-20 19:23:50 -04:00
  • 9ab54c5565 Overlap comms & data copy/buffer assembly in Ghost zone exchange Peter Boyle 2023-10-20 19:23:00 -04:00
  • 4341d96bde Massively sped up coarse grid mult, comms Save 3ms spend (60% of time !) on cudaMalloc !! Peter Boyle 2023-10-20 19:21:48 -04:00
  • 5fac47a26d Faster halo exchange Peter Boyle 2023-10-19 18:16:47 -04:00