1
0
mirror of https://github.com/paboyle/Grid.git synced 2024-11-09 23:45:36 +00:00

Commit Graph

  • 1502860004 Benchmark scripts Peter Boyle 2024-02-13 19:47:02 +0000
  • 585efc6f3f More benchmark scripts Peter Boyle 2024-02-13 19:40:49 +0000
  • 62055e04dd missing semicolon generates error with some compilers Antonin Portelli 2024-02-13 18:18:27 +0100
  • fe98e9f555 Fixing Laplace flopcount Minor cleanup Chulwoo Jung 2024-02-13 12:06:08 -0500
  • e4a641b64e removing old Eigen tensor patch Antonin Portelli 2024-02-13 10:37:14 +0100
  • 8849f187f1 updating Eigen to 3.4.0 Antonin Portelli 2024-02-13 10:30:22 +0100
  • 948d16fb06 Laplace benchmark added Chulwoo Jung 2024-02-12 21:23:36 -0500
  • 58fbcaa399 Checking in before cleaning up Chulwoo Jung 2024-02-12 21:10:21 -0500
  • db420525b3 fix Simd::Nsimd typo david clarke 2024-02-12 15:03:53 -0700
  • b5659d106e more test cases dbollweg 2024-02-09 13:37:14 -0500
  • 4b43307402 Undo include path changes for level zero api header dbollweg 2024-02-09 13:07:56 -0500
  • 09af8c25a2
    Merge branch 'paboyle:develop' into feature/sliceSum_gpu dbollweg 2024-02-09 13:02:59 -0500
  • 9514035b87 refactor slicesum: slicesum uses GPU version by default now dbollweg 2024-02-09 13:02:28 -0500
  • 9ad6836b0f Mixed precision for Laplace. Main program with Metric Chulwoo Jung 2024-02-08 17:13:10 -0500
  • 5c337e77db Merged implementation of X-conjugate Dirac operator, its surrounding infrastructure, tests and benchmarks Christopher Kelly 2024-02-08 14:58:53 -0500
  • 2da09ae99b acceleration compiles and doesn't break scalar mode david clarke 2024-02-06 18:40:13 -0700
  • a38fb0e04a first effort toward accelerators david clarke 2024-02-06 18:24:55 -0700
  • 7019916294 RNG seed change safer for large volumes; this is a long term solution Peter Boyle 2024-02-07 00:56:39 +0000
  • 1514b4f137 slicesum_sycl passes test dbollweg 2024-02-06 19:08:44 -0500
  • 91cf5ee312 Updated bench script Peter Boyle 2024-02-06 23:45:10 +0000
  • 0a6e2f42c5 small amount of cleanup david clarke 2024-02-06 16:32:07 -0700
  • ab2de131bd work towards sliceSum for sycl backend dbollweg 2024-02-06 13:24:45 -0500
  • 5bfa88be85 Aurora MPI standalone benchmake and options that work well Peter Boyle 2024-02-06 16:28:40 +0000
  • 5af8da76d7 Fix cuda compilation of Lattice_slicesum_gpu.h Dennis Bollweg 2024-02-01 18:02:30 -0500
  • b8b9dc952d Async memcpy's and cleanup Dennis Bollweg 2024-02-01 17:55:35 -0500
  • 79a6ed32d8 Use accelerator_for2d and DeviceSegmentedRecude to avoid kernel launch latencies Dennis Bollweg 2024-02-01 16:41:03 -0500
  • caa5f97723 Add sliceSum gpu using cub/hipcub dbollweg 2024-01-31 16:50:06 -0500
  • 4924b3209e projectU3 yields a unitary matrix david clarke 2024-01-23 14:43:58 -0700
  • eb702f581b Running on 12 rhs on 18 nodes of frontier Peter Boyle 2024-01-22 17:44:15 -0500
  • 3d13fd56c5 Precompute phases, save memory in hermitian Peter Boyle 2024-01-22 17:43:35 -0500
  • 6f51b49ef8 Use stderr Peter Boyle 2024-01-22 17:41:09 -0500
  • addc638856 Fast localCopyRegion, blockProjectFast Peter Boyle 2024-01-22 17:40:38 -0500
  • 00f24f8765 already found some bugs in projection, still needs testing david clarke 2024-01-22 05:50:16 -0700
  • f5b3d582b0 first attempt at U3 projection david clarke 2024-01-22 02:49:40 -0700
  • 981c93d67a update Test_fatLinks to accept Naik david clarke 2024-01-21 21:09:19 -0700
  • c020b78e02 Merge branch 'develop' into hisq_fat_links david clarke 2024-01-21 20:21:08 -0700
  • 42ae36bc28 WOrking Peter Boyle 2024-01-17 16:39:14 -0500
  • c69f73ff9f Working Peter Boyle 2024-01-17 16:38:46 -0500
  • ca5ae8a2e6 Revert to working. Peter Boyle 2024-01-17 16:32:05 -0500
  • d967eb53de Working for first time Peter Boyle 2024-01-17 16:31:12 -0500
  • 839f9f1bbe Don't log memory by default Peter Boyle 2024-01-17 16:25:50 -0500
  • b754a152c6 Flag guard correctly Peter Boyle 2024-01-17 16:25:28 -0500
  • e07cb2b9de Accelerator memory Peter Boyle 2024-01-17 16:24:31 -0500
  • a1f8bbb078 accelerator memory print Peter Boyle 2024-01-17 16:24:09 -0500
  • 7909683f3b MultiRHS Peter Boyle 2024-01-17 16:21:07 -0500
  • 25f71913b7 MultiRHS coarse Peter Boyle 2024-01-04 12:01:17 -0500
  • 34ddd2b7b1 MultiRHS coarse space Peter Boyle 2024-01-04 12:00:53 -0500
  • d5fd90b2f3 Add 48^3 rtest Peter Boyle 2024-01-04 12:00:01 -0500
  • b7c7000d0d Don't need the numerical rounding tolerance in multigrid Peter Boyle 2023-12-22 18:10:23 -0500
  • 551f6c4edd Synchronise changes Peter Boyle 2023-12-22 18:09:11 -0500
  • defd814750 Speed up the coarsened matrix matrix evaluation. It is block project limited. Could be sped up with calls to Batched GEMM and a data layout change. Peter Boyle 2023-12-22 18:07:03 -0500
  • 3d517bbd2a Synchronise decouple from the launch Speeds up multileg stencils Peter Boyle 2023-12-22 18:06:13 -0500
  • 78ab955fec Better padded cell exchange Peter Boyle 2023-12-22 18:05:41 -0500
  • dd13937bb6 Better opt face gather scatter Peter Boyle 2023-12-22 18:03:38 -0500
  • 66a1b63aa9 Faster grid/blas layout change. Halo exchange is now the only slow part. Revisit Peter Boyle 2023-12-21 20:50:18 -0500
  • 22c611bd1a Delete temp file Peter Boyle 2023-12-21 18:32:31 -0500
  • c9bb1bf8ea Passing new BLAs based Peter Boyle 2023-12-21 18:31:17 -0500
  • 2a0d75bac2 Aurora files Peter Boyle 2023-12-21 23:19:11 +0000
  • 9e489887cf General coarse multiRHS move to BLAS implementation Peter Boyle 2023-12-21 15:24:48 -0500
  • 9feb801bb9 Much simpler GPU implementation Peter Boyle 2023-12-21 15:24:06 -0500
  • c00b495933 Multigrid Peter Boyle 2023-12-21 15:23:31 -0500
  • d22eebe553 BLas options Peter Boyle 2023-12-21 15:23:03 -0500
  • 8bcbd82680 BLAS based layout and implementation Peter Boyle 2023-12-21 15:21:24 -0500
  • dfa617c439 Batched SGEMM/DGEMM/ZGEMM/CGEMM Hip, Cuda version and vanilla CPU One MKL stub in comments, to be tested as different. Peter Boyle 2023-12-21 14:01:18 -0500
  • 48d1f0df89 Optimised partially, working Peter Boyle 2023-12-21 12:33:47 -0500
  • b75cb7a12c Blas batched partial implementation on Frontier only for now Peter Boyle 2023-12-21 12:31:33 -0500
  • 332563e037 Debugged, reducing verbose Peter Boyle 2023-12-21 12:30:57 -0500
  • 0cce97a4fe verbosity only Peter Boyle 2023-12-20 21:30:10 -0500
  • 95a8e4be64 rocblas Peter Boyle 2023-12-20 21:27:59 -0500
  • abcd6b8cb6 Faster version Peter Boyle 2023-12-19 15:17:46 -0500
  • e8f21c9b6d Memmory verbose control improvement Peter Boyle 2023-12-19 15:16:58 -0500
  • 37d1d87c3c bug fix for Intel GPUs Meifeng Lin 2023-12-19 08:03:28 -0600
  • 1381dbc8ef Revert back to Grid develop version since new LLVM compilers now do not require static loop count variables. Meifeng Lin 2023-12-15 08:59:18 -0500
  • cc5ab624a2 Merge branch 'feature/omp-offload' of github.com:BNL-HPC/Grid into feature/omp-offload Meifeng Lin 2023-12-14 15:33:52 -0500
  • 72641211cd
    Merge branch 'paboyle:develop' into feature/omp-offload meifeng 2023-12-14 15:31:39 -0500
  • 505cc6927b
    Merge pull request #6 from atif4461/omp-offload-develop meifeng 2023-12-14 15:30:12 -0500
  • 026eb8a695 Wilson RMHMC main program Chulwoo Jung 2023-12-12 15:34:03 -0500
  • 076580c232 Recovering mixed precision CG for Laplace Checking in to move to aurora Chulwoo Jung 2023-12-12 15:32:00 -0500
  • f48298ad4e Bug fix Peter Boyle 2023-12-11 20:56:03 -0500
  • 645e47c1ba Config for Ampere Altra ARM root 2023-12-08 16:17:56 -0500
  • d1d9827263 Integrator logging update Peter Boyle 2023-12-08 12:11:03 -0500
  • e054078b11 Verbose Peter Boyle 2023-12-05 16:15:17 -0500
  • 7af6022a2a Added midMD checkpointing (for lattice only for now) Chulwoo Jung 2023-12-04 20:05:41 -0500
  • f516acda5f fixed conflicts; su3 working Mohammad Atif 2023-12-04 17:20:17 -0500
  • 7a7aa61d52 cleaned up Mohammad Atif 2023-12-04 16:37:28 -0500
  • 14643c0aab SDCC benchmarking scripts for A100 nodes and IceLake nodes (AVX512) Peter Boyle 2023-12-04 15:45:57 -0500
  • 867abeaf8e removed print flags Mohammad Atif 2023-12-04 15:12:03 -0500
  • b77a9b8947 SDDC compiles starting Peter Boyle 2023-11-30 14:31:51 -0500
  • 6835a7f208 Better logging, test on 81 point stencil Peter Boyle 2023-11-29 19:20:47 -0500
  • f59993b979 Nbasis§ Peter Boyle 2023-11-29 09:45:58 -0500
  • 2290b8f680 Verbose Peter Boyle 2023-11-29 09:47:04 -0500
  • 2c54be651c Further updates Peter Boyle 2023-11-29 09:43:29 -0500
  • e859a199df Reduce volume to interior for coarse stencil -- worth up to 4x gain Peter Boyle 2023-11-28 10:23:16 -0500
  • 0a3682ad0b MultiRHS work Peter Boyle 2023-11-28 07:43:37 -0500
  • 59abaeb5cd Time stamp Peter Boyle 2023-11-24 12:56:45 -0500
  • 3e448435d3 Restrict to interior Peter Boyle 2023-11-23 18:23:29 -0500
  • a294bc3c5b Relax constraints for multiRHS Peter Boyle 2023-11-23 18:20:42 -0500
  • b302ad3d49 multiRHS test in place, passes Yay! Peter Boyle 2023-11-23 18:20:15 -0500
  • 82fc4b1e94 Finalise Peter Boyle 2023-11-23 18:19:41 -0500
  • b4f1740380 Finalise message Peter Boyle 2023-11-23 18:19:16 -0500