1
0
mirror of https://github.com/paboyle/Grid.git synced 2025-04-03 18:55:56 +01:00

Commit Graph

  • 3c49762875 Propagate in the blas routine Peter Boyle 2024-02-29 15:33:06 -05:00
  • 436bf1d9d3
    Merge pull request #455 from clarkedavida/hisq_fat_links Peter Boyle 2024-02-29 15:29:39 -05:00
  • f70df6e195 changed NO_SHIFT and BACKWARD_CONST from define to enum david clarke 2024-02-29 12:29:30 -07:00
  • fce3852dff
    Merge pull request #451 from paboyle/feature/eigen-3.4.0-update Peter Boyle 2024-02-28 18:03:37 -05:00
  • ee1b8bbdbd
    Merge pull request #454 from edbennett/adjoint-broke Peter Boyle 2024-02-28 14:05:27 -05:00
  • 3f1636637d
    Merge pull request #453 from dbollweg/feature/sliceSum_gpu Peter Boyle 2024-02-28 14:04:43 -05:00
  • 2e570f5300
    Merge pull request #457 from lehner/feature/gpt Peter Boyle 2024-02-28 13:59:04 -05:00
  • 9f89486df5 remove unnecessary code path Christoph Lehner 2024-02-28 19:56:23 +01:00
  • 22b43b86cb Make GPT test suite work with SYCL Christoph Lehner 2024-02-28 12:57:17 +01:00
  • 3c9012676a CUDA cub refuses to reduce vSpinColourMatrix, breaking up into smaller parts like already done for HIP case. dbollweg 2024-02-27 12:41:45 -05:00
  • ee3b3c4c56 relocate deflation support Peter Boyle 2024-02-27 11:52:23 -05:00
  • 462d706a63 Move to a blas directory Peter Boyle 2024-02-27 11:51:04 -05:00
  • ee0d460c8e Blas based block project & deflate for multiRHS Peter Boyle 2024-02-27 11:41:44 -05:00
  • cd15abe9d1 Mrhs prep Peter Boyle 2024-02-27 11:41:13 -05:00
  • 9f40467e24 Warning squash Peter Boyle 2024-02-27 11:40:36 -05:00
  • d0b6593823 More verbose on checksum Peter Boyle 2024-02-27 11:40:14 -05:00
  • 79fc821d8d reorg headers Peter Boyle 2024-02-27 11:39:37 -05:00
  • d7fdb9a7e6 Reorg headers Peter Boyle 2024-02-27 11:39:06 -05:00
  • b74de51c18 Reorder headers Peter Boyle 2024-02-27 11:38:52 -05:00
  • b507fe209c Added SpinColourMatrix case to sliceSum Test Dennis Bollweg 2024-02-27 11:28:32 -05:00
  • 6cd2d8fcd5 Replace cuda/hip memcpy with Grid functions Dennis Bollweg 2024-02-26 09:55:07 -05:00
  • cfa0576ffd Getting rid of one more non-auto View, comms overlap in Laplace operator rmhmc_merge2 Chulwoo Jung 2024-02-25 22:37:48 -05:00
  • b02d022993 fixed race condition (thx michael) david clarke 2024-02-23 17:14:28 -07:00
  • 94581e3c7a accelerator_for is broken david clarke 2024-02-23 15:58:33 -07:00
  • 88b52cc045 Merge branch 'develop' into hisq_fat_links david clarke 2024-02-23 14:47:15 -07:00
  • 0a816b5509 Merge branch 'feature/sliceSum_gpu' of https://github.com/dbollweg/Grid into feature/sliceSum_gpu dbollweg 2024-02-22 21:43:06 -05:00
  • 1c8b807c2e free malloc'd memory dbollweg 2024-02-22 21:42:44 -05:00
  • 44b466e072 Make InsertSliceFast the default at some point in future. Should I do this now? Peter Boyle 2024-02-21 14:51:24 -05:00
  • 5e5b471bb2 Put/Get and DEviceToDevice Peter Boyle 2024-02-21 14:47:06 -05:00
  • 9c2565f64e Working and faster version Peter Boyle 2024-02-21 14:46:43 -05:00
  • e1d0a7cec3 Batched blas Peter Boyle 2024-02-21 14:38:20 -05:00
  • b19ae8f465 Nbasis method for convenience Peter Boyle 2024-02-21 14:36:19 -05:00
  • cdff2c8e18 Updated mrhs adef Peter Boyle 2024-02-21 14:27:19 -05:00
  • 66391f84f2 Merge branch 'feature/gpt' of ../Grid into develop Christoph Lehner 2024-02-21 19:05:00 +01:00
  • 97f7a9ecb3 fix HMC for non-fundamental representations Ed Bennett 2024-02-21 08:27:55 +00:00
  • 15878f7613 sliceSumReduction_cub_large now also faster than CPU on Frontier Dennis Bollweg 2024-02-16 13:55:21 -05:00
  • e0d5e3c6c7
    Merge branch 'paboyle:develop' into feature/sliceSum_gpu dbollweg 2024-02-16 13:16:37 -05:00
  • 6f3455900e Adding sliceSumReduction_cub_small/large since hipcub cannot deal with arb. large vobjs dbollweg 2024-02-16 13:15:02 -05:00
  • 56827d6ad6 accelerator_inline bug david clarke 2024-02-14 13:56:57 -07:00
  • 73c0b29535 Merge branch 'develop' of https://github.com/paboyle/Grid into develop Peter Boyle 2024-02-13 20:19:32 +00:00
  • 303b83cdb8 Scaling benchmarks, verbosity and MPICH aware in acceleratorInit() For some reason Dirichlet benchmark fails on several nodes; need to debug this. Peter Boyle 2024-02-13 19:48:03 +00:00
  • 5ef4da3f29 Silence verbose Peter Boyle 2024-02-13 19:47:36 +00:00
  • 1502860004 Benchmark scripts Peter Boyle 2024-02-13 19:47:02 +00:00
  • 585efc6f3f More benchmark scripts Peter Boyle 2024-02-13 19:40:49 +00:00
  • 62055e04dd missing semicolon generates error with some compilers Antonin Portelli 2024-02-13 18:18:27 +01:00
  • fe98e9f555 Fixing Laplace flopcount Minor cleanup Chulwoo Jung 2024-02-13 12:06:08 -05:00
  • e4a641b64e removing old Eigen tensor patch Antonin Portelli 2024-02-13 10:37:14 +01:00
  • 8849f187f1 updating Eigen to 3.4.0 Antonin Portelli 2024-02-13 10:30:22 +01:00
  • 948d16fb06 Laplace benchmark added Chulwoo Jung 2024-02-12 21:23:36 -05:00
  • 58fbcaa399 Checking in before cleaning up Chulwoo Jung 2024-02-12 21:10:21 -05:00
  • db420525b3 fix Simd::Nsimd typo david clarke 2024-02-12 15:03:53 -07:00
  • b5659d106e more test cases dbollweg 2024-02-09 13:37:14 -05:00
  • 4b43307402 Undo include path changes for level zero api header dbollweg 2024-02-09 13:07:56 -05:00
  • 09af8c25a2
    Merge branch 'paboyle:develop' into feature/sliceSum_gpu dbollweg 2024-02-09 13:02:59 -05:00
  • 9514035b87 refactor slicesum: slicesum uses GPU version by default now dbollweg 2024-02-09 13:02:28 -05:00
  • 9ad6836b0f Mixed precision for Laplace. Main program with Metric Chulwoo Jung 2024-02-08 17:13:10 -05:00
  • 2da09ae99b acceleration compiles and doesn't break scalar mode david clarke 2024-02-06 18:40:13 -07:00
  • a38fb0e04a first effort toward accelerators david clarke 2024-02-06 18:24:55 -07:00
  • 7019916294 RNG seed change safer for large volumes; this is a long term solution Peter Boyle 2024-02-07 00:56:39 +00:00
  • 1514b4f137 slicesum_sycl passes test dbollweg 2024-02-06 19:08:44 -05:00
  • 91cf5ee312 Updated bench script Peter Boyle 2024-02-06 23:45:10 +00:00
  • 0a6e2f42c5 small amount of cleanup david clarke 2024-02-06 16:32:07 -07:00
  • ab2de131bd work towards sliceSum for sycl backend dbollweg 2024-02-06 13:24:45 -05:00
  • 5bfa88be85 Aurora MPI standalone benchmake and options that work well Peter Boyle 2024-02-06 16:28:40 +00:00
  • 5af8da76d7 Fix cuda compilation of Lattice_slicesum_gpu.h Dennis Bollweg 2024-02-01 18:02:30 -05:00
  • b8b9dc952d Async memcpy's and cleanup Dennis Bollweg 2024-02-01 17:55:35 -05:00
  • 79a6ed32d8 Use accelerator_for2d and DeviceSegmentedRecude to avoid kernel launch latencies Dennis Bollweg 2024-02-01 16:41:03 -05:00
  • caa5f97723 Add sliceSum gpu using cub/hipcub dbollweg 2024-01-31 16:50:06 -05:00
  • 4924b3209e projectU3 yields a unitary matrix david clarke 2024-01-23 14:43:58 -07:00
  • eb702f581b Running on 12 rhs on 18 nodes of frontier Peter Boyle 2024-01-22 17:44:15 -05:00
  • 3d13fd56c5 Precompute phases, save memory in hermitian Peter Boyle 2024-01-22 17:43:35 -05:00
  • 6f51b49ef8 Use stderr Peter Boyle 2024-01-22 17:41:09 -05:00
  • addc638856 Fast localCopyRegion, blockProjectFast Peter Boyle 2024-01-22 17:40:38 -05:00
  • 00f24f8765 already found some bugs in projection, still needs testing david clarke 2024-01-22 05:50:16 -07:00
  • f5b3d582b0 first attempt at U3 projection david clarke 2024-01-22 02:49:40 -07:00
  • 981c93d67a update Test_fatLinks to accept Naik david clarke 2024-01-21 21:09:19 -07:00
  • c020b78e02 Merge branch 'develop' into hisq_fat_links david clarke 2024-01-21 20:21:08 -07:00
  • 42ae36bc28 WOrking Peter Boyle 2024-01-17 16:39:14 -05:00
  • c69f73ff9f Working Peter Boyle 2024-01-17 16:38:46 -05:00
  • ca5ae8a2e6 Revert to working. Peter Boyle 2024-01-17 16:32:05 -05:00
  • d967eb53de Working for first time Peter Boyle 2024-01-17 16:31:12 -05:00
  • 839f9f1bbe Don't log memory by default Peter Boyle 2024-01-17 16:25:50 -05:00
  • b754a152c6 Flag guard correctly Peter Boyle 2024-01-17 16:25:28 -05:00
  • e07cb2b9de Accelerator memory Peter Boyle 2024-01-17 16:24:31 -05:00
  • a1f8bbb078 accelerator memory print Peter Boyle 2024-01-17 16:24:09 -05:00
  • 7909683f3b MultiRHS Peter Boyle 2024-01-17 16:21:07 -05:00
  • 25f71913b7 MultiRHS coarse Peter Boyle 2024-01-04 12:01:17 -05:00
  • 34ddd2b7b1 MultiRHS coarse space Peter Boyle 2024-01-04 12:00:53 -05:00
  • d5fd90b2f3 Add 48^3 rtest Peter Boyle 2024-01-04 12:00:01 -05:00
  • b7c7000d0d Don't need the numerical rounding tolerance in multigrid Peter Boyle 2023-12-22 18:10:23 -05:00
  • 551f6c4edd Synchronise changes Peter Boyle 2023-12-22 18:09:11 -05:00
  • defd814750 Speed up the coarsened matrix matrix evaluation. It is block project limited. Could be sped up with calls to Batched GEMM and a data layout change. Peter Boyle 2023-12-22 18:07:03 -05:00
  • 3d517bbd2a Synchronise decouple from the launch Speeds up multileg stencils Peter Boyle 2023-12-22 18:06:13 -05:00
  • 78ab955fec Better padded cell exchange Peter Boyle 2023-12-22 18:05:41 -05:00
  • dd13937bb6 Better opt face gather scatter Peter Boyle 2023-12-22 18:03:38 -05:00
  • 66a1b63aa9 Faster grid/blas layout change. Halo exchange is now the only slow part. Revisit Peter Boyle 2023-12-21 20:50:18 -05:00
  • 22c611bd1a Delete temp file Peter Boyle 2023-12-21 18:32:31 -05:00
  • c9bb1bf8ea Passing new BLAs based Peter Boyle 2023-12-21 18:31:17 -05:00
  • 2a0d75bac2 Aurora files Peter Boyle 2023-12-21 23:19:11 +00:00
  • 9e489887cf General coarse multiRHS move to BLAS implementation Peter Boyle 2023-12-21 15:24:48 -05:00