1
0
mirror of https://github.com/paboyle/Grid.git synced 2025-09-20 02:01:05 +01:00

Commit Graph

  • f17b8de907 fallback to _POSIX_HOST_NAME_MAX if HOST_NAME_MAX is not defined Antonin Portelli 2024-03-07 15:22:08 +09:00
  • d87296f3e8 Merge branch 'develop' of https://github.com/dbollweg/Grid into develop dbollweg 2024-03-06 16:54:22 -05:00
  • be94cf1c6f Fewer wait-calls in sycl slicesum dbollweg 2024-03-06 16:53:13 -05:00
  • cc04dc42dc Merge branch 'develop' into feature/scidac-wp1 Peter Boyle 2024-03-06 14:55:21 -05:00
  • 070b61f08f Simplifying the MultiRHS solver to make it do SRHS *and* MRHS Peter Boyle 2024-03-06 14:04:33 -05:00
  • 7e5bd46dd3 Booster update Peter Boyle 2024-03-06 19:03:45 +01:00
  • 228bbb9d81 Benchmark results Peter Boyle 2024-03-06 19:03:35 +01:00
  • b812a7b4c6 Staggered launch script Peter Boyle 2024-03-06 01:32:40 +00:00
  • 891a366f73 Repro CG script Peter Boyle 2024-03-06 01:22:55 +00:00
  • 10116b3be8 Force device copyable and tell SYCL to shut it. Peter Boyle 2024-03-06 01:13:27 +00:00
  • a46a0f0882 force device copyable and don't take crap from SYCL Peter Boyle 2024-03-06 01:12:49 +00:00
  • a26a8a38f4 Merge branch 'develop' of https://github.com/paboyle/Grid into develop Peter Boyle 2024-03-06 00:05:00 +00:00
  • 7435315d50 More blasted shell variables Peter Boyle 2024-03-06 00:03:59 +00:00
  • 9b5f741e85 Reproducing CG can be more useful now Peter Boyle 2024-03-06 00:03:16 +00:00
  • 517822fdd2 SPR HBM benchmarking right and also PVC batched GEMM Peter Boyle 2024-03-06 00:02:27 +00:00
  • 1b93a9be88 Print out the hostname Peter Boyle 2024-03-06 00:01:58 +00:00
  • 783a66b348 Deterministic reduction please Peter Boyle 2024-03-06 00:01:37 +00:00
  • 976c3e9b59 Hack for flight logging CG inner products. Can be made to work, but could put in some more serious infrastructure for repro testing and blame attribution (Britney test) if necessary Peter Boyle 2024-03-05 23:59:57 +00:00
  • f8ca971dae Use of a bare PRECISION macro is not namespace safe and collides with SYCL Peter Boyle 2024-03-05 23:59:13 +00:00
  • 21bc8c24df OneMKL batched blas starting Peter Boyle 2024-03-05 23:58:20 +00:00
  • 30228214f7 SYCL conflict with Eigen Peter Boyle 2024-03-05 23:56:10 +00:00
  • 2ae980ae43 Update sourceme.sh Peter Boyle 2024-03-05 13:39:18 -05:00
  • 6153dec2e4 Update setup.sh Peter Boyle 2024-03-05 13:38:32 -05:00
  • c805f86343 USQCD benchmark Peter Boyle 2024-03-01 00:05:04 -05:00
  • 04ca065281 Only one rank opens Peter Boyle 2024-02-29 20:09:11 -05:00
  • 88d8fa43d7 Benchmark development Peter Boyle 2024-02-29 20:01:44 -05:00
  • 3c49762875 Propagate in the blas routine Peter Boyle 2024-02-29 15:33:06 -05:00
  • 436bf1d9d3 Merge pull request #455 from clarkedavida/hisq_fat_links Peter Boyle 2024-02-29 15:29:39 -05:00
  • f70df6e195 changed NO_SHIFT and BACKWARD_CONST from define to enum david clarke 2024-02-29 12:29:30 -07:00
  • fce3852dff Merge pull request #451 from paboyle/feature/eigen-3.4.0-update Peter Boyle 2024-02-28 18:03:37 -05:00
  • ee1b8bbdbd Merge pull request #454 from edbennett/adjoint-broke Peter Boyle 2024-02-28 14:05:27 -05:00
  • 3f1636637d Merge pull request #453 from dbollweg/feature/sliceSum_gpu Peter Boyle 2024-02-28 14:04:43 -05:00
  • 2e570f5300 Merge pull request #457 from lehner/feature/gpt Peter Boyle 2024-02-28 13:59:04 -05:00
  • 9f89486df5 remove unnecessary code path Christoph Lehner 2024-02-28 19:56:23 +01:00
  • 22b43b86cb Make GPT test suite work with SYCL Christoph Lehner 2024-02-28 12:57:17 +01:00
  • 3c9012676a CUDA cub refuses to reduce vSpinColourMatrix, breaking up into smaller parts like already done for HIP case. dbollweg 2024-02-27 12:41:45 -05:00
  • ee3b3c4c56 relocate deflation support Peter Boyle 2024-02-27 11:52:23 -05:00
  • 462d706a63 Move to a blas directory Peter Boyle 2024-02-27 11:51:04 -05:00
  • ee0d460c8e Blas based block project & deflate for multiRHS Peter Boyle 2024-02-27 11:41:44 -05:00
  • cd15abe9d1 Mrhs prep Peter Boyle 2024-02-27 11:41:13 -05:00
  • 9f40467e24 Warning squash Peter Boyle 2024-02-27 11:40:36 -05:00
  • d0b6593823 More verbose on checksum Peter Boyle 2024-02-27 11:40:14 -05:00
  • 79fc821d8d reorg headers Peter Boyle 2024-02-27 11:39:37 -05:00
  • d7fdb9a7e6 Reorg headers Peter Boyle 2024-02-27 11:39:06 -05:00
  • b74de51c18 Reorder headers Peter Boyle 2024-02-27 11:38:52 -05:00
  • b507fe209c Added SpinColourMatrix case to sliceSum Test Dennis Bollweg 2024-02-27 11:28:32 -05:00
  • 6cd2d8fcd5 Replace cuda/hip memcpy with Grid functions Dennis Bollweg 2024-02-26 09:55:07 -05:00
  • cfa0576ffd Getting rid of one more non-auto View, comms overlap in Laplace operator rmhmc_merge2 Chulwoo Jung 2024-02-25 22:37:48 -05:00
  • b02d022993 fixed race condition (thx michael) david clarke 2024-02-23 17:14:28 -07:00
  • 94581e3c7a accelerator_for is broken david clarke 2024-02-23 15:58:33 -07:00
  • 88b52cc045 Merge branch 'develop' into hisq_fat_links david clarke 2024-02-23 14:47:15 -07:00
  • 0a816b5509 Merge branch 'feature/sliceSum_gpu' of https://github.com/dbollweg/Grid into feature/sliceSum_gpu dbollweg 2024-02-22 21:43:06 -05:00
  • 1c8b807c2e free malloc'd memory dbollweg 2024-02-22 21:42:44 -05:00
  • 44b466e072 Make InsertSliceFast the default at some point in future. Should I do this now? Peter Boyle 2024-02-21 14:51:24 -05:00
  • 5e5b471bb2 Put/Get and DEviceToDevice Peter Boyle 2024-02-21 14:47:06 -05:00
  • 9c2565f64e Working and faster version Peter Boyle 2024-02-21 14:46:43 -05:00
  • e1d0a7cec3 Batched blas Peter Boyle 2024-02-21 14:38:20 -05:00
  • b19ae8f465 Nbasis method for convenience Peter Boyle 2024-02-21 14:36:19 -05:00
  • cdff2c8e18 Updated mrhs adef Peter Boyle 2024-02-21 14:27:19 -05:00
  • 66391f84f2 Merge branch 'feature/gpt' of ../Grid into develop Christoph Lehner 2024-02-21 19:05:00 +01:00
  • 97f7a9ecb3 fix HMC for non-fundamental representations Ed Bennett 2024-02-21 08:27:55 +00:00
  • 15878f7613 sliceSumReduction_cub_large now also faster than CPU on Frontier Dennis Bollweg 2024-02-16 13:55:21 -05:00
  • e0d5e3c6c7 Merge branch 'paboyle:develop' into feature/sliceSum_gpu dbollweg 2024-02-16 13:16:37 -05:00
  • 6f3455900e Adding sliceSumReduction_cub_small/large since hipcub cannot deal with arb. large vobjs dbollweg 2024-02-16 13:15:02 -05:00
  • 56827d6ad6 accelerator_inline bug david clarke 2024-02-14 13:56:57 -07:00
  • 73c0b29535 Merge branch 'develop' of https://github.com/paboyle/Grid into develop Peter Boyle 2024-02-13 20:19:32 +00:00
  • 303b83cdb8 Scaling benchmarks, verbosity and MPICH aware in acceleratorInit() For some reason Dirichlet benchmark fails on several nodes; need to debug this. Peter Boyle 2024-02-13 19:48:03 +00:00
  • 5ef4da3f29 Silence verbose Peter Boyle 2024-02-13 19:47:36 +00:00
  • 1502860004 Benchmark scripts Peter Boyle 2024-02-13 19:47:02 +00:00
  • 585efc6f3f More benchmark scripts Peter Boyle 2024-02-13 19:40:49 +00:00
  • 62055e04dd missing semicolon generates error with some compilers Antonin Portelli 2024-02-13 18:18:27 +01:00
  • fe98e9f555 Fixing Laplace flopcount Minor cleanup Chulwoo Jung 2024-02-13 12:06:08 -05:00
  • e4a641b64e removing old Eigen tensor patch Antonin Portelli 2024-02-13 10:37:14 +01:00
  • 8849f187f1 updating Eigen to 3.4.0 Antonin Portelli 2024-02-13 10:30:22 +01:00
  • 948d16fb06 Laplace benchmark added Chulwoo Jung 2024-02-12 21:23:36 -05:00
  • 58fbcaa399 Checking in before cleaning up Chulwoo Jung 2024-02-12 21:10:21 -05:00
  • db420525b3 fix Simd::Nsimd typo david clarke 2024-02-12 15:03:53 -07:00
  • b5659d106e more test cases dbollweg 2024-02-09 13:37:14 -05:00
  • 4b43307402 Undo include path changes for level zero api header dbollweg 2024-02-09 13:07:56 -05:00
  • 09af8c25a2 Merge branch 'paboyle:develop' into feature/sliceSum_gpu dbollweg 2024-02-09 13:02:59 -05:00
  • 9514035b87 refactor slicesum: slicesum uses GPU version by default now dbollweg 2024-02-09 13:02:28 -05:00
  • 9ad6836b0f Mixed precision for Laplace. Main program with Metric Chulwoo Jung 2024-02-08 17:13:10 -05:00
  • 2da09ae99b acceleration compiles and doesn't break scalar mode david clarke 2024-02-06 18:40:13 -07:00
  • a38fb0e04a first effort toward accelerators david clarke 2024-02-06 18:24:55 -07:00
  • 7019916294 RNG seed change safer for large volumes; this is a long term solution Peter Boyle 2024-02-07 00:56:39 +00:00
  • 1514b4f137 slicesum_sycl passes test dbollweg 2024-02-06 19:08:44 -05:00
  • 91cf5ee312 Updated bench script Peter Boyle 2024-02-06 23:45:10 +00:00
  • 0a6e2f42c5 small amount of cleanup david clarke 2024-02-06 16:32:07 -07:00
  • ab2de131bd work towards sliceSum for sycl backend dbollweg 2024-02-06 13:24:45 -05:00
  • 5bfa88be85 Aurora MPI standalone benchmake and options that work well Peter Boyle 2024-02-06 16:28:40 +00:00
  • 5af8da76d7 Fix cuda compilation of Lattice_slicesum_gpu.h Dennis Bollweg 2024-02-01 18:02:30 -05:00
  • b8b9dc952d Async memcpy's and cleanup Dennis Bollweg 2024-02-01 17:55:35 -05:00
  • 79a6ed32d8 Use accelerator_for2d and DeviceSegmentedRecude to avoid kernel launch latencies Dennis Bollweg 2024-02-01 16:41:03 -05:00
  • caa5f97723 Add sliceSum gpu using cub/hipcub dbollweg 2024-01-31 16:50:06 -05:00
  • 4924b3209e projectU3 yields a unitary matrix david clarke 2024-01-23 14:43:58 -07:00
  • eb702f581b Running on 12 rhs on 18 nodes of frontier Peter Boyle 2024-01-22 17:44:15 -05:00
  • 3d13fd56c5 Precompute phases, save memory in hermitian Peter Boyle 2024-01-22 17:43:35 -05:00
  • 6f51b49ef8 Use stderr Peter Boyle 2024-01-22 17:41:09 -05:00
  • addc638856 Fast localCopyRegion, blockProjectFast Peter Boyle 2024-01-22 17:40:38 -05:00
  • 00f24f8765 already found some bugs in projection, still needs testing david clarke 2024-01-22 05:50:16 -07:00