1
0
mirror of https://github.com/paboyle/Grid.git synced 2025-06-14 05:07:05 +01:00

Commit Graph

  • 97448a93dc Double2 compiles and dslash runs Peter Boyle 2022-09-27 10:55:25 -04:00
  • 70c83ec3be More instantiations Peter Boyle 2022-09-27 10:54:23 -04:00
  • 8f4e2ee545 Double2 Peter Boyle 2022-09-27 10:53:46 -04:00
  • e8bfbf2f7c D2 operators Peter Boyle 2022-09-27 10:37:45 -04:00
  • 9e81b42981 D2 fields Peter Boyle 2022-09-27 10:37:19 -04:00
  • 6c9eef9726 D2 fields Peter Boyle 2022-09-27 10:36:54 -04:00
  • 7ffbc3e98e Double2 improved. REally don't like 'convertType' - localise to a GPT header Peter Boyle 2022-09-27 10:35:31 -04:00
  • 68e4d833dd Run through wrapper script Peter Boyle 2022-09-23 16:49:29 -04:00
  • a2cefaa53a Faster Peter Boyle 2022-09-23 16:49:14 -04:00
  • a0d682687e Better logging of Fdt for force gradient Peter Boyle 2022-09-23 16:22:53 -04:00
  • eb552c3ecd dt info Peter Boyle 2022-09-23 16:22:28 -04:00
  • 97cce103d7 Tolerances control Peter Boyle 2022-09-23 16:21:49 -04:00
  • 87ac7104f8 Prettier Peter Boyle 2022-09-23 16:20:46 -04:00
  • e4c117aabf Compile fix, multishift mixed prec support Peter Boyle 2022-09-23 16:19:27 -04:00
  • 5b128a6f9f MixedPrec Multishift with better precision scheme for GPU Peter Boyle 2022-09-23 16:18:47 -04:00
  • 19da647e3c Added support for non-periodic gauge field implementations in the random gauge shift performed at the start of the HMC trajectory (The above required exposing the gauge implementation to the HMC class through the Integrator class) Made the random shift optional (default on) through a parameter in HMCparameters Modified ConjugateBC::CshiftLink such that it supports any shift in -L < shift < L rather than just +-1 Added a tester for the BC-respecting Cshift Fixed a missing system header include in SSE4 intrinsics wrapper Fixed sumD_cpu for single-prec types performing an incorrect conversion to a single-prec data type at the end, that fails to compile on some systems Christopher Kelly 2022-09-09 12:47:09 -04:00
  • 1713de35c0 Improved config flags Peter Boyle 2022-09-05 21:50:02 -04:00
  • 1177b8f661 Merge branch 'develop' into feature/dirichlet Peter Boyle 2022-08-31 19:05:57 -04:00
  • 442bfb3d42 Merge branch 'develop' into feature/dirichlet Peter Boyle 2022-08-31 19:04:19 -04:00
  • e7d9b75fdd Warning fixes Peter Boyle 2022-08-31 19:01:14 -04:00
  • 3d0e3ec363 Tracing Peter Boyle 2022-08-31 18:31:46 -04:00
  • 3c1c51f9aa Merge branch 'feature/dirichlet-gparity' into feature/dirichlet Peter Boyle 2022-08-31 18:25:34 -04:00
  • 8cc3c522c3 Merge pull request #409 from giltirn/feature/dirichlet-gparity-stage feature/dirichlet-gparity Peter Boyle 2022-08-31 18:22:50 -04:00
  • 913fbca74a Merge pull request #410 from gkanwar/photon_and_sha_patches Peter Boyle 2022-08-31 18:01:45 -04:00
  • 5c87342108 Used in g-2 sign off Peter Boyle 2022-08-31 17:35:32 -04:00
  • 66177bfbe2 Used in g-2 sign off Peter Boyle 2022-08-31 17:35:07 -04:00
  • 5205e68963 RocTX, NVTX, text based self profiling Peter Boyle 2022-08-31 17:34:09 -04:00
  • cd5cf6d614 Tracing replaces self timing hooks Peter Boyle 2022-08-31 17:33:41 -04:00
  • 5abb19eab0 Remove self timing Peter Boyle 2022-08-31 17:32:49 -04:00
  • 06d7b88c78 Force reporting improved Peter Boyle 2022-08-31 17:32:21 -04:00
  • cf72799735 Better action naming Peter Boyle 2022-08-31 17:24:11 -04:00
  • cdb8fcc269 Width=4 support. This is too broad; hit it on physical point run. Need to change strategy, I think. Peter Boyle 2022-08-31 17:21:33 -04:00
  • b4f4130901 Defer SMP node links until after interior. Allows for DMA overlapping compute Peter Boyle 2022-08-31 17:20:21 -04:00
  • bb049847d5 Tracing replaces self timing Peter Boyle 2022-08-31 17:19:02 -04:00
  • fd33c835dd Feynman rule fix and tracing replaces self timing Peter Boyle 2022-08-31 17:18:17 -04:00
  • 21371a7e5b Tracing replaces self timing Peter Boyle 2022-08-31 17:16:05 -04:00
  • abfaa00d3e Tracing replaces self timing Peter Boyle 2022-08-31 17:15:24 -04:00
  • efee33c55d Tracing replaces self timing Peter Boyle 2022-08-31 17:14:57 -04:00
  • db0fe6ddbb Tracing replaces self timinng Peter Boyle 2022-08-31 17:14:14 -04:00
  • 8a9e647120 Tracing replaces self timing Peter Boyle 2022-08-31 17:13:44 -04:00
  • e6dcb821ad Tracing replaces self timing Peter Boyle 2022-08-31 17:12:31 -04:00
  • 9bff188f02 Tracing replaces self timing Peter Boyle 2022-08-31 17:12:05 -04:00
  • 111b30ca1d Tracing replaces self timing Peter Boyle 2022-08-31 17:11:48 -04:00
  • 24182ca8bf HIP allows conserved currents. Tracing replaces self timeing Peter Boyle 2022-08-31 17:11:18 -04:00
  • ee2d7369b3 Tracing replaces self timing Peter Boyle 2022-08-31 17:10:45 -04:00
  • 7c686d29c9 Tracing replaces self timing Peter Boyle 2022-08-31 17:10:17 -04:00
  • e8a0a1e75d Tracing replaces self timing hooks Peter Boyle 2022-08-31 17:09:47 -04:00
  • 730be89abf Remove timing hooks as tracing replaces Peter Boyle 2022-08-31 17:08:44 -04:00
  • f991ad7d5c Remove timing hooks as tracing replaces Peter Boyle 2022-08-31 17:08:18 -04:00
  • b3f33f82f7 Decrease self timing hooks, use nvtx / roctx type tracing hooks instead Peter Boyle 2022-08-31 17:06:47 -04:00
  • a34a6e059f Logging improvement. Sinitial will be used to improve RHMC terms Peter Boyle 2022-08-31 17:06:08 -04:00
  • 1333319941 Tracing Peter Boyle 2022-08-31 17:00:25 -04:00
  • 9295ed8d20 Print full memory range Peter Boyle 2022-08-31 16:59:51 -04:00
  • 19cc7653fb Tracing Peter Boyle 2022-08-31 16:57:51 -04:00
  • 5752538661 Tracing Peter Boyle 2022-08-31 16:57:32 -04:00
  • ca40a1b00b Tracing Peter Boyle 2022-08-31 16:54:55 -04:00
  • 659fac9dfb Tracing hook Peter Boyle 2022-08-31 16:54:25 -04:00
  • 4dc3d6fce0 Buy into Nvidia/Rocm etc... tracing. Peter Boyle 2022-08-31 16:53:19 -04:00
  • 60dfb49afa Remove FP16 tests when FP16 is disabled Gurtej Kanwar 2022-08-21 17:29:55 +02:00
  • 554c238359 Update OpenSSL digest to use high-level methods Gurtej Kanwar 2022-08-21 17:28:57 +02:00
  • f922adf05e Fix Photon ComplexField type Gurtej Kanwar 2022-08-21 16:16:18 +02:00
  • 95b640cb6b 10TF/s on 32^3 x 64 on single node Peter Boyle 2022-08-04 15:43:52 -04:00
  • 2cb5bedc15 Copy stream HIP improvements Peter Boyle 2022-08-04 15:24:03 -04:00
  • 806b02bddf Simplify dead code Peter Boyle 2022-08-04 15:23:13 -04:00
  • de40395773 More timing. Think I should start to use nvtx and rocmtx ?? Peter Boyle 2022-08-04 13:37:16 -04:00
  • 7ba4788715 Fix Peter Boyle 2022-08-04 13:36:44 -04:00
  • 06d9ce1a02 Synch ranks on node here for GPU - GPU memcopy Peter Boyle 2022-08-04 13:35:56 -04:00
  • 75bb6b2b40 Move barrier into the StencilSend begin routine Peter Boyle 2022-08-04 13:35:26 -04:00
  • 74f10c2dc0 Move barrier into Stencil Send Peter Boyle 2022-08-04 13:34:11 -04:00
  • 188d2c7a4d PVC default, ignore ATS Peter Boyle 2022-08-02 08:38:53 -07:00
  • 17d7177105 Files for SYCL Peter Boyle 2022-08-02 08:33:39 -07:00
  • bb0a0da47a inon blocking caution due to SYCL Peter Boyle 2022-08-02 08:09:43 -07:00
  • 84110166e4 Fix the fence Peter Boyle 2022-08-02 08:00:43 -07:00
  • d32b923b6c Fencing on a stream in SYCL is needed. Didn't know that ... gulp Peter Boyle 2022-08-02 07:58:04 -07:00
  • a93d5459d4 Better mpi request completion Peter Boyle 2022-07-28 12:18:35 -04:00
  • 9c21add0c6 High res timer replaces getttimeofday Peter Boyle 2022-07-28 12:14:03 -04:00
  • 639aab6563 High res timer instead of gettimeofday Peter Boyle 2022-07-28 12:13:35 -04:00
  • 8137cc7049 Allways concurrent comms Peter Boyle 2022-07-28 12:01:51 -04:00
  • 60e63dca1d Add memory logging channel Peter Boyle 2022-07-28 11:39:15 -04:00
  • 486409574e Expanded cach to avoid any allocs in HMC Peter Boyle 2022-07-28 11:38:34 -04:00
  • a913b8be12 Dslash self timing. Might want to not have this Peter Boyle 2022-07-28 11:37:55 -04:00
  • 2239751850 Better logging Peter Boyle 2022-07-28 11:37:36 -04:00
  • 9b20f1449c Better timing Peter Boyle 2022-07-28 11:37:12 -04:00
  • b99453083d Updated timing Peter Boyle 2022-07-28 11:37:02 -04:00
  • 2ab1af5754 Ensure no synchronize and not optoin dependent Peter Boyle 2022-07-19 09:51:06 -07:00
  • 5f8892bf03 Mistake pointed out by Camilo Peter Boyle 2022-07-19 09:31:51 -07:00
  • f14e7e51e7 Grid accelerator Peter Boyle 2022-07-12 10:56:22 -07:00
  • 943fbb914d Merge branch 'feature/dirichlet' of https://github.com/paboyle/Grid into feature/dirichlet Peter Boyle 2022-07-11 13:48:42 -04:00
  • ca4603580d Verbose Peter Boyle 2022-07-11 13:48:35 -04:00
  • f73db8f1f3 Synch clocks Peter Boyle 2022-07-11 13:47:39 -04:00
  • f7217d12d2 World barrier for clock synch Peter Boyle 2022-07-11 13:45:31 -04:00
  • fab50c57d9 More loggin Peter Boyle 2022-07-11 18:42:27 +01:00
  • 3440534fbf MixedPrec support Peter Boyle 2022-07-10 21:35:18 +01:00
  • 177b1a7ec6 Mixed prec Peter Boyle 2022-07-10 21:34:10 +01:00
  • 58182fe345 Different approach to default dirichlet params Peter Boyle 2022-07-10 21:32:58 +01:00
  • 1f907d330d Different default params for dirichlet Peter Boyle 2022-07-10 21:31:48 +01:00
  • b0fe664e9d Better force log info Peter Boyle 2022-07-10 21:31:25 +01:00
  • c0f8482402 Remove SSC marks Peter Boyle 2022-07-07 17:49:36 +01:00
  • 3544965f54 Stream doesn't work Peter Boyle 2022-07-07 17:49:20 +01:00
  • 33e4a0caee Imported changes from feature/gparity_HMC branch: Rework of WilsonFlow class Fixed logic error in smear method where the step index was initialized to 1 rather than 0, resulting in the logged output value of tau being too large by epsilon Previously smear_adaptive would maintain the current value of tau as a class member variable whereas smear would compute it separately; now both methods maintain the current value internally and it is updated by the evolve_step routines. Both evolve methods are now const. smear_adaptive now also maintains the current value of epsilon internally, allowing it to be a const method and also allowing the same class instance to be reused without needing to be reset Replaced the fixed evaluation of the plaquette energy density and plaquette topological charge during the smearing with a highly flexible general strategy where the user can add arbitrary measurements as functional objects that are evaluated at an arbitrary frequency By default the same plaquette-based measurements are performed, but additional example functions are provided where the smearing is performed with different choices of measurement that are returned as an array for further processing Added a method to compute the energy density using the Cloverleaf approach which has smaller discretization errors Added a new tensor utility operation, copyLane, which allows for the copying of a single SIMD lane between two instances of the same tensor type but potentially different precisions To LocalCoherenceLanczos, added the option to compute the high/low eval of the fine operator on every restart to aid in tuning the Chebyshev Added Test_field_array_io which demonstrates and tests a single-file write of an arbitrary array of fields Added Test_evec_compression which generates evecs using Lanczos and attempts to compress them using the local coherence technique Added Test_compressed_lanczos_gparity which demonstrates the local coherence Lanczos for G-parity BCs Added HMC main programs for the 40ID and 48ID G-parity lattices Christopher Kelly 2022-07-01 14:10:59 -04:00