Peter Boyle
5ae77876a8
Meson field and Aslash field on GPU; some compiler warning removed
2024-10-18 19:08:06 -04:00
066544281f
Deprecate UVM
2024-09-17 13:34:27 +00:00
Peter Boyle
53573d7d94
Better benchmark
2024-08-20 14:31:57 +00:00
Peter Boyle
f8f408e7a9
BLAS everywhere
2024-07-25 18:09:02 +00:00
517822fdd2
SPR HBM benchmarking right and also PVC batched GEMM
2024-03-06 00:02:27 +00:00
Peter Boyle
c805f86343
USQCD benchmark
2024-03-01 00:05:04 -05:00
Peter Boyle
04ca065281
Only one rank opens
2024-02-29 20:09:11 -05:00
Peter Boyle
88d8fa43d7
Benchmark development
2024-02-29 20:01:44 -05:00
303b83cdb8
Scaling benchmarks, verbosity and MPICH aware in acceleratorInit()
...
For some reason Dirichlet benchmark fails on several nodes; need to
debug this.
2024-02-13 19:48:03 +00:00
Peter Boyle
14643c0aab
SDCC benchmarking scripts for A100 nodes and IceLake nodes (AVX512)
2023-12-04 15:45:57 -05:00
Peter Boyle
86dac5ff4f
Better printing
2023-04-04 07:42:19 -07:00
Peter Boyle
900e01f49b
Temporary
2023-03-27 21:35:06 -07:00
Peter Boyle
23298acb81
Merge pull request #424 from giltirn/feature/dirichlet-precchange
...
Precision change implementation
2023-03-22 23:04:52 -04:00
Peter Boyle
b5b759df73
Merge branch 'develop' into feature/dirichlet
2023-03-21 16:05:46 -04:00
Christopher Kelly
1db58a8acc
Precision change improvements
...
Added a new, much faster implementation of precision change that uses (optionally) a precomputed workspace containing pointer offsets that is device resident, such that all lattice copying occurs only on the device and no host<->device transfer is required, other than the pointer table. It also avoids the need to unpack and repack the fields using explicit lane copying. When this new precisionChange is called without a workspace, one will be computed on-the-fly; however it is still considerably faster than the original implementation.
In the special case of using double2 and when the Grids are the same, calls to the new precisionChange will automatically use precisionChangeFast, such that there is a single API call for all precision changes.
Reliable update and mixed-prec multishift have been modified to precompute precision change workspaces
Renamed the original precisionChange as precisionChangeOrig
Fixed incorrect pointer offset bug in copyLane
Added a test and a benchmark for precisionChange
Added a test for reliable update CG
2023-02-21 10:52:42 -05:00
Peter Boyle
67f569354e
Partial dirichlet changes
2022-11-30 15:51:13 -05:00
Peter Boyle
fe6e8f5ac6
Benchmark_comms fix
2022-11-15 17:00:49 -05:00
Peter Boyle
0ae0e5f436
Partial Dirichlet test
2022-11-15 16:40:38 -05:00
Peter Boyle
653039695b
Partial dirichlet changes
2022-11-15 16:37:15 -05:00
Peter Boyle
c82b164f6b
Merge branch 'feature/dirichlet' of https://github.com/paboyle/Grid into feature/dirichlet
2022-10-04 17:41:48 -04:00
Peter Boyle
413312f9a9
Benchmark the halo construction.
...
THe bye counts are out and should be doubled for SIMD directions
2022-10-04 11:12:59 -07:00
Peter Boyle
25df2d2c3b
Various precision options
2022-09-27 10:57:12 -04:00
Peter Boyle
cd5cf6d614
Tracing replaces self timing hooks
2022-08-31 17:33:41 -04:00
Peter Boyle
c0f8482402
Remove SSC marks
2022-07-07 17:49:36 +01:00
Peter Boyle
583f7c52f3
SSC mark
2022-06-01 19:27:29 -04:00
Peter Boyle
58a86c9164
SSC mark removal
2022-06-01 19:27:06 -04:00
Peter Boyle
18028f4309
Merge branch 'develop' into feature/dirichlet
2022-05-24 18:26:18 -07:00
Peter Boyle
aa008cbe99
Updated for new Dirichlet interface
2022-05-19 16:44:39 -07:00
4b1997e2f3
wilson sweep test
2022-05-16 15:58:33 +01:00
8939d5dc73
bugfix: eo operator called in correct location
2022-05-16 00:28:28 +01:00
Christoph Lehner
e2fc3a0f04
Merge pull request #28 from paboyle/develop
...
Sync with Upstream
2022-03-08 09:58:51 +01:00
Peter Boyle
5340e50427
HMC running with new formulation
2022-03-01 17:10:25 -05:00
Christoph Lehner
9616811c3d
Merge branch 'feature/gpt' of https://github.com/lehner/Grid into feature/gpt
2022-02-24 22:03:05 +01:00
Christoph Lehner
8a3002c03b
separate left and right masses for CayleyFermion5D
2022-02-24 22:02:56 +01:00
Peter Boyle
0f1c5b08a1
Dirichlet filters running on AMD and now integrated in Fermion op
2022-02-23 19:29:28 -05:00
Peter Boyle
70988e43d2
Passes multinode dirichlet test with boundaries at
...
node boundary or at the single rank boundary
2022-02-23 01:42:14 -05:00
Peter Boyle
aab3bcb46f
Dirichlet first cut - wrong answers on dagger multiply.
...
Struggling to get a compute node so changing systems
2022-02-22 19:58:33 +00:00
Peter Boyle
135808dcfa
Less verbose
2021-12-07 16:24:24 -05:00
Peter Boyle
2bf3b4d576
Update to reduce memory footpring in benchmark test
2021-12-07 09:02:02 -08:00
Peter Boyle
ba7e371b90
Warning free compile on Tursa.
...
Hopefully got all reqd virtual dtors
2021-10-21 19:56:52 +01:00
Peter Boyle
8bd70ad8b5
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
2021-09-16 10:22:38 -07:00
Peter Boyle
b4690e6091
Adding build basics for different systems
2021-09-16 00:00:38 +01:00
Peter Boyle
c7baeb5bae
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
2021-09-14 08:31:11 -07:00
Peter Boyle
361bb8a101
Remove half prec comms
2021-09-14 15:06:29 +01:00
Peter Boyle
7efdb3cd2b
Remove half prec comms
2021-09-14 15:06:06 +01:00
Peter Boyle
bcfa9cf068
Improvement of output
2021-08-28 08:08:15 -07:00
Peter Boyle
75030637cc
Improved comms benchmark, same as benchmark_comms_host_device
2021-08-10 05:16:30 -07:00
Peter Boyle
fe5aaf7677
Make comms benchmark same as Benchmark_comms_host_device
2021-08-09 04:06:30 -07:00
Peter Boyle
1eea9d73b9
Pass serial RNG around
2021-03-03 23:50:01 +01:00
Peter Boyle
cf76741ec6
Intel DPCPP Gold happy now (compiles all, runs Benchmark_dwf_fp32 )
2020-12-03 03:47:11 -08:00