Peter Boyle
23298acb81
Merge pull request #424 from giltirn/feature/dirichlet-precchange
...
Precision change implementation
2023-03-22 23:04:52 -04:00
Peter Boyle
b5b759df73
Merge branch 'develop' into feature/dirichlet
2023-03-21 16:05:46 -04:00
Christopher Kelly
1db58a8acc
Precision change improvements
...
Added a new, much faster implementation of precision change that uses (optionally) a precomputed workspace containing pointer offsets that is device resident, such that all lattice copying occurs only on the device and no host<->device transfer is required, other than the pointer table. It also avoids the need to unpack and repack the fields using explicit lane copying. When this new precisionChange is called without a workspace, one will be computed on-the-fly; however it is still considerably faster than the original implementation.
In the special case of using double2 and when the Grids are the same, calls to the new precisionChange will automatically use precisionChangeFast, such that there is a single API call for all precision changes.
Reliable update and mixed-prec multishift have been modified to precompute precision change workspaces
Renamed the original precisionChange as precisionChangeOrig
Fixed incorrect pointer offset bug in copyLane
Added a test and a benchmark for precisionChange
Added a test for reliable update CG
2023-02-21 10:52:42 -05:00
Peter Boyle
67f569354e
Partial dirichlet changes
2022-11-30 15:51:13 -05:00
Peter Boyle
fe6e8f5ac6
Benchmark_comms fix
2022-11-15 17:00:49 -05:00
Peter Boyle
0ae0e5f436
Partial Dirichlet test
2022-11-15 16:40:38 -05:00
Peter Boyle
653039695b
Partial dirichlet changes
2022-11-15 16:37:15 -05:00
Peter Boyle
c82b164f6b
Merge branch 'feature/dirichlet' of https://github.com/paboyle/Grid into feature/dirichlet
2022-10-04 17:41:48 -04:00
Peter Boyle
413312f9a9
Benchmark the halo construction.
...
THe bye counts are out and should be doubled for SIMD directions
2022-10-04 11:12:59 -07:00
Peter Boyle
25df2d2c3b
Various precision options
2022-09-27 10:57:12 -04:00
Peter Boyle
cd5cf6d614
Tracing replaces self timing hooks
2022-08-31 17:33:41 -04:00
Peter Boyle
c0f8482402
Remove SSC marks
2022-07-07 17:49:36 +01:00
Peter Boyle
583f7c52f3
SSC mark
2022-06-01 19:27:29 -04:00
Peter Boyle
58a86c9164
SSC mark removal
2022-06-01 19:27:06 -04:00
Peter Boyle
18028f4309
Merge branch 'develop' into feature/dirichlet
2022-05-24 18:26:18 -07:00
Peter Boyle
aa008cbe99
Updated for new Dirichlet interface
2022-05-19 16:44:39 -07:00
4b1997e2f3
wilson sweep test
2022-05-16 15:58:33 +01:00
8939d5dc73
bugfix: eo operator called in correct location
2022-05-16 00:28:28 +01:00
Christoph Lehner
e2fc3a0f04
Merge pull request #28 from paboyle/develop
...
Sync with Upstream
2022-03-08 09:58:51 +01:00
Peter Boyle
5340e50427
HMC running with new formulation
2022-03-01 17:10:25 -05:00
Christoph Lehner
9616811c3d
Merge branch 'feature/gpt' of https://github.com/lehner/Grid into feature/gpt
2022-02-24 22:03:05 +01:00
Christoph Lehner
8a3002c03b
separate left and right masses for CayleyFermion5D
2022-02-24 22:02:56 +01:00
Peter Boyle
0f1c5b08a1
Dirichlet filters running on AMD and now integrated in Fermion op
2022-02-23 19:29:28 -05:00
Peter Boyle
70988e43d2
Passes multinode dirichlet test with boundaries at
...
node boundary or at the single rank boundary
2022-02-23 01:42:14 -05:00
Peter Boyle
aab3bcb46f
Dirichlet first cut - wrong answers on dagger multiply.
...
Struggling to get a compute node so changing systems
2022-02-22 19:58:33 +00:00
Peter Boyle
135808dcfa
Less verbose
2021-12-07 16:24:24 -05:00
Peter Boyle
2bf3b4d576
Update to reduce memory footpring in benchmark test
2021-12-07 09:02:02 -08:00
Peter Boyle
ba7e371b90
Warning free compile on Tursa.
...
Hopefully got all reqd virtual dtors
2021-10-21 19:56:52 +01:00
Peter Boyle
8bd70ad8b5
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
2021-09-16 10:22:38 -07:00
Peter Boyle
b4690e6091
Adding build basics for different systems
2021-09-16 00:00:38 +01:00
Peter Boyle
c7baeb5bae
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
2021-09-14 08:31:11 -07:00
Peter Boyle
361bb8a101
Remove half prec comms
2021-09-14 15:06:29 +01:00
Peter Boyle
7efdb3cd2b
Remove half prec comms
2021-09-14 15:06:06 +01:00
Peter Boyle
bcfa9cf068
Improvement of output
2021-08-28 08:08:15 -07:00
Peter Boyle
75030637cc
Improved comms benchmark, same as benchmark_comms_host_device
2021-08-10 05:16:30 -07:00
Peter Boyle
fe5aaf7677
Make comms benchmark same as Benchmark_comms_host_device
2021-08-09 04:06:30 -07:00
Peter Boyle
1eea9d73b9
Pass serial RNG around
2021-03-03 23:50:01 +01:00
Peter Boyle
cf76741ec6
Intel DPCPP Gold happy now (compiles all, runs Benchmark_dwf_fp32 )
2020-12-03 03:47:11 -08:00
Peter Boyle
147dc15d26
Update
2020-11-20 13:13:59 -05:00
Peter Boyle
8fcb392e24
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
2020-11-17 04:51:31 -08:00
Peter Boyle
dd8d70eeff
Build without LIME
2020-11-17 04:41:15 -08:00
Peter Boyle
3aab983760
Flop count set as in DiRAC-ITT-2020 (mistaken 20% low, but must maintain consistency)
2020-11-16 17:13:58 +01:00
Peter Boyle
9c4dcc5ea3
Merge branch 'master' into develop
2020-11-16 16:34:57 +01:00
Peter Boyle
e9bc748828
Useful GPU machine benchmark for GDR used to shakeout Booster at Juelich - see slack earlyaccess channel
2020-11-13 03:58:34 +01:00
Peter Boyle
f48156529b
Work on 2,2,2,8 ranks
2020-11-13 03:57:58 +01:00
Peter Boyle
f16c2665f5
Host memory explict
2020-11-12 20:29:58 +01:00
Peter Boyle
41e28015ae
Volume divisible guarantee
2020-11-07 13:32:16 +01:00
Peter Boyle
3f06209720
Pretty print
2020-10-13 22:18:51 -04:00
c2b688abc9
Benchmark_IO: reducing max local volume to 32^4
2020-10-10 16:52:56 +01:00
b0d61b9687
Benchmark_IO cleaner output
2020-10-09 21:46:45 +01:00