1
0
mirror of https://github.com/paboyle/Grid.git synced 2025-06-16 06:47:06 +01:00
Commit Graph

55 Commits

Author SHA1 Message Date
3668e81c5e Extract slice working on checkerboard field for Block Lanczos 2024-08-27 11:31:30 -04:00
44b466e072 Make InsertSliceFast the default at some point in future.
Should I do this now?
2024-02-21 14:51:24 -05:00
addc638856 Fast localCopyRegion, blockProjectFast 2024-01-22 17:40:38 -05:00
ca5ae8a2e6 Revert to working. 2024-01-17 16:32:05 -05:00
b7c7000d0d Don't need the numerical rounding tolerance in multigrid 2023-12-22 18:10:23 -05:00
9feb801bb9 Much simpler GPU implementation 2023-12-21 15:24:06 -05:00
48d1f0df89 Optimised partially, working 2023-12-21 12:33:47 -05:00
2c54be651c Further updates 2023-11-29 09:43:29 -05:00
0a3682ad0b MultiRHS work 2023-11-28 07:43:37 -05:00
031f85247c multRHS initial support -- needs optimisation for multi project/promote.
Bug fix in freeing intermediate grids to stop double free
2023-11-23 18:18:35 -05:00
aa5047a9e4 Faster blockProject blockPromote 2023-10-24 10:49:55 -04:00
3bc2da5321 Merge branch 'feature/scidac-wp1' of https://github.com/paboyle/Grid into feature/scidac-wp1 2023-10-05 16:57:59 -04:00
59b9d0e030 coalesceRead the blockSum 2023-10-05 16:54:48 -04:00
6a87487544 Running on Frontier, fix RNG big volume y2k, affecting 5D RNG 2023-10-05 16:50:59 -04:00
f44dce390f Implemented acclerator-optimized versions of localCopyRegion and insertSliceLocal to speed up padding
Fixed const correctness on PaddedCell methods
Fixed compile issues on Crusher
Added timing breakdowns for PaddedCell::Expand and the padded implementations of the staples, visible under --log Performance
Optimized kernel for StaplePadded
Test_iwasaki_action_newstaple now repeats the calculation 10 times and reports average timings
2023-06-27 14:58:10 -04:00
9c8750f261 Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2023-05-11 12:29:09 -04:00
f534523ede Debug 2023-05-11 12:23:11 -04:00
2376156fbc Merge branch 'develop' into feature/dirichlet 2023-03-27 21:33:50 -07:00
a3e935c902 Batched block project/promote size checks 2023-02-27 11:38:16 +00:00
1db58a8acc Precision change improvements
Added a new, much faster implementation of precision change that uses (optionally) a precomputed workspace containing pointer offsets that is device resident, such that all lattice copying occurs only on the device and no host<->device transfer is required, other than the pointer table. It also avoids the need to unpack and repack the fields using explicit lane copying. When this new precisionChange is called without a workspace, one will be computed on-the-fly; however it is still considerably faster than the original implementation.

In the special case of using double2 and when the Grids are the same, calls to the new precisionChange will automatically use precisionChangeFast, such that there is a single API call for all precision changes.

Reliable update and mixed-prec multishift have been modified to precompute precision change workspaces

Renamed the original precisionChange as precisionChangeOrig

Fixed incorrect pointer offset bug in copyLane

Added a test and a benchmark for precisionChange

Added a test for reliable update CG
2023-02-21 10:52:42 -05:00
be528b6d27 Add batched block project/promote functions 2023-02-14 14:37:10 +00:00
204c283e16 Merge branch 'feature/dirichlet' of https://github.com/paboyle/Grid into feature/dirichlet 2022-10-11 14:59:07 -04:00
551a5f8dc8 RRII gpu option 2022-10-11 14:44:55 -04:00
c82b164f6b Merge branch 'feature/dirichlet' of https://github.com/paboyle/Grid into feature/dirichlet 2022-10-04 17:41:48 -04:00
7ffbc3e98e Double2 improved. REally don't like 'convertType' - localise to a GPT
header
2022-09-27 10:35:31 -04:00
e4c117aabf Compile fix, multishift mixed prec support 2022-09-23 16:19:27 -04:00
1ad54d049d To PeriodicBC and ConjugateBC, added a new function "CshiftLink" which performs a boundary-aware C-shift of links or products of links. For the latter, the links crossing the global boundary are complex-conjugated.
To the gauge implementations, added CshiftLink functions calling into the appropriate operation for the BC in a given direction.
GaugeTransform, FourierAcceleratedGaugeFixer and WilsonLoops::FieldStrength no longer implicitly assume periodic boundary conditions; instead the shifted link is obtained using CshiftLink and is aware of the gauge implementation.
Added an assert-check to ensure that the gauge fixing converges within the specified number of steps.
Added functionality to compute the timeslice averaged plaquette
Added functionality to compute the 5LI topological charge and timeslice topological charge
Added a check of the properties of the charge conjugation matrix C=-gamma_2 gamma_4 to Test_gamma
Fixed const correctness for Replicate
Modified Test_fft_gfix to support either conjugate or periodic BCs, optionally disabling Fourier-accelerated gauge fixing, and tuning of alpha using cmdline options
2022-06-02 15:30:41 -04:00
7e130076d6 Fixed line left behind 2021-09-24 17:26:31 +01:00
a822c48565 Added accelerated pick-set checkerboard functions 2021-09-24 17:13:25 +01:00
e2abbf9520 Merge pull request #25 from paboyle/develop
Sync
2021-09-15 10:02:43 +02:00
2bb374daea hip-friendly 2021-03-19 11:33:23 +01:00
3215d88a91 Simplify syntax with Grid::EnableIf post code review. Updated EnableIf so that ReturnType defaults to void in same way as std::enable_if see https://en.cppreference.com/w/cpp/types/enable_if 2021-02-03 15:17:03 +00:00
77063418da Fix issue for GPU by ensuring accelerator_inline version of convertType is available for Grid::complex<T>. This removes many warnings in Hadrons
Simplify the SFINAE syntax and correct convertType for iScalar
2021-01-25 15:09:36 +00:00
f0dc0f3621 fix compile issue on Qpace3 2020-08-22 13:57:33 +02:00
dbaa24ebf6 further GPU memory access fixes (with this GPT passes all single-rank tests on non-summit GPUs) 2020-08-13 16:14:15 +02:00
b949cf6b12 PeekLocal needs a view to keep thread safe.
ALLOCATION_CACHEE reenable
2020-06-19 17:13:27 -04:00
b5e87e8d97 summit compile fixes 2020-06-12 18:16:12 -04:00
a7ffc61e82 acceleratorSIMTlane() 2020-06-10 19:58:33 -04:00
cdf0a04fc5 Merge branch 'develop' into sycl 2020-06-09 04:00:12 -04:00
1a4c8c3387 Global edit with change to View usage. autoView() creates a wrapper object that closes the view when scope closes. 2020-06-05 18:52:35 -04:00
7860a50f70 Make view specify where and drive data motion - first cut.
This is a compile tiime option --enable-unified=yes/no
2020-05-21 16:13:16 -04:00
e9b295f967 Synchronize blocking infrastructure with GPT 2020-05-06 08:42:28 -04:00
6cdb09c884 Faster copy region 2020-04-10 11:10:52 -04:00
68b45f6444 Lower left/upper right region cut paste 2020-02-06 15:50:26 -05:00
1bd87c35d7 Read coalescing on Nvidia 2020-01-27 12:29:56 -05:00
9aafd20468 Simple block project promote runs faster on GPU 2019-12-17 05:01:39 -05:00
9e15474999 Accelerator loop attempt at speed up 2019-12-14 05:28:16 -05:00
152b525a4d Typo fix 2019-12-13 22:44:42 -05:00
d18994eddc offload more of mgrid to GPU 2019-12-13 22:08:11 -05:00
6b692aa726 Thread loops 2019-06-15 08:02:26 +01:00