portelli/Grid - Grid - DiRAC Tursa git server

mirror of https://github.com/paboyle/Grid.git synced 2026-05-15 14:44:30 +01:00

Author	SHA1	Message	Date
Peter Boyle	87697eb07e	SHared compile	2023-03-14 09:07:36 -07:00
rhodgson	a3e935c902	Batched block project/promote size checks	2023-02-27 11:38:16 +00:00
rhodgson	7731c7db8e	Add huge cache type and allow Ncache==0	2023-02-26 14:15:28 +00:00
rhodgson	ff97340324	Expose cached bytes	2023-02-26 12:22:45 +00:00
Christopher Kelly	83d86943db	Fixed compile bug in MemoryManagerShared caused by Audit function not being passed a string	2023-02-23 13:09:45 -05:00
Christopher Kelly	e82cf1d311	Further prec-change improvements Mixed prec CG algorithm has been modified to precompute precision change workspaces As the original Test_dwf_mixedcg_prec has been coopted to do a performance stability and reproducibility test, requiring the single-prec CG to be run 200 times, I have created a new version of Test_dwf_mixedcg_prec in the solver subdirectory that just does the mixed vs double CG test	2023-02-23 09:45:29 -05:00
Christopher Kelly	1db58a8acc	Precision change improvements Added a new, much faster implementation of precision change that uses (optionally) a precomputed workspace containing pointer offsets that is device resident, such that all lattice copying occurs only on the device and no host<->device transfer is required, other than the pointer table. It also avoids the need to unpack and repack the fields using explicit lane copying. When this new precisionChange is called without a workspace, one will be computed on-the-fly; however it is still considerably faster than the original implementation. In the special case of using double2 and when the Grids are the same, calls to the new precisionChange will automatically use precisionChangeFast, such that there is a single API call for all precision changes. Reliable update and mixed-prec multishift have been modified to precompute precision change workspaces Renamed the original precisionChange as precisionChangeOrig Fixed incorrect pointer offset bug in copyLane Added a test and a benchmark for precisionChange Added a test for reliable update CG	2023-02-21 10:52:42 -05:00
rhodgson	920a51438d	Added batched Mixed precision CG	2023-02-14 17:04:13 +00:00
rhodgson	be528b6d27	Add batched block project/promote functions	2023-02-14 14:37:10 +00:00
Peter Boyle	ccd21f96ff	Plaquette agreeing and moving to final form (slowly) need to optimise	2023-02-01 22:57:44 -05:00
Peter Boyle	4b90cb8888	First cut passes combining padded cell with general stencil towards fast plaquette and staggered force	2023-02-01 22:14:10 -05:00
Peter Boyle	796abfad80	Merge pull request #422 from fjosw/fix/NVCC_DIAG_PRAGMA_SUPPORT Disable diagnostic pragma warnings for CUDA 12+	2023-01-17 09:34:49 -05:00
fjosw	ad0270ac8c	fix: diagnostic pragma warnings fixed for CUDA 12+	2023-01-12 12:36:30 +00:00
Makis Kappas	7d62f1d6d2	Populate the Cshift_table in the GPU Cshift is allocated in Unified memory and used in the LambdaApply kernels but also populated from the host. This creates a lot of Unified HtoD and DtoH mem operations and has a negative effect in performance. With this commit we populate the Cshift table in the device with the populate_Cshift_table() kernel.	2023-01-11 21:26:25 +00:00
Christoph Lehner	458c943987	merged upstream	2022-12-31 11:16:21 +02:00
Christoph Lehner	88015b0858	Split sum in rankSum and GlobalSum	2022-12-26 10:01:32 +01:00
Peter Boyle	4ca1bf7cca	Added gauge invariance test	2022-12-21 07:23:16 -05:00
Peter Boyle	2ff868f7a5	CPU open doesn't need to free space	2022-12-20 05:10:23 -05:00
Peter Boyle	ede02b6883	Memory manager debug Felix case	2022-12-20 05:10:23 -05:00
Peter Boyle	1822ced302	Bug fix	2022-12-20 05:10:23 -05:00
Peter Boyle	37ba32776f	More logging	2022-12-20 05:10:23 -05:00
Peter Boyle	99b3697b03	More loggin	2022-12-20 05:10:23 -05:00
Peter Boyle	43a45ec97b	SSC_START	2022-12-20 05:10:23 -05:00
Peter Boyle	b00a4142e5	A=A fix	2022-12-20 05:10:23 -05:00
Peter Boyle	3791bc527b	Logging pulled in from dirichlet branch	2022-12-20 05:10:23 -05:00
Peter Boyle	d8c29f5fcf	Updated FFT test for PETSc	2022-12-18 12:05:00 -05:00
Peter Boyle	281f8101fe	Matt FFT test	2022-12-17 20:35:33 -05:00
Peter Boyle	472ed2dd5c	Merge branch 'feature/dirichlet' of https://github.com/paboyle/Grid into feature/dirichlet	2022-12-17 20:17:09 -05:00
Peter Boyle	4f85672674	Simpler test for PETSc	2022-12-17 20:16:11 -05:00
Peter Boyle	dc747c54be	Merge branch 'develop' into feature/dirichlet Conflicts: Grid/qcd/action/fermion/WilsonCompressor.h Grid/stencil/Stencil.h	2022-12-13 08:24:58 -05:00
Peter Boyle	140684d706	Head to head vs HMC	2022-12-13 08:15:38 -05:00
Peter Boyle	5bb7ba92fa	Test for DDHMC force term	2022-12-13 08:15:11 -05:00
Peter Boyle	b54d0f3c73	Smaller deltaH down to 7000s on t=0.5 trajectory	2022-12-13 08:14:27 -05:00
Peter Boyle	ff6777a98d	Variable depth experiments	2022-12-13 08:13:51 -05:00
Peter Boyle	07acfe89f2	Merge pull request #417 from rrhodgson/feature/fermtoprop Feature/fermtoprop	2022-12-06 12:45:03 -05:00
rhodgson	40234f531f	FermToProp accelerator_for -> thread_for	2022-12-06 17:34:51 +00:00
rhodgson	d49694f38f	PropToFerm fix	2022-12-06 15:48:54 +00:00
Chulwoo Jung	dc6a38f177	Minor cleanup	2022-11-30 17:13:12 -05:00
Chulwoo Jung	82c1ecf60f	Block lanczos added	2022-11-30 16:08:40 -05:00
Peter Boyle	67f569354e	Partial dirichlet changes	2022-11-30 15:51:13 -05:00
Peter Boyle	97a098636d	FermToProp	2022-11-30 15:36:35 -05:00
Peter Boyle	e13930c8b2	Faster fermtoprop case	2022-11-30 15:11:29 -05:00
Peter Boyle	5fa573dfd3	partial send fix	2022-11-25 00:51:04 -05:00
Peter Boyle	f6402cb6c4	AUDIT removal	2022-11-25 00:50:33 -05:00
Peter Boyle	bae6c263dc	Audit	2022-11-25 00:47:01 -05:00
Peter Boyle	d71672dca9	Bug fix	2022-11-25 00:46:35 -05:00
Peter Boyle	121c9e2ceb	Tracing	2022-11-25 00:45:21 -05:00
Peter Boyle	63a30ae34f	Tracing	2022-11-25 00:45:05 -05:00
Peter Boyle	7d8231ba32	Tracing	2022-11-25 00:44:57 -05:00
Peter Boyle	b690b1cbe9	Audit	2022-11-25 00:43:57 -05:00

... 2 3 4 5 6 ...

7273 Commits