portelli/Grid - Grid - DiRAC Tursa git server

mirror of https://github.com/paboyle/Grid.git synced 2025-08-12 17:27:06 +01:00

Author	SHA1	Message	Date
Peter Boyle	0f2b786436	Vector -> vector	2023-03-21 15:36:11 -04:00
Peter Boyle	e1c326558a	COmms improvements	2023-03-21 08:53:56 -07:00
Peter Boyle	bae0f8ea99	Merge pull request #425 from rrhodgson/feature/CacheLogging Huge Cache	2023-03-21 08:59:08 -04:00
Peter Boyle	bbbcd36ae5	Merge pull request #426 from rrhodgson/feature/LCDeflation Batched Local Coherence Tools	2023-03-21 08:58:40 -04:00
Peter Boyle	39c0815d9e	WriteDiscard	2023-03-21 08:57:29 -04:00
Peter Boyle	a997d24743	Remove nofma	2023-03-14 12:10:31 -07:00
Peter Boyle	861e5d7f4c	SYCL version update. Why do they keep making incompatible changes	2023-03-14 12:10:02 -07:00
Peter Boyle	14cc142a14	Warning remove	2023-03-14 12:09:26 -07:00
Peter Boyle	f36b87deb5	syscall fix	2023-03-14 12:09:00 -07:00
Peter Boyle	eeb6e0a6e3	Renable cache blocking and efficient UPI type SHM comms	2023-03-14 09:10:27 -07:00
Peter Boyle	cad5b187dd	Cleanup	2023-03-14 09:08:16 -07:00
Peter Boyle	87697eb07e	SHared compile	2023-03-14 09:07:36 -07:00
Raoul Hodgson	a3e935c902	Batched block project/promote size checks	2023-02-27 11:38:16 +00:00
Raoul Hodgson	7731c7db8e	Add huge cache type and allow Ncache==0	2023-02-26 14:15:28 +00:00
Raoul Hodgson	ff97340324	Expose cached bytes	2023-02-26 12:22:45 +00:00
Christopher Kelly	83d86943db	Fixed compile bug in MemoryManagerShared caused by Audit function not being passed a string	2023-02-23 13:09:45 -05:00
Christopher Kelly	e82cf1d311	Further prec-change improvements Mixed prec CG algorithm has been modified to precompute precision change workspaces As the original Test_dwf_mixedcg_prec has been coopted to do a performance stability and reproducibility test, requiring the single-prec CG to be run 200 times, I have created a new version of Test_dwf_mixedcg_prec in the solver subdirectory that just does the mixed vs double CG test	2023-02-23 09:45:29 -05:00
Christopher Kelly	1db58a8acc	Precision change improvements Added a new, much faster implementation of precision change that uses (optionally) a precomputed workspace containing pointer offsets that is device resident, such that all lattice copying occurs only on the device and no host<->device transfer is required, other than the pointer table. It also avoids the need to unpack and repack the fields using explicit lane copying. When this new precisionChange is called without a workspace, one will be computed on-the-fly; however it is still considerably faster than the original implementation. In the special case of using double2 and when the Grids are the same, calls to the new precisionChange will automatically use precisionChangeFast, such that there is a single API call for all precision changes. Reliable update and mixed-prec multishift have been modified to precompute precision change workspaces Renamed the original precisionChange as precisionChangeOrig Fixed incorrect pointer offset bug in copyLane Added a test and a benchmark for precisionChange Added a test for reliable update CG	2023-02-21 10:52:42 -05:00
Raoul Hodgson	920a51438d	Added batched Mixed precision CG	2023-02-14 17:04:13 +00:00
Raoul Hodgson	be528b6d27	Add batched block project/promote functions	2023-02-14 14:37:10 +00:00
Peter Boyle	796abfad80	Merge pull request #422 from fjosw/fix/NVCC_DIAG_PRAGMA_SUPPORT Disable diagnostic pragma warnings for CUDA 12+	2023-01-17 09:34:49 -05:00
Fabian Joswig	ad0270ac8c	fix: diagnostic pragma warnings fixed for CUDA 12+	2023-01-12 12:36:30 +00:00
Makis Kappas	7d62f1d6d2	Populate the Cshift_table in the GPU Cshift is allocated in Unified memory and used in the LambdaApply kernels but also populated from the host. This creates a lot of Unified HtoD and DtoH mem operations and has a negative effect in performance. With this commit we populate the Cshift table in the device with the populate_Cshift_table() kernel.	2023-01-11 21:26:25 +00:00
Christoph Lehner	458c943987	merged upstream	2022-12-31 11:16:21 +02:00
Christoph Lehner	88015b0858	Split sum in rankSum and GlobalSum	2022-12-26 10:01:32 +01:00
Peter Boyle	4ca1bf7cca	Added gauge invariance test	2022-12-21 07:23:16 -05:00
Peter Boyle	2ff868f7a5	CPU open doesn't need to free space	2022-12-20 05:10:23 -05:00
Peter Boyle	ede02b6883	Memory manager debug Felix case	2022-12-20 05:10:23 -05:00
Peter Boyle	1822ced302	Bug fix	2022-12-20 05:10:23 -05:00
Peter Boyle	37ba32776f	More logging	2022-12-20 05:10:23 -05:00
Peter Boyle	99b3697b03	More loggin	2022-12-20 05:10:23 -05:00
Peter Boyle	43a45ec97b	SSC_START	2022-12-20 05:10:23 -05:00
Peter Boyle	b00a4142e5	A=A fix	2022-12-20 05:10:23 -05:00
Peter Boyle	3791bc527b	Logging pulled in from dirichlet branch	2022-12-20 05:10:23 -05:00
Peter Boyle	d8c29f5fcf	Updated FFT test for PETSc	2022-12-18 12:05:00 -05:00
Peter Boyle	281f8101fe	Matt FFT test	2022-12-17 20:35:33 -05:00
Peter Boyle	472ed2dd5c	Merge branch 'feature/dirichlet' of https://github.com/paboyle/Grid into feature/dirichlet	2022-12-17 20:17:09 -05:00
Peter Boyle	4f85672674	Simpler test for PETSc	2022-12-17 20:16:11 -05:00
Peter Boyle	dc747c54be	Merge branch 'develop' into feature/dirichlet Conflicts: Grid/qcd/action/fermion/WilsonCompressor.h Grid/stencil/Stencil.h	2022-12-13 08:24:58 -05:00
Peter Boyle	140684d706	Head to head vs HMC	2022-12-13 08:15:38 -05:00
Peter Boyle	5bb7ba92fa	Test for DDHMC force term	2022-12-13 08:15:11 -05:00
Peter Boyle	b54d0f3c73	Smaller deltaH down to 7000s on t=0.5 trajectory	2022-12-13 08:14:27 -05:00
Peter Boyle	ff6777a98d	Variable depth experiments	2022-12-13 08:13:51 -05:00
Peter Boyle	07acfe89f2	Merge pull request #417 from rrhodgson/feature/fermtoprop Feature/fermtoprop	2022-12-06 12:45:03 -05:00
Raoul Hodgson	40234f531f	FermToProp accelerator_for -> thread_for	2022-12-06 17:34:51 +00:00
Raoul Hodgson	d49694f38f	PropToFerm fix	2022-12-06 15:48:54 +00:00
Chulwoo Jung	dc6a38f177	Minor cleanup	2022-11-30 17:13:12 -05:00
Chulwoo Jung	82c1ecf60f	Block lanczos added	2022-11-30 16:08:40 -05:00
Peter Boyle	67f569354e	Partial dirichlet changes	2022-11-30 15:51:13 -05:00
Peter Boyle	97a098636d	FermToProp	2022-11-30 15:36:35 -05:00

1 2 3 4 5 ...

7182 Commits