portelli/Grid - Grid - DiRAC Tursa git server

mirror of https://github.com/paboyle/Grid.git synced 2024-11-10 07:55:35 +00:00

Author	SHA1	Message	Date
azusayamaguchi	b6a65059a2	Update to use shared memory to contain the stencil comms buffers Tested on 2.1.1.1 1.2.1.1 4.1.1.1 1.4.1.1 2.2.1.1 subnode decompositions	2016-10-24 17:30:43 +01:00
paboyle	a762b1fb71	MPI3 working with a bounce through shared memory on my laptop. Longer term plan: make the "u_comm_buf" in Stencil point to the shared region and avoid the send between ranks on same node.	2016-10-21 09:03:26 +01:00
azusayamaguchi	8b0d171c9a	32bit issue on the KNL code variant where byte offsets were stored	2016-10-12 17:49:32 +01:00
paboyle	ff6da364e8	FFT double and single precision gives good performance now in multithreaded code.	2016-08-24 15:05:00 +01:00
Guido Cossu	fdfbf11c6d	Merge branch 'develop' into temporary-smearing	2016-07-04 18:45:10 +01:00
Guido Cossu	9cb90f714e	Merge remote-tracking branch 'origin/develop' into temporary-smearing	2016-07-04 17:28:40 +01:00
paboyle	fc4a043663	Colors and banner clean up	2016-07-02 16:15:38 +01:00
Guido Cossu	565e9329ba	Changed the colouring classes	2016-06-30 16:51:03 +01:00
paboyle	139cc5f1ae	Large change with KNL preparation	2016-06-03 03:24:26 -07:00
Antonin Portelli	4bc21ec7cb	thread CL argument fix	2016-05-11 15:21:29 +01:00
Antonin Portelli	e99ce0875f	directly exit when using '--help' option	2016-05-01 16:05:16 -07:00
paboyle	60d965f79e	AVX512 improvements; sigfpe trapping too	2016-03-30 08:42:34 +01:00
Antonin Portelli	1eb169ac0b	compatibility fix	2016-02-23 16:36:50 +00:00
Antonin Portelli	497e7e4c53	BG/Q compatibility fix	2016-02-23 15:57:38 +00:00
Peter Boyle	2cfa20cc4e	Improving the logging, got fed up with color so optionally disable. Backtrace macro used everwhere	2016-02-21 07:58:53 -06:00
Peter Boyle	41c2b09184	Shmem comms [NO MPI] target added. The dwf test runs and passes. Not really shaken out to my satisfaction though as I want more tests done, so don't declare as working. But committing my current while I try a few experimentals.	2016-02-14 14:24:38 -06:00
paboyle	02452afd36	Optional overlap of comms with compute	2016-01-04 14:18:40 +00:00
paboyle	331768dcff	Added overlap comms compute mode	2016-01-03 01:38:11 +00:00
paboyle	4aac345bea	Updated logging to colour code according to message type	2016-01-02 17:21:14 +00:00
paboyle	15c0022042	GPLv2 clarified, and copyright message and banner in Init function. Color is just showing off....	2016-01-02 15:22:30 +00:00
paboyle	aae8bf31a7	Global edit adding copyright and license info to every source file.	2016-01-02 14:51:32 +00:00
Azusa Yamaguchi	78c4e862ef	Plaq, Rectangle, Iwasaki, Symanzik and DBW2 workign and HMC regresses to http://arxiv.org/pdf/hep-lat/0610075.pdf	2015-12-28 16:38:31 +00:00
Peter Boyle	825875fd48	compile fixes	2015-11-29 00:24:25 +00:00
paboyle	899ca41cb8	Merge branch 'master' of github.com:paboyle/Grid Conflicts: lib/qcd/action/fermion/WilsonFermion5D.cc	2015-11-06 03:50:04 -08:00
paboyle	64770d9052	Threading changes for many core and asm calls	2015-11-06 03:46:21 -08:00
Azusa Yamaguchi	3281745fde	Exec info and linux check to stop non-portable code breaking	2015-11-06 10:31:24 +00:00
paboyle	63a2993827	Exec info an cache blocking	2015-11-04 03:16:56 -08:00
Peter Boyle	64d64d1ab6	Updating to modify non-inlining permute routines and hopefully get better reg use and enhance performance.	2015-09-25 08:55:04 -07:00
Peter Boyle	dc814f30da	Binary IO file for generic Grid array parallel I/O. Number of IO MPI tasks can be varied by selecting which dimensions use parallel IO and which dimensions use Serial send to boss I/O. Thus can neck down from, say 1024 nodes = 4x4x8x8 to {1,8,32,64,128,256,1024} nodes doing the I/O. Interpolates nicely between ALL nodes write their data, a single boss per time-plane in processor space [old UKQCD fortran code did this], and a single node doing all I/O. Not sure I have the transfer sizes big enough and am not overly convinced fstream is guaranteed to not give buffer inconsistencies unless I set streambuf size to zero. Practically it has worked on 8 tasks, 2x1x2x2 writing /cloning NERSC configurations on my MacOS + OpenMPI and Clang environment. It is VERY easy to switch to pwrite at a later date, and also easy to send x-strips around from each node in order to gather bigger chunks at the syscall level. That would push us up to the circa 8x 1848 == 4KB size write chunk, and by taking, say, x/y non parallel we get to 16MB contiguous chunks written in multi 4KB transactions per IOnode in 64^3 lattices for configuration I/O. I suspect this is fine for system performance.	2015-08-26 13:40:29 +01:00
Peter Boyle	84a66476ab	Rework/global edit to enforce type templating of fermion operators. Allows multi-precision work and paves the way for alternate BC's and such like allowing for example G-parity which is important for K pipi programme. In particular, can drive an extra flavour index into the fermion fields using template types.	2015-08-10 20:47:44 +01:00
Peter Boyle	4cc2ef84d3	Committing incomplete work for parameter file I/O. MacroMagic.h is central. Guido and I plan to move over to generating virtual (XML, JSON, YAML, text, binary) encoding from macro based system.	2015-07-27 18:32:28 +09:00

31 Commits