portelli/Grid - Grid - DiRAC Tursa git server

mirror of https://github.com/paboyle/Grid.git synced 2024-11-10 15:55:37 +00:00

Author	SHA1	Message	Date
Antonin Portelli	5803933aea	First implementation of HDF5 serial IO writer, reader is still empty	2017-01-17 16:21:18 -08:00
paboyle	d15ab66aae	FFT moves higher in include order	2016-08-31 00:25:22 +01:00
paboyle	17097a93ec	FFTW test ran over 4 mpi processes.	2016-08-17 01:33:55 +01:00
paboyle	a0676beeb1	Open up dependency on Eigen and FFTW	2016-07-07 22:31:07 +01:00
paboyle	165bffc2e7	Avx512 changes for assembler kernels	2016-03-26 22:25:45 -06:00
Peter Boyle	7f927a541c	Shmem related fixes for shmem compile	2016-02-11 07:37:39 -06:00
paboyle	aae8bf31a7	Global edit adding copyright and license info to every source file.	2016-01-02 14:51:32 +00:00
paboyle	0afcf1cf13	Moved all the HMC tests over to using a single HmcRunner class that manages checkpoint strategies and such like	2015-12-22 11:19:25 +00:00
paboyle	31ca609d12	HMC checkpointing . Need a general HMC framework to work in restart.	2015-12-20 02:29:51 +00:00
Peter Boyle	dc814f30da	Binary IO file for generic Grid array parallel I/O. Number of IO MPI tasks can be varied by selecting which dimensions use parallel IO and which dimensions use Serial send to boss I/O. Thus can neck down from, say 1024 nodes = 4x4x8x8 to {1,8,32,64,128,256,1024} nodes doing the I/O. Interpolates nicely between ALL nodes write their data, a single boss per time-plane in processor space [old UKQCD fortran code did this], and a single node doing all I/O. Not sure I have the transfer sizes big enough and am not overly convinced fstream is guaranteed to not give buffer inconsistencies unless I set streambuf size to zero. Practically it has worked on 8 tasks, 2x1x2x2 writing /cloning NERSC configurations on my MacOS + OpenMPI and Clang environment. It is VERY easy to switch to pwrite at a later date, and also easy to send x-strips around from each node in order to gather bigger chunks at the syscall level. That would push us up to the circa 8x 1848 == 4KB size write chunk, and by taking, say, x/y non parallel we get to 16MB contiguous chunks written in multi 4KB transactions per IOnode in 64^3 lattices for configuration I/O. I suspect this is fine for system performance.	2015-08-26 13:40:29 +01:00
Peter Boyle	35818fdf6c	Text and Binary readers	2015-08-20 23:04:38 +01:00
Peter Boyle	ab81a25073	XMLReader implementation and a virtual Reader/Writer template framework. Test_serialisation has an example of code free object serialisation to both ostream and to XML using macro magic. Implementing TextReader/TextWriter, YAML, JSON etc.. should be trivial and we can use configure time options to select the default "Reader" typedef. Present done with "using XMLPolicy::Reader" to pick up the default serialisation strategy.	2015-08-20 16:21:26 +01:00
Peter Boyle	4cc2ef84d3	Committing incomplete work for parameter file I/O. MacroMagic.h is central. Guido and I plan to move over to generating virtual (XML, JSON, YAML, text, binary) encoding from macro based system.	2015-07-27 18:32:28 +09:00
Peter Boyle	d1afebf71e	Sizable improvement in multigrid for unsquared. 6000 matmuls CG unprec 2000 matmuls CG prec (4000 eo muls) 1050 matmuls PGCR on 16^3 x 32 x 8 m=.01 Substantial effort on timing and logging infrastructure	2015-07-24 01:31:13 +09:00
Peter Boyle	1e5b015ee3	Some unary ops and coarse grid support	2015-06-09 10:26:19 +01:00
Peter Boyle	1d0df449e8	Reorganise of file naming	2015-06-03 12:47:05 +01:00
Peter Boyle	5644ab1e19	Large scale change to support 5d fermion formulations. Have 5d replicated wilson with 4d gauge working and matrix regressing to Ls copies of wilson.	2015-05-31 15:09:02 +01:00
Peter Boyle	840754dd42	Hand unrolled version of dslash in a separate class. Useful to compare; raises Intel compiler from 9GFlop/s to 17.5 Gflops. on ivybridge core. Raises Clang form 14.5 to 17.5	2015-05-26 19:54:03 +01:00
azusayamaguchi	91f29d4a68	Add messages to get the number of threads for openmp	2015-05-19 14:54:42 +01:00
Peter Boyle	11cb3e9a01	Getting closer to having a wilson solver... introducing a first and untested cut at Conjugate gradient. Also copied in Remez, Zolotarev, Chebyshev from Mike Clark, Tony Kennedy and my BFM package respectively since we know we will need these. I wanted the structure of algorithms/approx algorithms/iterative etc.. to start taking shape.	2015-05-18 07:47:05 +01:00
Peter Boyle	c0977dcfaa	strong inline required to force icpc	2015-05-15 11:31:41 +01:00
Peter Boyle	48f425d31c	I have made the Cshift work successfully with open mp threading in every routine. Collapse(2) is now working under clang-omp++.	2015-05-13 00:31:00 +01:00
Peter Boyle	6cec662ac5	Enhanced SIMD interfacing	2015-05-12 20:41:44 +01:00
Peter Boyle	6103c29ee3	Threading support rework. Placed parallel pragmas as macros; implemented deterministic thread reduction in style of BFM.	2015-05-12 07:51:41 +01:00
Peter Boyle	22d384b07d	Adding a better controlled threading class, preparing to force in deterministic reduction.	2015-05-11 18:59:03 +01:00
Peter Boyle	f5dcca7b1b	Got command line args working	2015-05-11 14:36:48 +01:00
paboyle	379943abf5	Command line args and a general clean up	2015-05-11 12:43:10 +01:00
Peter Boyle	193860dbc8	Comms and memory benchmarks added	2015-05-03 09:44:47 +01:00
Peter Boyle	94f728bee4	Big updates with progress towards wilson matrix	2015-04-26 15:51:09 +01:00
Peter Boyle	3083d2e908	Rename Grid_QCD	2015-04-23 20:42:09 +01:00
Peter Boyle	b32c14b433	Got the NERSC IO working and fixed a bug in cshift.	2015-04-22 22:46:48 +01:00
Peter Boyle	8ddfa7e6b0	Reorganisation	2015-04-18 21:23:32 +01:00
Peter Boyle	6eae2c1083	Shrinking and organising the files	2015-04-18 20:44:19 +01:00
Peter Boyle	cffad66894	Reorganise to keep files smaller	2015-04-18 18:36:48 +01:00
Peter Boyle	2ee9322a8f	Clean up caps.	2015-04-18 17:09:48 +01:00
Peter Boyle	c656164015	Reorg of build structure	2015-04-18 14:55:00 +01:00

36 Commits