mirror of https://github.com/paboyle/Grid.git synced 2025-08-14 10:11:53 +01:00

Go to file

Peter Boyle dc814f30da Binary IO file for generic Grid array parallel I/O.

Number of IO MPI tasks can be varied by selecting which
dimensions use parallel IO and which dimensions use Serial send to boss
I/O.

Thus can neck down from, say 1024 nodes = 4x4x8x8 to {1,8,32,64,128,256,1024} nodes
doing the I/O.

Interpolates nicely between ALL nodes write their data, a single boss per time-plane
in processor space [old UKQCD fortran code did this], and a single node doing all I/O.

Not sure I have the transfer sizes big enough and am not overly convinced fstream
is guaranteed to not give buffer inconsistencies unless I set streambuf size to zero.

Practically it has worked on 8 tasks, 2x1x2x2 writing /cloning NERSC configurations
on my MacOS + OpenMPI and Clang environment.

It is VERY easy to switch to pwrite at a later date, and also easy to send x-strips around from
each node in order to gather bigger chunks at the syscall level.

That would push us up to the circa 8x 18*4*8 == 4KB size write chunk, and by taking, say, x/y non
parallel we get to 16MB contiguous chunks written in multi 4KB transactions
per IOnode in 64^3 lattices for configuration I/O.

I suspect this is fine for system performance.

2015-08-26 13:40:29 +01:00

benchmarks

Gparity test added; partial implementation -- this is Chris K's doubled lattice only

2015-08-12 09:49:33 +01:00

docs

Small modification to the configure files

2015-06-04 14:17:58 +09:00

gcc-bug-report

Sizable improvement in multigrid for unsquared.

2015-07-24 01:31:13 +09:00

lib

Binary IO file for generic Grid array parallel I/O.

2015-08-26 13:40:29 +01:00

fix of AX_GCC_X86_AVX_XGETBV macro

2015-07-17 11:15:57 +09:00

scripts

Binary IO file for generic Grid array parallel I/O.

2015-08-26 13:40:29 +01:00

tests

Binary IO file for generic Grid array parallel I/O.

2015-08-26 13:40:29 +01:00

.gitignore

gitignore update

2015-07-17 11:15:17 +09:00

AUTHORS

Update AUTHORS

2015-03-07 07:00:39 +00:00

ChangeLog

Updating build system

2015-03-04 04:53:40 +00:00

configure

Merge problem fixed

2015-08-01 22:30:00 +09:00

configure.ac

Small change in the HMC interface.

2015-07-30 17:16:57 +09:00

COPYING

Extra files

2015-03-04 12:03:07 +00:00

LICENSE

Initial commit

2015-03-04 02:30:11 +00:00

Makefile.am

Reorganise of file naming

2015-06-03 12:47:05 +01:00

NEWS

Updating build system

2015-03-04 04:53:40 +00:00

README

Added check of mpfr and gmp at configure time

2015-05-19 13:54:55 +09:00

README.md

Check at configure time if CPU supports the requested SIMD optimization

2015-05-27 18:30:11 +09:00

TODO

Binary IO file for generic Grid array parallel I/O.

2015-08-26 13:40:29 +01:00

README.md

Grid

Data parallel C++ mathematical object library

This library provides data parallel C++ container classes with internal memory layout that is transformed to map efficiently to SIMD architectures. CSHIFT facilities are provided, similar to HPF and cmfortran, and user control is given over the mapping of array indices to both MPI tasks and SIMD processing elements.

Identically shaped arrays then be processed with perfect data parallelisation.
Such identically shapped arrays are called conformable arrays.

The transformation is based on the observation that Cartesian array processing involves identical processing to be performed on different regions of the Cartesian array.

The library will both geometrically decompose into MPI tasks and across SIMD lanes. Local vector loops are parallelised with OpenMP pragmas.

Data parallel array operations can then be specified with a SINGLE data parallel paradigm, but optimally use MPI, OpenMP and SIMD parallelism under the hood. This is a significant simplification for most programmers.

The layout transformations are parametrised by the SIMD vector length. This adapts according to the architecture. Presently SSE4 (128 bit) AVX, AVX2 (256 bit) and IMCI and AVX512 (512 bit) targets are supported.

These are presented as

vRealF, vRealD, vComplexF, vComplexD

internal vector data types. These may be useful in themselves for other programmers. The corresponding scalar types are named

RealF, RealD, ComplexF, ComplexD

MPI, OpenMP, and SIMD parallelism are present in the library.

You can give `configure' initial values for configuration parameters by setting variables in the command line or in the environment. Here are examples:

 ./configure CXX=clang++ CXXFLAGS="-std=c++11 -O3 -msse4" --enable-simd=SSE4

 ./configure CXX=clang++ CXXFLAGS="-std=c++11 -O3 -mavx" --enable-simd=AVX1

 ./configure CXX=clang++ CXXFLAGS="-std=c++11 -O3 -mavx2" --enable-simd=AVX2

 ./configure CXX=icpc CXXFLAGS="-std=c++11 -O3 -mmic" --enable-simd=AVX512 --host=none

For developers: Use reconfigure_script in the scripts/ directory to create the autotools environment

Languages

C++ 92.6%

C 3.5%

M4 1.7%

Shell 1%

Mathematica 0.8%

Other 0.3%