From beed527ea37c90fd5e19b82d326eb8adc8eba5ff Mon Sep 17 00:00:00 2001 From: Peter Boyle Date: Mon, 24 Sep 2018 15:09:51 +0100 Subject: [PATCH] Carletons chapter --- documentation/interfacing.rst | 197 ++++++++++++++++++++++++++++++++++ 1 file changed, 197 insertions(+) create mode 100644 documentation/interfacing.rst diff --git a/documentation/interfacing.rst b/documentation/interfacing.rst new file mode 100644 index 00000000..3fd0c8a3 --- /dev/null +++ b/documentation/interfacing.rst @@ -0,0 +1,197 @@ +Interfacing with external software +======================================== + +Grid provides a number of important modules, such as solvers and +eigensolvers, that are highly optimized for complex vector/SIMD +architectures, such as the Intel Xeon Phi KNL and Skylake processors. +This growing library, with appropriate interfacing, can be accessed +from existing code. Here we describe interfacing issues and provide +examples. + + +MPI initialization +------------------ + +Grid supports threaded MPI sends and receives and, if running with +more than one thread, requires the MPI_THREAD_MULTIPLE mode of message +passing. If the user initializes MPI before starting Grid, the +appropriate initialization call is:: + + MPI_Init_thread(argc, argv, MPI_THREAD_MULTIPLE, &provided); + assert(MPI_THREAD_MULTIPLE == provided); + +Grid Initialization +------------------- + +Grid itself is initialized with a call:: + + Grid_init(&argc, &argv); + +.. todo:: CD: Where are the command-line arguments explained above? + +where `argc` and `argv` are constructed to simulate the command-line +options described above. At a minimum one must provide the `--grid` +and `--mpi` parameters. The latter specifies the grid of processors +(MPI ranks). + +The following Grid procedures are useful for verifying that Grid is +properly initialized. + +============================================================= =========================================================================================================== + Grid procedure returns +============================================================= =========================================================================================================== + std::vector GridDefaultLatt(); lattice size + std::vector GridDefaultSimd(int Nd,vComplex::Nsimd()); SIMD layout + std::vector GridDefaultMpi(); MPI layout + int Grid::GridThread::GetThreads(); number of threads + +MPI coordination +---------------- + +Grid wants to use its own numbering of MPI ranks and its own +assignment of the lattice coordinates with each rank. Obviously, the +calling program and Grid must agree on these conventions. It is +convenient to use Grid's Cartesian communicator class to discover the +processor assignments. For a four-dimensional processor grid one can +define:: + + static Grid::CartesianCommunicator *grid_cart = NULL; + grid_cart = new Grid::CartesianCommunicator(processors); + +where `processors` is of type `std::vector`, with values matching +the MPI processor-layout dimensions specified with the `--mpi` +argument in the `Grid_Init` call. Then each MPI rank can obtain its +processor coordinate using the Cartesian communicator instantiated +above. For example, in four dimensions:: + + std::vector pePos(4); + for(int i=0; i<4; i++) + pePos[i] = grid_cart->_processor_coor[i]; + +and each MPI process can get its world rank from its processor +coordinates using:: + + int peRank = grid_cart->RankFromProcessorCoor(pePos) + +Conversely, each MPI process can get its processor coordinates from +its world rank using:: + + grid_cart->ProcessorCoorFromRank(peRank, pePos); + +If the calling program initialized MPI before initializing Grid, it is +then important for each MPI process in the calling program to reset +its rank number so it agrees with Grid:: + + MPI_Comm comm; + MPI_Comm_split(MPI_COMM_THISJOB,jobid,peRank,&comm); + MPI_COMM_THISJOB = comm; + +where `MPI_COMM_THISJOB` is initially a copy of `MPI_COMM_WORLD` (with +`jobid = 0`), or it is a split communicator with `jobid` equal to the +index number of the subcommunicator. Once this is done,:: + + MPI_Comm_rank(MPI_COMM_THISJOB, &myrank); + +returns a rank that agrees with Grid's `peRank`. + + +Mapping fields between Grid and user layouts +------------------------------------------- + +In order to map data between layouts, it is important to know +how the lattice sites are distributed across the processor grid. A +lattice site with coordinates `r[mu]` is assigned to the processor with +processor coordinates `pePos[mu]` according to the rule:: + + pePos[mu] = r[mu]/dim[mu] + +where `dim[mu]` is the lattice dimension in the `mu` direction. For +performance reasons, it is important that the external data layout +follow the same rule. Then data mapping can be done without +requiring costly communication between ranks. We assume this is the +case here. + +When mapping data to and from Grid, one must choose a lattice object +defined on the appropriate grid, whether it be a full lattice (4D +`GridCartesian`), one of the checkerboards (4D +`GridRedBlackCartesian`), a five-dimensional full grid (5D +`GridCartesian`), or a five-dimensional checkerboard (5D +`GridRedBlackCartesian`). For example, an improved staggered fermion +color-vector field `cv` on a single checkerboard would be constructed +using + +**Example**:: + + std::vector latt_size = GridDefaultLatt(); + std::vector simd_layout = GridDefaultSimd(Nd,vComplex::Nsimd()); + std::vector mpi_layout = GridDefaultMpi(); + + GridCartesian Grid(latt_size,simd_layout,mpi_layout); + GridRedBlackCartesian RBGrid(&Grid); + + typename ImprovedStaggeredFermion::FermionField cv(RBGrid); + +To map data within an MPI rank, the external code must iterate over +the sites belonging to that rank (full or checkerboard as +appropriate). To import data into Grid, the external data on a single +site with coordinates `r` is first copied into the appropriate Grid +scalar object `s`. Then it is copied into the Grid lattice field `l` +with `pokeLocalSite`:: + + pokeLocalSite(const sobj &s, Lattice &l, Coordinate &r); + +To export data from Grid, the reverse operation starts with:: + + peekLocalSite(const sobj &s, Lattice &l, Coordinate &r); + +and then copies the single-site data from `s` into the corresponding +external type. + +Here is an example that maps a single site's worth of data in a MILC +color-vector field to a Grid scalar ColourVector object `cVec` and from +there to the lattice colour-vector field `cv`, as defined above. + +**Example**:: + + indexToCoords(idx,r); + ColourVector cVec; + for(int col=0; col r(4); + indexToCoords(idx,r); + std::vector r5(1,0); + for( int d = 0; d < 4; d++ ) r5.push_back(r[d]); + + for( int j = 0; j < Ls; j++ ){ + r5[0] = j; + ColourVector cVec; + for(int col=0; colcv), r5); + } +