Interfacing with external software ======================================== Grid provides a number of important modules, such as solvers and eigensolvers, that are highly optimized for complex vector/SIMD architectures, such as the Intel Xeon Phi KNL and Skylake processors. This growing library, with appropriate interfacing, can be accessed from existing code. Here we describe interfacing issues and provide examples. MPI initialization ------------------ Grid supports threaded MPI sends and receives and, if running with more than one thread, requires the MPI_THREAD_MULTIPLE mode of message passing. If the user initializes MPI before starting Grid, the appropriate initialization call is:: MPI_Init_thread(argc, argv, MPI_THREAD_MULTIPLE, &provided); assert(MPI_THREAD_MULTIPLE == provided); Grid Initialization ------------------- Grid itself is initialized with a call:: Grid_init(&argc, &argv); .. todo:: CD: Where are the command-line arguments explained above? where `argc` and `argv` are constructed to simulate the command-line options described above. At a minimum one must provide the `--grid` and `--mpi` parameters. The latter specifies the grid of processors (MPI ranks). The following Grid procedures are useful for verifying that Grid is properly initialized. ============================================================= =========================================================================================================== Grid procedure returns ============================================================= =========================================================================================================== std::vector GridDefaultLatt(); lattice size std::vector GridDefaultSimd(int Nd,vComplex::Nsimd()); SIMD layout std::vector GridDefaultMpi(); MPI layout int Grid::GridThread::GetThreads(); number of threads MPI coordination ---------------- Grid wants to use its own numbering of MPI ranks and its own assignment of the lattice coordinates with each rank. Obviously, the calling program and Grid must agree on these conventions. It is convenient to use Grid's Cartesian communicator class to discover the processor assignments. For a four-dimensional processor grid one can define:: static Grid::CartesianCommunicator *grid_cart = NULL; grid_cart = new Grid::CartesianCommunicator(processors); where `processors` is of type `std::vector`, with values matching the MPI processor-layout dimensions specified with the `--mpi` argument in the `Grid_Init` call. Then each MPI rank can obtain its processor coordinate using the Cartesian communicator instantiated above. For example, in four dimensions:: std::vector pePos(4); for(int i=0; i<4; i++) pePos[i] = grid_cart->_processor_coor[i]; and each MPI process can get its world rank from its processor coordinates using:: int peRank = grid_cart->RankFromProcessorCoor(pePos) Conversely, each MPI process can get its processor coordinates from its world rank using:: grid_cart->ProcessorCoorFromRank(peRank, pePos); If the calling program initialized MPI before initializing Grid, it is then important for each MPI process in the calling program to reset its rank number so it agrees with Grid:: MPI_Comm comm; MPI_Comm_split(MPI_COMM_THISJOB,jobid,peRank,&comm); MPI_COMM_THISJOB = comm; where `MPI_COMM_THISJOB` is initially a copy of `MPI_COMM_WORLD` (with `jobid = 0`), or it is a split communicator with `jobid` equal to the index number of the subcommunicator. Once this is done,:: MPI_Comm_rank(MPI_COMM_THISJOB, &myrank); returns a rank that agrees with Grid's `peRank`. Mapping fields between Grid and user layouts ------------------------------------------- In order to map data between layouts, it is important to know how the lattice sites are distributed across the processor grid. A lattice site with coordinates `r[mu]` is assigned to the processor with processor coordinates `pePos[mu]` according to the rule:: pePos[mu] = r[mu]/dim[mu] where `dim[mu]` is the lattice dimension in the `mu` direction. For performance reasons, it is important that the external data layout follow the same rule. Then data mapping can be done without requiring costly communication between ranks. We assume this is the case here. When mapping data to and from Grid, one must choose a lattice object defined on the appropriate grid, whether it be a full lattice (4D `GridCartesian`), one of the checkerboards (4D `GridRedBlackCartesian`), a five-dimensional full grid (5D `GridCartesian`), or a five-dimensional checkerboard (5D `GridRedBlackCartesian`). For example, an improved staggered fermion color-vector field `cv` on a single checkerboard would be constructed using **Example**:: std::vector latt_size = GridDefaultLatt(); std::vector simd_layout = GridDefaultSimd(Nd,vComplex::Nsimd()); std::vector mpi_layout = GridDefaultMpi(); GridCartesian Grid(latt_size,simd_layout,mpi_layout); GridRedBlackCartesian RBGrid(&Grid); typename ImprovedStaggeredFermion::FermionField cv(RBGrid); To map data within an MPI rank, the external code must iterate over the sites belonging to that rank (full or checkerboard as appropriate). To import data into Grid, the external data on a single site with coordinates `r` is first copied into the appropriate Grid scalar object `s`. Then it is copied into the Grid lattice field `l` with `pokeLocalSite`:: pokeLocalSite(const sobj &s, Lattice &l, Coordinate &r); To export data from Grid, the reverse operation starts with:: peekLocalSite(const sobj &s, Lattice &l, Coordinate &r); and then copies the single-site data from `s` into the corresponding external type. Here is an example that maps a single site's worth of data in a MILC color-vector field to a Grid scalar ColourVector object `cVec` and from there to the lattice colour-vector field `cv`, as defined above. **Example**:: indexToCoords(idx,r); ColourVector cVec; for(int col=0; col r(4); indexToCoords(idx,r); std::vector r5(1,0); for( int d = 0; d < 4; d++ ) r5.push_back(r[d]); for( int j = 0; j < Ls; j++ ){ r5[0] = j; ColourVector cVec; for(int col=0; colcv), r5); }