diff --git a/documentation/interfacing.rst b/documentation/interfacing.rst
new file mode 100644
index 00000000..3fd0c8a3
--- /dev/null
+++ b/documentation/interfacing.rst
@@ -0,0 +1,197 @@
+Interfacing with external software
+========================================
+
+Grid provides a number of important modules, such as solvers and
+eigensolvers, that are highly optimized for complex vector/SIMD
+architectures, such as the Intel Xeon Phi KNL and Skylake processors.
+This growing library, with appropriate interfacing, can be accessed
+from existing code. Here we describe interfacing issues and provide
+examples.
+
+	  
+MPI initialization
+------------------
+
+Grid supports threaded MPI sends and receives and, if running with
+more than one thread, requires the MPI_THREAD_MULTIPLE mode of message
+passing. If the user initializes MPI before starting Grid, the
+appropriate initialization call is::
+
+  MPI_Init_thread(argc, argv, MPI_THREAD_MULTIPLE, &provided);
+  assert(MPI_THREAD_MULTIPLE == provided);
+
+Grid Initialization
+-------------------
+
+Grid itself is initialized with a call::
+
+  Grid_init(&argc, &argv);
+
+.. todo:: CD: Where are the command-line arguments explained above?
+	  
+where `argc` and `argv` are constructed to simulate the command-line
+options described above.  At a minimum one must provide the `--grid`
+and `--mpi` parameters.  The latter specifies the grid of processors
+(MPI ranks).
+
+The following Grid procedures are useful for verifying that Grid is
+properly initialized.
+
+=============================================================   ===========================================================================================================
+  Grid procedure                                                returns 
+=============================================================   ===========================================================================================================
+  std::vector<int> GridDefaultLatt();                           lattice size
+  std::vector<int> GridDefaultSimd(int Nd,vComplex::Nsimd());   SIMD layout
+  std::vector<int> GridDefaultMpi();                            MPI layout
+  int Grid::GridThread::GetThreads();                           number of threads
+
+MPI coordination
+----------------
+
+Grid wants to use its own numbering of MPI ranks and its own
+assignment of the lattice coordinates with each rank.  Obviously, the
+calling program and Grid must agree on these conventions.  It is
+convenient to use Grid's Cartesian communicator class to discover the
+processor assignments. For a four-dimensional processor grid one can
+define::
+
+  static Grid::CartesianCommunicator *grid_cart = NULL;
+  grid_cart = new Grid::CartesianCommunicator(processors);
+
+where `processors` is of type `std::vector<int>`, with values matching
+the MPI processor-layout dimensions specified with the `--mpi`
+argument in the `Grid_Init` call.  Then each MPI rank can obtain its
+processor coordinate using the Cartesian communicator instantiated
+above.  For example, in four dimensions::
+
+  std::vector<int> pePos(4);    
+  for(int i=0; i<4; i++)
+     pePos[i] = grid_cart->_processor_coor[i];
+
+and each MPI process can get its world rank from its processor
+coordinates using::
+
+  int peRank = grid_cart->RankFromProcessorCoor(pePos)
+	  
+Conversely, each MPI process can get its processor coordinates from
+its world rank using::
+
+  grid_cart->ProcessorCoorFromRank(peRank, pePos);
+
+If the calling program initialized MPI before initializing Grid, it is
+then important for each MPI process in the calling program to reset
+its rank number so it agrees with Grid::
+
+   MPI_Comm comm;
+   MPI_Comm_split(MPI_COMM_THISJOB,jobid,peRank,&comm);
+   MPI_COMM_THISJOB = comm;
+
+where `MPI_COMM_THISJOB` is initially a copy of `MPI_COMM_WORLD` (with
+`jobid = 0`), or it is a split communicator with `jobid` equal to the
+index number of the subcommunicator.  Once this is done,::
+
+  MPI_Comm_rank(MPI_COMM_THISJOB, &myrank);
+
+returns a rank that agrees with Grid's `peRank`.
+
+  
+Mapping fields between Grid and user layouts
+-------------------------------------------
+
+In order to map data between layouts, it is important to know
+how the lattice sites are distributed across the processor grid.  A
+lattice site with coordinates `r[mu]` is assigned to the processor with
+processor coordinates `pePos[mu]` according to the rule::
+
+  pePos[mu] = r[mu]/dim[mu]
+
+where `dim[mu]` is the lattice dimension in the `mu` direction.  For
+performance reasons, it is important that the external data layout
+follow the same rule.  Then data mapping can be done without
+requiring costly communication between ranks.  We assume this is the
+case here.
+
+When mapping data to and from Grid, one must choose a lattice object
+defined on the appropriate grid, whether it be a full lattice (4D
+`GridCartesian`), one of the checkerboards (4D
+`GridRedBlackCartesian`), a five-dimensional full grid (5D
+`GridCartesian`), or a five-dimensional checkerboard (5D
+`GridRedBlackCartesian`).  For example, an improved staggered fermion
+color-vector field `cv` on a single checkerboard would be constructed
+using
+
+**Example**::
+
+  std::vector<int> latt_size   = GridDefaultLatt();
+  std::vector<int> simd_layout = GridDefaultSimd(Nd,vComplex::Nsimd());
+  std::vector<int> mpi_layout  = GridDefaultMpi();
+
+  GridCartesian               Grid(latt_size,simd_layout,mpi_layout);
+  GridRedBlackCartesian       RBGrid(&Grid);
+
+  typename ImprovedStaggeredFermion::FermionField  cv(RBGrid);
+
+To map data within an MPI rank, the external code must iterate over
+the sites belonging to that rank (full or checkerboard as
+appropriate).  To import data into Grid, the external data on a single
+site with coordinates `r` is first copied into the appropriate Grid
+scalar object `s`.  Then it is copied into the Grid lattice field `l`
+with `pokeLocalSite`::
+
+  pokeLocalSite(const sobj &s, Lattice<vobj> &l, Coordinate &r);
+
+To export data from Grid, the reverse operation starts with::
+
+  peekLocalSite(const sobj &s, Lattice<vobj> &l, Coordinate &r);
+
+and then copies the single-site data from `s` into the corresponding
+external type.
+
+Here is an example that maps a single site's worth of data in a MILC
+color-vector field to a Grid scalar ColourVector object `cVec` and from
+there to the lattice colour-vector field `cv`, as defined above.
+
+**Example**::
+
+  indexToCoords(idx,r);
+  ColourVector cVec;
+  for(int col=0; col<Nc; col++)
+      cVec._internal._internal._internal[col] = 
+          Complex(src[idx].c[col].real, src[idx].c[col].imag);
+
+  pokeLocalSite(cVec, cv, r);
+
+Here the `indexToCoords()` function is a MILC mapping of the MILC site
+index `idx` to the 4D lattice coordinate `r`.
+
+Grid provides block- and multiple-rhs conjugate-gradient solvers. For
+this purpose it uses a 5D lattice. To map data to and from Grid data
+types, the index for the right-hand-side vector becomes the zeroth
+coordinate of a five-dimensional vector `r5`.  The remaining
+components of `r5` contain the 4D space-time coordinates.  The
+`pokeLocalSite/peekLocalSite` operations then accept the coordinate
+`r5`, provided the destination/source lattice object is also 5D.  In
+the example below data from a single site specified by `idx`,
+belonging to a set of `Ls` MILC color-vector fields, are copied into a
+Grid 5D fermion field `cv5`.
+
+**Example**::
+
+  GridCartesian * UGrid = SpaceTimeGrid::makeFourDimGrid(GridDefaultLatt();
+  GridRedBlackCartesian * FrbGrid = SpaceTimeGrid::makeFiveDimRedBlackGrid(Ls,UGrid)  typename ImprovedStaggeredFermion5D::FermionField  cv5(FrbGrid);
+
+  std::vector<int> r(4);
+  indexToCoords(idx,r);
+  std::vector<int> r5(1,0);
+  for( int d = 0; d < 4; d++ ) r5.push_back(r[d]);
+
+  for( int j = 0; j < Ls; j++ ){
+      r5[0] = j;
+      ColourVector cVec;
+      for(int col=0; col<Nc; col++){
+	  cVec._internal._internal._internal[col] = 
+	      Complex(src[j][idx].c[col].real, src[j][idx].c[col].imag);
+      }
+      pokeLocalSite(cVec, *(out->cv), r5);
+  }
+