mirror of
https://github.com/paboyle/Grid.git
synced 2025-06-16 23:07:05 +01:00
Adding pdf generation capabilities
This commit is contained in:
34
_pages/docs/API/coordinates.md
Normal file
34
_pages/docs/API/coordinates.md
Normal file
@ -0,0 +1,34 @@
|
||||
---
|
||||
title : "API Documentation"
|
||||
author_profile: false
|
||||
excerpt: "Coordinates"
|
||||
header:
|
||||
overlay_color: "#5DADE2"
|
||||
permalink: /docs/API/coordinates.html
|
||||
sidebar:
|
||||
nav : docs
|
||||
---
|
||||
|
||||
The Grid is define on a N-dimensional set of integer coordinates.
|
||||
|
||||
The maximum dimension is eight, and indexes in this space make use of the `Coordinate` class.
|
||||
The coordinate class shares a similar interface to `std::vector<int>`, but contains all data within the
|
||||
object, and has a fixed maximum length (template parameter).
|
||||
|
||||
**Example**:
|
||||
|
||||
```c++
|
||||
const int Nd=4;
|
||||
Coordinate point(Nd);
|
||||
|
||||
for(int i=0;i<Nd;i++)
|
||||
point[i] = 1;
|
||||
|
||||
std::cout<< point <<std::endl;
|
||||
point.resize(3);
|
||||
std::cout<< point <<std::endl;
|
||||
```
|
||||
|
||||
This enables the coordinates to be manipulated without heap allocation or thread contention,
|
||||
and avoids introducing STL functions into GPU code, but does so at the expense of introducing
|
||||
a maximum dimensionality. This limit is easy to change (`lib/util/Coordinate.h`).
|
90
_pages/docs/API/grids.md
Normal file
90
_pages/docs/API/grids.md
Normal file
@ -0,0 +1,90 @@
|
||||
---
|
||||
title : "API Documentation"
|
||||
author_profile: false
|
||||
excerpt: "Grids"
|
||||
header:
|
||||
overlay_color: "#5DADE2"
|
||||
permalink: /docs/API/grids.html
|
||||
sidebar:
|
||||
nav : docs
|
||||
---
|
||||
|
||||
A `Grid` object defines the geometry of a global cartesian array, and through inheritance
|
||||
provides access to message passing decomposition, the local lattice, and the message passing primitives.
|
||||
|
||||
The constructor requires parameters to indicate how the spatial (and temporal) indices
|
||||
are decomposed across MPI tasks and SIMD lanes of the vector length.
|
||||
We use a partial vectorisation transformation, must select
|
||||
which space-time dimensions participate in SIMD vectorisation.
|
||||
The Lattice containers are defined to have opaque internal layout, hiding this layout transformation.
|
||||
|
||||
We define GridCartesian and GridRedBlackCartesian which both inherit from `GridBase`:
|
||||
|
||||
```c++
|
||||
class GridCartesian : public GridBase
|
||||
class GridRedBlackCartesian: public GridBase
|
||||
```
|
||||
|
||||
The simplest Cartesian Grid constructor distributes across `MPI_COMM_WORLD`:
|
||||
|
||||
```c++
|
||||
/////////////////////////////////////////////////////////////////////////
|
||||
// Construct from comm world
|
||||
/////////////////////////////////////////////////////////////////////////
|
||||
GridCartesian(const Coordinate &dimensions,
|
||||
const Coordinate &simd_layout,
|
||||
const Coordinate &processor_grid);
|
||||
```
|
||||
|
||||
A second constructor will create a child communicator from a previously declared Grid.
|
||||
This allows to subdivide the processor grid, and also to define lattices of differing dimensionalities and sizes,
|
||||
useful for both Chiral fermions, lower dimensional operations, and multigrid:
|
||||
|
||||
```c++
|
||||
/////////////////////////////////////////////////////////////////////////
|
||||
// Constructor takes a parent grid and possibly subdivides communicator.
|
||||
/////////////////////////////////////////////////////////////////////////
|
||||
GridCartesian(const Coordinate &dimensions,
|
||||
const Coordinate &simd_layout,
|
||||
const Coordinate &processor_grid,
|
||||
const GridCartesian &parent,int &split_rank);
|
||||
```
|
||||
|
||||
The Grid object provides much `internal` functionality to map a lattice site to
|
||||
a node and lexicographic index. These are not needed by code interfacing
|
||||
to the data parallel layer.
|
||||
|
||||
**Example** (`tests/solver/Test_split_grid.cc`):
|
||||
|
||||
```c++
|
||||
const int Ls=8;
|
||||
|
||||
////////////////////////////////////////////
|
||||
// Grids distributed across full machine
|
||||
// pick up default command line args
|
||||
////////////////////////////////////////////
|
||||
Grid_init(&argc,&argv);
|
||||
|
||||
Coordinate latt_size = GridDefaultLatt();
|
||||
Coordinate simd_layout = GridDefaultSimd(Nd,vComplex::Nsimd());
|
||||
Coordinate mpi_layout = GridDefaultMpi();
|
||||
|
||||
GridCartesian * UGrid = SpaceTimeGrid::makeFourDimGrid(GridDefaultLatt(),
|
||||
GridDefaultSimd(Nd,vComplex::Nsimd()),GridDefaultMpi());
|
||||
GridCartesian * FGrid = SpaceTimeGrid::makeFiveDimGrid(Ls,UGrid);
|
||||
GridRedBlackCartesian * rbGrid = SpaceTimeGrid::makeFourDimRedBlackGrid(UGrid);
|
||||
GridRedBlackCartesian * FrbGrid = SpaceTimeGrid::makeFiveDimRedBlackGrid(Ls,UGrid);
|
||||
|
||||
/////////////////////////////////////////////
|
||||
// Split into N copies of 1^4 mpi communicators
|
||||
/////////////////////////////////////////////
|
||||
Coordinate mpi_split (mpi_layout.size(),1);
|
||||
GridCartesian * SGrid = new GridCartesian(GridDefaultLatt(),
|
||||
GridDefaultSimd(Nd,vComplex::Nsimd()),
|
||||
mpi_split,
|
||||
*UGrid);
|
||||
|
||||
GridCartesian * SFGrid = SpaceTimeGrid::makeFiveDimGrid(Ls,SGrid);
|
||||
GridRedBlackCartesian * SrbGrid = SpaceTimeGrid::makeFourDimRedBlackGrid(SGrid);
|
||||
GridRedBlackCartesian * SFrbGrid = SpaceTimeGrid::makeFiveDimRedBlackGrid(Ls,SGrid);
|
||||
```
|
32
_pages/docs/API/introduction.md
Normal file
32
_pages/docs/API/introduction.md
Normal file
@ -0,0 +1,32 @@
|
||||
---
|
||||
title : "API Documentation"
|
||||
author_profile: false
|
||||
excerpt: "Data parallel API"
|
||||
header:
|
||||
overlay_color: "#5DADE2"
|
||||
permalink: /docs/API/introduction.html
|
||||
sidebar:
|
||||
nav : docs
|
||||
---
|
||||
|
||||
|
||||
Data parallel array indices are divided into two types.
|
||||
|
||||
* Internal indices, such as complex, colour, spin degrees of freedom
|
||||
* spatial (space-time) indices.
|
||||
|
||||
The ranges of all internal degrees are determined by template parameters,
|
||||
and known at compile time. The ranges of spatial indices are dynamic, run time
|
||||
values and the Cartesian structure information is contained and accessed via `Grid` objects.
|
||||
|
||||
Grid objects are the controlling entity for the decomposition of a distributed `Lattice`
|
||||
array across MPI tasks, nodes, SIMD lanes, accelerators. Threaded loops are used
|
||||
as appropriate on host code.
|
||||
|
||||
Data parallel operations can only be performed between Lattice objects constructed
|
||||
from the same Grid pointer. These are called `conformable` operations.
|
||||
|
||||
We will focus initially on the internal indices as these are the building blocks assembled
|
||||
in Lattice container classes. Every Lattice container class constructor requires a Grid object
|
||||
pointer.
|
||||
|
560
_pages/docs/API/lattice_containers.md
Normal file
560
_pages/docs/API/lattice_containers.md
Normal file
@ -0,0 +1,560 @@
|
||||
---
|
||||
title : "API Documentation"
|
||||
author_profile: false
|
||||
excerpt: "Lattice containers"
|
||||
header:
|
||||
overlay_color: "#5DADE2"
|
||||
permalink: /docs/API/lattice_containers.html
|
||||
sidebar:
|
||||
nav : docs
|
||||
---
|
||||
|
||||
{% include toc icon="gears" title="Contents" %}
|
||||
|
||||
Lattice objects may be constructed to contain the local portion of a distribued array of any tensor type.
|
||||
For performance reasons the tensor type uses a vector `Real` or `Complex` as the fundamental datum.
|
||||
|
||||
Every lattice requires a `GridBase` object pointer to be provided in its constructor. Memory is allocated
|
||||
at construction time. If a Lattice is passed a RedBlack grid, it allocates
|
||||
half the storage of the full grid, and may either store the red or black checkerboard. The Lattice object
|
||||
will automatically track through assignments which checkerboard it refers to.
|
||||
For example, shifting a Even checkerboard by an odd distance produces an Odd result field.
|
||||
|
||||
Struct of array objects are defined, and used in the template parameters to the lattice class.
|
||||
|
||||
**Example** (`lib/qcd/QCD.h`):
|
||||
|
||||
```c++
|
||||
template<typename vtype> using iSpinMatrix = iScalar<iMatrix<iScalar<vtype>, Ns> >;
|
||||
typedef iSpinMatrix<ComplexF> SpinMatrixF; //scalar
|
||||
typedef iSpinMatrix<vComplexF> vSpinMatrixF;//vectorised
|
||||
typedef Lattice<vSpinMatrixF> LatticeSpinMatrixF;
|
||||
```
|
||||
|
||||
The full range of QCD relevant lattice objects is given below.
|
||||
|
||||
|-----------|------------|----------|-----------|---------------|---------------------------------|--------------------------|
|
||||
| Lattice | Lorentz | Spin | Colour | scalar_type | Field | Synonym |
|
||||
|-----------|------------|----------|-----------|---------------|---------------------------------|--------------------------|
|
||||
|`Vector` | `Scalar` | `Scalar` | `Scalar` | `RealD` | `LatticeRealD` | N/A |
|
||||
|`Vector` | `Scalar` | `Scalar` | `Scalar` | `ComplexD` | `LatticeComplexD` | N/A |
|
||||
|`Vector` | `Scalar` | `Scalar` | `Matrix` | `ComplexD` | `LatticeColourMatrixD` | `LatticeGaugeLink` |
|
||||
|`Vector` | `Vector` | `Scalar` | `Matrix` | `ComplexD` | `LatticeLorentzColourMatrixD` | `LatticeGaugeFieldD` |
|
||||
|`Vector` | `Scalar` | `Vector` | `Vector` | `ComplexD` | `LatticeSpinColourVectorD` | `LatticeFermionD` |
|
||||
|`Vector` | `Scalar` | `Vector` | `Vector` | `ComplexD` | `LatticeHalfSpinColourVectorD` | `LatticeHalfFermionD` |
|
||||
|`Vector` | `Scalar` | `Matrix` | `Matrix` | `ComplexD` | `LatticeSpinColourMatrixD` | `LatticePropagatorD` |
|
||||
|-----------|------------|----------|-----------|---------------|---------------------------------|--------------------------|
|
||||
|
||||
Additional single precison variants are defined with the suffix `F`.
|
||||
Other lattice objects can be defined using the sort of typedef's shown above if needed.
|
||||
|
||||
### Opaque containers
|
||||
|
||||
The layout within the container is complicated to enable maximum opportunity for vectorisation, and
|
||||
is opaque from the point of view of the API definition. The key implementation observation is that
|
||||
so long as data parallel operations are performed and adjacent SIMD lanes correspond to well separated
|
||||
lattice sites, then identical operations are performed on all SIMD lanes and enable good vectorisation.
|
||||
|
||||
Because the layout is opaque, import and export routines from naturally ordered x,y,z,t arrays
|
||||
are provided (`lib/lattice/Lattice_transfer.h`):
|
||||
|
||||
```c++
|
||||
unvectorizeToLexOrdArray(std::vector<sobj> &out, const Lattice<vobj> &in);
|
||||
vectorizeFromLexOrdArray(std::vector<sobj> &in , Lattice<vobj> &out);
|
||||
```
|
||||
|
||||
The Lexicographic order of data in the external vector fields is defined by (`lib/util/Lexicographic.h`):
|
||||
|
||||
```c++
|
||||
Lexicographic::IndexFromCoor(const Coordinate &lcoor, int &lex,Coordinate *local_dims);
|
||||
```
|
||||
|
||||
This ordering is $$x + L_x * y + L_x*L_y*z + L_x*L_y*L_z *t$$
|
||||
|
||||
Peek and poke routines are provided to perform single site operations. These operations are
|
||||
extremely low performance and are not intended for algorithm development or performance critical code.
|
||||
|
||||
The following are "collective" operations and involve communication between nodes. All nodes receive the same
|
||||
result by broadcast from the owning node:
|
||||
|
||||
```c++
|
||||
void peekSite(sobj &s,const Lattice<vobj> &l,const Coordinate &site);
|
||||
void pokeSite(const sobj &s,Lattice<vobj> &l,const Coordinate &site);
|
||||
```
|
||||
|
||||
The following are executed independently by each node:
|
||||
|
||||
```c++
|
||||
void peekLocalSite(sobj &s,const Lattice<vobj> &l,Coordinate &site);
|
||||
void pokeLocalSite(const sobj &s,Lattice<vobj> &l,Coordinate &site);
|
||||
```
|
||||
|
||||
Lattices of one tensor type may be transformed into lattices of another tensor type by
|
||||
peeking and poking specific indices in a data parallel manner:
|
||||
|
||||
```c++
|
||||
template<int Index,class vobj> // Vector data parallel index peek
|
||||
auto PeekIndex(const Lattice<vobj> &lhs,int i);
|
||||
|
||||
template<int Index,class vobj> // Matrix data parallel index peek
|
||||
auto PeekIndex(const Lattice<vobj> &lhs,int i,int j);
|
||||
|
||||
template<int Index,class vobj> // Vector poke
|
||||
void PokeIndex(Lattice<vobj> &lhs,const Lattice<> & rhs,int i)
|
||||
|
||||
template<int Index,class vobj> // Matrix poke
|
||||
void PokeIndex(Lattice<vobj> &lhs,const Lattice<> & rhs,int i,int j)
|
||||
```
|
||||
|
||||
The inconsistent capitalisation on the letter P is due to an obscure bug in g++ that has not to
|
||||
our knowledge been fixed in any version. The bug was reported in 2016.
|
||||
|
||||
### Global Reduction operations
|
||||
|
||||
Reduction operations for any lattice field are provided. The result is identical on each computing node
|
||||
that is part of the relevant Grid communicator:
|
||||
|
||||
```c++
|
||||
template<class vobj>
|
||||
RealD norm2(const Lattice<vobj> &arg);
|
||||
|
||||
template<class vobj>
|
||||
ComplexD innerProduct(const Lattice<vobj> &left,const Lattice<vobj> &right);
|
||||
|
||||
template<class vobj>
|
||||
vobj sum(const Lattice<vobj> &arg)
|
||||
```
|
||||
|
||||
### Site local reduction operations
|
||||
|
||||
Internal indices may be reduced, site by site, using the following routines:
|
||||
|
||||
```c++
|
||||
template<class vobj>
|
||||
auto localNorm2 (const Lattice<vobj> &rhs)
|
||||
|
||||
template<class vobj>
|
||||
auto localInnerProduct (const Lattice<vobj> &lhs,const Lattice<vobj> &rhs)
|
||||
```
|
||||
|
||||
### Outer product
|
||||
|
||||
A site local outer product is defined:
|
||||
|
||||
```c++
|
||||
template<class ll,class rr>
|
||||
auto outerProduct (const Lattice<ll> &lhs,const Lattice<rr> &rhs)
|
||||
```
|
||||
|
||||
### Slice operations
|
||||
|
||||
Slice operations are defined to operate on one lower dimension than the full lattice. The omitted dimension
|
||||
is the parameter orthogdim:
|
||||
|
||||
```c++
|
||||
template<class vobj>
|
||||
void sliceSum(const Lattice<vobj> &Data,
|
||||
std::vector<typename vobj::scalar_object> &result,
|
||||
int orthogdim);
|
||||
|
||||
template<class vobj>
|
||||
void sliceInnerProductVector( std::vector<ComplexD> & result,
|
||||
const Lattice<vobj> &lhs,
|
||||
const Lattice<vobj> &rhs,
|
||||
int orthogdim);
|
||||
|
||||
template<class vobj>
|
||||
void sliceNorm (std::vector<RealD> &sn,
|
||||
const Lattice<vobj> &rhs,
|
||||
int orthogdim);
|
||||
```
|
||||
|
||||
### Data parallel expression template engine
|
||||
|
||||
The standard arithmetic operators and some data parallel library functions are implemented site by site
|
||||
on lattice types.
|
||||
|
||||
Operations may only ever combine lattice objects that have been constructed from the **same** grid pointer.
|
||||
|
||||
**Example**:
|
||||
|
||||
```c++
|
||||
LatticeFermionD A(&grid);
|
||||
LatticeFermionD B(&grid);
|
||||
LatticeFermionD C(&grid);
|
||||
|
||||
A = B - C;
|
||||
```
|
||||
|
||||
Such operations are said to be **conformable** and are the lattice are guaranteed to have the same dimensions
|
||||
and both MPI and SIMD decomposition because they are based on the same grid object. The conformability check
|
||||
is lightweight and simply requires the same grid pointers be passed to the lattice objects. The data members
|
||||
of the grid objects are not compared.
|
||||
|
||||
Conformable lattice fields may be combined with appropriate scalar types in expressions. The implemented
|
||||
rules follow those already documented for the tensor types.
|
||||
|
||||
|
||||
|
||||
|
||||
### Unary operators and functions
|
||||
|
||||
The following sitewise unary operations are defined:
|
||||
|
||||
|-----------------------|---------------------------------------------|
|
||||
| Operation | Description |
|
||||
|-----------------------|---------------------------------------------|
|
||||
|`operator-` | negate |
|
||||
|`adj` | Hermitian conjugate |
|
||||
|`conjugate` | complex conjugate |
|
||||
|`trace` | sitewise trace |
|
||||
|`transpose` | sitewise transpose |
|
||||
|`Ta` | take traceles anti Hermitian part |
|
||||
|`ProjectOnGroup` | reunitarise or orthogonalise |
|
||||
|`real` | take the real part |
|
||||
|`imag` | take the imaginary part |
|
||||
|`toReal` | demote complex to real |
|
||||
|`toComplex` | promote real to complex |
|
||||
|`timesI` | elementwise +i mult (0 is not multiplied) |
|
||||
|`timesMinusI` | elementwise -i mult (0 is not multiplied) |
|
||||
|`abs` | elementwise absolute value |
|
||||
|`sqrt` | elementwise square root |
|
||||
|`rsqrt` | elementwise reciprocal square root |
|
||||
|`sin` | elementwise sine |
|
||||
|`cos` | elementwise cosine |
|
||||
|`asin` | elementwise inverse sine |
|
||||
|`acos` | elementwise inverse cosine |
|
||||
|`log` | elementwise logarithm |
|
||||
|`exp` | elementwise exponentiation |
|
||||
|`operator!` | Logical negation of integer field |
|
||||
|`Not` | Logical negation of integer field |
|
||||
|-----------------------|---------------------------------------------|
|
||||
|
||||
The following sitewise applied functions with additional parameters are:
|
||||
|
||||
```c++
|
||||
template<class obj> Lattice<obj> pow(const Lattice<obj> &rhs_i,RealD y);
|
||||
|
||||
template<class obj> Lattice<obj> mod(const Lattice<obj> &rhs_i,Integer y);
|
||||
|
||||
template<class obj> Lattice<obj> div(const Lattice<obj> &rhs_i,Integer y);
|
||||
|
||||
template<class obj> Lattice<obj>
|
||||
expMat(const Lattice<obj> &rhs_i, RealD alpha, Integer Nexp = DEFAULT_MAT_EXP);
|
||||
```
|
||||
|
||||
### Binary operators
|
||||
|
||||
The following binary operators are defined:
|
||||
|
||||
```
|
||||
operator+
|
||||
operator-
|
||||
operator*
|
||||
operator/
|
||||
```
|
||||
|
||||
Logical are defined on LatticeInteger types:
|
||||
|
||||
```
|
||||
operator&
|
||||
operator|
|
||||
operator&&
|
||||
operator||
|
||||
```
|
||||
|
||||
### Ternary operator, logical operatons and where
|
||||
|
||||
Within the data parallel level of the API the only way to perform operations
|
||||
that are differentiated between sites is use predicated execution.
|
||||
|
||||
The predicate takes the form of a `LatticeInteger` which is confromable with both
|
||||
the `iftrue` and `iffalse` argument:
|
||||
|
||||
```c++
|
||||
template<class vobj,class iobj> void where(const Lattice<iobj> &pred,
|
||||
Lattice<vobj> &iftrue,
|
||||
Lattice<vobj> &iffalse);
|
||||
```
|
||||
This plays the data parallel analogue of the C++ ternary operator:
|
||||
|
||||
```c++
|
||||
a = b ? c : d;
|
||||
```
|
||||
|
||||
In order to create the predicate in a coordinate dependent fashion it is often useful
|
||||
to use the lattice coordinates.
|
||||
|
||||
The `LatticeCoordinate` function:
|
||||
|
||||
```c++
|
||||
template<class iobj> LatticeCoordinate(Lattice<iobj> &coor,int dir);
|
||||
```
|
||||
|
||||
fills an `Integer` field with the coordinate in the N-th dimension.
|
||||
A usage example is given
|
||||
|
||||
**Example**:
|
||||
|
||||
```c++
|
||||
int dir =3;
|
||||
int block=4;
|
||||
LatticeInteger coor(FineGrid);
|
||||
|
||||
LatticeCoordinate(coor,dir);
|
||||
|
||||
result = where(mod(coor,block)==(block-1),x,z);
|
||||
```
|
||||
|
||||
(Other usage cases of LatticeCoordinate include the generation of plane wave momentum phases.)
|
||||
|
||||
### Site local fused operations
|
||||
|
||||
The biggest limitation of expression template engines is that the optimisation
|
||||
visibility is a single assignment statement in the original source code.
|
||||
|
||||
There is no scope for loop fusion between multiple statements.
|
||||
Multi-loop fusion gives scope for greater cache locality.
|
||||
|
||||
Two primitives for hardware aware parallel loops are provided.
|
||||
These will operate directly on the site objects which are expanded by a factor
|
||||
of the vector length (in our struct of array datatypes).
|
||||
|
||||
Since the mapping of sites
|
||||
to data lanes is opaque, these vectorised loops
|
||||
are *only* appropriate for optimisation of site local operations.
|
||||
|
||||
### View objects
|
||||
|
||||
Due to an obscure aspect of the way that Nvidia handle device C++11 lambda functions,
|
||||
it is necessary to disable the indexing of a Lattice object.
|
||||
|
||||
Rather, a reference to a lattice object must be first obtained.
|
||||
|
||||
The reference is copyable to a GPU, and is able to be indexed on either accelerator code,
|
||||
or by host code.
|
||||
|
||||
In order to prevent people developing code that dereferences Lattice objects in a way that
|
||||
works on CPU compilation, but fails on GPU compilation, we have decided to remove the ability
|
||||
to index a lattice object on CPU code.
|
||||
|
||||
As a result of Nvidia's constraints, all accesses to lattice objects are required to be made
|
||||
through a View object.
|
||||
|
||||
In the following, the type is `LatticeView<vobj>`, however it is wise to use the C++11 auto keyword
|
||||
to avoid naming the type. See code examples below.
|
||||
|
||||
|
||||
### thread_loops
|
||||
|
||||
The first parallel primitive is the thread_loop
|
||||
|
||||
**Example**:
|
||||
|
||||
```c++
|
||||
LatticeField r(grid);
|
||||
LatticeField x(grid);
|
||||
LatticeField p(grid);
|
||||
LatticeField mmp(grid);
|
||||
auto r_v = r.View();
|
||||
auto x_v = x.View();
|
||||
auto p_v = p.View();
|
||||
auto mmp_v = mmp.View();
|
||||
thread_loop(s , r_v, {
|
||||
r_v[s] = r_v[s] - a * mmp_v[s];
|
||||
x_v[s] = x_v[s] + a*p_v[s];
|
||||
p_v[s] = p_v[s]*b + r_v[s];
|
||||
});
|
||||
```
|
||||
|
||||
### accelerator_loops
|
||||
|
||||
The second parallel primitive is an accelerated_loop
|
||||
|
||||
**Example**:
|
||||
|
||||
```c++
|
||||
LatticeField r(grid);
|
||||
LatticeField x(grid);
|
||||
LatticeField p(grid);
|
||||
LatticeField mmp(grid);
|
||||
auto r_v = r.View();
|
||||
auto x_v = x.View();
|
||||
auto p_v = p.View();
|
||||
auto mmp_v = mmp.View();
|
||||
accelerator_loop(s , r_v, {
|
||||
r_v[s] = r_v[s] - a * mmp_v[s];
|
||||
x_v[s] = x_v[s] + a*p_v[s];
|
||||
p_v[s] = p_v[s]*b + r_v[s];
|
||||
});
|
||||
```
|
||||
|
||||
|
||||
### Cshift
|
||||
|
||||
Site shifting operations are provided using the Cshift function:
|
||||
|
||||
```c++
|
||||
template<class vobj>
|
||||
Lattice<vobj> Cshift(const Lattice<vobj> &rhs,int dimension,int shift)
|
||||
```
|
||||
|
||||
This shifts the whole vector by any distance shift in the appropriate dimension.
|
||||
|
||||
For the avoidance of doubt on direction conventions,a positive shift moves the
|
||||
lattice site $$x_mu = 1$$ in the rhs to $$x_mu = 0$$ in the result.
|
||||
|
||||
**Example** (`benchmarks/Benchmark_wilson.cc`):
|
||||
|
||||
```c++
|
||||
{ // Naive wilson implementation
|
||||
ref = Zero();
|
||||
for(int mu=0;mu<Nd;mu++){
|
||||
|
||||
tmp = U[mu]*Cshift(src,mu,1);
|
||||
{
|
||||
auto ref_v = ref.View();
|
||||
auto tmp_v = tmp.View();
|
||||
for(int i=0;i<ref_v.size();i++){
|
||||
ref_v[i]+= tmp_v[i] - Gamma(Gmu[mu])*tmp_v[i]; ;
|
||||
}
|
||||
}
|
||||
|
||||
tmp =adj(U[mu])*src;
|
||||
tmp =Cshift(tmp,mu,-1);
|
||||
{
|
||||
auto ref_v = ref.View();
|
||||
auto tmp_v = tmp.View();
|
||||
for(int i=0;i<ref_v.size();i++){
|
||||
ref_v[i]+= tmp_v[i] + Gamma(Gmu[mu])*tmp_v[i]; ;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### CovariantCshift
|
||||
|
||||
Covariant Cshift operations are provided for common cases of boundary condition. These may be further optimised
|
||||
in future:
|
||||
|
||||
```c++
|
||||
template<class covariant,class gauge>
|
||||
Lattice<covariant> CovShiftForward(const Lattice<gauge> &Link, int mu,
|
||||
const Lattice<covariant> &field);
|
||||
|
||||
template<class covariant,class gauge>
|
||||
Lattice<covariant> CovShiftBackward(const Lattice<gauge> &Link, int mu,
|
||||
const Lattice<covariant> &field);
|
||||
```
|
||||
|
||||
### Boundary conditions
|
||||
|
||||
The covariant shift routines occur in namespaces PeriodicBC and ConjugateBC. The correct covariant shift
|
||||
for the boundary condition is passed into the gauge actions and wilson loops via an
|
||||
"Impl" template policy class.
|
||||
|
||||
The relevant staples, plaquettes, and loops are formed by using the provided method:
|
||||
|
||||
```c++
|
||||
Impl::CovShiftForward
|
||||
Impl::CovShiftBackward
|
||||
```
|
||||
|
||||
etc... This makes physics code transform appropriately with externally supplied rules about
|
||||
treating the boundary.
|
||||
|
||||
**Example** (`lib/qcd/util/WilsonLoops.h`):
|
||||
|
||||
```c++
|
||||
static void dirPlaquette(GaugeMat &plaq, const std::vector<GaugeMat> &U,
|
||||
const int mu, const int nu) {
|
||||
// ___
|
||||
//| |
|
||||
//|<__|
|
||||
plaq = Gimpl::CovShiftForward(U[mu],mu,
|
||||
Gimpl::CovShiftForward(U[nu],nu,
|
||||
Gimpl::CovShiftBackward(U[mu],mu,
|
||||
Gimpl::CovShiftIdentityBackward(U[nu], nu))));
|
||||
}
|
||||
```
|
||||
|
||||
### Inter-grid transfer operations
|
||||
|
||||
Transferring between different checkerboards of the same global lattice:
|
||||
|
||||
```c++
|
||||
template<class vobj> void pickCheckerboard(int cb,Lattice<vobj> &half,const Lattice<vobj> &full);
|
||||
template<class vobj> void setCheckerboard(Lattice<vobj> &full,const Lattice<vobj> &half);
|
||||
```
|
||||
|
||||
These are used to set up Schur red-black decomposed solvers, for example.
|
||||
|
||||
Multi-grid projection between a fine and coarse grid:
|
||||
|
||||
```c++
|
||||
template<class vobj,class CComplex,int nbasis>
|
||||
void blockProject(Lattice<iVector<CComplex,nbasis > > &coarseData,
|
||||
const Lattice<vobj> &fineData,
|
||||
const std::vector<Lattice<vobj> > &Basis);
|
||||
```
|
||||
|
||||
Multi-grid promotion to a finer grid:
|
||||
|
||||
```c++
|
||||
template<class vobj,class CComplex,int nbasis>
|
||||
void blockPromote(const Lattice<iVector<CComplex,nbasis > > &coarseData,
|
||||
Lattice<vobj> &fineData,
|
||||
const std::vector<Lattice<vobj> > &Basis)
|
||||
```
|
||||
|
||||
Support for sub-block Linear algebra:
|
||||
|
||||
```c++
|
||||
template<class vobj,class CComplex>
|
||||
void blockZAXPY(Lattice<vobj> &fineZ,
|
||||
const Lattice<CComplex> &coarseA,
|
||||
const Lattice<vobj> &fineX,
|
||||
const Lattice<vobj> &fineY)
|
||||
|
||||
template<class vobj,class CComplex>
|
||||
void blockInnerProduct(Lattice<CComplex> &CoarseInner,
|
||||
const Lattice<vobj> &fineX,
|
||||
const Lattice<vobj> &fineY)
|
||||
|
||||
template<class vobj,class CComplex>
|
||||
void blockNormalise(Lattice<CComplex> &ip,Lattice<vobj> &fineX)
|
||||
|
||||
template<class vobj>
|
||||
void blockSum(Lattice<vobj> &coarseData,const Lattice<vobj> &fineData)
|
||||
|
||||
template<class vobj,class CComplex>
|
||||
void blockOrthogonalise(Lattice<CComplex> &ip,std::vector<Lattice<vobj> > &Basis)
|
||||
```
|
||||
|
||||
Conversion between different SIMD layouts:
|
||||
|
||||
```c++
|
||||
template<class vobj,class vvobj>
|
||||
void localConvert(const Lattice<vobj> &in,Lattice<vvobj> &out)
|
||||
```
|
||||
|
||||
Slices between grid of dimension N and grid of dimentions N+1:
|
||||
|
||||
```c++
|
||||
template<class vobj>
|
||||
void InsertSlice(const Lattice<vobj> &lowDim,Lattice<vobj> & higherDim,int slice, int orthog)
|
||||
|
||||
template<class vobj>
|
||||
void ExtractSlice(Lattice<vobj> &lowDim,const Lattice<vobj> & higherDim,int slice, int orthog)
|
||||
```
|
||||
|
||||
Growing a lattice by a multiple factor, with periodic replication:
|
||||
|
||||
```c++
|
||||
template<class vobj>
|
||||
void Replicate(Lattice<vobj> &coarse,Lattice<vobj> & fine)
|
||||
```
|
||||
|
||||
That latter is useful to, for example, pre-thermalise a smaller volume and then grow the volume in HMC.
|
||||
It was written while debugging G-parity boundary conditions.
|
107
_pages/docs/API/random_number_generators.md
Normal file
107
_pages/docs/API/random_number_generators.md
Normal file
@ -0,0 +1,107 @@
|
||||
---
|
||||
title : "API Documentation"
|
||||
author_profile: false
|
||||
excerpt: "Random number generators"
|
||||
header:
|
||||
overlay_color: "#5DADE2"
|
||||
permalink: /docs/API/random_number_generators.html
|
||||
sidebar:
|
||||
nav : docs
|
||||
---
|
||||
|
||||
Grid provides three configure time options for random the number generator engine.
|
||||
|
||||
* `sitmo`
|
||||
* `ranlux48`
|
||||
* `mt19937`
|
||||
|
||||
The selection is controlled by the `--enable-rng=<option>` flag.
|
||||
|
||||
Sitmo is the default Grid RNG and is recommended. It is a hash based RNG that is cryptographically secure and has
|
||||
|
||||
#. passed the BigCrush tests
|
||||
|
||||
#. can Skip forward an arbitrary distance (up to 2^256) in O(1) time
|
||||
|
||||
We use Skip functionality to place each site in an independent well separated stream.
|
||||
The Skip was trivially parallelised, important in a many core node,
|
||||
and gives very low overhead parallel RNG initialisation.
|
||||
|
||||
Our implementation of parallel RNG
|
||||
|
||||
* Has passed the BigCrush tests **drawing once from each site RNG** in a round robin fashion.
|
||||
|
||||
This test is applied in `tests/testu01/Test_smallcrush.cc`
|
||||
|
||||
The interface is as follows::
|
||||
|
||||
```c++
|
||||
class GridSerialRNG {
|
||||
GridSerialRNG();
|
||||
void SeedFixedIntegers(const std::vector<int> &seeds);
|
||||
}
|
||||
|
||||
class GridParallelRNG {
|
||||
GridParallelRNG(GridBase *grid);
|
||||
void SeedFixedIntegers(const std::vector<int> &seeds);
|
||||
}
|
||||
|
||||
template <class vobj> void random(GridParallelRNG &rng,Lattice<vobj> &l) { rng.fill(l,rng._uniform); }
|
||||
template <class vobj> void gaussian(GridParallelRNG &rng,Lattice<vobj> &l) { rng.fill(l,rng._gaussian); }
|
||||
|
||||
template <class sobj> void random(GridSerialRNG &rng,sobj &l) { rng.fill(l,rng._uniform ); }
|
||||
template <class sobj> void gaussian(GridSerialRNG &rng,sobj &l) { rng.fill(l,rng._gaussian ); }
|
||||
```
|
||||
|
||||
* Serial RNG's are used to assign scalar fields.
|
||||
|
||||
* Parallel RNG's are used to assign lattice fields and must subdivide the field grid (need not be conformable).
|
||||
|
||||
It is the API users responsibility to initialise, manage, save and restore these RNG state for their algorithm.
|
||||
In particular there is no single globally managed RNG state.
|
||||
|
||||
Input/Output routines are provided for saving and restoring RNG states.
|
||||
|
||||
`lib/parallelIO/BinaryIO.h`:
|
||||
|
||||
```c++
|
||||
|
||||
////////////////////////////////////////////////////////////////////////////
|
||||
// Read a RNG; use IOobject and lexico map to an array of state
|
||||
////////////////////////////////////////////////////////////////////////////
|
||||
static void readRNG(GridSerialRNG &serial,
|
||||
GridParallelRNG ¶llel,
|
||||
std::string file,
|
||||
Integer offset,
|
||||
uint32_t &nersc_csum,
|
||||
uint32_t &scidac_csuma,
|
||||
uint32_t &scidac_csumb)
|
||||
////////////////////////////////////////////////////////////////////////////
|
||||
// Write a RNG; lexico map to an array of state and use IOobject
|
||||
////////////////////////////////////////////////////////////////////////////
|
||||
static void writeRNG(GridSerialRNG &serial,
|
||||
GridParallelRNG ¶llel,
|
||||
std::string file,
|
||||
Integer offset,
|
||||
uint32_t &nersc_csum,
|
||||
uint32_t &scidac_csuma,
|
||||
uint32_t &scidac_csumb)
|
||||
|
||||
lib/parallelIO/NerscIO.h::
|
||||
|
||||
void writeRNGState(GridSerialRNG &serial,GridParallelRNG ¶llel,std::string file);
|
||||
|
||||
void readRNG(GridSerialRNG &serial,
|
||||
GridParallelRNG ¶llel,
|
||||
std::string file,
|
||||
Integer offset,
|
||||
uint32_t &nersc_csum,
|
||||
uint32_t &scidac_csuma,
|
||||
uint32_t &scidac_csumb);
|
||||
```
|
||||
|
||||
**Example**:
|
||||
|
||||
```c++
|
||||
NerscIO::writeRNGState(sRNG,pRNG,rfile);
|
||||
```
|
443
_pages/docs/API/tensor_classes.md
Normal file
443
_pages/docs/API/tensor_classes.md
Normal file
@ -0,0 +1,443 @@
|
||||
---
|
||||
title : "API Documentation"
|
||||
author_profile: false
|
||||
excerpt: "Tensor classes"
|
||||
header:
|
||||
overlay_color: "#5DADE2"
|
||||
permalink: /docs/API/tensor_classes.html
|
||||
sidebar:
|
||||
nav : docs
|
||||
---
|
||||
|
||||
The Tensor data structures are built up from fundamental
|
||||
scalar matrix and vector classes:
|
||||
|
||||
```c++
|
||||
template<class vobj > class iScalar { private: vobj _internal ; }
|
||||
template<class vobj,int N> class iVector { private: vobj _internal[N] ; }
|
||||
template<class vobj,int N> class iMatrix { private: vobj _internal[N] ; }
|
||||
```
|
||||
|
||||
These are template classes and can be passed a fundamental scalar or vector type, or
|
||||
nested to form arbitrarily complicated tensor products of indices. All mathematical expressions
|
||||
are defined to operate recursively, index by index.
|
||||
|
||||
Presently the constants
|
||||
|
||||
* `Nc`
|
||||
* `Nd`
|
||||
|
||||
are globally predefined. However, this is planned for changed in future and policy classes
|
||||
for different theories (e.g. QCD, QED, SU2 etc...) will contain these constants and enable multiple
|
||||
theories to coexist more naturally.
|
||||
|
||||
Arbitrary tensor products of fundamental scalar, vector
|
||||
and matrix objects may be formed in principle by the basic Grid code.
|
||||
|
||||
For Lattice field theory, we define types according to the following tensor
|
||||
product structure ordering. The suffix "D" indicates either double types, and
|
||||
replacing with "F" gives the corresponding single precision type.
|
||||
|
||||
|Lattice | Lorentz | Spin | Colour | scalar_type | Field |
|
||||
|---------|----------|-----------|----------|-------------|--------------------------|
|
||||
|Scalar | Scalar | Scalar | Scalar | RealD | RealD |
|
||||
|Scalar | Scalar | Scalar | Scalar | ComplexD | ComplexD |
|
||||
|Scalar | Scalar | Scalar | Matrix | ComplexD | ColourMatrixD |
|
||||
|Scalar | Vector | Scalar | Matrix | ComplexD | LorentzColourMatrixD |
|
||||
|Scalar | Scalar | Vector | Vector | ComplexD | SpinColourVectorD |
|
||||
|Scalar | Scalar | Vector | Vector | ComplexD | HalfSpinColourVectorD |
|
||||
|Scalar | Scalar | Matrix | Matrix | ComplexD | SpinColourMatrixD |
|
||||
|---------|----------|-----------|----------|-------------|--------------------------|
|
||||
|
||||
The types are implemented via a recursive tensor nesting system.
|
||||
|
||||
**Example** we declare:
|
||||
|
||||
```c++
|
||||
template<typename vtype>
|
||||
using iLorentzColourMatrix = iVector<iScalar<iMatrix<vtype, Nc> >, Nd > ;
|
||||
|
||||
typedef iLorentzColourMatrix<ComplexD > LorentzColourMatrixD;
|
||||
```
|
||||
|
||||
**Example** we declare:
|
||||
|
||||
```c++
|
||||
template<typename vtype>
|
||||
using iLorentzColourMatrix = iVector<iScalar<iMatrix<vtype, Nc> >, Nd > ;
|
||||
|
||||
typedef iLorentzColourMatrix<ComplexD > LorentzColourMatrixD;
|
||||
```
|
||||
|
||||
Arbitrarily deep tensor nests may be formed. Grid uses a positional and numerical rule to associate indices for contraction
|
||||
in the Einstein summation sense.
|
||||
|
||||
|----------------|---------|------------|
|
||||
| Symbolic name | Number | Position |
|
||||
|----------------|---------|------------|
|
||||
|`LorentzIndex` | 0 | left |
|
||||
|`SpinIndex` | 1 | middle |
|
||||
|`ColourIndex` | 2 | right |
|
||||
|----------------|---------|------------|
|
||||
|
||||
The conventions are that the index ordering left to right are: Lorentz, Spin, Colour. A scalar type (either real
|
||||
or complex, single or double precision) is be provided to the innermost structure.
|
||||
|
||||
|
||||
### Tensor arithmetic rules (`lib/tensors/Tensor_arith.h`)
|
||||
|
||||
Arithmetic rules are defined on these types
|
||||
|
||||
The multiplication operator follows the natural multiplication
|
||||
table for each index, index level by index level.
|
||||
|
||||
`Operator *`
|
||||
|
||||
|----|----|----|----|
|
||||
| x | S | V | M |
|
||||
|----|----|----|----|
|
||||
| S | S | V | M |
|
||||
| V | S | S | V |
|
||||
| M | M | V | M |
|
||||
|----|----|----|----|
|
||||
|
||||
The addition and subtraction rules disallow a scalar to be added to a vector,
|
||||
and vector to be added to matrix. A scalar adds to a matrix on the diagonal.
|
||||
|
||||
`Operator +` and `Operator -`
|
||||
|
||||
|----|----|----|----|
|
||||
| +/-| S | V | M |
|
||||
|----|----|----|----|
|
||||
| S | S | - | M |
|
||||
| V | - | V | - |
|
||||
| M | M | - | M |
|
||||
|----|----|----|----|
|
||||
|
||||
|
||||
The rules for a nested objects are recursively inferred level by level from basic rules of multiplication
|
||||
addition and subtraction for scalar/vector/matrix. Legal expressions can only be formed between objects
|
||||
with the same number of nested internal indices. All the Grid QCD datatypes have precisely three internal
|
||||
indices, some of which may be trivial scalar to enable expressions to be formed.
|
||||
|
||||
Arithmetic operations are possible where the left or right operand is a scalar type.
|
||||
|
||||
**Example**:
|
||||
|
||||
```c++
|
||||
LatticeColourMatrixD U(grid);
|
||||
LatticeColourMatrixD Udag(grid);
|
||||
|
||||
Udag = adj(U);
|
||||
|
||||
RealD unitary_err = norm2(U*adj(U) - 1.0);
|
||||
```
|
||||
|
||||
Will provide a measure of how discrepant from unitarity the matrix U is.
|
||||
|
||||
### Internal index manipulation (`lib/tensors/Tensor_index.h`)
|
||||
|
||||
General code can access any specific index by number with a peek/poke semantic:
|
||||
|
||||
```c++
|
||||
// peek index number "Level" of a vector index
|
||||
template<int Level,class vtype> auto peekIndex (const vtype &arg,int i);
|
||||
|
||||
// peek index number "Level" of a vector index
|
||||
template<int Level,class vtype> auto peekIndex (const vtype &arg,int i,int j);
|
||||
|
||||
// poke index number "Level" of a vector index
|
||||
template<int Level,class vtype>
|
||||
void pokeIndex (vtype &pokeme,arg,int i)
|
||||
|
||||
// poke index number "Level" of a matrix index
|
||||
template<int Level,class vtype>
|
||||
void pokeIndex (vtype &pokeme,arg,int i,int j)
|
||||
```
|
||||
|
||||
**Example**:
|
||||
|
||||
```c++
|
||||
for (int mu = 0; mu < Nd; mu++) {
|
||||
U[mu] = PeekIndex<LorentzIndex>(Umu, mu);
|
||||
}
|
||||
```
|
||||
|
||||
Similar to the QDP++ package convenience routines are provided to access specific elements of
|
||||
vector and matrix internal index types by physics name or meaning aliases for the above routines
|
||||
with the appropriate index constant.
|
||||
|
||||
* `peekColour`
|
||||
* `peekSpin`
|
||||
* `peekLorentz`
|
||||
|
||||
and
|
||||
|
||||
* `pokeColour`
|
||||
* `pokeSpin`
|
||||
* `pokeLorentz`
|
||||
|
||||
For example, we often store Gauge Fields with a Lorentz index, but can split them into
|
||||
polarisations in relevant pieces of code.
|
||||
|
||||
**Example**:
|
||||
|
||||
```c++
|
||||
for (int mu = 0; mu < Nd; mu++) {
|
||||
U[mu] = peekLorentz(Umu, mu);
|
||||
}
|
||||
```
|
||||
|
||||
For convenience, direct access as both an l-value and an r-value is provided by the parenthesis operator () on each of the Scalar, Vector and Matrix classes.
|
||||
For example one may write
|
||||
|
||||
**Example**:
|
||||
|
||||
```c++
|
||||
ColourMatrix A, B;
|
||||
A()()(i,j) = B()()(j,i);
|
||||
```
|
||||
|
||||
bearing in mind that empty parentheses are need to address a scalar entry in the tensor index nest.
|
||||
|
||||
The first (left) empty parentheses move past the (scalar) Lorentz level in the tensor nest, and the second
|
||||
(middle) empty parantheses move past the (scalar) spin level. The (i,j) index the colour matrix.
|
||||
|
||||
Other examples are easy to form for the many cases, and should be obvious to the reader.
|
||||
This form of addressing is convenient and saves peek, modifying, poke
|
||||
multiple temporary objects when both spin and colour indices are being accessed.
|
||||
There are many cases where multiple lines of code are required with a peek/poke semantic which are
|
||||
easier with direct l-value and r-value addressing.
|
||||
|
||||
|
||||
### Matrix operations
|
||||
|
||||
Transposition and tracing specific internal indices are possible using:
|
||||
|
||||
```c++
|
||||
template<int Level,class vtype>
|
||||
auto traceIndex (const vtype &arg)
|
||||
|
||||
template<int Level,class vtype>
|
||||
auto transposeIndex (const vtype &arg)
|
||||
```
|
||||
|
||||
These may be used as
|
||||
|
||||
**Example**:
|
||||
|
||||
```c++
|
||||
LatticeColourMatrixD Link(grid);
|
||||
ComplexD link_trace = traceIndex<ColourIndex> (Link);
|
||||
```
|
||||
|
||||
Again, convenience aliases for QCD naming schemes are provided via
|
||||
|
||||
* `traceColour`
|
||||
* `traceSpin`
|
||||
|
||||
* `transposeColour`
|
||||
* `transposeSpin`
|
||||
|
||||
**Example**:
|
||||
|
||||
```c++
|
||||
ComplexD link_trace = traceColour (Link);
|
||||
```
|
||||
|
||||
The operations only makes sense for matrix and scalar internal indices.
|
||||
|
||||
The trace and transpose over all indices is also defined for matrix and scalar types:
|
||||
|
||||
```c++
|
||||
template<class vtype,int N>
|
||||
auto trace(const iMatrix<vtype,N> &arg) -> iScalar
|
||||
|
||||
template<class vtype,int N>
|
||||
auto transpose(const iMatrix<vtype,N> &arg ) -> iMatrix
|
||||
```
|
||||
|
||||
Similar functions are:
|
||||
|
||||
* `conjugate`
|
||||
* `adjoint`
|
||||
|
||||
The traceless anti-Hermitian part is taken with:
|
||||
|
||||
```c++
|
||||
template<class vtype,int N> iMatrix<vtype,N>
|
||||
Ta(const iMatrix<vtype,N> &arg)
|
||||
```
|
||||
|
||||
Reunitarisation (or reorthogonalisation) is enabled by:
|
||||
|
||||
```c++
|
||||
template<class vtype,int N> iMatrix<vtype,N>
|
||||
ProjectOnGroup(const iMatrix<vtype,N> &arg)
|
||||
```
|
||||
|
||||
**Example**:
|
||||
|
||||
```c++
|
||||
LatticeColourMatrixD Mom(grid);
|
||||
LatticeColourMatrixD TaMom(grid);
|
||||
TaMom = Ta(Mom);
|
||||
```
|
||||
|
||||
### Querying internal index structure
|
||||
|
||||
Templated code may find it useful to use query functions on the Grid datatypes they are provided.
|
||||
For example general Serialisation and I/O code can inspect the nature of a type a routine has been
|
||||
asked to read from disk, or even generate descriptive type strings:
|
||||
|
||||
```c++
|
||||
////////////////////////////////////////////////////
|
||||
// Support type queries on template params:
|
||||
////////////////////////////////////////////////////
|
||||
// int _ColourScalar = isScalar<ColourIndex,vobj>();
|
||||
// int _ColourVector = isVector<ColourIndex,vobj>();
|
||||
// int _ColourMatrix = isMatrix<ColourIndex,vobj>();
|
||||
template<int Level,class vtype> int isScalar(void)
|
||||
template<int Level,class vtype> int isVector(void)
|
||||
template<int Level,class vtype> int isMatrix(void)
|
||||
```
|
||||
|
||||
**Example** (`lib/parallelIO/IldgIO.h`):
|
||||
|
||||
```c++
|
||||
template<class vobj> std::string ScidacRecordTypeString(int &colors, int &spins, int & typesize,int &datacount) {
|
||||
|
||||
/////////////////////////////////////////
|
||||
// Encode a generic tensor as a string
|
||||
/////////////////////////////////////////
|
||||
|
||||
typedef typename getPrecision<vobj>::real_scalar_type stype;
|
||||
|
||||
int _ColourN = indexRank<ColourIndex,vobj>();
|
||||
int _ColourScalar = isScalar<ColourIndex,vobj>();
|
||||
int _ColourVector = isVector<ColourIndex,vobj>();
|
||||
int _ColourMatrix = isMatrix<ColourIndex,vobj>();
|
||||
|
||||
int _SpinN = indexRank<SpinIndex,vobj>();
|
||||
int _SpinScalar = isScalar<SpinIndex,vobj>();
|
||||
int _SpinVector = isVector<SpinIndex,vobj>();
|
||||
int _SpinMatrix = isMatrix<SpinIndex,vobj>();
|
||||
|
||||
int _LorentzN = indexRank<LorentzIndex,vobj>();
|
||||
int _LorentzScalar = isScalar<LorentzIndex,vobj>();
|
||||
int _LorentzVector = isVector<LorentzIndex,vobj>();
|
||||
int _LorentzMatrix = isMatrix<LorentzIndex,vobj>();
|
||||
|
||||
std::stringstream stream;
|
||||
|
||||
stream << "GRID_";
|
||||
stream << ScidacWordMnemonic<stype>();
|
||||
|
||||
if ( _LorentzVector ) stream << "_LorentzVector"<<_LorentzN;
|
||||
if ( _LorentzMatrix ) stream << "_LorentzMatrix"<<_LorentzN;
|
||||
|
||||
if ( _SpinVector ) stream << "_SpinVector"<<_SpinN;
|
||||
if ( _SpinMatrix ) stream << "_SpinMatrix"<<_SpinN;
|
||||
|
||||
if ( _ColourVector ) stream << "_ColourVector"<<_ColourN;
|
||||
if ( _ColourMatrix ) stream << "_ColourMatrix"<<_ColourN;
|
||||
|
||||
if ( _ColourScalar && _LorentzScalar && _SpinScalar ) stream << "_Complex";
|
||||
|
||||
typesize = sizeof(typename vobj::scalar_type);
|
||||
|
||||
if ( _ColourMatrix ) typesize*= _ColourN*_ColourN;
|
||||
else typesize*= _ColourN;
|
||||
|
||||
if ( _SpinMatrix ) typesize*= _SpinN*_SpinN;
|
||||
else typesize*= _SpinN;
|
||||
|
||||
};
|
||||
```
|
||||
|
||||
### Inner and outer products
|
||||
|
||||
We recursively define (`tensors/Tensor_inner.h`), ultimately returning scalar in all indices:
|
||||
|
||||
```c++
|
||||
/////////////////////////////////////////////////////////////////////////
|
||||
// innerProduct Scalar x Scalar -> Scalar
|
||||
// innerProduct Vector x Vector -> Scalar
|
||||
// innerProduct Matrix x Matrix -> Scalar
|
||||
/////////////////////////////////////////////////////////////////////////
|
||||
template<class l,class r>
|
||||
auto innerProductD (const iScalar<l>& lhs,const iScalar<r>& rhs)
|
||||
|
||||
template<class l,class r,int N>
|
||||
auto innerProductD (const iVector<l,N>& lhs,const iVector<r,N>& rhs)
|
||||
|
||||
template<class l,class r,int N>
|
||||
auto innerProductD (const iMatrix<l,N>& lhs,const iMatrix<r,N>& rhs)
|
||||
|
||||
template<class l,class r>
|
||||
auto innerProduct (const iScalar<l>& lhs,const iScalar<r>& rhs)
|
||||
|
||||
template<class l,class r,int N>
|
||||
auto innerProduct (const iVector<l,N>& lhs,const iVector<r,N>& rhs)
|
||||
|
||||
template<class l,class r,int N>
|
||||
auto innerProduct (const iMatrix<l,N>& lhs,const iMatrix<r,N>& rhs)
|
||||
```
|
||||
|
||||
The sum is always performed in double precision for the `innerProductD` variant.
|
||||
|
||||
We recursively define (`tensors/Tensor_outer.h`):
|
||||
|
||||
```c++
|
||||
/////////////////////////////////////////////////////////////////////////
|
||||
// outerProduct Scalar x Scalar -> Scalar
|
||||
// Vector x Vector -> Matrix
|
||||
/////////////////////////////////////////////////////////////////////////
|
||||
template<class l,class r>
|
||||
auto outerProduct (const iScalar<l>& lhs,const iScalar<r>& rhs)
|
||||
|
||||
template<class l,class r,int N>
|
||||
auto outerProduct (const iVector<l,N>& lhs,const iVector<r,N>& rhs)
|
||||
```
|
||||
|
||||
### Functions of Tensor
|
||||
|
||||
The following unary functions are defined, which operate element by element on a tensor
|
||||
data structure:
|
||||
|
||||
```c++
|
||||
sqrt();
|
||||
rsqrt();
|
||||
sin();
|
||||
cos();
|
||||
asin();
|
||||
acos();
|
||||
log();
|
||||
exp();
|
||||
abs();
|
||||
Not();
|
||||
toReal();
|
||||
toComplex();
|
||||
```
|
||||
|
||||
Element wise functions are defined for::
|
||||
|
||||
```c++
|
||||
div(tensor,Integer);
|
||||
mod(tensor,Integer);
|
||||
pow(tensor,RealD);
|
||||
```
|
||||
|
||||
Matrix exponentiation (as opposed to element wise exponentiation is implemented via power series in::
|
||||
|
||||
```c++
|
||||
Exponentiate(const Tensor &r ,RealD alpha, Integer Nexp = DEFAULT_MAT_EXP)
|
||||
```
|
||||
|
||||
the exponentiation is distributive across vector indices (i.e. proceeds component by component for a `LorentzColourMatrix`).
|
||||
|
||||
Determinant is similar::
|
||||
|
||||
```c++
|
||||
iScalar Determinant(const Tensor &r )
|
||||
```
|
25
_pages/docs/API/vectorisation.md
Normal file
25
_pages/docs/API/vectorisation.md
Normal file
@ -0,0 +1,25 @@
|
||||
---
|
||||
title : "API Documentation"
|
||||
author_profile: false
|
||||
excerpt: "Vectorisation"
|
||||
header:
|
||||
overlay_color: "#5DADE2"
|
||||
permalink: /docs/API/vectorisation.html
|
||||
sidebar:
|
||||
nav : docs
|
||||
---
|
||||
|
||||
|
||||
Internally, Grid defines a portable abstraction SIMD vectorisation, via the following types:
|
||||
|
||||
* `vRealF`
|
||||
|
||||
* `vRealD`
|
||||
|
||||
* `vComplexF`
|
||||
|
||||
* `vComplexD`
|
||||
|
||||
These have the usual range of arithmetic operators and functions acting upon them. They do not form
|
||||
part of the API, but are mentioned to (partially) explain the need for controlling the
|
||||
layout transformation in lattice objects.
|
Reference in New Issue
Block a user