Grid/TODO

FUNCTIONALITY:
* Conditional execution, where etc...                -----DONE, simple test
* Integer relational support                         -----DONE
* Coordinate information, integers etc...            -----DONE
* Integer type padding/union to vector.              -----DONE 
* LatticeCoordinate[mu]                              -----DONE
* expose traceIndex, peekIndex, transposeIndex etc at the Lattice Level -- DONE
* TraceColor, TraceSpin.                             ----- DONE (traceIndex<1>,traceIndex<2>, transposeIndex<1>,transposeIndex<2>)
                                                     ----- Implement mapping between traceColour and traceSpin and traceIndex<1/2>.
* How to do U[mu] ... lorentz part of type structure or not. more like chroma if not. -- DONE

* subdirs lib, tests ??                              ----- DONE
  - lib/math
  - lib/cartesian
  - lib/cshift
  - lib/stencil
  - lib/communicator
  - lib/algorithms
  - lib/qcd
 future
  - lib/io/   -- GridLog, GridIn, GridErr, GridDebug, GridMessage
  - lib/qcd/actions
  - lib/qcd/measurements


Not done, or just incomplete
* random number generation

* Consider switch std::vector to boost arrays.
  boost::multi_array<type, 3> A()...    to replace multi1d, multi2d etc..

* How to define simple matrix operations, such as flavour matrices?

* Dirac, Pauli, SU subgroup, etc.. * Gamma/Dirac structures

* Fourspin, two spin project

* su3 exponentiation, log etc.. [Jamie's code?]

* Stencil operator support                           -----Initial thoughts, trial implementation DONE.
                                                     -----some simple tests that Stencil matches Cshift.
                                                     -----do all permute in comms phase, so that copy permute
						     -----cases move into a buffer.
						     -----allow transform in/out buffers spproj

* CovariantShift support                             -----Use a class to store gauge field? (parallel transport?)

* Subset support, slice sums etc...                  -----Only need slice sum?
                                                     -----Generic cartesian subslicing?
                                                     -----Array ranges / boost extents?
                                                     -----Multigrid grid transferral?
                                                     -----Suggests generalised cartesian subblocking
                                                          sums, returning modified grid?
					             -----What should interface be?

* Grid transferral
  * pickCheckerboard, pickSubPlane, pickSubBlock,
  *                    sumSubPlane, sumSubBlocks

* rb4d support.

* Check for missing functionality                    - partially audited against QDP++ layout

* Optimise the extract/merge SIMD routines; Azusa??

 - I have collated into single location at least.
 - Need to use _mm_*insert/extract routines.

* Conformable test in Cshift routines.


* Broadcast, reduction tests. innerProduct, localInnerProduct

* QDP++ regression suite and comparative benchmark

* I/O support

* NERSC Lattice loading, plaquette test

  - MPI IO?
  - BinaryWriter, TextWriter etc...
  - protocol buffers?

AUDITS:
// Lattice support audit                 Tested in Grid_main.cc
//
//     -=,+=,*=                           Y
//     add,+,sub,-,mult,mac,*             Y
//     innerProduct,norm2                 Y
//     localInnerProduct,outerProduct,    Y
//     adj,conj                           Y
//     transpose,                         Y
//     trace                              Y
//
//     transposeIndex                     Y
//     traceIndex                         Y
//     peekIndex                          Y
//
//     real,imag                          missing, semantic thought needed on real/im support.
//                                        perhaps I just keep everything complex?
// 

* FIXME audit
* const audit
* Replace vset with a call to merge.; 
* care in Gmerge,Gextract over vset .
* extract / merge extra implementation removal      
* Test infrastructure

[ More on subsets and grid transfers ]
i)  Three classes of subset;   red black parity subsetting (pick checkerboard).
                             cartesian sub-block subsetting
                             rbNd 

ii) Need to be able to project one Grid to another Grid.

Lattice<vobj> coarse_data SubBlockSum (GridBase *CoarseGrid, Lattice<vobj> &fine_data)

Operation ensure either:
 rd[dim] divide rd[dim] fine_data

This will give a distributed array over mpi ranks in a given dim IF coarse gd != 1 and _processors[d]>1
Dimension can be *replicated* on all ranks in dimension. Need a "replicated" option on GridCartesian etc..

This will give "slice" summation and fourier projection assistance.

    Generic concept is to subdivide (based on RD so applies to red/black or full).
    Return a type on SUB-grid from CellSum TOP-grid
    SUB-grid need not distribute but be replicated in some dims if that is how the
    cartesian communicator works.

Instead of subsetting 

iii) No general permutation map.


 ? Cell definition <-> sliceSum.
 ? Replicated arrays.


// Cartesian grid inheritance
//            Grid::GridBase
//                     |
//           __________|___________
//          |                      |
// Grid::GridCartesian   Grid::GridCartesianRedBlack
//
// TODO: document the following as an API guaranteed public interface

    /* 
     *       Rough map of functionality against QDP++ Layout
     *
     *       Param     |     Grid                     |     QDP++             
     *       -----------------------------------------
     *                 |                              |
     *        void     |     oSites, iSites, lSites   |  sitesOnNode 
     *        void     |     gSites                   |  vol
     *                 |                              |
     *        gcoor    |     oIndex, iIndex           |  linearSiteIndex // no virtual node in QDP
     *        lcoor    |                              |
     * 
     *        void     |     CheckerBoarded           |  -        // No checkerboarded in QDP
     *        void     |     FullDimensions           |  lattSize
     *        void     |     GlobalDimensions         |  lattSize // No checkerboarded in QDP
     *        void     |     LocalDimensions          |  subgridLattSize
     *        void     |     VirtualLocalDimensions   |  subgridLattSize // no virtual node in QDP
     *                 |                              |
     *       int x 3   |     oiSiteRankToGlobal       |  siteCoords
     *                 |     ProcessorCoorLocalCoorToGlobalCoor | 
     *                 |                              |
     *     vector<int> |     GlobalCoorToRankIndex   |  nodeNumber(coord)
     *     vector<int> |     GlobalCoorToProcessorCoorLocalCoor|  nodeCoord(coord)
     *                 |                              |
     *     void        |     Processors               |  logicalSize    // returns cart array shape
     *     void        |     ThisRank        |  nodeNumber();  // returns this node rank
     *     void        |     ThisProcessorCoor        |    // returns this node coor
     *     void        |     isBoss(void)             |  primaryNode();
     *                 |                              |
     *                 |     RankFromProcessorCoor    |  getLogicalCoorFrom(node)
     *                 |     ProcessorCoorFromRank    |  getNodeNumberFrom(logical_coord)
     */
  // Work out whether to permute 
  // ABCDEFGH ->   AE BF CG DH       permute              wrap num
  //
  // Shift 0       AE BF CG DH       0 0 0 0    ABCDEFGH   0   0
  // Shift 1       BF CG DH AE       0 0 0 1    BCDEFGHA   0   1
  // Shift 2       CG DH AE BF       0 0 1 1    CDEFGHAB   0   2
  // Shift 3       DH AE BF CG       0 1 1 1    DEFGHABC   0   3
  // Shift 4       AE BF CG DH       1 1 1 1    EFGHABCD   1   0 
  // Shift 5       BF CG DH AE       1 1 1 0    FGHACBDE   1   1 
  // Shift 6       CG DH AE BF       1 1 0 0    GHABCDEF   1   2
  // Shift 7       DH AE BF CG       1 0 0 0    HABCDEFG   1   3

  // Suppose 4way simd in one dim.
  // ABCDEFGH ->   AECG BFDH      permute              wrap num

  // Shift 0       AECG BFDH      0,00 0,00 ABCDEFGH         0     0
  // Shift 1       BFDH CGEA      0,00 1,01 BCDEFGHA         0     1
  // Shift 2       CGEA DHFB      1,01 1,01 CDEFGHAB         1     0
  // Shift 3       DHFB EAGC      1,01 1,11 DEFGHABC         1     1
  // Shift 4       EAGC FBHD      1,11 1,11 EFGHABCD         2     0 
  // Shift 5       FBHD GCAE      1,11 1,10 FGHABCDE         2     1
  // Shift 6       GCAE HDBF      1,10 1,10 GHABCDEF         3     0
  // Shift 7       HDBF AECG      1,10 0,00 HABCDEFG         3     1

  // Generalisation to 8 way simd, 16 way simd required.
  //
  // Need log2 Nway masks. consisting of 
  //	    1 bit  256 bit granule
  //	    2 bit  128 bit granule
  //        4 bits 64  bit granule
  //        8 bits 32  bit granules
  //
  //        15 bits....
    // TODO
    //
    // Base class to share common code between vRealF, VComplexF etc...
    //
    // lattice Broad cast assignment
    //
    // where() support
    // implement with masks, and/or? Type of the mask & boolean support?
    //
    // Unary functions
    // cos,sin, tan, acos, asin, cosh, acosh, tanh, sinh, // Scalar<vReal> only arg
    // exp, log, sqrt, fabs
    //
    // transposeColor, transposeSpin,
    // adjColor, adjSpin,
    // traceColor, traceSpin.
    // peekColor, peekSpin + pokeColor PokeSpin
    //
    // copyMask.
    //
    // localMaxAbs
    //
    // norm2,
    // sumMulti equivalent.
    // Fourier transform equivalent.
    //
peekIndex update 2015-04-18 14:36:01 +01:00			`FUNCTIONALITY:`
			`* Conditional execution, where etc... -----DONE, simple test`
			`* Integer relational support -----DONE`
			`* Coordinate information, integers etc... -----DONE`
			`* Integer type padding/union to vector. -----DONE`
			`* LatticeCoordinate[mu] -----DONE`
Reorganise to keep files smaller 2015-04-18 18:36:48 +01:00			`* expose traceIndex, peekIndex, transposeIndex etc at the Lattice Level -- DONE`
			`* TraceColor, TraceSpin. ----- DONE (traceIndex<1>,traceIndex<2>, transposeIndex<1>,transposeIndex<2>)`
			`----- Implement mapping between traceColour and traceSpin and traceIndex<1/2>.`
			`* How to do U[mu] ... lorentz part of type structure or not. more like chroma if not. -- DONE`

			`* subdirs lib, tests ?? ----- DONE`
Clean up 2015-04-18 20:52:40 +01:00			`- lib/math`
			`- lib/cartesian`
			`- lib/cshift`
			`- lib/stencil`
			`- lib/communicator`
			`- lib/algorithms`
			`- lib/qcd`
Update to task list 2015-04-19 14:55:16 +01:00			`future`
Update 2015-04-18 22:16:31 +01:00			`- lib/io/ -- GridLog, GridIn, GridErr, GridDebug, GridMessage`
Clean up 2015-04-18 20:52:40 +01:00			`- lib/qcd/actions`
			`- lib/qcd/measurements`
Update 2015-04-18 22:16:31 +01:00
Reorganise to keep files smaller 2015-04-18 18:36:48 +01:00
			`Not done, or just incomplete`
Update to task list 2015-04-19 14:55:16 +01:00			`* random number generation`
Reorganise to keep files smaller 2015-04-18 18:36:48 +01:00
			`* Consider switch std::vector to boost arrays.`
			`boost::multi_array<type, 3> A()... to replace multi1d, multi2d etc..`

			`* How to define simple matrix operations, such as flavour matrices?`

			`* Dirac, Pauli, SU subgroup, etc.. * Gamma/Dirac structures`
Update to task list 2015-04-19 14:55:16 +01:00
Reorganise to keep files smaller 2015-04-18 18:36:48 +01:00			`* Fourspin, two spin project`

			`* su3 exponentiation, log etc.. [Jamie's code?]`

			`* Stencil operator support -----Initial thoughts, trial implementation DONE.`
			`-----some simple tests that Stencil matches Cshift.`
			`-----do all permute in comms phase, so that copy permute`
			`-----cases move into a buffer.`
			`-----allow transform in/out buffers spproj`

			`* CovariantShift support -----Use a class to store gauge field? (parallel transport?)`

			`* Subset support, slice sums etc... -----Only need slice sum?`
			`-----Generic cartesian subslicing?`
			`-----Array ranges / boost extents?`
			`-----Multigrid grid transferral?`
			`-----Suggests generalised cartesian subblocking`
			`sums, returning modified grid?`
			`-----What should interface be?`

			`* Grid transferral`
			`* pickCheckerboard, pickSubPlane, pickSubBlock,`
			`* sumSubPlane, sumSubBlocks`

			`* rb4d support.`

			`* Check for missing functionality - partially audited against QDP++ layout`

			`* Optimise the extract/merge SIMD routines; Azusa??`

			`- I have collated into single location at least.`
			`- Need to use _mm_*insert/extract routines.`

			`* Conformable test in Cshift routines.`



			`* Broadcast, reduction tests. innerProduct, localInnerProduct`

			`* QDP++ regression suite and comparative benchmark`

			`* I/O support`

			`* NERSC Lattice loading, plaquette test`

			`- MPI IO?`
			`- BinaryWriter, TextWriter etc...`
			`- protocol buffers?`
"where" and integer comparisons logic implemented for conditional assignment. LatticeCoordinate helper to get global (reduced) coordinate. Some more work of similar type perhaps needed, but the bulk of the required structure for masked array assignment is now in place. 2015-04-09 07:06:03 +01:00
Reduce now going through MPI. 2015-04-14 22:40:40 +01:00			`AUDITS:`
peekIndex update 2015-04-18 14:36:01 +01:00			`// Lattice support audit Tested in Grid_main.cc`
			`//`
			`// -=,+=,*= Y`
			`// add,+,sub,-,mult,mac,* Y`
			`// innerProduct,norm2 Y`
			`// localInnerProduct,outerProduct, Y`
			`// adj,conj Y`
			`// transpose, Y`
			`// trace Y`
			`//`
			`// transposeIndex Y`
			`// traceIndex Y`
Reorganise to keep files smaller 2015-04-18 18:36:48 +01:00			`// peekIndex Y`
peekIndex update 2015-04-18 14:36:01 +01:00			`//`
			`// real,imag missing, semantic thought needed on real/im support.`
			`// perhaps I just keep everything complex?`
			`//`

Bringing in LatticeInteger with the idea of implemented predicated assignment, subsets etc. c.f the QDP++ "where" syntax 2015-04-06 06:30:48 +01:00			`* FIXME audit`
Reduce now going through MPI. 2015-04-14 22:40:40 +01:00			`* const audit`
Fixing nocompile 2015-04-10 04:24:01 +01:00			`* Replace vset with a call to merge.;`
Modified 2015-04-14 20:25:51 +01:00			`* care in Gmerge,Gextract over vset .`
Stencil code pretty much shaken out. Beginning of inner product and norm2. 2015-04-14 20:22:04 +01:00			`* extract / merge extra implementation removal`
spin trace type work 2015-04-16 14:48:21 +01:00			`* Test infrastructure`
peekIndex update 2015-04-18 14:36:01 +01:00
Reorganise to keep files smaller 2015-04-18 18:36:48 +01:00			`[ More on subsets and grid transfers ]`
			`i) Three classes of subset; red black parity subsetting (pick checkerboard).`
Fixing nocompile 2015-04-10 04:24:01 +01:00			`cartesian sub-block subsetting`
Reorganise to keep files smaller 2015-04-18 18:36:48 +01:00			`rbNd`
Fixing nocompile 2015-04-10 04:24:01 +01:00
"where" and integer comparisons logic implemented for conditional assignment. LatticeCoordinate helper to get global (reduced) coordinate. Some more work of similar type perhaps needed, but the bulk of the required structure for masked array assignment is now in place. 2015-04-09 07:06:03 +01:00			`ii) Need to be able to project one Grid to another Grid.`
Fixing nocompile 2015-04-10 04:24:01 +01:00
			`Lattice<vobj> coarse_data SubBlockSum (GridBase *CoarseGrid, Lattice<vobj> &fine_data)`

			`Operation ensure either:`
			`rd[dim] divide rd[dim] fine_data`

			`This will give a distributed array over mpi ranks in a given dim IF coarse gd != 1 and _processors[d]>1`
			`Dimension can be replicated on all ranks in dimension. Need a "replicated" option on GridCartesian etc..`

			`This will give "slice" summation and fourier projection assistance.`

"where" and integer comparisons logic implemented for conditional assignment. LatticeCoordinate helper to get global (reduced) coordinate. Some more work of similar type perhaps needed, but the bulk of the required structure for masked array assignment is now in place. 2015-04-09 07:06:03 +01:00			`Generic concept is to subdivide (based on RD so applies to red/black or full).`
			`Return a type on SUB-grid from CellSum TOP-grid`
			`SUB-grid need not distribute but be replicated in some dims if that is how the`
			`cartesian communicator works.`

Fixing nocompile 2015-04-10 04:24:01 +01:00			`Instead of subsetting`
"where" and integer comparisons logic implemented for conditional assignment. LatticeCoordinate helper to get global (reduced) coordinate. Some more work of similar type perhaps needed, but the bulk of the required structure for masked array assignment is now in place. 2015-04-09 07:06:03 +01:00
Fixing nocompile 2015-04-10 04:24:01 +01:00			`iii) No general permutation map.`
"where" and integer comparisons logic implemented for conditional assignment. LatticeCoordinate helper to get global (reduced) coordinate. Some more work of similar type perhaps needed, but the bulk of the required structure for masked array assignment is now in place. 2015-04-09 07:06:03 +01:00

Reorganise to keep files smaller 2015-04-18 18:36:48 +01:00			`? Cell definition <-> sliceSum.`
"where" and integer comparisons logic implemented for conditional assignment. LatticeCoordinate helper to get global (reduced) coordinate. Some more work of similar type perhaps needed, but the bulk of the required structure for masked array assignment is now in place. 2015-04-09 07:06:03 +01:00			`? Replicated arrays.`
TODO list for preparing this for real use and QDP++ replacement. 2015-04-03 09:28:58 +01:00

"where" and integer comparisons logic implemented for conditional assignment. LatticeCoordinate helper to get global (reduced) coordinate. Some more work of similar type perhaps needed, but the bulk of the required structure for masked array assignment is now in place. 2015-04-09 07:06:03 +01:00
Modified 2015-04-14 20:25:51 +01:00
Major rework of extract/merge/permute processing debugged and working. 2015-04-06 11:26:24 +01:00			`// Cartesian grid inheritance`
			`// Grid::GridBase`
			`// \|`
			`// __________\|___________`
			`// \| \|`
			`// Grid::GridCartesian Grid::GridCartesianRedBlack`
			`//`
			`// TODO: document the following as an API guaranteed public interface`

			`/*`
			`* Rough map of functionality against QDP++ Layout`
			`*`
			`* Param \| Grid \| QDP++`
			`* -----------------------------------------`
			`* \| \|`
			`* void \| oSites, iSites, lSites \| sitesOnNode`
			`* void \| gSites \| vol`
			`* \| \|`
			`* gcoor \| oIndex, iIndex \| linearSiteIndex // no virtual node in QDP`
			`* lcoor \| \|`
			`*`
			`* void \| CheckerBoarded \| - // No checkerboarded in QDP`
			`* void \| FullDimensions \| lattSize`
			`* void \| GlobalDimensions \| lattSize // No checkerboarded in QDP`
			`* void \| LocalDimensions \| subgridLattSize`
			`* void \| VirtualLocalDimensions \| subgridLattSize // no virtual node in QDP`
			`* \| \|`
			`* int x 3 \| oiSiteRankToGlobal \| siteCoords`
			`* \| ProcessorCoorLocalCoorToGlobalCoor \|`
			`* \| \|`
			`* vector<int> \| GlobalCoorToRankIndex \| nodeNumber(coord)`
			`* vector<int> \| GlobalCoorToProcessorCoorLocalCoor\| nodeCoord(coord)`
			`* \| \|`
			`* void \| Processors \| logicalSize // returns cart array shape`
			`* void \| ThisRank \| nodeNumber(); // returns this node rank`
			`* void \| ThisProcessorCoor \| // returns this node coor`
			`* void \| isBoss(void) \| primaryNode();`
			`* \| \|`
			`* \| RankFromProcessorCoor \| getLogicalCoorFrom(node)`
			`* \| ProcessorCoorFromRank \| getNodeNumberFrom(logical_coord)`
			`*/`
			`// Work out whether to permute`
			`// ABCDEFGH -> AE BF CG DH permute wrap num`
			`//`
			`// Shift 0 AE BF CG DH 0 0 0 0 ABCDEFGH 0 0`
			`// Shift 1 BF CG DH AE 0 0 0 1 BCDEFGHA 0 1`
			`// Shift 2 CG DH AE BF 0 0 1 1 CDEFGHAB 0 2`
			`// Shift 3 DH AE BF CG 0 1 1 1 DEFGHABC 0 3`
			`// Shift 4 AE BF CG DH 1 1 1 1 EFGHABCD 1 0`
			`// Shift 5 BF CG DH AE 1 1 1 0 FGHACBDE 1 1`
			`// Shift 6 CG DH AE BF 1 1 0 0 GHABCDEF 1 2`
			`// Shift 7 DH AE BF CG 1 0 0 0 HABCDEFG 1 3`

			`// Suppose 4way simd in one dim.`
			`// ABCDEFGH -> AECG BFDH permute wrap num`

			`// Shift 0 AECG BFDH 0,00 0,00 ABCDEFGH 0 0`
			`// Shift 1 BFDH CGEA 0,00 1,01 BCDEFGHA 0 1`
			`// Shift 2 CGEA DHFB 1,01 1,01 CDEFGHAB 1 0`
			`// Shift 3 DHFB EAGC 1,01 1,11 DEFGHABC 1 1`
			`// Shift 4 EAGC FBHD 1,11 1,11 EFGHABCD 2 0`
			`// Shift 5 FBHD GCAE 1,11 1,10 FGHABCDE 2 1`
			`// Shift 6 GCAE HDBF 1,10 1,10 GHABCDEF 3 0`
			`// Shift 7 HDBF AECG 1,10 0,00 HABCDEFG 3 1`

			`// Generalisation to 8 way simd, 16 way simd required.`
			`//`
			`// Need log2 Nway masks. consisting of`
			`// 1 bit 256 bit granule`
			`// 2 bit 128 bit granule`
			`// 4 bits 64 bit granule`
			`// 8 bits 32 bit granules`
			`//`
			`// 15 bits....`
			`// TODO`
			`//`
			`// Base class to share common code between vRealF, VComplexF etc...`
			`//`
			`// lattice Broad cast assignment`
			`//`
			`// where() support`
			`// implement with masks, and/or? Type of the mask & boolean support?`
			`//`
			`// Unary functions`
			`// cos,sin, tan, acos, asin, cosh, acosh, tanh, sinh, // Scalar<vReal> only arg`
			`// exp, log, sqrt, fabs`
			`//`
			`// transposeColor, transposeSpin,`
			`// adjColor, adjSpin,`
			`// traceColor, traceSpin.`
			`// peekColor, peekSpin + pokeColor PokeSpin`
			`//`
			`// copyMask.`
			`//`
			`// localMaxAbs`
			`//`
			`// norm2,`
			`// sumMulti equivalent.`
			`// Fourier transform equivalent.`
			`//`