diff --git a/TODO b/TODO index 46c12838..98dfc3b8 100644 --- a/TODO +++ b/TODO @@ -2,6 +2,10 @@ - use protocol buffers? replace xmlReader/Writer ec.. - Binary use htonll, htonl +* Reduce implemention is poor +* Bug in SeedFixedIntegers gives same output on each site. +* Bug in RNG with complex numbers ; only filling real values; need helper function -- DONE + * Stencil operator support -----Initial thoughts, trial implementation DONE. -----some simple tests that Stencil matches Cshift. -----do all permute in comms phase, so that copy permute @@ -11,6 +15,7 @@ * CovariantShift support -----Use a class to store gauge field? (parallel transport?) +* Strong test for norm2, conj and all primitive types. * Consider switch std::vector to boost arrays or something lighter weight boost::multi_array A()... to replace multi1d, multi2d etc.. @@ -33,10 +38,21 @@ * Make the Tensor types and Complex etc... play more nicely. -* TensorRemove is a hack, come up with a long term rationalised approach to Complex vs. Scalar > > + + - TensorRemove is a hack, come up with a long term rationalised approach to Complex vs. Scalar > > QDP forces use of "toDouble" to get back to non tensor scalar. This role is presently taken TensorRemove, but I want to introduce a syntax that does not require this. + - Reductions that contract indices on a site should always demote the tensor structure. + norm2(), innerProduct. + + - Result of Sum(), SliceSum // spatial sums + trace, traceIndex etc.. do not. + + - problem arises because "trace" returns Lattice moving everything down to Scalar, + and then Sum and SliceSum to not remove the Scalars. This would be fixed if we + template specialize the scalar scalar scalar sum and SliceSum, on the basis of being + pure scalar. * Optimise the extract/merge SIMD routines; Azusa?? - I have collated into single location at least. diff --git a/lib/Grid.h b/lib/Grid.h index 2a666c3b..d402b545 100644 --- a/lib/Grid.h +++ b/lib/Grid.h @@ -11,6 +11,7 @@ #define GRID_H #include + #include #include #include diff --git a/lib/Grid_config.h b/lib/Grid_config.h index 4152540e..f3d7016d 100644 --- a/lib/Grid_config.h +++ b/lib/Grid_config.h @@ -2,7 +2,7 @@ /* lib/Grid_config.h.in. Generated from configure.ac by autoheader. */ /* AVX */ -#define AVX1 1 +/* #undef AVX1 */ /* AVX2 */ /* #undef AVX2 */ @@ -77,7 +77,7 @@ #define PACKAGE_VERSION "1.0" /* SSE4 */ -/* #undef SSE4 */ +#define SSE4 1 /* Define to 1 if you have the ANSI C header files. */ #define STDC_HEADERS 1 diff --git a/lib/Grid_simd.h b/lib/Grid_simd.h index d8788e33..a059cc23 100644 --- a/lib/Grid_simd.h +++ b/lib/Grid_simd.h @@ -26,10 +26,15 @@ namespace Grid { typedef float RealF; typedef double RealD; - +#ifdef GRID_DEFAULT_PRECISION_DOUBLE + typedef RealD Real; +#else + typedef RealF Real; +#endif + typedef std::complex ComplexF; typedef std::complex ComplexD; - + typedef std::complex Complex; inline RealF adj(const RealF & r){ return r; } inline RealF conj(const RealF & r){ return r; } @@ -63,8 +68,8 @@ namespace Grid { //conj already supported for complex inline ComplexF timesI(const ComplexF r) { return(r*ComplexF(0.0,1.0));} - inline ComplexF timesMinusI(const ComplexF r){ return(r*ComplexF(0.0,-1.0));} inline ComplexD timesI(const ComplexD r) { return(r*ComplexD(0.0,1.0));} + inline ComplexF timesMinusI(const ComplexF r){ return(r*ComplexF(0.0,-1.0));} inline ComplexD timesMinusI(const ComplexD r){ return(r*ComplexD(0.0,-1.0));} inline void mac (RealD * __restrict__ y,const RealD * __restrict__ a,const RealD *__restrict__ x){ *y = (*a) * (*x)+(*y);} @@ -280,15 +285,11 @@ namespace Grid { // Default precision #ifdef GRID_DEFAULT_PRECISION_DOUBLE - typedef RealD Real; typedef vRealD vReal; typedef vComplexD vComplex; - typedef std::complex Complex; #else - typedef RealF Real; typedef vRealF vReal; typedef vComplexF vComplex; - typedef std::complex Complex; #endif } #endif diff --git a/lib/Grid_stencil.h b/lib/Grid_stencil.h index 7bafb879..83db0a80 100644 --- a/lib/Grid_stencil.h +++ b/lib/Grid_stencil.h @@ -47,6 +47,101 @@ namespace Grid { int from_rank; } ; + +/////////////////////////////////////////////////////////////////// +// Gather for when there is no need to SIMD split with compression +/////////////////////////////////////////////////////////////////// +template void +Gather_plane_simple (Lattice &rhs,std::vector > &buffer,int dimension,int plane,int cbmask,compressor &compress) +{ + int rd = rhs._grid->_rdimensions[dimension]; + + if ( !rhs._grid->CheckerBoarded(dimension) ) { + + int so = plane*rhs._grid->_ostride[dimension]; // base offset for start of plane + int o = 0; // relative offset to base within plane + int bo = 0; // offset in buffer + + // Simple block stride gather of SIMD objects +#pragma omp parallel for collapse(2) + for(int n=0;n_slice_nblock[dimension];n++){ + for(int b=0;b_slice_block[dimension];b++){ + buffer[bo++]=compress(rhs._odata[so+o+b]); + } + o +=rhs._grid->_slice_stride[dimension]; + } + + } else { + + int so = plane*rhs._grid->_ostride[dimension]; // base offset for start of plane + int o = 0; // relative offset to base within plane + int bo = 0; // offset in buffer + +#pragma omp parallel for collapse(2) + for(int n=0;n_slice_nblock[dimension];n++){ + for(int b=0;b_slice_block[dimension];b++){ + + int ocb=1<CheckerBoardFromOindex(o+b);// Could easily be a table lookup + if ( ocb &cbmask ) { + buffer[bo]=compress(rhs._odata[so+o+b]); + bo++; + } + + } + o +=rhs._grid->_slice_stride[dimension]; + } + } +} + +/////////////////////////////////////////////////////////////////// +// Gather for when there *is* need to SIMD split with compression +/////////////////////////////////////////////////////////////////// +template void + Gather_plane_extract(Lattice &rhs,std::vector pointers,int dimension,int plane,int cbmask,compressor &compress) +{ + int rd = rhs._grid->_rdimensions[dimension]; + + if ( !rhs._grid->CheckerBoarded(dimension) ) { + + int so = plane*rhs._grid->_ostride[dimension]; // base offset for start of plane + int o = 0; // relative offset to base within plane + int bo = 0; // offset in buffer + + // Simple block stride gather of SIMD objects +#pragma omp parallel for collapse(2) + for(int n=0;n_slice_nblock[dimension];n++){ + for(int b=0;b_slice_block[dimension];b++){ + cobj temp; + temp=compress(rhs._odata[so+o+b]); + extract(temp,pointers); + } + o +=rhs._grid->_slice_stride[dimension]; + } + + } else { + + int so = plane*rhs._grid->_ostride[dimension]; // base offset for start of plane + int o = 0; // relative offset to base within plane + int bo = 0; // offset in buffer + +#pragma omp parallel for collapse(2) + for(int n=0;n_slice_nblock[dimension];n++){ + for(int b=0;b_slice_block[dimension];b++){ + + int ocb=1<CheckerBoardFromOindex(o+b); + if ( ocb & cbmask ) { + cobj temp; + temp =compress(rhs._odata[so+o+b]); + extract(temp,pointers); + } + + } + o +=rhs._grid->_slice_stride[dimension]; + } + } +} + + class CartesianStencil { // Stencil runs along coordinate axes only; NO diagonal fill in. public: @@ -86,8 +181,8 @@ namespace Grid { // Could allow a functional munging of the halo to another type during the comms. // this could implement the 16bit/32bit/64bit compression. - template void HaloExchange(Lattice &source, - std::vector > &u_comm_buf) + template void + HaloExchange(Lattice &source,std::vector > &u_comm_buf,compressor &compress) { // conformable(source._grid,_grid); assert(source._grid==_grid); @@ -95,12 +190,10 @@ namespace Grid { int u_comm_offset=0; // Gather all comms buffers - typedef typename vobj::vector_type vector_type; - typedef typename vobj::scalar_type scalar_type; - for(int point = 0 ; point < _npoints; point++) { - printf("Point %d \n",point);fflush(stdout); + compress.Point(point); + int dimension = _directions[point]; int displacement = _distances[point]; @@ -126,33 +219,30 @@ namespace Grid { sshift[1] = _grid->CheckerBoardShift(_checkerboard,dimension,shift,1); if ( sshift[0] == sshift[1] ) { if (splice_dim) { - printf("splice 0x3 \n");fflush(stdout); - GatherStartCommsSimd(source,dimension,shift,0x3,u_comm_buf,u_comm_offset); + GatherStartCommsSimd(source,dimension,shift,0x3,u_comm_buf,u_comm_offset,compress); } else { - printf("NO splice 0x3 \n");fflush(stdout); - GatherStartComms(source,dimension,shift,0x3,u_comm_buf,u_comm_offset); + GatherStartComms(source,dimension,shift,0x3,u_comm_buf,u_comm_offset,compress); } } else { if(splice_dim){ - printf("splice 0x1,2 \n");fflush(stdout); - GatherStartCommsSimd(source,dimension,shift,0x1,u_comm_buf,u_comm_offset);// if checkerboard is unfavourable take two passes - GatherStartCommsSimd(source,dimension,shift,0x2,u_comm_buf,u_comm_offset);// both with block stride loop iteration + GatherStartCommsSimd(source,dimension,shift,0x1,u_comm_buf,u_comm_offset,compress);// if checkerboard is unfavourable take two passes + GatherStartCommsSimd(source,dimension,shift,0x2,u_comm_buf,u_comm_offset,compress);// both with block stride loop iteration } else { - printf("NO splice 0x1,2 \n");fflush(stdout); - GatherStartComms(source,dimension,shift,0x1,u_comm_buf,u_comm_offset); - GatherStartComms(source,dimension,shift,0x2,u_comm_buf,u_comm_offset); + GatherStartComms(source,dimension,shift,0x1,u_comm_buf,u_comm_offset,compress); + GatherStartComms(source,dimension,shift,0x2,u_comm_buf,u_comm_offset,compress); } } } } } - template void GatherStartComms(Lattice &rhs,int dimension,int shift,int cbmask, - std::vector > &u_comm_buf, - int &u_comm_offset) + template + void GatherStartComms(Lattice &rhs,int dimension,int shift,int cbmask, + std::vector > &u_comm_buf, + int &u_comm_offset,compressor & compress) { - typedef typename vobj::vector_type vector_type; - typedef typename vobj::scalar_type scalar_type; + typedef typename cobj::vector_type vector_type; + typedef typename cobj::scalar_type scalar_type; GridBase *grid=_grid; assert(rhs._grid==_grid); @@ -169,31 +259,26 @@ namespace Grid { int buffer_size = _grid->_slice_nblock[dimension]*_grid->_slice_block[dimension]; - std::vector > send_buf(buffer_size); // hmm... - std::vector > recv_buf(buffer_size); + std::vector > send_buf(buffer_size); // hmm... + std::vector > recv_buf(buffer_size); int cb= (cbmask==0x2)? 1 : 0; int sshift= _grid->CheckerBoardShift(rhs.checkerboard,dimension,shift,cb); for(int x=0;x= rd ); int sx = (x+sshift)%rd; int comm_proc = (x+sshift)/rd; if (offnode) { - printf("GatherStartComms offnode x %d\n",x);fflush(stdout); int words = send_buf.size(); if (cbmask != 0x3) words=words>>1; - int bytes = words * sizeof(vobj); + int bytes = words * sizeof(cobj); - printf("Gather_plane_simple dimension %d sx %d cbmask %d\n",dimension,sx,cbmask);fflush(stdout); - Gather_plane_simple (rhs,send_buf,dimension,sx,cbmask); - - printf("GatherStartComms gathered offnode x %d\n",x);fflush(stdout); + Gather_plane_simple (rhs,send_buf,dimension,sx,cbmask,compress); int rank = _grid->_processor; int recv_from_rank; @@ -219,14 +304,14 @@ namespace Grid { } - template + template void GatherStartCommsSimd(Lattice &rhs,int dimension,int shift,int cbmask, - std::vector > &u_comm_buf, - int &u_comm_offset) + std::vector > &u_comm_buf, + int &u_comm_offset,compressor &compress) { const int Nsimd = _grid->Nsimd(); - typedef typename vobj::vector_type vector_type; - typedef typename vobj::scalar_type scalar_type; + typedef typename cobj::vector_type vector_type; + typedef typename cobj::scalar_type scalar_type; int fd = _grid->_fdimensions[dimension]; int rd = _grid->_rdimensions[dimension]; @@ -245,7 +330,7 @@ namespace Grid { // Simd direction uses an extract/merge pair /////////////////////////////////////////////// int buffer_size = _grid->_slice_nblock[dimension]*_grid->_slice_block[dimension]; - int words = sizeof(vobj)/sizeof(vector_type); + int words = sizeof(cobj)/sizeof(vector_type); /* FIXME ALTERNATE BUFFER DETERMINATION */ std::vector > send_buf_extract(Nsimd,std::vector(buffer_size*words) ); @@ -285,7 +370,7 @@ namespace Grid { for(int i=0;i(rhs,pointers,dimension,sx,cbmask,compress); for(int i=0;i - inline auto operator + (const Lattice &lhs,const Lattice &rhs)-> Lattice + inline auto operator + (const Lattice &lhs,const Lattice &rhs)-> Lattice { //NB mult performs conformable check. Do not reapply here for performance. - Lattice ret(rhs._grid); + Lattice ret(rhs._grid); add(ret,lhs,rhs); return ret; } template - inline auto operator - (const Lattice &lhs,const Lattice &rhs)-> Lattice + inline auto operator - (const Lattice &lhs,const Lattice &rhs)-> Lattice { //NB mult performs conformable check. Do not reapply here for performance. - Lattice ret(rhs._grid); + Lattice ret(rhs._grid); sub(ret,lhs,rhs); return ret; } @@ -107,9 +107,9 @@ namespace Grid { return ret; } template - inline auto operator + (const left &lhs,const Lattice &rhs) -> Lattice + inline auto operator + (const left &lhs,const Lattice &rhs) -> Lattice { - Lattice ret(rhs._grid); + Lattice ret(rhs._grid); #pragma omp parallel for for(int ss=0;ssoSites(); ss++){ ret._odata[ss]=lhs+rhs._odata[ss]; @@ -117,9 +117,9 @@ namespace Grid { return ret; } template - inline auto operator - (const left &lhs,const Lattice &rhs) -> Lattice + inline auto operator - (const left &lhs,const Lattice &rhs) -> Lattice { - Lattice ret(rhs._grid); + Lattice ret(rhs._grid); #pragma omp parallel for for(int ss=0;ssoSites(); ss++){ ret._odata[ss]=lhs-rhs._odata[ss]; @@ -137,9 +137,9 @@ namespace Grid { return ret; } template - inline auto operator + (const Lattice &lhs,const right &rhs) -> Lattice + inline auto operator + (const Lattice &lhs,const right &rhs) -> Lattice { - Lattice ret(lhs._grid); + Lattice ret(lhs._grid); #pragma omp parallel for for(int ss=0;ssoSites(); ss++){ ret._odata[ss]=lhs._odata[ss]+rhs; @@ -147,9 +147,9 @@ namespace Grid { return ret; } template - inline auto operator - (const Lattice &lhs,const right &rhs) -> Lattice + inline auto operator - (const Lattice &lhs,const right &rhs) -> Lattice { - Lattice ret(lhs._grid); + Lattice ret(lhs._grid); #pragma omp parallel for for(int ss=0;ssoSites(); ss++){ ret._odata[ss]=lhs._odata[ss]-rhs; diff --git a/lib/lattice/Grid_lattice_reduction.h b/lib/lattice/Grid_lattice_reduction.h index bec436f6..4491e8ec 100644 --- a/lib/lattice/Grid_lattice_reduction.h +++ b/lib/lattice/Grid_lattice_reduction.h @@ -14,7 +14,7 @@ namespace Grid { typedef typename vobj::scalar_type scalar; typedef typename vobj::vector_type vector; - decltype(innerProduct(arg._odata[0],arg._odata[0])) vnrm=zero; + decltype(innerProduct(arg._odata[0],arg._odata[0])) vnrm; scalar nrm; //FIXME make this loop parallelisable vnrm=zero; @@ -33,10 +33,11 @@ namespace Grid { //->decltype(innerProduct(left._odata[0],right._odata[0])) { typedef typename vobj::scalar_type scalar; - decltype(innerProduct(left._odata[0],right._odata[0])) vnrm=zero; + decltype(innerProduct(left._odata[0],right._odata[0])) vnrm; scalar nrm; //FIXME make this loop parallelisable + vnrm=zero; for(int ss=0;ssoSites(); ss++){ vnrm = vnrm + innerProduct(left._odata[ss],right._odata[ss]); } @@ -94,8 +95,10 @@ template inline void sliceSum(const Lattice &Data,std::vector< int ld=grid->_ldimensions[orthogdim]; int rd=grid->_rdimensions[orthogdim]; + sobj szero; szero=zero; + std::vector > lvSum(rd); // will locally sum vectors first - std::vector lsSum(ld,sobj(zero)); // sum across these down to scalars + std::vector lsSum(ld,szero); // sum across these down to scalars std::vector extracted(Nsimd); // splitting the SIMD result.resize(fd); // And then global sum to return the same vector to every node for IO to file diff --git a/lib/lattice/Grid_lattice_rng.h b/lib/lattice/Grid_lattice_rng.h index 85d8c471..0842c80c 100644 --- a/lib/lattice/Grid_lattice_rng.h +++ b/lib/lattice/Grid_lattice_rng.h @@ -26,6 +26,8 @@ namespace Grid { } }; + + class GridRNGbase { public: @@ -62,6 +64,21 @@ namespace Grid { } + // real scalars are one component + template void fillScalar(scalar &s,distribution &dist) + { + s=dist(_generators[0]); + } + template void fillScalar(ComplexF &s,distribution &dist) + { + s=ComplexF(dist(_generators[0]),dist(_generators[0])); + } + template void fillScalar(ComplexD &s,distribution &dist) + { + s=ComplexD(dist(_generators[0]),dist(_generators[0])); + } + + template inline void fill(sobj &l,distribution &dist){ typedef typename sobj::scalar_type scalar_type; @@ -71,13 +88,60 @@ namespace Grid { scalar_type *buf = (scalar_type *) & l; for(int idx=0;idx inline void fill(ComplexF &l,distribution &dist){ + fillScalar(l,dist); + CartesianCommunicator::BroadcastWorld(0,(void *)&l,sizeof(l)); + } + template inline void fill(ComplexD &l,distribution &dist){ + fillScalar(l,dist); + CartesianCommunicator::BroadcastWorld(0,(void *)&l,sizeof(l)); + } + template inline void fill(RealF &l,distribution &dist){ + fillScalar(l,dist); + CartesianCommunicator::BroadcastWorld(0,(void *)&l,sizeof(l)); + } + template inline void fill(RealD &l,distribution &dist){ + fillScalar(l,dist); + CartesianCommunicator::BroadcastWorld(0,(void *)&l,sizeof(l)); + } + // vector fill + template inline void fill(vComplexF &l,distribution &dist){ + RealF *pointer=(RealF *)&l; + for(int i=0;i<2*vComplexF::Nsimd();i++){ + fillScalar(pointer[i],dist); + } + CartesianCommunicator::BroadcastWorld(0,(void *)&l,sizeof(l)); + } + template inline void fill(vComplexD &l,distribution &dist){ + RealD *pointer=(RealD *)&l; + for(int i=0;i<2*vComplexD::Nsimd();i++){ + fillScalar(pointer[i],dist); + } + CartesianCommunicator::BroadcastWorld(0,(void *)&l,sizeof(l)); + } + template inline void fill(vRealF &l,distribution &dist){ + RealF *pointer=(RealF *)&l; + for(int i=0;i inline void fill(vRealD &l,distribution &dist){ + RealD *pointer=(RealD *)&l; + for(int i=0;i inline void random(GridParallelRNG &rng,Lattice &l){ rng.fill(l,rng._uniform); } diff --git a/lib/math/Grid_math_arith_scalar.h b/lib/math/Grid_math_arith_scalar.h index 2adb2836..69c9a169 100644 --- a/lib/math/Grid_math_arith_scalar.h +++ b/lib/math/Grid_math_arith_scalar.h @@ -11,21 +11,21 @@ namespace Grid { // multiplication by fundamental scalar type template inline iScalar operator * (const iScalar& lhs,const typename iScalar::scalar_type rhs) { - typename iScalar::tensor_reduced srhs(rhs); + typename iScalar::tensor_reduced srhs; srhs=rhs; return lhs*srhs; } template inline iScalar operator * (const typename iScalar::scalar_type lhs,const iScalar& rhs) { return rhs*lhs; } template inline iVector operator * (const iVector& lhs,const typename iScalar::scalar_type rhs) { - typename iVector::tensor_reduced srhs(rhs); + typename iVector::tensor_reduced srhs; srhs=rhs; return lhs*srhs; } template inline iVector operator * (const typename iScalar::scalar_type lhs,const iVector& rhs) { return rhs*lhs; } template inline iMatrix operator * (const iMatrix& lhs,const typename iScalar::scalar_type &rhs) { - typename iMatrix::tensor_reduced srhs(rhs); + typename iMatrix::tensor_reduced srhs; srhs=rhs; return lhs*srhs; } template inline iMatrix operator * (const typename iScalar::scalar_type & lhs,const iMatrix& rhs) { return rhs*lhs; } @@ -35,24 +35,24 @@ template inline iMatrix operator * (const typename iScalar inline iScalar operator * (const iScalar& lhs,double rhs) { - typename iScalar::scalar_type t(rhs); - typename iScalar::tensor_reduced srhs(t); + typename iScalar::scalar_type t; t=rhs; + typename iScalar::tensor_reduced srhs;srhs=t; return lhs*srhs; } template inline iScalar operator * (double lhs,const iScalar& rhs) { return rhs*lhs; } template inline iVector operator * (const iVector& lhs,double rhs) { - typename iScalar::scalar_type t(rhs); - typename iScalar::tensor_reduced srhs(t); + typename iScalar::scalar_type t;t=rhs; + typename iScalar::tensor_reduced srhs;srhs=t; return lhs*srhs; } template inline iVector operator * (double lhs,const iVector& rhs) { return rhs*lhs; } template inline iMatrix operator * (const iMatrix& lhs,double rhs) { - typename iScalar::scalar_type t(rhs); - typename iScalar::tensor_reduced srhs(t); + typename iScalar::scalar_type t;t=rhs; + typename iScalar::tensor_reduced srhs;srhs=t; return lhs*srhs; } template inline iMatrix operator * (double lhs,const iMatrix& rhs) { return rhs*lhs; } @@ -62,24 +62,26 @@ template inline iMatrix operator * (double lhs,const iMatrix //////////////////////////////////////////////////////////////////// template inline iScalar operator * (const iScalar& lhs,ComplexD rhs) { - typename iScalar::scalar_type t(rhs); - typename iScalar::tensor_reduced srhs(t); + typename iScalar::scalar_type t;t=rhs; + typename iScalar::tensor_reduced srhs;srhs=t; + + return lhs*srhs; } template inline iScalar operator * (ComplexD lhs,const iScalar& rhs) { return rhs*lhs; } template inline iVector operator * (const iVector& lhs,ComplexD rhs) { - typename iScalar::scalar_type t(rhs); - typename iScalar::tensor_reduced srhs(t); + typename iScalar::scalar_type t;t=rhs; + typename iScalar::tensor_reduced srhs;srhs=t; return lhs*srhs; } template inline iVector operator * (ComplexD lhs,const iVector& rhs) { return rhs*lhs; } template inline iMatrix operator * (const iMatrix& lhs,ComplexD rhs) { - typename iScalar::scalar_type t(rhs); - typename iScalar::tensor_reduced srhs(t); + typename iScalar::scalar_type t;t=rhs; + typename iScalar::tensor_reduced srhs;srhs=t; return lhs*srhs; } template inline iMatrix operator * (ComplexD lhs,const iMatrix& rhs) { return rhs*lhs; } @@ -89,24 +91,24 @@ template inline iMatrix operator * (ComplexD lhs,const iMatr //////////////////////////////////////////////////////////////////// template inline iScalar operator * (const iScalar& lhs,Integer rhs) { - typename iScalar::scalar_type t(rhs); - typename iScalar::tensor_reduced srhs(t); + typename iScalar::scalar_type t; t=rhs; + typename iScalar::tensor_reduced srhs; srhs=t; return lhs*srhs; } template inline iScalar operator * (Integer lhs,const iScalar& rhs) { return rhs*lhs; } template inline iVector operator * (const iVector& lhs,Integer rhs) { - typename iScalar::scalar_type t(rhs); - typename iScalar::tensor_reduced srhs(t); + typename iScalar::scalar_type t;t=rhs; + typename iScalar::tensor_reduced srhs;srhs=t; return lhs*srhs; } template inline iVector operator * (Integer lhs,const iVector& rhs) { return rhs*lhs; } template inline iMatrix operator * (const iMatrix& lhs,Integer rhs) { - typename iScalar::scalar_type t(rhs); - typename iScalar::tensor_reduced srhs(t); + typename iScalar::scalar_type t;t=rhs; + typename iScalar::tensor_reduced srhs;srhs=t; return lhs*srhs; } template inline iMatrix operator * (Integer lhs,const iMatrix& rhs) { return rhs*lhs; } @@ -118,14 +120,14 @@ template inline iMatrix operator * (Integer lhs,const iMatri /////////////////////////////////////////////////////////////////////////////////////////////// template inline iScalar operator + (const iScalar& lhs,const typename iScalar::scalar_type rhs) { - typename iScalar::tensor_reduced srhs(rhs); + typename iScalar::tensor_reduced srhs; srhs=rhs; return lhs+srhs; } template inline iScalar operator + (const typename iScalar::scalar_type lhs,const iScalar& rhs) { return rhs+lhs; } template inline iMatrix operator + (const iMatrix& lhs,const typename iScalar::scalar_type rhs) { - typename iMatrix::tensor_reduced srhs(rhs); + typename iMatrix::tensor_reduced srhs; srhs=rhs; return lhs+srhs; } template inline iMatrix operator + (const typename iScalar::scalar_type lhs,const iMatrix& rhs) { return rhs+lhs; } @@ -135,16 +137,16 @@ template inline iMatrix operator + (const typename iScalar inline iScalar operator + (const iScalar& lhs,double rhs) { - typename iScalar::scalar_type t(rhs); - typename iScalar::tensor_reduced srhs(t); + typename iScalar::scalar_type t; t=rhs; + typename iScalar::tensor_reduced srhs; srhs=t; return lhs+srhs; } template inline iScalar operator + (double lhs,const iScalar& rhs) { return rhs+lhs; } template inline iMatrix operator + (const iMatrix& lhs,double rhs) { - typename iScalar::scalar_type t(rhs); - typename iScalar::tensor_reduced srhs(t); + typename iScalar::scalar_type t;t=rhs; + typename iScalar::tensor_reduced srhs;srhs=t; return lhs+srhs; } template inline iMatrix operator + (double lhs,const iMatrix& rhs) { return rhs+lhs; } @@ -155,8 +157,8 @@ template inline iMatrix operator + (double lhs,const iMatrix template inline iScalar operator + (const iScalar& lhs,Integer rhs) { - typename iScalar::scalar_type t(rhs); - typename iScalar::tensor_reduced srhs(t); + typename iScalar::scalar_type t; t=rhs; + typename iScalar::tensor_reduced srhs; srhs=t; return lhs+srhs; } @@ -164,8 +166,8 @@ template inline iScalar operator + (Integer lhs,const iScalar& rh template inline iMatrix operator + (const iMatrix& lhs,Integer rhs) { - typename iScalar::scalar_type t(rhs); - typename iScalar::tensor_reduced srhs(t); + typename iScalar::scalar_type t;t=rhs; + typename iScalar::tensor_reduced srhs;srhs=t; return lhs+srhs; } template inline iMatrix operator + (Integer lhs,const iMatrix& rhs) { return rhs+lhs; } @@ -176,23 +178,23 @@ template inline iMatrix operator + (Integer lhs,const iMatri /////////////////////////////////////////////////////////////////////////////////////////////// template inline iScalar operator - (const iScalar& lhs,const typename iScalar::scalar_type rhs) { - typename iScalar::tensor_reduced srhs(rhs); + typename iScalar::tensor_reduced srhs; srhs=rhs; return lhs-srhs; } template inline iScalar operator - (const typename iScalar::scalar_type lhs,const iScalar& rhs) { - typename iScalar::tensor_reduced slhs(lhs); + typename iScalar::tensor_reduced slhs;slhs=lhs; return slhs-rhs; } template inline iMatrix operator - (const iMatrix& lhs,const typename iScalar::scalar_type rhs) { - typename iScalar::tensor_reduced srhs(rhs); + typename iScalar::tensor_reduced srhs; srhs=rhs; return lhs-srhs; } template inline iMatrix operator - (const typename iScalar::scalar_type lhs,const iMatrix& rhs) { - typename iScalar::tensor_reduced slhs(lhs); + typename iScalar::tensor_reduced slhs;slhs=lhs; return slhs-rhs; } @@ -201,27 +203,27 @@ template inline iMatrix operator - (const typename iScalar inline iScalar operator - (const iScalar& lhs,double rhs) { - typename iScalar::scalar_type t(rhs); - typename iScalar::tensor_reduced srhs(t); + typename iScalar::scalar_type t; t=rhs; + typename iScalar::tensor_reduced srhs; srhs=t; return lhs-srhs; } template inline iScalar operator - (double lhs,const iScalar& rhs) { typename iScalar::scalar_type t(lhs); - typename iScalar::tensor_reduced slhs(t); + typename iScalar::tensor_reduced slhs;slhs=t; return slhs-rhs; } template inline iMatrix operator - (const iMatrix& lhs,double rhs) { - typename iScalar::scalar_type t(rhs); - typename iScalar::tensor_reduced srhs(t); + typename iScalar::scalar_type t;t=rhs; + typename iScalar::tensor_reduced srhs;srhs=t; return lhs-srhs; } template inline iMatrix operator - (double lhs,const iMatrix& rhs) { typename iScalar::scalar_type t(lhs); - typename iScalar::tensor_reduced slhs(t); + typename iScalar::tensor_reduced slhs;slhs=t; return slhs-rhs; } @@ -230,26 +232,26 @@ template inline iMatrix operator - (double lhs,const iMatrix //////////////////////////////////////////////////////////////////// template inline iScalar operator - (const iScalar& lhs,Integer rhs) { - typename iScalar::scalar_type t(rhs); - typename iScalar::tensor_reduced srhs(t); + typename iScalar::scalar_type t; t=rhs; + typename iScalar::tensor_reduced srhs; srhs=t; return lhs-srhs; } template inline iScalar operator - (Integer lhs,const iScalar& rhs) { - typename iScalar::scalar_type t(lhs); - typename iScalar::tensor_reduced slhs(t); + typename iScalar::scalar_type t;t=lhs; + typename iScalar::tensor_reduced slhs;slhs=t; return slhs-rhs; } template inline iMatrix operator - (const iMatrix& lhs,Integer rhs) { - typename iScalar::scalar_type t(rhs); - typename iScalar::tensor_reduced srhs(t); + typename iScalar::scalar_type t;t=rhs; + typename iScalar::tensor_reduced srhs;srhs=t; return lhs-srhs; } template inline iMatrix operator - (Integer lhs,const iMatrix& rhs) { - typename iScalar::scalar_type t(lhs); - typename iScalar::tensor_reduced slhs(t); + typename iScalar::scalar_type t;t=lhs; + typename iScalar::tensor_reduced slhs;slhs=t; return slhs-rhs; } diff --git a/lib/math/Grid_math_inner.h b/lib/math/Grid_math_inner.h index 71b5e3b8..1e44032c 100644 --- a/lib/math/Grid_math_inner.h +++ b/lib/math/Grid_math_inner.h @@ -17,7 +17,8 @@ namespace Grid { auto innerProduct (const iVector& lhs,const iVector& rhs) -> iScalar { typedef decltype(innerProduct(lhs._internal[0],rhs._internal[0])) ret_t; - iScalar ret=zero; + iScalar ret; + ret=zero; for(int c1=0;c1& lhs,const iMatrix& rhs) -> iScalar { typedef decltype(innerProduct(lhs._internal[0][0],rhs._internal[0][0])) ret_t; - iScalar ret=zero; + iScalar ret; iScalar tmp; + ret=zero; for(int c1=0;c1 == true" +// because then the standard C++ valarray container eliminates fill overhead on new allocation and +// non-move copying. +// +// However note that doing this eliminates some syntactical sugar such as +// calling the constructor explicitly or implicitly +// +#define TENSOR_IS_POD + template class iScalar { public: @@ -25,16 +35,22 @@ public: // Scalar no action // template using tensor_reduce_level = typename iScalar::tensor_reduce_level >; - iScalar(){}; - - iScalar(scalar_type s) : _internal(s) {};// recurse down and hit the constructor for vector_type - - iScalar(const Zero &z){ *this = zero; }; +#ifndef TENSOR_IS_POD + iScalar(){;}; + iScalar(scalar_type s) : _internal(s) {};// recurse down and hit the constructor for vector_type + iScalar(const Zero &z){ *this = zero; }; +#endif iScalar & operator= (const Zero &hero){ - zeroit(*this); - return *this; + zeroit(*this); + return *this; } + iScalar & operator= (const scalar_type s){ + _internal=s; + return *this; + } + + friend void zeroit(iScalar &that){ zeroit(that._internal); } @@ -114,8 +130,10 @@ public: enum { TensorLevel = GridTypeMapper::TensorLevel + 1}; - iVector(const Zero &z){ *this = zero; }; - iVector() {};// Empty constructure +#ifndef TENSOR_IS_POD + iVector(const Zero &z){ *this = zero; }; + iVector() {};// Empty constructure +#endif iVector & operator= (const Zero &hero){ zeroit(*this); @@ -185,8 +203,11 @@ public: enum { TensorLevel = GridTypeMapper::TensorLevel + 1}; +#ifndef TENSOR_IS_POD iMatrix(const Zero &z){ *this = zero; }; iMatrix() {}; +#endif + iMatrix & operator= (const Zero &hero){ zeroit(*this); return *this; diff --git a/lib/qcd/Grid_qcd.h b/lib/qcd/Grid_qcd.h index f331bd88..0bf68632 100644 --- a/lib/qcd/Grid_qcd.h +++ b/lib/qcd/Grid_qcd.h @@ -8,6 +8,7 @@ namespace QCD { static const int Ns=4; static const int Nd=4; static const int Nhs=2; // half spinor + static const int Nds=8; // double stored gauge field static const int CbRed =0; static const int CbBlack=1; @@ -28,79 +29,216 @@ namespace QCD { // // That probably makes for GridRedBlack4dCartesian grid. - template using iSinglet = iScalar > >; - template using iSpinMatrix = iScalar, Ns> >; - template using iSpinColourMatrix = iScalar, Ns> >; - template using iColourMatrix = iScalar > > ; - template using iLorentzColourMatrix = iVector >, Nd > ; + // s,sp,c,spc,lc + template using iSinglet = iScalar > >; + template using iSpinMatrix = iScalar, Ns> >; + template using iColourMatrix = iScalar > > ; + template using iSpinColourMatrix = iScalar, Ns> >; + template using iLorentzColourMatrix = iVector >, Nd > ; + template using iDoubleStoredColourMatrix = iVector >, Nds > ; + template using iSpinVector = iScalar, Ns> >; + template using iColourVector = iScalar > >; + template using iSpinColourVector = iScalar, Ns> >; + template using iHalfSpinVector = iScalar, Nhs> >; + template using iHalfSpinColourVector = iScalar, Nhs> >; + // Spin matrix + typedef iSpinMatrix SpinMatrix; + typedef iSpinMatrix SpinMatrixF; + typedef iSpinMatrix SpinMatrixD; - template using iSpinVector = iScalar, Ns> >; - template using iColourVector = iScalar > >; - template using iSpinColourVector = iScalar, Ns> >; + typedef iSpinMatrix vSpinMatrix; + typedef iSpinMatrix vSpinMatrixF; + typedef iSpinMatrix vSpinMatrixD; - template using iHalfSpinVector = iScalar, Nhs> >; - template using iHalfSpinColourVector = iScalar, Nhs> >; + // Colour Matrix + typedef iColourMatrix ColourMatrix; + typedef iColourMatrix ColourMatrixF; + typedef iColourMatrix ColourMatrixD; - typedef iSpinMatrix SpinMatrix; - typedef iColourMatrix ColourMatrix; - typedef iSpinColourMatrix SpinColourMatrix; - typedef iLorentzColourMatrix LorentzColourMatrix; + typedef iColourMatrix vColourMatrix; + typedef iColourMatrix vColourMatrixF; + typedef iColourMatrix vColourMatrixD; + + // SpinColour matrix + typedef iSpinColourMatrix SpinColourMatrix; + typedef iSpinColourMatrix SpinColourMatrixF; + typedef iSpinColourMatrix SpinColourMatrixD; + + typedef iSpinColourMatrix vSpinColourMatrix; + typedef iSpinColourMatrix vSpinColourMatrixF; + typedef iSpinColourMatrix vSpinColourMatrixD; + + // LorentzColour + typedef iLorentzColourMatrix LorentzColourMatrix; typedef iLorentzColourMatrix LorentzColourMatrixF; typedef iLorentzColourMatrix LorentzColourMatrixD; - typedef iSpinVector SpinVector; - typedef iColourVector ColourVector; - typedef iSpinColourVector SpinColourVector; - typedef iHalfSpinVector HalfSpinVector; - typedef iHalfSpinColourVector HalfSpinColourVector; - - - typedef iSpinMatrix vSpinMatrix; - typedef iColourMatrix vColourMatrix; - typedef iSpinColourMatrix vSpinColourMatrix; typedef iLorentzColourMatrix vLorentzColourMatrix; - + typedef iLorentzColourMatrix vLorentzColourMatrixF; + typedef iLorentzColourMatrix vLorentzColourMatrixD; + + // DoubleStored gauge field + typedef iDoubleStoredColourMatrix DoubleStoredColourMatrix; + typedef iDoubleStoredColourMatrix DoubleStoredColourMatrixF; + typedef iDoubleStoredColourMatrix DoubleStoredColourMatrixD; + + typedef iDoubleStoredColourMatrix vDoubleStoredColourMatrix; + typedef iDoubleStoredColourMatrix vDoubleStoredColourMatrixF; + typedef iDoubleStoredColourMatrix vDoubleStoredColourMatrixD; + + // Spin vector + typedef iSpinVector SpinVector; + typedef iSpinVector SpinVectorF; + typedef iSpinVector SpinVectorD; + typedef iSpinVector vSpinVector; + typedef iSpinVector vSpinVectorF; + typedef iSpinVector vSpinVectorD; + + // Colour vector + typedef iColourVector ColourVector; + typedef iColourVector ColourVectorF; + typedef iColourVector ColourVectorD; + typedef iColourVector vColourVector; + typedef iColourVector vColourVectorF; + typedef iColourVector vColourVectorD; + + // SpinColourVector + typedef iSpinColourVector SpinColourVector; + typedef iSpinColourVector SpinColourVectorF; + typedef iSpinColourVector SpinColourVectorD; + typedef iSpinColourVector vSpinColourVector; + typedef iSpinColourVector vSpinColourVectorF; + typedef iSpinColourVector vSpinColourVectorD; + + // HalfSpin vector + typedef iHalfSpinVector HalfSpinVector; + typedef iHalfSpinVector HalfSpinVectorF; + typedef iHalfSpinVector HalfSpinVectorD; + typedef iHalfSpinVector vHalfSpinVector; - typedef iHalfSpinColourVector vHalfSpinColourVector; + typedef iHalfSpinVector vHalfSpinVectorF; + typedef iHalfSpinVector vHalfSpinVectorD; + + // HalfSpinColour vector + typedef iHalfSpinColourVector HalfSpinColourVector; + typedef iHalfSpinColourVector HalfSpinColourVectorF; + typedef iHalfSpinColourVector HalfSpinColourVectorD; + typedef iHalfSpinColourVector vHalfSpinColourVector; + typedef iHalfSpinColourVector vHalfSpinColourVectorF; + typedef iHalfSpinColourVector vHalfSpinColourVectorD; + + // singlets typedef iSinglet TComplex; // FIXME This is painful. Tensor singlet complex type. - typedef iSinglet vTComplex; // what if we don't know the tensor structure + typedef iSinglet TComplexF; // FIXME This is painful. Tensor singlet complex type. + typedef iSinglet TComplexD; // FIXME This is painful. Tensor singlet complex type. + + typedef iSinglet vTComplex ; // what if we don't know the tensor structure + typedef iSinglet vTComplexF; // what if we don't know the tensor structure + typedef iSinglet vTComplexD; // what if we don't know the tensor structure + typedef iSinglet TReal; // Shouldn't need these; can I make it work without? + typedef iSinglet TRealF; // Shouldn't need these; can I make it work without? + typedef iSinglet TRealD; // Shouldn't need these; can I make it work without? + typedef iSinglet vTReal; - typedef iSinglet vTInteger; + typedef iSinglet vTRealF; + typedef iSinglet vTRealD; + + typedef iSinglet vTInteger; typedef iSinglet TInteger; - typedef Lattice LatticeReal; - typedef Lattice LatticeComplex; - typedef Lattice LatticeInteger; // Predicates for "where" - - typedef Lattice LatticeColourMatrix; - typedef Lattice LatticeSpinMatrix; - typedef Lattice LatticeSpinColourMatrix; - typedef Lattice LatticeSpinVector; - typedef Lattice LatticeColourVector; - typedef Lattice LatticeSpinColourVector; - typedef Lattice LatticeHalfSpinVector; - typedef Lattice LatticeHalfSpinColourVector; + // Lattices of these + typedef Lattice LatticeColourMatrix; + typedef Lattice LatticeColourMatrixF; + typedef Lattice LatticeColourMatrixD; + + typedef Lattice LatticeSpinMatrix; + typedef Lattice LatticeSpinMatrixF; + typedef Lattice LatticeSpinMatrixD; + + typedef Lattice LatticeSpinColourMatrix; + typedef Lattice LatticeSpinColourMatrixF; + typedef Lattice LatticeSpinColourMatrixD; + + + typedef Lattice LatticeLorentzColourMatrix; + typedef Lattice LatticeLorentzColourMatrixF; + typedef Lattice LatticeLorentzColourMatrixD; + + // DoubleStored gauge field + typedef Lattice LatticeDoubleStoredColourMatrix; + typedef Lattice LatticeDoubleStoredColourMatrixF; + typedef Lattice LatticeDoubleStoredColourMatrixD; + + typedef Lattice LatticeSpinVector; + typedef Lattice LatticeSpinVectorF; + typedef Lattice LatticeSpinVectorD; + + typedef Lattice LatticeColourVector; + typedef Lattice LatticeColourVectorF; + typedef Lattice LatticeColourVectorD; + + typedef Lattice LatticeSpinColourVector; + typedef Lattice LatticeSpinColourVectorF; + typedef Lattice LatticeSpinColourVectorD; + + typedef Lattice LatticeHalfSpinVector; + typedef Lattice LatticeHalfSpinVectorF; + typedef Lattice LatticeHalfSpinVectorD; + + typedef Lattice LatticeHalfSpinColourVector; + typedef Lattice LatticeHalfSpinColourVectorF; + typedef Lattice LatticeHalfSpinColourVectorD; + + typedef Lattice LatticeReal; + typedef Lattice LatticeRealF; + typedef Lattice LatticeRealD; + + typedef Lattice LatticeComplex; + typedef Lattice LatticeComplexF; + typedef Lattice LatticeComplexD; + + typedef Lattice LatticeInteger; // Predicates for "where" + /////////////////////////////////////////// // Physical names for things /////////////////////////////////////////// - typedef Lattice LatticeHalfFermion; - typedef Lattice LatticeFermion; + typedef LatticeHalfSpinColourVector LatticeHalfFermion; + typedef LatticeHalfSpinColourVectorF LatticeHalfFermionF; + typedef LatticeHalfSpinColourVectorF LatticeHalfFermionD; - typedef Lattice LatticePropagator; - typedef Lattice LatticeGaugeField; + typedef LatticeSpinColourVector LatticeFermion; + typedef LatticeSpinColourVectorF LatticeFermionF; + typedef LatticeSpinColourVectorD LatticeFermionD; + + typedef LatticeSpinColourMatrix LatticePropagator; + typedef LatticeSpinColourMatrixF LatticePropagatorF; + typedef LatticeSpinColourMatrixD LatticePropagatorD; + + typedef LatticeLorentzColourMatrix LatticeGaugeField; + typedef LatticeLorentzColourMatrixF LatticeGaugeFieldF; + typedef LatticeLorentzColourMatrixD LatticeGaugeFieldD; + + typedef LatticeDoubleStoredColourMatrix LatticeDoubledGaugeField; + typedef LatticeDoubleStoredColourMatrixF LatticeDoubledGaugeFieldF; + typedef LatticeDoubleStoredColourMatrixD LatticeDoubledGaugeFieldD; // Uhgg... typing this hurt ;) // (my keyboard got burning hot when I typed this, must be the anti-Fermion) - typedef Lattice LatticeStaggeredFermion; - typedef Lattice LatticeStaggeredPropagator; + typedef Lattice LatticeStaggeredFermion; + typedef Lattice LatticeStaggeredFermionF; + typedef Lattice LatticeStaggeredFermionD; + + typedef Lattice LatticeStaggeredPropagator; + typedef Lattice LatticeStaggeredPropagatorF; + typedef Lattice LatticeStaggeredPropagatorD; ////////////////////////////////////////////////////////////////////////////// // Peek and Poke named after physics attributes @@ -157,11 +295,14 @@ namespace QCD { return peekIndex(rhs,i,j); } + // FIXME transpose Colour, transpose Spin, traceColour traceSpin + } //namespace QCD } // Grid #include #include //#include +#include #endif diff --git a/lib/qcd/Grid_qcd_dirac.h b/lib/qcd/Grid_qcd_dirac.h index 03dc9cc5..2623e683 100644 --- a/lib/qcd/Grid_qcd_dirac.h +++ b/lib/qcd/Grid_qcd_dirac.h @@ -17,8 +17,14 @@ namespace QCD { GammaZ, GammaT, Gamma5, - // GammaXGamma5, - // GammaYGamma5, + MinusIdentity, + MinusGammaX, + MinusGammaY, + MinusGammaZ, + MinusGammaT, + MinusGamma5 + // GammaXGamma5, // Rest are composite (willing to take hit for two calls sequentially) + // GammaYGamma5, // as they are less commonly used. // GammaZGamma5, // GammaTGamma5, // SigmaXY, @@ -27,12 +33,6 @@ namespace QCD { // SigmaXT, // SigmaYT, // SigmaZT, - MinusIdentity, - MinusGammaX, - MinusGammaY, - MinusGammaZ, - MinusGammaT, - MinusGamma5 // MinusGammaXGamma5, easiest to form by composition // MinusGammaYGamma5, as performance is not critical for these // MinusGammaZGamma5, @@ -54,7 +54,6 @@ namespace QCD { }; - /* Gx * 0 0 0 i * 0 0 i 0 diff --git a/lib/qcd/Grid_qcd_wilson_dop.cc b/lib/qcd/Grid_qcd_wilson_dop.cc index 73b19c12..aba21417 100644 --- a/lib/qcd/Grid_qcd_wilson_dop.cc +++ b/lib/qcd/Grid_qcd_wilson_dop.cc @@ -1,157 +1,220 @@ -#ifnfdef GRID_QCD_WILSON_DOP_H -#define GRID_QCD_WILSON_DOP_H #include namespace Grid { namespace QCD { - const std::vector WilsonMatrix::directions ({0,1,2,3, 0, 1, 2, 3,0}); const std::vector WilsonMatrix::displacements({1,1,1,1,-1,-1,-1,-1,0}); // Should be in header? -static const int WilsonMatrix::Xp = 0; -static const int WilsonMatrix::Yp = 1; -static const int WilsonMatrix::Zp = 2; -static const int WilsonMatrix::Tp = 3; -static const int WilsonMatrix::Xm = 4; -static const int WilsonMatrix::Ym = 5; -static const int WilsonMatrix::Zm = 6; -static const int WilsonMatrix::Tm = 7; -static const int WilsonMatrix::X0 = 8; -static const int WilsonMatrix::npoint=9; +const int WilsonMatrix::Xp = 0; +const int WilsonMatrix::Yp = 1; +const int WilsonMatrix::Zp = 2; +const int WilsonMatrix::Tp = 3; +const int WilsonMatrix::Xm = 4; +const int WilsonMatrix::Ym = 5; +const int WilsonMatrix::Zm = 6; +const int WilsonMatrix::Tm = 7; + //const int WilsonMatrix::X0 = 8; + class WilsonCompressor { + public: + int mu; + + void Point(int p) { mu=p;}; + vHalfSpinColourVector operator () (vSpinColourVector &in) + { + vHalfSpinColourVector ret; + switch(mu) { + case WilsonMatrix::Xp: + spProjXp(ret,in); + break; + case WilsonMatrix::Yp: + spProjYp(ret,in); + break; + case WilsonMatrix::Zp: + spProjZp(ret,in); + break; + case WilsonMatrix::Tp: + spProjTp(ret,in); + break; + case WilsonMatrix::Xm: + spProjXm(ret,in); + break; + case WilsonMatrix::Ym: + spProjYm(ret,in); + break; + case WilsonMatrix::Zm: + spProjZm(ret,in); + break; + case WilsonMatrix::Tm: + spProjTm(ret,in); + break; + default: + assert(0); + break; + } + return ret; + } + }; -WilsonMatrix::WilsonMatrix(LatticeGaugeField &_Umu,int _mass) - : Stencil((&Umu._grid,npoint,0,directions,displacements), + WilsonMatrix::WilsonMatrix(LatticeGaugeField &_Umu,double _mass) + : Stencil(Umu._grid,npoint,0,directions,displacements), mass(_mass), - Umu(_Umu) + Umu(_Umu._grid) { // Allocate the required comms buffer grid = _Umu._grid; comm_buf.resize(Stencil._unified_buffer_size); + DoubleStore(Umu,_Umu); } + +void WilsonMatrix::DoubleStore(LatticeDoubledGaugeField &Uds,const LatticeGaugeField &Umu) +{ + LatticeColourMatrix U(grid); + + for(int mu=0;mu(Umu,mu); + pokeIndex(Uds,U,mu); + U = adj(Cshift(U,mu,-1)); + pokeIndex(Uds,U,mu+4); + } +} + void WilsonMatrix::multiply(const LatticeFermion &in, LatticeFermion &out) { - + Dhop(in,out); + return; } + void WilsonMatrix::Dhop(const LatticeFermion &in, LatticeFermion &out) { - Stencil.HaloExchange(in,comm_buf); + // Stencil.HaloExchange(in,comm_buf); - for(int ss=0;ss<_grid->oSites();ss++){ + for(int ss=0;ssoSites();ss++){ int offset,local; vSpinColourVector result; - vHalfSpinColourVector UChi; - + vHalfSpinColourVector chi; + vHalfSpinColourVector Uchi; + vHalfSpinColourVector *chi_p; // Xp offset = Stencil._offsets [Xp][ss]; local = Stencil._is_local[Xp][ss]; - if ( local ) { - Uchi = U[]*spProjXp(in._odata[offset]); - } else { - Uchi = U[]*comm_buf._odata[offset] - } - result = ReconXp(Uchi); + chi_p = &comm_buf[offset]; + if ( local ) { + spProjXp(chi,in._odata[offset]); + chi_p = χ + } + mult(&(Uchi()),&(Umu._odata[ss](Xp)),&(*chi_p)()); + spReconXp(result,Uchi); + +#if 0 // Yp offset = Stencil._offsets [Yp][ss]; local = Stencil._is_local[Yp][ss]; + chi_p = &comm_buf[offset]; if ( local ) { - Uchi = U[]*spProjYp(in._odata[offset]); - } else { - Uchi = U[]*comm_buf._odata[offset] - } - result+= ReconYp(Uchi); + spProjYp(chi,in._odata[offset]); + chi_p = χ + } + mult(&(Uchi()),&(Umu._odata[ss](Yp)),&(*chi_p)()); + accumReconYp(result,Uchi); // Zp offset = Stencil._offsets [Zp][ss]; local = Stencil._is_local[Zp][ss]; + chi_p = &comm_buf[offset]; if ( local ) { - Uchi = U[]*spProjZp(in._odata[offset]); - } else { - Uchi = U[]*comm_buf._odata[offset] - } - result+= ReconZp(Uchi); + spProjZp(chi,in._odata[offset]); + chi_p = χ + } + mult(&(Uchi()),&(Umu._odata[ss](Zp)),&(*chi_p)() ); + accumReconZp(result,Uchi); // Tp offset = Stencil._offsets [Tp][ss]; local = Stencil._is_local[Tp][ss]; + chi_p = &comm_buf[offset]; if ( local ) { - Uchi = U[]*spProjTp(in._odata[offset]); - } else { - Uchi = U[]*comm_buf._odata[offset] - } - result+= ReconTp(Uchi); + spProjTp(chi,in._odata[offset]); + chi_p = χ + } + mult(&(Uchi()),&(Umu._odata[ss](Tp)),&(*chi_p)()); + accumReconTp(result,Uchi); // Xm offset = Stencil._offsets [Xm][ss]; local = Stencil._is_local[Xm][ss]; + chi_p = &comm_buf[offset]; if ( local ) { - Uchi = U[]*spProjXm(in._odata[offset]); - } else { - Uchi = U[]*comm_buf._odata[offset] - } - result+= ReconXm(Uchi); + spProjXm(chi,in._odata[offset]); + chi_p = χ + } + mult(&(Uchi()),&(Umu._odata[ss](Xm)),&(*chi_p)()); + accumReconXm(result,Uchi); // Ym offset = Stencil._offsets [Ym][ss]; local = Stencil._is_local[Ym][ss]; + chi_p = &comm_buf[offset]; if ( local ) { - Uchi = U[]*spProjYm(in._odata[offset]); - } else { - Uchi = U[]*comm_buf._odata[offset] - } - result+= ReconYm(Uchi); + spProjYm(chi,in._odata[offset]); + chi_p = χ + } + mult(&(Uchi()),&(Umu._odata[ss](Ym)),&(*chi_p)()); + accumReconYm(result,Uchi); // Zm offset = Stencil._offsets [Zm][ss]; local = Stencil._is_local[Zm][ss]; + chi_p = &comm_buf[offset]; if ( local ) { - Uchi = U[]*spProjZm(in._odata[offset]); - } else { - Uchi = U[]*comm_buf._odata[offset] - } - result+= ReconZm(Uchi); + spProjZm(chi,in._odata[offset]); + chi_p = χ + } + mult(&(Uchi()),&(Umu._odata[ss](Zm)),&(*chi_p)()); + accumReconZm(result,Uchi); // Tm offset = Stencil._offsets [Tm][ss]; local = Stencil._is_local[Tm][ss]; + chi_p = &comm_buf[offset]; if ( local ) { - Uchi = U[]*spProjTm(in._odata[offset]); - } else { - Uchi = U[]*comm_buf._odata[offset] - } - result+= ReconTm(Uchi); - + spProjTm(chi,in._odata[offset]); + chi_p = χ + } + mult(&(Uchi()),&(Umu._odata[ss](Tm)),&(*chi_p)()); + accumReconTm(result,Uchi); +#endif out._odata[ss] = result; } } void WilsonMatrix::Dw(const LatticeFermion &in, LatticeFermion &out) { - + return; } void WilsonMatrix::MpcDag (const LatticeFermion &in, LatticeFermion &out) { - + return; } void WilsonMatrix::Mpc (const LatticeFermion &in, LatticeFermion &out) { - + return; } void WilsonMatrix::MpcDagMpc(const LatticeFermion &in, LatticeFermion &out) { - + return; } void WilsonMatrix::MDagM (const LatticeFermion &in, LatticeFermion &out) { - + return; } }} -#endif + diff --git a/lib/qcd/Grid_qcd_wilson_dop.h b/lib/qcd/Grid_qcd_wilson_dop.h index f6775842..dc3c80b9 100644 --- a/lib/qcd/Grid_qcd_wilson_dop.h +++ b/lib/qcd/Grid_qcd_wilson_dop.h @@ -1,4 +1,4 @@ -#ifnfdef GRID_QCD_WILSON_DOP_H +#ifndef GRID_QCD_WILSON_DOP_H #define GRID_QCD_WILSON_DOP_H #include @@ -21,21 +21,23 @@ namespace Grid { GridBase *grid; // Copy of the gauge field - LatticeGaugeField Umu; + LatticeDoubledGaugeField Umu; //Defines the stencil CartesianStencil Stencil; static const int npoint=9; static const std::vector directions ; static const std::vector displacements; - static const int Xp,Xm,Yp,Ym,Zp,Zm,Tp,Tm; // Comms buffer - std::vector > comm_buf; + std::vector > comm_buf; // Constructor - WilsonMatrix(LatticeGaugeField &Umu,int mass); + WilsonMatrix(LatticeGaugeField &Umu,double mass); + + // DoubleStore + void DoubleStore(LatticeDoubledGaugeField &Uds,const LatticeGaugeField &Umu); // override multiply void multiply(const LatticeFermion &in, LatticeFermion &out); diff --git a/lib/simd/Grid_vComplexD.h b/lib/simd/Grid_vComplexD.h index 402d303c..76295fdf 100644 --- a/lib/simd/Grid_vComplexD.h +++ b/lib/simd/Grid_vComplexD.h @@ -251,13 +251,13 @@ friend inline void vstore(const vComplexD &ret, ComplexD *a){ friend inline vComplexD conj(const vComplexD &in){ vComplexD ret ; vzero(ret); #if defined (AVX1)|| defined (AVX2) - // addsubps 0, inv=>0+in.v[3] 0-in.v[2], 0+in.v[1], 0-in.v[0], ... - // __m256d tmp = _mm256_addsub_pd(ret.v,_mm256_shuffle_pd(in.v,in.v,0x5)); - // ret.v=_mm256_shuffle_pd(tmp,tmp,0x5); - ret.v = _mm256_addsub_pd(ret.v,in.v); + // addsubps 0, inv=>0+in.v[3] 0-in.v[2], 0+in.v[1], 0-in.v[0], ... + zvec tmp = _mm256_addsub_pd(ret.v,_mm256_shuffle_pd(in.v,in.v,0x5)); + ret.v =_mm256_shuffle_pd(tmp,tmp,0x5); #endif #ifdef SSE4 - ret.v = _mm_addsub_pd(ret.v,in.v); + zvec tmp = _mm_addsub_pd(ret.v,_mm_shuffle_pd(in.v,in.v,0x1)); + ret.v = _mm_shuffle_pd(tmp,tmp,0x1); #endif #ifdef AVX512 ret.v = _mm512_mask_sub_pd(in.v, 0xaaaa,ret.v, in.v); @@ -268,48 +268,41 @@ friend inline void vstore(const vComplexD &ret, ComplexD *a){ return ret; } - friend inline vComplexD timesI(const vComplexD &in){ + friend inline vComplexD timesMinusI(const vComplexD &in){ vComplexD ret; vzero(ret); + vComplexD tmp; #if defined (AVX1)|| defined (AVX2) - cvec tmp =_mm256_addsub_ps(ret.v,in.v); // r,-i - /* - IF IMM0[0] = 0 - THEN DEST[63:0]=SRC1[63:0] ELSE DEST[63:0]=SRC1[127:64] FI; - IF IMM0[1] = 0 - THEN DEST[127:64]=SRC2[63:0] ELSE DEST[127:64]=SRC2[127:64] FI; - IF IMM0[2] = 0 - THEN DEST[191:128]=SRC1[191:128] ELSE DEST[191:128]=SRC1[255:192] FI; - IF IMM0[3] = 0 - THEN DEST[255:192]=SRC2[191:128] ELSE DEST[255:192]=SRC2[255:192] FI; - */ - ret.v =_mm256_shuffle_ps(tmp,tmp,0x5); + tmp.v =_mm256_addsub_pd(ret.v,in.v); // r,-i + ret.v =_mm256_shuffle_pd(tmp.v,tmp.v,0x5); #endif #ifdef SSE4 - cvec tmp =_mm_addsub_ps(ret.v,in.v); // r,-i - ret.v =_mm_shuffle_ps(tmp,tmp,0x5); + tmp.v =_mm_addsub_pd(ret.v,in.v); // r,-i + ret.v =_mm_shuffle_pd(tmp.v,tmp.v,0x1); #endif #ifdef AVX512 - ret.v = _mm512_mask_sub_ps(in.v,0xaaaa,ret.v,in.v); // real -imag - ret.v = _mm512_swizzle_ps(ret.v, _MM_SWIZ_REG_CDAB);// OK + ret.v = _mm512_mask_sub_pd(in.v,0xaaaa,ret.v,in.v); // real -imag + ret.v = _mm512_swizzle_pd(ret.v, _MM_SWIZ_REG_CDAB);// OK #endif #ifdef QPX assert(0); #endif return ret; } - friend inline vComplexD timesMinusI(const vComplexD &in){ + + friend inline vComplexD timesI(const vComplexD &in){ vComplexD ret; vzero(ret); + vComplexD tmp; #if defined (AVX1)|| defined (AVX2) - cvec tmp =_mm256_shuffle_ps(in.v,in.v,0x5); - ret.v =_mm256_addsub_ps(ret.v,tmp); // i,-r + tmp.v =_mm256_shuffle_pd(in.v,in.v,0x5); + ret.v =_mm256_addsub_pd(ret.v,tmp.v); // i,-r #endif #ifdef SSE4 - cvec tmp =_mm_shuffle_ps(in.v,in.v,0x5); - ret.v =_mm_addsub_ps(ret.v,tmp); // r,-i + tmp.v =_mm_shuffle_pd(in.v,in.v,0x1); + ret.v =_mm_addsub_pd(ret.v,tmp.v); // r,-i #endif #ifdef AVX512 - cvec tmp = _mm512_swizzle_ps(in.v, _MM_SWIZ_REG_CDAB);// OK - ret.v = _mm512_mask_sub_ps(tmp,0xaaaa,ret.v,tmp); // real -imag + tmp.v = _mm512_swizzle_pd(in.v, _MM_SWIZ_REG_CDAB);// OK + ret.v = _mm512_mask_sub_pd(tmp.v,0xaaaa,ret.v,tmp.v); // real -imag #endif #ifdef QPX assert(0); diff --git a/lib/simd/Grid_vComplexF.h b/lib/simd/Grid_vComplexF.h index 4c31ad04..fb9c5a67 100644 --- a/lib/simd/Grid_vComplexF.h +++ b/lib/simd/Grid_vComplexF.h @@ -214,10 +214,10 @@ friend inline void vstore(const vComplexF &ret, ComplexF *a){ { #ifdef SSE4 union { - __m128 v1; // SSE 4 x float vector + cvec v1; // SSE 4 x float vector float f[4]; // scalar array of 4 floats } u128; - u128.v1= _mm_add_ps(v, _mm_shuffle_ps(v, v, 0b01001110)); // FIXME Prefer to use _MM_SHUFFLE macros + u128.v1= _mm_add_ps(in.v, _mm_shuffle_ps(in.v,in.v, 0b01001110)); // FIXME Prefer to use _MM_SHUFFLE macros return ComplexF(u128.f[0], u128.f[1]); #endif #ifdef AVX1 @@ -329,13 +329,15 @@ friend inline void vstore(const vComplexF &ret, ComplexF *a){ friend inline vComplexF conj(const vComplexF &in){ vComplexF ret ; vzero(ret); #if defined (AVX1)|| defined (AVX2) - // cvec tmp; - // tmp = _mm256_addsub_ps(ret.v,_mm256_shuffle_ps(in.v,in.v,_MM_SHUFFLE(2,3,0,1))); // ymm1 <- br,bi - // ret.v=_mm256_shuffle_ps(tmp,tmp,_MM_SHUFFLE(2,3,0,1)); - ret.v = _mm256_addsub_ps(ret.v,in.v); + cvec tmp; + tmp = _mm256_addsub_ps(ret.v,_mm256_shuffle_ps(in.v,in.v,_MM_SHUFFLE(2,3,0,1))); // ymm1 <- br,bi + ret.v=_mm256_shuffle_ps(tmp,tmp,_MM_SHUFFLE(2,3,0,1)); + #endif #ifdef SSE4 - ret.v = _mm_addsub_ps(ret.v,in.v); + cvec tmp; + tmp = _mm_addsub_ps(ret.v,_mm_shuffle_ps(in.v,in.v,_MM_SHUFFLE(2,3,0,1))); // ymm1 <- br,bi + ret.v=_mm_shuffle_ps(tmp,tmp,_MM_SHUFFLE(2,3,0,1)); #endif #ifdef AVX512 ret.v = _mm512_mask_sub_ps(in.v,0xaaaa,ret.v,in.v); // Zero out 0+real 0-imag @@ -345,15 +347,16 @@ friend inline void vstore(const vComplexF &ret, ComplexF *a){ #endif return ret; } - friend inline vComplexF timesI(const vComplexF &in){ - vComplexF ret; vzero(ret); + friend inline vComplexF timesMinusI(const vComplexF &in){ + vComplexF ret; + vzero(ret); #if defined (AVX1)|| defined (AVX2) cvec tmp =_mm256_addsub_ps(ret.v,in.v); // r,-i - ret.v = _mm256_shuffle_ps(tmp,tmp,0x5); + ret.v = _mm256_shuffle_ps(tmp,tmp,_MM_SHUFFLE(2,3,0,1)); //-i,r #endif #ifdef SSE4 cvec tmp =_mm_addsub_ps(ret.v,in.v); // r,-i - ret.v = _mm_shuffle_ps(tmp,tmp,0x5); + ret.v = _mm_shuffle_ps(tmp,tmp,_MM_SHUFFLE(2,3,0,1)); #endif #ifdef AVX512 ret.v = _mm512_mask_sub_ps(in.v,0xaaaa,ret.v,in.v); // real -imag @@ -364,14 +367,14 @@ friend inline void vstore(const vComplexF &ret, ComplexF *a){ #endif return ret; } - friend inline vComplexF timesMinusI(const vComplexF &in){ + friend inline vComplexF timesI(const vComplexF &in){ vComplexF ret; vzero(ret); #if defined (AVX1)|| defined (AVX2) - cvec tmp =_mm256_shuffle_ps(in.v,in.v,0x5); - ret.v = _mm256_addsub_ps(ret.v,tmp); // i,-r + cvec tmp =_mm256_shuffle_ps(in.v,in.v,_MM_SHUFFLE(2,3,0,1));//i,r + ret.v =_mm256_addsub_ps(ret.v,tmp); //i,-r #endif #ifdef SSE4 - cvec tmp =_mm_shuffle_ps(in.v,in.v,0x5); + cvec tmp =_mm_shuffle_ps(in.v,in.v,_MM_SHUFFLE(2,3,0,1)); ret.v = _mm_addsub_ps(ret.v,tmp); // r,-i #endif #ifdef AVX512 @@ -443,5 +446,8 @@ friend inline void vstore(const vComplexF &ret, ComplexF *a){ inline vComplexF trace(const vComplexF &arg){ return arg; } + + } + #endif diff --git a/lib/simd/Grid_vRealD.h b/lib/simd/Grid_vRealD.h index ab848916..f6d68963 100644 --- a/lib/simd/Grid_vRealD.h +++ b/lib/simd/Grid_vRealD.h @@ -146,7 +146,7 @@ namespace Grid { ret.v = _mm256_set_pd(a[3],a[2],a[1],a[0]); #endif #ifdef SSE4 - ret.v = _mm_set_pd(a[0],a[1]); + ret.v = _mm_set_pd(a[1],a[0]); #endif #ifdef AVX512 ret.v = _mm512_set_pd(a[7],a[6],a[5],a[4],a[3],a[2],a[1],a[0]); @@ -186,6 +186,15 @@ namespace Grid { friend inline RealD Reduce(const vRealD & in) { +#if defined (SSE4) + // FIXME Hack + const RealD * ptr =(const RealD *) ∈ + RealD ret = 0; + for(int i=0;i_simd_layout[dimension]; int comm_dim = _grid->_processors[dimension] >1 ; - assert(simd_layout==1); + // assert(simd_layout==1); // Why? assert(comm_dim==1); + shift = (shift + fd) %fd; assert(shift>=0); assert(shift +struct scal { + d internal; +}; + int main (int argc, char ** argv) { @@ -22,22 +27,33 @@ int main (int argc, char ** argv) GridSerialRNG sRNG; sRNG.SeedRandomDevice(); - SpinMatrix ident=zero; + SpinMatrix ident; ident=zero; SpinMatrix rnd ; random(sRNG,rnd); - SpinMatrix ll=zero; - SpinMatrix rr=zero; + SpinMatrix ll; ll=zero; + SpinMatrix rr; rr=zero; SpinMatrix result; SpinVector lv; random(sRNG,lv); SpinVector rv; random(sRNG,rv); + std::cout << " Is pod " << std::is_pod::value << std::endl; + std::cout << " Is pod double " << std::is_pod::value << std::endl; + std::cout << " Is pod ComplexF " << std::is_pod::value << std::endl; + std::cout << " Is pod scal " << std::is_pod >::value << std::endl; + std::cout << " Is pod Scalar " << std::is_pod >::value << std::endl; + std::cout << " Is pod Scalar " << std::is_pod >::value << std::endl; + std::cout << " Is pod Scalar " << std::is_pod >::value << std::endl; + std::cout << " Is pod Scalar " << std::is_pod >::value << std::endl; + std::cout << " Is pod Scalar " << std::is_pod >::value << std::endl; + std::cout << " Is pod Scalar " << std::is_pod >::value << std::endl; + for(int a=0;a +#include + +using namespace std; +using namespace Grid; +using namespace Grid::QCD; + +class funcPlus { +public: + funcPlus() {}; + template void operator()(vec &rr,vec &i1,vec &i2) const { rr = i1+i2;} + std::string name(void) const { return std::string("Plus"); } +}; +class funcMinus { +public: + funcMinus() {}; + template void operator()(vec &rr,vec &i1,vec &i2) const { rr = i1-i2;} + std::string name(void) const { return std::string("Minus"); } +}; +class funcTimes { +public: + funcTimes() {}; + template void operator()(vec &rr,vec &i1,vec &i2) const { rr = i1*i2;} + std::string name(void) const { return std::string("Times"); } +}; +class funcConj { +public: + funcConj() {}; + template void operator()(vec &rr,vec &i1,vec &i2) const { rr = conj(i1);} + std::string name(void) const { return std::string("Conj"); } +}; +class funcAdj { +public: + funcAdj() {}; + template void operator()(vec &rr,vec &i1,vec &i2) const { rr = adj(i1);} + std::string name(void) const { return std::string("Adj"); } +}; + +class funcTimesI { +public: + funcTimesI() {}; + template void operator()(vec &rr,vec &i1,vec &i2) const { rr = timesI(i1);} + std::string name(void) const { return std::string("timesI"); } +}; + +class funcTimesMinusI { +public: + funcTimesMinusI() {}; + template void operator()(vec &rr,vec &i1,vec &i2) const { rr = timesMinusI(i1);} + std::string name(void) const { return std::string("timesMinusI"); } +}; + +template +void Tester(const functor &func) +{ + GridSerialRNG sRNG; + sRNG.SeedRandomDevice(); + + int Nsimd = vec::Nsimd(); + + std::vector input1(Nsimd); + std::vector input2(Nsimd); + std::vector result(Nsimd); + std::vector reference(Nsimd); + + std::vector > buf(3); + vec & v_input1 = buf[0]; + vec & v_input2 = buf[1]; + vec & v_result = buf[2]; + + + for(int i=0;i0){ + std::cout<< "*****" << std::endl; + std::cout<< "["< simd_layout({1,1,2,2}); + std::vector mpi_layout ({1,1,1,1}); + std::vector latt_size ({8,8,8,8}); + + GridCartesian Grid(latt_size,simd_layout,mpi_layout); + std::vector seeds({1,2,3,4}); + + // Insist that operations on random scalars gives + // identical results to on vectors. + + Tester(funcPlus()); + + std::cout << "==================================="<< std::endl; + std::cout << "Testing vComplexF "<(funcTimesI()); + Tester(funcTimesMinusI()); + Tester(funcPlus()); + Tester(funcMinus()); + Tester(funcTimes()); + Tester(funcConj()); + Tester(funcAdj()); + + std::cout << "==================================="<< std::endl; + std::cout << "Testing vComplexD "<(funcTimesI()); + Tester(funcTimesMinusI()); + Tester(funcPlus()); + Tester(funcMinus()); + Tester(funcTimes()); + Tester(funcConj()); + Tester(funcAdj()); + + std::cout << "==================================="<< std::endl; + std::cout << "Testing vRealF "<(funcPlus()); + Tester(funcMinus()); + Tester(funcTimes()); + Tester(funcAdj()); + + std::cout << "==================================="<< std::endl; + std::cout << "Testing vRealD "<(funcPlus()); + Tester(funcMinus()); + Tester(funcTimes()); + Tester(funcAdj()); + + Grid_finalize(); +} diff --git a/tests/Grid_stencil.cc b/tests/Grid_stencil.cc index f35581f6..a7529045 100644 --- a/tests/Grid_stencil.cc +++ b/tests/Grid_stencil.cc @@ -4,60 +4,37 @@ using namespace std; using namespace Grid; using namespace Grid::QCD; +template +class SimpleCompressor { +public: + void Point(int) {}; + vobj operator() (vobj &arg) { + return arg; + } +}; int main (int argc, char ** argv) { Grid_init(&argc,&argv); - std::vector latt_size (4); - std::vector simd_layout(4); - std::vector mpi_layout (4); + std::vector simd_layout({1,1,2,2}); + std::vector mpi_layout ({2,2,2,2}); + std::vector latt_size ({8,8,8,8}); - int omp=1; - int lat=8; - - mpi_layout[0]=1; - mpi_layout[1]=1; - mpi_layout[2]=1; - mpi_layout[3]=1; - - latt_size[0] = lat; - latt_size[1] = lat; - latt_size[2] = lat; - latt_size[3] = lat; - double volume = latt_size[0]*latt_size[1]*latt_size[2]*latt_size[3]; + double volume = latt_size[0]*latt_size[1]*latt_size[2]*latt_size[3]; -#ifdef AVX512 - simd_layout[0] = 1; - simd_layout[1] = 2; - simd_layout[2] = 2; - simd_layout[3] = 2; -#endif -#if defined (AVX1)|| defined (AVX2) - simd_layout[0] = 1; - simd_layout[1] = 1; - simd_layout[2] = 2; - simd_layout[3] = 2; -#endif -#if defined (SSE2) - simd_layout[0] = 1; - simd_layout[1] = 1; - simd_layout[2] = 1; - simd_layout[3] = 2; -#endif - - GridCartesian Fine(latt_size,simd_layout,mpi_layout); - GridRedBlackCartesian rbFine(latt_size,simd_layout,mpi_layout); - GridParallelRNG fRNG(&Fine); - fRNG.SeedRandomDevice(); - - LatticeColourMatrix Foo(&Fine); - LatticeColourMatrix Bar(&Fine); - LatticeColourMatrix Check(&Fine); - LatticeColourMatrix Diff(&Fine); - - random(fRNG,Foo); - gaussian(fRNG,Bar); + GridCartesian Fine(latt_size,simd_layout,mpi_layout); + GridRedBlackCartesian rbFine(latt_size,simd_layout,mpi_layout); + GridParallelRNG fRNG(&Fine); + fRNG.SeedRandomDevice(); + + LatticeColourMatrix Foo(&Fine); + LatticeColourMatrix Bar(&Fine); + LatticeColourMatrix Check(&Fine); + LatticeColourMatrix Diff(&Fine); + + random(fRNG,Foo); + gaussian(fRNG,Bar); for(int dir=0;dir<4;dir++){ @@ -86,7 +63,8 @@ int main (int argc, char ** argv) fflush(stdout); std::vector > comm_buf(myStencil._unified_buffer_size); printf("calling halo exchange\n");fflush(stdout); - myStencil.HaloExchange(Foo,comm_buf); + SimpleCompressor compress; + myStencil.HaloExchange(Foo,comm_buf,compress); Bar = Cshift(Foo,dir,disp); diff --git a/tests/Grid_wilson.cc b/tests/Grid_wilson.cc new file mode 100644 index 00000000..ade5263d --- /dev/null +++ b/tests/Grid_wilson.cc @@ -0,0 +1,69 @@ +#include +#include + +using namespace std; +using namespace Grid; +using namespace Grid::QCD; + +template +struct scal { + d internal; +}; + + +int main (int argc, char ** argv) +{ + Grid_init(&argc,&argv); + + std::vector simd_layout({1,1,2,2}); + std::vector mpi_layout ({1,1,1,1}); + std::vector latt_size ({8,8,8,8}); + + GridCartesian Grid(latt_size,simd_layout,mpi_layout); + std::vector seeds({1,2,3,4}); + + GridParallelRNG pRNG(&Grid); + // pRNG.SeedFixedIntegers(seeds); + pRNG.SeedRandomDevice(); + + LatticeFermion src(&Grid); random(pRNG,src); + LatticeFermion result(&Grid); result=zero; + LatticeFermion ref(&Grid); ref=zero; + LatticeGaugeField Umu(&Grid); random(pRNG,Umu); + std::vector U(4,&Grid); + + for(int mu=0;mu(Umu,U[mu],mu); + } + + { + int mu=0; + // ref = src + Gamma(Gamma::GammaX)* src ; // 1-gamma_x + ref = src; + ref = U[0]*Cshift(ref,0,1); + } + + RealD mass=0.1; + WilsonMatrix Dw(Umu,mass); + std::cout << "Calling Dw"<