1
0
mirror of https://github.com/paboyle/Grid.git synced 2026-05-19 00:24:32 +01:00

Compare commits

..

53 Commits

Author SHA1 Message Date
Peter Boyle 747c167658 sumD_gpu_direct: one thread per SIMD lane using extractLane
Replaces one thread per outer site calling Reduce() (sequential Nsimd-wide
loop) with one thread per lane calling extractLane() — O(1) per thread.
CUB now reduces over osites*Nsimd elements. Avoids serial lane reduction
but leaves the per-lane sobjD store stride as a known remaining concern.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-18 16:21:50 -04:00
Peter Boyle fca2c5dba0 Lattice_reduction_gpu_cub: define GRID_REDUCTION_TIMING in header
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-18 14:54:08 -04:00
Peter Boyle e12bc7f07c Lattice_reduction_gpu_cub: add GRID_REDUCTION_TIMING instrumentation
Guards accelerator_for and CUB DeviceReduce calls in sumD_gpu_direct
and sumD_gpu_large with #ifdef GRID_REDUCTION_TIMING to isolate where
time is spent in each path. Large path accumulates across all groups
and prints totals with words/nfull/rem context.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-18 14:23:44 -04:00
Peter Boyle dc6ae51cab Lattice_reduction_gpu_cub: replace WordBundle4 with iVector<iScalar<scalarD>,4>
WordBundle4 was redundant with Grid's existing tensor infrastructure.
iVector<iScalar<scalarD>,4> already provides accelerator_inline operator+,
zeroit(), and sycl::is_device_copyable — no new type needed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-18 13:55:28 -04:00
Peter Boyle baa70d8ec9 Test_reduction: add timing benchmark for new vs old reduction paths
Reports us/call and GB/s for sum_gpu (CUB/sycl::reduction) and
sum_gpu_old (hand-rolled shared-memory) for each field type, with
5-call warmup and 100-call timed loop.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-18 12:31:13 -04:00
Peter Boyle c93b338bdd skills: HPC battle-hardening skill files for GPU+MPI correctness
Six skill files encoding expertise for making codebases robust on
problematic HPC systems, covering: correctness verification
(double-run, fingerprinting, flight recorder), hang diagnosis,
GPU runtime correctness (premature barrier, infinite poll),
MPI correctness on heterogeneous systems (device buffer aliasing,
AARCH64 PLT corruption, deterministic reductions),
compiler validation, and communication/computation overlap pipeline
design.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-18 12:10:44 -04:00
Peter Boyle c0472aa0ec Test_reduction: use separate float and double grids
Float fields require a grid constructed with vComplexF::Nsimd(); using
a double grid causes grid->_gsites to undercount the sites in float
vobjF, making the constant-field expected value wrong.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-18 12:09:35 -04:00
Peter Boyle 09552cfd73 Rename scalarNorm2 to squaredSum in Test_reduction.cc
The function computes |sum|^2 — the squared magnitude of an aggregate sum —
not a norm. squaredSum makes clear that squaring is applied to the sum, not
to individual site values before summing, distinguishing it from sumOfSquares
(the squared L2 norm).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 23:15:11 -04:00
Peter Boyle 003fec509c Fix Zero() used on thrust::complex in WordBundle4 initialisation
Grid's Zero() sentinel is not assignable to thrust::complex<double>;
use scalarD(0) instead.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 18:10:17 -04:00
Peter Boyle 773a82d87f Reinstate large/small dispatch in CUB reduction path; radix-4 word-bundle for large types
rocPRIM's DeviceReduce requires warpSize(64) threads each holding one element in shared
memory, so sizeof(T)*64 must fit in sharedMemPerBlock.  LatticePropagator::scalar_objectD
is 2304 bytes (64*2304 = 147 KB), exceeding the budget and triggering a compile-time
static_assert in limit_block_size.

Introduce sumD_gpu_direct (the original direct-CUB path, safe for small types) and a new
sumD_gpu_large that groups the vobj's vector_type words in bundles of 4, reducing each
bundle as WordBundle4<scalarD> (64 bytes, 64*64 = 4 KB — always within budget).  If
words % 4 != 0, the final partial bundle is zero-padded.  sumD_gpu dispatches at compile
time via if constexpr on sizeof(sobjD) > 512.

For LatticePropagator (144 words) this gives 36 CUB launches instead of 144.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 16:55:58 -04:00
Peter Boyle 286c29d6fb Add Test_reduction to tests/debug
Tests the new CUB/hipCUB/SYCL lattice reduction (sum_gpu) against the
preserved hand-rolled implementation (sum_gpu_old) for LatticeComplexF/D,
LatticeColourMatrixF/D and LatticePropagatorF/D.

Part a) gaussian random field: checks that old and new agree to within
float/double roundoff tolerance.
Part b) constant field (= 1.0, identity-matrix init): verifies
innerProduct(sum, sum) = Ncomp * V^2 where Ncomp counts the nonzero
diagonal scalar components per site (1 / Nc / Ns*Nc respectively).

Make.inc is auto-generated by scripts/filelist on bootstrap and is not
tracked; the new .cc file is all that is needed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 14:31:33 -04:00
Peter Boyle 969b0a3922 Rewrite lattice GPU reduction to use CUB, hipCUB, and SYCL reduction
Replace hand-rolled shared-memory reduction kernels (reduceBlock/reduceBlocks/
reduceKernel) and the global device variable retirementCount with a unified
CUB/hipCUB DeviceReduce::Reduce path for CUDA/HIP and sycl::reduction for SYCL.
No small/large split is needed: both CUB and sycl::reduction handle arbitrary
object sizes internally.

Old implementations preserved as sum_gpu_old / sumD_gpu_old etc. in the
original files for regression testing on GPU hardware.

Also add CLAUDE.md with build, test, and architecture guidance.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 13:41:56 -04:00
Peter Boyle c6c2834e03 Hip Happy 2026-05-15 11:30:29 -04:00
Peter Boyle 856545a1db Support ROCM 7.0.2 2026-05-15 11:30:29 -04:00
Peter Boyle e2d607f6c7 Merge pull request #490 from jdmaia/hip-guard-acceleratorfor2dNB
[HIP] Including kernel launch parameter guard on accelerator_for2dNB
2026-05-06 14:51:30 -04:00
Julio Maia 66da4e0657 Including guard on accelerator_for2dNB against invalid kernel configurations if GRID_HIP 2026-05-06 13:26:33 -05:00
Peter Boyle b37390bb5a 4 node usqcd run 2026-04-27 14:40:11 -07:00
Peter Boyle 829dc8cceb 32 node 2026-04-27 14:38:02 -07:00
Peter Boyle 13cc2c39f5 FOM run 2026-04-27 14:20:49 -07:00
Peter Boyle 66ea3b271c Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2026-04-27 13:55:52 -07:00
Peter Boyle d293b58a20 384 node baseline run 2026-04-27 13:54:40 -07:00
Peter Boyle ce093b2bf3 rdtsc 2026-04-27 13:54:06 -07:00
Peter Boyle e4404efe5a Perlmutter compile update 2026-04-27 13:53:28 -07:00
Peter Boyle 5ce270f1de Adding Claude related files 2026-04-21 10:41:18 -04:00
Peter Boyle af43b067a0 New CLAUDE controllable visualiser 2026-04-10 11:23:25 -04:00
Quadro 34b44d1fee New file for animation in MD time direction 2026-04-02 13:55:38 -04:00
Peter Boyle 595ceaac37 Include grid header and make the ENABLE correct 2026-03-11 17:24:44 -04:00
Peter Boyle daf5834e8e Fixing incorrect PR about disable fermion instantiations 2026-03-11 17:05:46 -04:00
Peter Boyle 0d8658a039 Optimised 2026-03-05 06:06:32 -05:00
Peter Boyle 095e004d01 Setup change GCR 2026-03-05 06:06:32 -05:00
Peter Boyle 0acabee7f6 Modest change 2026-03-05 06:06:32 -05:00
Peter Boyle 76fbcffb60 Improvement to 16^3 hdcg 2026-03-05 06:06:32 -05:00
Peter Boyle a0a62d7ead Merge pull request #478 from vataspro/PolyakovUpstream
Spatial Polyakov Loop implementation
2026-02-24 20:45:42 -05:00
Peter Boyle c5038ea6a5 Merge pull request #483 from cmcknigh/bugfix/rocm7-rocblas-type-refactor
Adding a version check to handle rocBlas type refactor
2026-02-24 20:45:03 -05:00
Peter Boyle a5120903eb Merge pull request #486 from RChrHill/fix/sp4-fp32
Define Sp4 ProjectOnGeneralGroup for generic vtype
2026-02-24 20:44:08 -05:00
Peter Boyle 00b286a08a Merge pull request #488 from RChrHill/feature/additional-ET-traces
Add ET support for Lattice spin- and colour-traces
2026-02-24 20:43:45 -05:00
Peter Boyle 24a9759353 Merge pull request #485 from edbennett/skip-fermion-instantiations
Be able to skip compiling fermion instantiations altogether
2026-02-24 20:43:20 -05:00
edbennett 1b56f6f46d be able to skip compiling fermion instantiations altogether 2026-02-24 23:52:18 +00:00
Peter Boyle 2a8084d569 Subspace setup 2026-02-13 17:26:11 -05:00
Peter Boyle 6ff29f9d4f Alternate multigrids 2026-02-13 17:25:45 -05:00
RChHill c4d3e79193 Add ET support for Lattice spin- and colour-traces 2026-01-29 14:46:52 +00:00
Peter Boyle 7cd3f21e6b preserving a bunch of experiments on setup and g5 subspace doubling 2026-01-06 05:57:39 -05:00
paboyle 4a0aaf0786 Fix issue with Aurora compilers 2025-11-21 21:41:13 +00:00
paboyle 9c3835524c Fix compile warn 2025-11-21 21:41:12 +00:00
RChHill b650b89682 Define Sp4 ProjectOnGeneralGroup for generic vtype 2025-11-19 13:26:52 +00:00
Allen McKnight 4304245c1b Merge branch 'develop' into bugfix/rocm7-rocblas-type-refactor 2025-11-04 08:50:11 -06:00
Your Name 1d1fd3bcaf adding a version check to handle rocblas type change 2025-10-02 15:24:24 -05:00
Alexis Provatas c646d91527 Fix names, protect against bad index values, clean docstrings 2025-05-01 10:52:00 +01:00
Alexis Provatas a2b98d82e1 remove obsolete spatial polyakov observable file 2025-05-01 10:52:00 +01:00
Alexis Provatas 7b9415c088 Move observable logger to Polyakov Loop file and fix docstring 2025-05-01 10:52:00 +01:00
Alexis Provatas cb7110f492 Add Spatial Polyakov Loop observable 2025-05-01 10:52:00 +01:00
Alexis Provatas 0c7af66490 Create Spatial Polyakov Observable Module 2025-05-01 10:52:00 +01:00
Alexis Provatas 496d1b914a Generalise Polyakov loop and overload for temporal direction 2025-05-01 10:52:00 +01:00
94 changed files with 8218 additions and 180 deletions
+98
View File
@@ -0,0 +1,98 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## What This Is
Grid is a data-parallel C++ library for lattice QCD. It provides SIMD-vectorised lattice containers, MPI-based domain decomposition, GPU acceleration (CUDA/HIP/SYCL), and a full suite of QCD algorithms including HMC.
## Build
Uses GNU Autotools. The bootstrap step only needs to run once (or after `configure.ac` changes).
```bash
./bootstrap.sh # downloads Eigen 3.4.0, generates configure
mkdir build && cd build
../configure [options]
make -j$(nproc)
make check # run root-level tests
make install
```
Key configure options:
| Option | Common values |
|--------|---------------|
| `--enable-simd=` | `AVX2`, `AVX512`, `KNL`, `A64FX`, `NEONv8`, `GPU` |
| `--enable-comms=` | `mpi-auto`, `mpi3-auto`, `none` |
| `--enable-accelerator=` | `cuda`, `hip`, `sycl` |
| `--enable-shm=` | `shmopen`, `hugetlbfs`, `nvlink` |
| `--enable-Nc=` | `3` (default), `2`, `4`, `5` |
| `--with-gmp=`, `--with-mpfr=`, `--with-fftw=`, `--with-lime=` | paths to libs |
| `--enable-hdf5`, `--enable-mkl`, `--enable-lapack` | optional features |
Platform recipes from `README.md`:
- **KNL**: `--enable-simd=KNL --enable-comms=mpi3-auto --enable-mkl`
- **Skylake/Haswell**: `--enable-simd=AVX512` or `AVX2` + `--enable-comms=mpi3-auto`
- **AMD EPYC**: `--enable-simd=AVX2 --enable-comms=mpi3`
- **A64FX (Fugaku)**: `--enable-simd=A64FX --enable-comms=mpi3 --enable-shm=shmget` (see `SVE_README.txt`)
Required external libs: GMP, MPFR, OpenSSL, zlib.
## Running Tests
```bash
# From build directory
make check # root-level tests (Test_simd, Test_cshift, etc.)
make -C tests/<subdir> tests # build tests in a subdirectory
./tests/core/Test_simd # run a single test binary directly
```
Test subdirectories and their focus: `core` (SIMD, stencil, comms), `solver` (CG, GMRES, eigensolvers), `hmc` (MD integrators), `forces` (fermion forces), `lanczos`, `IO`, `smearing`, `sp2n`, `debug`.
## Architecture
### Layer stack (bottom to top)
1. **SIMD layer** (`Grid/simd/`) — platform-specific intrinsics wrapped into `vRealF`, `vComplexD`, etc. The SIMD width and layout are compile-time constants controlled by `--enable-simd`.
2. **Tensor layer** (`Grid/tensors/`) — Lorentz/colour/spin tensor algebra built on top of SIMD types. `iMatrix`, `iVector`, `iScalar` templates compose into QCD types like `ColourMatrix`, `SpinColourVector`.
3. **Lattice layer** (`Grid/lattice/`) — `Lattice<T>` container: a site-local tensor replicated across a distributed Cartesian grid. All arithmetic is site-parallel and expression-template-fused.
4. **Cartesian/comms layer** (`Grid/cartesian/`, `Grid/communicator/`) — `GridCartesian` holds the MPI topology and local/global geometry. `Grid/cshift/` implements nearest-neighbour halo exchange; `Grid/stencil/` is the optimised multi-hop stencil used by Dirac operators.
5. **Algorithm layer** (`Grid/algorithms/`) — iterative solvers (CG, GMRES, BiCGSTAB, mixed-precision), eigensolvers (Lanczos, LAPACK), FFT, smearing.
6. **QCD layer** (`Grid/qcd/`) — gauge and fermion actions, HMC integrators, observables.
### QCD subsystem (`Grid/qcd/`)
- `action/fermion/` — Wilson, Clover, DWF (Mobius), Staggered, twisted-mass, G-parity variants
- `action/gauge/` — Wilson gauge, Symanzik, Iwasaki, DBW2, plaquette+rect
- `representations/` — Fundamental, Adjoint, Two-index, Sp(2n)
- `hmc/` — Leapfrog, OMF2/OMF4 integrators; pseudofermion refreshment; Metropolis accept/reject
- `smearing/` — APE, Stout, HEX, gradient flow
- `observables/` — Polyakov loop, plaquette, topological charge
### GPU acceleration
GPU support is injected via macros (`accelerator_for`, `accelerator_for2dNB`). The `Grid/simd/` SIMD types map to scalar on GPU device code; host code paths remain vectorised. Unified virtual memory is on by default (`--enable-unified=yes`); device-aware MPI (`--enable-accelerator-aware-mpi`) avoids device→host copies on transfers.
### Memory and I/O
- `Grid/allocator/` — aligned/NUMA-aware allocators; caching allocator via `--enable-alloc-cache`
- `Grid/parallelIO/` — distributed parallel reader/writer for ILDG (via LIME), SciDAC, and native binary formats
- `Grid/serialisation/` — text, binary, HDF5, XML/JSON serialisation of arbitrary Grid objects
### HMC applications
`HMC/` contains production-ready HMC driver programmes (e.g. `Mobius2p1f.cc`, `DWF_plus_DSDR_nf2plus1_Shamir_Gparity.cc`). These are built separately from the library tests.
## Key Conventions
- **C++17** is required throughout.
- Template structure: most classes are templated on `<_FImpl>` (fermion impl) or `<Gimpl>` (gauge impl), which encode the representation and precision. Instantiation is controlled by `--enable-fermion-instantiations`.
- The `RealD`/`RealF`/`ComplexD`/`ComplexF` typedefs are used everywhere; avoid raw `double`/`float`.
- Logging uses `Grid_log`, `Grid_error` macros (from `Grid/log/`); performance-critical paths use the `GRID_TRACE` / timer macros from `Grid/perfmon/`.
- Reductions across MPI ranks go through `GridBase::GlobalSum` / `GlobalMax`; never reduce with bare MPI calls inside library code.
+9 -7
View File
@@ -54,22 +54,24 @@ Version.h: version-cache
include Make.inc
include Eigen.inc
extra_sources+=$(WILS_FERMION_FILES)
extra_sources+=$(STAG_FERMION_FILES)
if BUILD_FERMION_INSTANTIATIONS
extra_sources+=$(WILS_FERMION_FILES)
extra_sources+=$(STAG_FERMION_FILES)
if BUILD_ZMOBIUS
extra_sources+=$(ZWILS_FERMION_FILES)
extra_sources+=$(ZWILS_FERMION_FILES)
endif
if BUILD_GPARITY
extra_sources+=$(GP_FERMION_FILES)
extra_sources+=$(GP_FERMION_FILES)
endif
if BUILD_FERMION_REPS
extra_sources+=$(ADJ_FERMION_FILES)
extra_sources+=$(TWOIND_FERMION_FILES)
extra_sources+=$(ADJ_FERMION_FILES)
extra_sources+=$(TWOIND_FERMION_FILES)
endif
if BUILD_SP
extra_sources+=$(SP_FERMION_FILES)
if BUILD_FERMION_REPS
extra_sources+=$(SP_TWOIND_FERMION_FILES)
extra_sources+=$(SP_TWOIND_FERMION_FILES)
endif
endif
endif
+88 -17
View File
@@ -28,6 +28,7 @@ Author: Peter Boyle <pboyle@bnl.gov>
#pragma once
#ifdef GRID_HIP
#include <hip/hip_version.h>
#include <hipblas/hipblas.h>
#endif
#ifdef GRID_CUDA
@@ -255,17 +256,30 @@ public:
if ( OpB == GridBLAS_OP_N ) hOpB = HIPBLAS_OP_N;
if ( OpB == GridBLAS_OP_T ) hOpB = HIPBLAS_OP_T;
if ( OpB == GridBLAS_OP_C ) hOpB = HIPBLAS_OP_C;
#if defined(HIP_VERSION_MAJOR) && (HIP_VERSION_MAJOR >=7)
auto err = hipblasZgemmBatched(gridblasHandle,
hOpA,
hOpB,
m,n,k,
(hipblasDoubleComplex *) &alpha_p[0],
(hipblasDoubleComplex **)&Amk[0], lda,
(hipblasDoubleComplex **)&Bkn[0], ldb,
(hipblasDoubleComplex *) &beta_p[0],
(hipblasDoubleComplex **)&Cmn[0], ldc,
(hipDoubleComplex *) &alpha_p[0],
(hipDoubleComplex **)&Amk[0], lda,
(hipDoubleComplex **)&Bkn[0], ldb,
(hipDoubleComplex *) &beta_p[0],
(hipDoubleComplex **)&Cmn[0], ldc,
batchCount);
// std::cout << " hipblas return code " <<(int)err<<std::endl;
#else
auto err = hipblasZgemmBatched(gridblasHandle,
hOpA,
hOpB,
m,n,k,
(hipblasDoubleComplex *) &alpha_p[0],
(hipblasDoubleComplex **)&Amk[0], lda,
(hipblasDoubleComplex **)&Bkn[0], ldb,
(hipblasDoubleComplex *) &beta_p[0],
(hipblasDoubleComplex **)&Cmn[0], ldc,
batchCount);
#endif
// std::cout << " hipblas return code " <<(int)err<<" "<<__LINE__<<std::endl;
GRID_ASSERT(err==HIPBLAS_STATUS_SUCCESS);
#endif
#ifdef GRID_CUDA
@@ -503,17 +517,31 @@ public:
if ( OpB == GridBLAS_OP_N ) hOpB = HIPBLAS_OP_N;
if ( OpB == GridBLAS_OP_T ) hOpB = HIPBLAS_OP_T;
if ( OpB == GridBLAS_OP_C ) hOpB = HIPBLAS_OP_C;
#if defined(HIP_VERSION_MAJOR) && (HIP_VERSION_MAJOR >=7)
auto err = hipblasCgemmBatched(gridblasHandle,
hOpA,
hOpB,
m,n,k,
(hipblasComplex *) &alpha_p[0],
(hipblasComplex **)&Amk[0], lda,
(hipblasComplex **)&Bkn[0], ldb,
(hipblasComplex *) &beta_p[0],
(hipblasComplex **)&Cmn[0], ldc,
(hipComplex *) &alpha_p[0],
(hipComplex **)&Amk[0], lda,
(hipComplex **)&Bkn[0], ldb,
(hipComplex *) &beta_p[0],
(hipComplex **)&Cmn[0], ldc,
batchCount);
#else
auto err = hipblasCgemmBatched(gridblasHandle,
hOpA,
hOpB,
m,n,k,
(hipblasComplex *) &alpha_p[0],
(hipblasComplex **)&Amk[0], lda,
(hipblasComplex **)&Bkn[0], ldb,
(hipblasComplex *) &beta_p[0],
(hipblasComplex **)&Cmn[0], ldc,
batchCount);
#endif
// std::cout << " hipblas return code " <<(int)err<<" "<<__LINE__<<std::endl;
GRID_ASSERT(err==HIPBLAS_STATUS_SUCCESS);
#endif
#ifdef GRID_CUDA
@@ -550,6 +578,7 @@ public:
(void **)&Cmn[0], CUDA_C_32F, ldc,
batchCount, compute_precision, CUBLAS_GEMM_DEFAULT);
}
// std::cout << " hipblas return code " <<(int)err<<" "<<__LINE__<<std::endl;
GRID_ASSERT(err==CUBLAS_STATUS_SUCCESS);
#endif
#ifdef GRID_SYCL
@@ -720,6 +749,7 @@ public:
(float *) &beta_p[0],
(float **)&Cmn[0], ldc,
batchCount);
// std::cout << " hipblas return code " <<(int)err<<" "<<__LINE__<<std::endl;
GRID_ASSERT(err==HIPBLAS_STATUS_SUCCESS);
#endif
#ifdef GRID_CUDA
@@ -880,6 +910,7 @@ public:
(double *) &beta_p[0],
(double **)&Cmn[0], ldc,
batchCount);
// std::cout << " hipblas return code " <<(int)err<<" "<<__LINE__<<std::endl;
GRID_ASSERT(err==HIPBLAS_STATUS_SUCCESS);
#endif
#ifdef GRID_CUDA
@@ -1094,11 +1125,20 @@ public:
GRID_ASSERT(info.size()==batchCount);
#ifdef GRID_HIP
#if defined(HIP_VERSION_MAJOR) && (HIP_VERSION_MAJOR >=7)
auto err = hipblasZgetrfBatched(gridblasHandle,(int)n,
(hipblasDoubleComplex **)&Ann[0], (int)n,
(hipDoubleComplex **)&Ann[0], (int)n,
(int*) &ipiv[0],
(int*) &info[0],
(int)batchCount);
#else
auto err = hipblasZgetrfBatched(gridblasHandle,(int)n,
(hipblasDoubleComplex **)&Ann[0], (int)n,
(int*) &ipiv[0],
(int*) &info[0],
(int)batchCount);
#endif
// std::cout << " hipblas return code " <<(int)err<<" "<<__LINE__<<std::endl;
GRID_ASSERT(err==HIPBLAS_STATUS_SUCCESS);
#endif
#ifdef GRID_CUDA
@@ -1124,11 +1164,21 @@ public:
GRID_ASSERT(info.size()==batchCount);
#ifdef GRID_HIP
#if defined(HIP_VERSION_MAJOR) && (HIP_VERSION_MAJOR >=7)
auto err = hipblasCgetrfBatched(gridblasHandle,(int)n,
(hipblasComplex **)&Ann[0], (int)n,
(hipComplex **)&Ann[0], (int)n,
(int*) &ipiv[0],
(int*) &info[0],
(int)batchCount);
#else
auto err = hipblasCgetrfBatched(gridblasHandle,(int)n,
(hipblasComplex **)&Ann[0], (int)n,
(int*) &ipiv[0],
(int*) &info[0],
(int)batchCount);
#endif
// std::cout << " hipblas return code " <<(int)err<<" "<<__LINE__<<std::endl;
GRID_ASSERT(err==HIPBLAS_STATUS_SUCCESS);
#endif
#ifdef GRID_CUDA
@@ -1201,12 +1251,23 @@ public:
GRID_ASSERT(Cnn.size()==batchCount);
#ifdef GRID_HIP
#if defined(HIP_VERSION_MAJOR) && (HIP_VERSION_MAJOR >=7)
auto err = hipblasZgetriBatched(gridblasHandle,(int)n,
(hipblasDoubleComplex **)&Ann[0], (int)n,
(hipDoubleComplex **)&Ann[0], (int)n,
(int*) &ipiv[0],
(hipblasDoubleComplex **)&Cnn[0], (int)n,
(hipDoubleComplex **)&Cnn[0], (int)n,
(int*) &info[0],
(int)batchCount);
#else
auto err = hipblasZgetriBatched(gridblasHandle,(int)n,
(hipblasDoubleComplex **)&Ann[0], (int)n,
(int*) &ipiv[0],
(hipblasDoubleComplex **)&Cnn[0], (int)n,
(int*) &info[0],
(int)batchCount);
#endif
// std::cout << " hipblas return code " <<(int)err<<" "<<__LINE__<<std::endl;
GRID_ASSERT(err==HIPBLAS_STATUS_SUCCESS);
#endif
#ifdef GRID_CUDA
@@ -1235,12 +1296,22 @@ public:
GRID_ASSERT(Cnn.size()==batchCount);
#ifdef GRID_HIP
#if defined(HIP_VERSION_MAJOR) && (HIP_VERSION_MAJOR >=7)
auto err = hipblasCgetriBatched(gridblasHandle,(int)n,
(hipblasComplex **)&Ann[0], (int)n,
(hipComplex **)&Ann[0], (int)n,
(int*) &ipiv[0],
(hipblasComplex **)&Cnn[0], (int)n,
(hipComplex **)&Cnn[0], (int)n,
(int*) &info[0],
(int)batchCount);
#else
auto err = hipblasCgetriBatched(gridblasHandle,(int)n,
(hipblasComplex **)&Ann[0], (int)n,
(int*) &ipiv[0],
(hipblasComplex **)&Cnn[0], (int)n,
(int*) &info[0],
(int)batchCount);
#endif
// std::cout << " hipblas return code " <<(int)err<<" "<<__LINE__<<std::endl;
GRID_ASSERT(err==HIPBLAS_STATUS_SUCCESS);
#endif
#ifdef GRID_CUDA
+2 -2
View File
@@ -92,8 +92,8 @@ class TwoLevelCGmrhs
// Vector case
virtual void operator() (std::vector<Field> &src, std::vector<Field> &x)
{
// SolveSingleSystem(src,x);
SolvePrecBlockCG(src,x);
SolveSingleSystem(src,x);
// SolvePrecBlockCG(src,x);
}
////////////////////////////////////////////////////////////////////////////////////////////////////
+9 -6
View File
@@ -97,7 +97,7 @@ public:
RealD scale;
ConjugateGradient<FineField> CG(1.0e-3,400,false);
ConjugateGradient<FineField> CG(1.0e-4,2000,false);
FineField noise(FineGrid);
FineField Mn(FineGrid);
@@ -131,7 +131,10 @@ public:
RealD scale;
TrivialPrecon<FineField> simple_fine;
PrecGeneralisedConjugateResidualNonHermitian<FineField> GCR(0.001,30,DiracOp,simple_fine,12,12);
// PrecGeneralisedConjugateResidualNonHermitian<FineField> GCR(0.001,10,DiracOp,simple_fine,30,30);
// PrecGeneralisedConjugateResidualNonHermitian<FineField> GCR(0.001,10,DiracOp,simple_fine,12,12);
// PrecGeneralisedConjugateResidualNonHermitian<FineField> GCR(0.001,30,DiracOp,simple_fine,12,12);
PrecGeneralisedConjugateResidualNonHermitian<FineField> GCR(0.001,30,DiracOp,simple_fine,10,10);
FineField noise(FineGrid);
FineField src(FineGrid);
FineField guess(FineGrid);
@@ -146,16 +149,16 @@ public:
DiracOp.Op(noise,Mn); std::cout<<GridLogMessage << "noise ["<<b<<"] <n|Op|n> "<<innerProduct(noise,Mn)<<std::endl;
for(int i=0;i<2;i++){
for(int i=0;i<3;i++){
// void operator() (const Field &src, Field &psi){
#if 1
std::cout << GridLogMessage << " inverting on noise "<<std::endl;
if (i==0)std::cout << GridLogMessage << " inverting on noise "<<std::endl;
src = noise;
guess=Zero();
GCR(src,guess);
subspace[b] = guess;
#else
std::cout << GridLogMessage << " inverting on zero "<<std::endl;
if (i==0)std::cout << GridLogMessage << " inverting on zero "<<std::endl;
src=Zero();
guess = noise;
GCR(src,guess);
@@ -167,7 +170,7 @@ public:
}
DiracOp.Op(noise,Mn); std::cout<<GridLogMessage << "filtered["<<b<<"] <f|Op|f> "<<innerProduct(noise,Mn)<<std::endl;
DiracOp.Op(noise,Mn); std::cout<<GridLogMessage << "filtered["<<b<<"] <f|Op|f> "<<innerProduct(noise,Mn)<<" <f|OpDagOp|f>"<<norm2(Mn)<<std::endl;
subspace[b] = noise;
}
+3
View File
@@ -31,6 +31,9 @@ Author: Christoph Lehner <christoph@lhnr.de>
#if defined(GRID_SYCL)
#include <Grid/lattice/Lattice_reduction_sycl.h>
#endif
#if defined(GRID_CUDA)||defined(GRID_HIP)||defined(GRID_SYCL)
#include <Grid/lattice/Lattice_reduction_gpu_cub.h>
#endif
#include <Grid/lattice/Lattice_slicesum_core.h>
NAMESPACE_BEGIN(Grid);
+10 -10
View File
@@ -198,7 +198,7 @@ __global__ void reduceKernel(const vobj *lat, sobj *buffer, Iterator n) {
// Possibly promote to double and sum
/////////////////////////////////////////////////////////////////////////////////////////////////////////
template <class vobj>
inline typename vobj::scalar_objectD sumD_gpu_small(const vobj *lat, Integer osites)
inline typename vobj::scalar_objectD sumD_gpu_small_old(const vobj *lat, Integer osites)
{
typedef typename vobj::scalar_objectD sobj;
typedef decltype(lat) Iterator;
@@ -224,7 +224,7 @@ inline typename vobj::scalar_objectD sumD_gpu_small(const vobj *lat, Integer osi
}
template <class vobj>
inline typename vobj::scalar_objectD sumD_gpu_large(const vobj *lat, Integer osites)
inline typename vobj::scalar_objectD sumD_gpu_large_old(const vobj *lat, Integer osites)
{
typedef typename vobj::vector_type vector;
typedef typename vobj::scalar_typeD scalarD;
@@ -244,13 +244,13 @@ inline typename vobj::scalar_objectD sumD_gpu_large(const vobj *lat, Integer osi
buf[ss] = dat[ss*words+w];
});
ret_p[w] = sumD_gpu_small(tbuf,osites);
ret_p[w] = sumD_gpu_small_old(tbuf,osites);
}
return ret;
}
template <class vobj>
inline typename vobj::scalar_objectD sumD_gpu(const vobj *lat, Integer osites)
inline typename vobj::scalar_objectD sumD_gpu_old(const vobj *lat, Integer osites)
{
typedef typename vobj::scalar_objectD sobj;
sobj ret;
@@ -261,9 +261,9 @@ inline typename vobj::scalar_objectD sumD_gpu(const vobj *lat, Integer osites)
int ok = getNumBlocksAndThreads(size, sizeof(sobj), numThreads, numBlocks);
if ( ok ) {
ret = sumD_gpu_small(lat,osites);
ret = sumD_gpu_small_old(lat,osites);
} else {
ret = sumD_gpu_large(lat,osites);
ret = sumD_gpu_large_old(lat,osites);
}
return ret;
}
@@ -272,20 +272,20 @@ inline typename vobj::scalar_objectD sumD_gpu(const vobj *lat, Integer osites)
// Return as same precision as input performing reduction in double precision though
/////////////////////////////////////////////////////////////////////////////////////////////////////////
template <class vobj>
inline typename vobj::scalar_object sum_gpu(const vobj *lat, Integer osites)
inline typename vobj::scalar_object sum_gpu_old(const vobj *lat, Integer osites)
{
typedef typename vobj::scalar_object sobj;
sobj result;
result = sumD_gpu(lat,osites);
result = sumD_gpu_old(lat,osites);
return result;
}
template <class vobj>
inline typename vobj::scalar_object sum_gpu_large(const vobj *lat, Integer osites)
inline typename vobj::scalar_object sum_gpu_large_old(const vobj *lat, Integer osites)
{
typedef typename vobj::scalar_object sobj;
sobj result;
result = sumD_gpu_large(lat,osites);
result = sumD_gpu_large_old(lat,osites);
return result;
}
+361
View File
@@ -0,0 +1,361 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./Grid/lattice/Lattice_reduction_gpu_cub.h
Copyright (C) 2015-2024
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#pragma once
#if defined(GRID_CUDA)
#include <cub/cub.cuh>
#define gpucub cub
#define gpuError_t cudaError_t
#define gpuSuccess cudaSuccess
#elif defined(GRID_HIP)
#include <hipcub/hipcub.hpp>
#define gpucub hipcub
#define gpuError_t hipError_t
#define gpuSuccess hipSuccess
#endif
NAMESPACE_BEGIN(Grid);
/////////////////////////////////////////////////////////////////////////////////////////////////////////
// Unified lattice reduction using CUB (CUDA/HIP) and sycl::reduction (SYCL).
//
// CUDA/HIP: one accelerator_for pass per site to extract SIMD lanes and promote to sobjD,
// then CUB/hipCUB DeviceReduce::Reduce over the resulting array.
//
// rocPRIM's DeviceReduce requires warpSize(64) threads per block, each holding one element
// in shared memory: sizeof(T)*64 must fit in sharedMemPerBlock. Large QCD objects such as
// LatticePropagator (sobjD = 2304 bytes, 64*2304 = 147 KB) exceed this budget.
//
// For those types sumD_gpu_large groups the vobj's vector_type words in bundles of 4,
// reducing each bundle as an iVector<iScalar<scalarD>,4> (64 bytes, 64*64 = 4 KB — always safe).
// Words that do not fill a complete bundle are zero-padded.
//
// SYCL: sycl::reduction handles any type size through the runtime, so one path suffices.
/////////////////////////////////////////////////////////////////////////////////////////////////////////
#if defined(GRID_CUDA) || defined(GRID_HIP)
#define GRID_REDUCTION_TIMING
// Direct CUB reduction on the full scalar_objectD.
// Only safe when sizeof(sobjD)*64 <= device sharedMemPerBlock.
// Do not call directly for large composite types (e.g. LatticePropagator).
template<class vobj>
inline typename vobj::scalar_objectD sumD_gpu_direct(const vobj *lat, Integer osites)
{
typedef typename vobj::scalar_object sobj;
typedef typename vobj::scalar_objectD sobjD;
const Integer nsimd = vobj::Nsimd();
const Integer nlanes = osites * nsimd;
deviceVector<sobjD> per_lane(nlanes);
sobjD *per_lane_p = &per_lane[0];
#ifdef GRID_REDUCTION_TIMING
RealD t_for = -usecond();
#endif
accelerator_for(idx, nlanes, 1, {
Integer ss = idx / nsimd;
Integer lane = idx % nsimd;
sobj tmp = extractLane(lane, lat[ss]);
sobjD tmpD; tmpD = tmp;
per_lane_p[idx] = tmpD;
});
#ifdef GRID_REDUCTION_TIMING
accelerator_barrier();
t_for += usecond();
#endif
sobjD zero; zeroit(zero);
sobjD *d_out = static_cast<sobjD *>(acceleratorAllocDevice(sizeof(sobjD)));
void *d_temp = nullptr;
size_t temp_bytes = 0;
gpuError_t gpuErr;
gpuErr = gpucub::DeviceReduce::Reduce(d_temp, temp_bytes, per_lane_p, d_out,
(int)nlanes, gpucub::Sum(), zero, computeStream);
if (gpuErr != gpuSuccess) {
std::cout << GridLogError << "sumD_gpu_direct: DeviceReduce size query failed: "
<< gpuErr << std::endl;
exit(EXIT_FAILURE);
}
d_temp = acceleratorAllocDevice(temp_bytes);
#ifdef GRID_REDUCTION_TIMING
RealD t_cub = -usecond();
#endif
gpuErr = gpucub::DeviceReduce::Reduce(d_temp, temp_bytes, per_lane_p, d_out,
(int)nlanes, gpucub::Sum(), zero, computeStream);
if (gpuErr != gpuSuccess) {
std::cout << GridLogError << "sumD_gpu_direct: DeviceReduce failed: "
<< gpuErr << std::endl;
exit(EXIT_FAILURE);
}
accelerator_barrier();
#ifdef GRID_REDUCTION_TIMING
t_cub += usecond();
std::cout << GridLogMessage << "sumD_gpu_direct"
<< " sizeof(sobjD)=" << sizeof(sobjD)
<< " accelerator_for=" << t_for << " us"
<< " CUB_reduce=" << t_cub << " us" << std::endl;
#endif
sobjD result;
acceleratorCopyFromDevice(d_out, &result, sizeof(sobjD));
acceleratorFreeDevice(d_temp);
acceleratorFreeDevice(d_out);
return result;
}
// Radix-4 word-bundle path for types too large for the direct CUB path.
// Treats vobj as words of vector_type; groups them in bundles of 4 and reduces
// each bundle as an iVector<iScalar<scalarD>,4> — reusing Grid's existing tensor
// type which already has accelerator_inline operator+ and zeroit().
// sizeof = 4 * sizeof(scalarD) <= 64 bytes; 64 * 64 = 4096 bytes, safely within
// rocPRIM's shared-memory budget on all supported devices.
// If words % 4 != 0, the final partial bundle is zero-padded so all unused
// slots contribute zero to the sum.
template<class vobj>
inline typename vobj::scalar_objectD sumD_gpu_large(const vobj *lat, Integer osites)
{
typedef typename vobj::vector_type vector;
typedef typename vobj::scalar_typeD scalarD;
typedef typename vobj::scalar_objectD sobjD;
using R4 = iVector<iScalar<scalarD>, 4>;
const int words = sizeof(vobj) / sizeof(vector);
const int nfull = words / 4;
const int rem = words % 4;
sobjD ret; zeroit(ret);
scalarD *ret_p = (scalarD *)&ret;
iScalar<vector> *idat = (iScalar<vector> *)lat;
deviceVector<R4> buf(osites);
R4 *buf_p = &buf[0];
R4 zero4; zeroit(zero4);
R4 *d_out = static_cast<R4 *>(acceleratorAllocDevice(sizeof(R4)));
void *d_temp = nullptr;
size_t temp_bytes = 0;
// Probe workspace size once — type R4 and count osites are fixed across all groups.
gpuError_t gpuErr;
gpuErr = gpucub::DeviceReduce::Reduce(d_temp, temp_bytes, buf_p, d_out,
(int)osites, gpucub::Sum(), zero4, computeStream);
if (gpuErr != gpuSuccess) {
std::cout << GridLogError << "sumD_gpu_large: DeviceReduce size query failed: "
<< gpuErr << std::endl;
exit(EXIT_FAILURE);
}
d_temp = acceleratorAllocDevice(temp_bytes);
#ifdef GRID_REDUCTION_TIMING
RealD t_for_large = 0.0, t_cub_large = 0.0;
#endif
// Full groups of 4 words.
for (int g = 0; g < nfull; g++) {
int base = 4 * g;
#ifdef GRID_REDUCTION_TIMING
t_for_large -= usecond();
#endif
accelerator_for(ss, osites, 1, {
R4 r4;
r4._internal[0] = TensorRemove(Reduce(idat[ss * words + base ]));
r4._internal[1] = TensorRemove(Reduce(idat[ss * words + base + 1]));
r4._internal[2] = TensorRemove(Reduce(idat[ss * words + base + 2]));
r4._internal[3] = TensorRemove(Reduce(idat[ss * words + base + 3]));
buf_p[ss] = r4;
});
#ifdef GRID_REDUCTION_TIMING
accelerator_barrier();
t_for_large += usecond();
t_cub_large -= usecond();
#endif
gpuErr = gpucub::DeviceReduce::Reduce(d_temp, temp_bytes, buf_p, d_out,
(int)osites, gpucub::Sum(), zero4, computeStream);
if (gpuErr != gpuSuccess) {
std::cout << GridLogError << "sumD_gpu_large: DeviceReduce failed (group "
<< g << "): " << gpuErr << std::endl;
exit(EXIT_FAILURE);
}
accelerator_barrier();
#ifdef GRID_REDUCTION_TIMING
t_cub_large += usecond();
#endif
R4 group_result;
acceleratorCopyFromDevice(d_out, &group_result, sizeof(R4));
ret_p[base ] = TensorRemove(group_result._internal[0]);
ret_p[base + 1] = TensorRemove(group_result._internal[1]);
ret_p[base + 2] = TensorRemove(group_result._internal[2]);
ret_p[base + 3] = TensorRemove(group_result._internal[3]);
}
// Partial last group: zero-pad unused slots so they contribute nothing to the sum.
if (rem > 0) {
int base = 4 * nfull;
#ifdef GRID_REDUCTION_TIMING
t_for_large -= usecond();
#endif
accelerator_for(ss, osites, 1, {
R4 r4; zeroit(r4);
for (int k = 0; k < rem; k++)
r4._internal[k] = TensorRemove(Reduce(idat[ss * words + base + k]));
buf_p[ss] = r4;
});
#ifdef GRID_REDUCTION_TIMING
accelerator_barrier();
t_for_large += usecond();
t_cub_large -= usecond();
#endif
gpuErr = gpucub::DeviceReduce::Reduce(d_temp, temp_bytes, buf_p, d_out,
(int)osites, gpucub::Sum(), zero4, computeStream);
if (gpuErr != gpuSuccess) {
std::cout << GridLogError << "sumD_gpu_large: DeviceReduce failed (partial group): "
<< gpuErr << std::endl;
exit(EXIT_FAILURE);
}
accelerator_barrier();
#ifdef GRID_REDUCTION_TIMING
t_cub_large += usecond();
#endif
R4 partial_result;
acceleratorCopyFromDevice(d_out, &partial_result, sizeof(R4));
for (int k = 0; k < rem; k++)
ret_p[4 * nfull + k] = TensorRemove(partial_result._internal[k]);
}
#ifdef GRID_REDUCTION_TIMING
std::cout << GridLogMessage << "sumD_gpu_large"
<< " sizeof(sobjD)=" << sizeof(sobjD)
<< " words=" << words << " nfull=" << nfull << " rem=" << rem
<< " accelerator_for=" << t_for_large << " us"
<< " CUB_reduce=" << t_cub_large << " us" << std::endl;
#endif
acceleratorFreeDevice(d_temp);
acceleratorFreeDevice(d_out);
return ret;
}
// Dispatch: direct CUB path for types that fit in the shared-memory budget,
// radix-4 word-bundle path for larger types.
// Threshold 512 bytes: 64 * 512 = 32768 bytes, within rocPRIM's
// ROCPRIM_SHARED_MEMORY_MAX on all supported devices.
template<class vobj>
inline typename vobj::scalar_objectD sumD_gpu(const vobj *lat, Integer osites)
{
typedef typename vobj::scalar_objectD sobjD;
if constexpr (sizeof(sobjD) > 512) {
return sumD_gpu_large(lat, osites);
} else {
return sumD_gpu_direct(lat, osites);
}
}
template<class vobj>
inline typename vobj::scalar_objectD sumD_gpu_small(const vobj *lat, Integer osites)
{
return sumD_gpu(lat, osites);
}
template<class vobj>
inline typename vobj::scalar_object sum_gpu(const vobj *lat, Integer osites)
{
typedef typename vobj::scalar_object sobj;
sobj result;
result = sumD_gpu(lat, osites);
return result;
}
template<class vobj>
inline typename vobj::scalar_object sum_gpu_large(const vobj *lat, Integer osites)
{
typedef typename vobj::scalar_object sobj;
sobj result;
result = sumD_gpu_large(lat, osites);
return result;
}
#endif // GRID_CUDA || GRID_HIP
#if defined(GRID_SYCL)
// Accumulates in sobjD throughout, fixing the precision bug in the original
// Lattice_reduction_sycl.h which accumulated in sobj then converted at the end.
template<class vobj>
inline typename vobj::scalar_objectD sumD_gpu(const vobj *lat, Integer osites)
{
typedef typename vobj::scalar_object sobj;
typedef typename vobj::scalar_objectD sobjD;
sobjD identity; zeroit(identity);
sobjD ret; zeroit(ret);
{
sycl::buffer<sobjD, 1> abuff(&ret, {1});
theGridAccelerator->submit([&](sycl::handler &cgh) {
auto Reduction = sycl::reduction(abuff, cgh, identity, std::plus<>());
cgh.parallel_for(sycl::range<1>{(size_t)osites},
Reduction,
[=](sycl::id<1> item, auto &sum) {
sobj s = Reduce(lat[item[0]]);
sobjD sd; sd = s;
sum += sd;
});
});
}
return ret;
}
template<class vobj>
inline typename vobj::scalar_objectD sumD_gpu_small(const vobj *lat, Integer osites)
{
return sumD_gpu(lat, osites);
}
template<class vobj>
inline typename vobj::scalar_objectD sumD_gpu_large(const vobj *lat, Integer osites)
{
return sumD_gpu(lat, osites);
}
template<class vobj>
inline typename vobj::scalar_object sum_gpu(const vobj *lat, Integer osites)
{
typedef typename vobj::scalar_object sobj;
sobj result;
result = sumD_gpu(lat, osites);
return result;
}
template<class vobj>
inline typename vobj::scalar_object sum_gpu_large(const vobj *lat, Integer osites)
{
return sum_gpu(lat, osites);
}
#endif // GRID_SYCL
NAMESPACE_END(Grid);
+11 -11
View File
@@ -6,7 +6,7 @@ NAMESPACE_BEGIN(Grid);
template <class vobj>
inline typename vobj::scalar_objectD sumD_gpu_tensor(const vobj *lat, Integer osites)
inline typename vobj::scalar_objectD sumD_gpu_tensor_old(const vobj *lat, Integer osites)
{
typedef typename vobj::scalar_object sobj;
typedef typename vobj::scalar_objectD sobjD;
@@ -31,40 +31,40 @@ inline typename vobj::scalar_objectD sumD_gpu_tensor(const vobj *lat, Integer os
}
template <class vobj>
inline typename vobj::scalar_objectD sumD_gpu_large(const vobj *lat, Integer osites)
inline typename vobj::scalar_objectD sumD_gpu_large_old(const vobj *lat, Integer osites)
{
return sumD_gpu_tensor(lat,osites);
return sumD_gpu_tensor_old(lat,osites);
}
template <class vobj>
inline typename vobj::scalar_objectD sumD_gpu_small(const vobj *lat, Integer osites)
inline typename vobj::scalar_objectD sumD_gpu_small_old(const vobj *lat, Integer osites)
{
return sumD_gpu_large(lat,osites);
return sumD_gpu_large_old(lat,osites);
}
template <class vobj>
inline typename vobj::scalar_objectD sumD_gpu(const vobj *lat, Integer osites)
inline typename vobj::scalar_objectD sumD_gpu_old(const vobj *lat, Integer osites)
{
return sumD_gpu_large(lat,osites);
return sumD_gpu_large_old(lat,osites);
}
/////////////////////////////////////////////////////////////////////////////////////////////////////////
// Return as same precision as input performing reduction in double precision though
/////////////////////////////////////////////////////////////////////////////////////////////////////////
template <class vobj>
inline typename vobj::scalar_object sum_gpu(const vobj *lat, Integer osites)
inline typename vobj::scalar_object sum_gpu_old(const vobj *lat, Integer osites)
{
typedef typename vobj::scalar_object sobj;
sobj result;
result = sumD_gpu(lat,osites);
result = sumD_gpu_old(lat,osites);
return result;
}
template <class vobj>
inline typename vobj::scalar_object sum_gpu_large(const vobj *lat, Integer osites)
inline typename vobj::scalar_object sum_gpu_large_old(const vobj *lat, Integer osites)
{
typedef typename vobj::scalar_object sobj;
sobj result;
result = sumD_gpu_large(lat,osites);
result = sumD_gpu_large_old(lat,osites);
return result;
}
+2 -2
View File
@@ -51,8 +51,8 @@ Author: paboyle <paboyle@ph.ed.ac.uk>
#endif
#ifdef __x86_64__
#ifdef GRID_CUDA
accelerator_inline uint64_t __rdtsc(void) { return 0; }
accelerator_inline uint64_t __rdpmc(int ) { return 0; }
//accelerator_inline uint64_t __rdtsc(void) { return 0; }
//accelerator_inline uint64_t __rdpmc(int ) { return 0; }
#else
#include <x86intrin.h>
#endif
+18
View File
@@ -596,16 +596,32 @@ template<int Index,class vobj> inline vobj transposeColour(const vobj &lhs){
//////////////////////////////////////////
// Trace lattice and non-lattice
//////////////////////////////////////////
#define GRID_UNOP(name) name
#define GRID_DEF_UNOP(op, name) \
template <typename T1, typename std::enable_if<is_lattice<T1>::value||is_lattice_expr<T1>::value,T1>::type * = nullptr> \
inline auto op(const T1 &arg) ->decltype(LatticeUnaryExpression<GRID_UNOP(name),T1>(GRID_UNOP(name)(), arg)) \
{ \
return LatticeUnaryExpression<GRID_UNOP(name),T1>(GRID_UNOP(name)(), arg); \
}
template<int Index,class vobj>
inline auto traceSpin(const Lattice<vobj> &lhs) -> Lattice<decltype(traceIndex<SpinIndex>(vobj()))>
{
return traceIndex<SpinIndex>(lhs);
}
GridUnopClass(UnaryTraceSpin, traceIndex<SpinIndex>(a));
GRID_DEF_UNOP(traceSpin, UnaryTraceSpin);
template<int Index,class vobj>
inline auto traceColour(const Lattice<vobj> &lhs) -> Lattice<decltype(traceIndex<ColourIndex>(vobj()))>
{
return traceIndex<ColourIndex>(lhs);
}
GridUnopClass(UnaryTraceColour, traceIndex<ColourIndex>(a));
GRID_DEF_UNOP(traceColour, UnaryTraceColour);
template<int Index,class vobj>
inline auto traceSpin(const vobj &lhs) -> Lattice<decltype(traceIndex<SpinIndex>(lhs))>
{
@@ -617,6 +633,8 @@ inline auto traceColour(const vobj &lhs) -> Lattice<decltype(traceIndex<ColourIn
return traceIndex<ColourIndex>(lhs);
}
#undef GRID_UNOP
#undef GRID_DEF_UNOP
//////////////////////////////////////////
// Current types
//////////////////////////////////////////
+6 -3
View File
@@ -138,10 +138,13 @@ public:
//auto start = std::chrono::high_resolution_clock::now();
autoView(U_v,U,AcceleratorWrite);
autoView(P_v,P,AcceleratorRead);
accelerator_for(ss, P.Grid()->oSites(),1,{
typedef typename Field::vector_object vobj;
const int Nsimd = vobj::Nsimd();
accelerator_for(ss, P.Grid()->oSites(),Nsimd,{
for (int mu = 0; mu < Nd; mu++) {
U_v[ss](mu) = Exponentiate(P_v[ss](mu), ep, Nexp) * U_v[ss](mu);
U_v[ss](mu) = Group::ProjectOnGeneralGroup(U_v[ss](mu));
auto tmp = Exponentiate(P_v(ss)(mu), ep, Nexp) * U_v(ss)(mu);
tmp = Group::ProjectOnGeneralGroup(tmp);
coalescedWrite(U_v[ss](mu),tmp);
}
});
//auto end = std::chrono::high_resolution_clock::now();
+12
View File
@@ -103,6 +103,18 @@ class PolyakovMod: public ObservableModule<PolyakovLogger<Impl>, NoParameters>{
PolyakovMod(): ObsBase(NoParameters()){}
};
template < class Impl >
class SpatialPolyakovMod: public ObservableModule<SpatialPolyakovLogger<Impl>, NoParameters>{
typedef ObservableModule<SpatialPolyakovLogger<Impl>, NoParameters> ObsBase;
using ObsBase::ObsBase; // for constructors
// acquire resource
virtual void initialize(){
this->ObservablePtr.reset(new SpatialPolyakovLogger<Impl>());
}
public:
SpatialPolyakovMod(): ObsBase(NoParameters()){}
};
template < class Impl >
class TopologicalChargeMod: public ObservableModule<TopologicalCharge<Impl>, TopologyObsParameters>{
+42 -2
View File
@@ -2,11 +2,12 @@
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/qcd/modules/polyakov_line.h
Source file: ./Grid/qcd/observables/polyakov_loop.h
Copyright (C) 2017
Copyright (C) 2025
Author: David Preti <david.preti@csic.es>
Author: Alexis Verney-Provatas <2414441@swansea.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
@@ -60,4 +61,43 @@ class PolyakovLogger : public HmcObservable<typename Impl::Field> {
}
};
template <class Impl>
class SpatialPolyakovLogger : public HmcObservable<typename Impl::Field> {
public:
// here forces the Impl to be of gauge fields
// if not the compiler will complain
INHERIT_GIMPL_TYPES(Impl);
// necessary for HmcObservable compatibility
typedef typename Impl::Field Field;
void TrajectoryComplete(int traj,
Field &U,
GridSerialRNG &sRNG,
GridParallelRNG &pRNG) {
// Save current numerical output precision
int def_prec = std::cout.precision();
// Assume that the dimensions are D=3+1
int Ndim = 3;
ComplexD polyakov;
// Iterate over the spatial directions and print the average spatial polyakov loop
// over them
for (int idx=0; idx<Ndim; idx++) {
polyakov = WilsonLoops<Impl>::avgPolyakovLoop(U, idx);
std::cout << GridLogMessage
<< std::setprecision(std::numeric_limits<Real>::digits10 + 1)
<< "Polyakov Loop in the " << idx << " spatial direction : [ " << traj << " ] "<< polyakov << std::endl;
}
// Return to original output precision
std::cout.precision(def_prec);
}
};
NAMESPACE_END(Grid);
+1 -1
View File
@@ -54,7 +54,7 @@ public:
// Usual cases are not used
//////////////////////////////////
virtual void refresh(const GaugeField &U, GridSerialRNG &sRNG, GridParallelRNG &pRNG){ GRID_ASSERT(0);};
virtual RealD S(const GaugeField &U) { GRID_ASSERT(0); }
virtual RealD S(const GaugeField &U) { GRID_ASSERT(0); return 0; }
virtual void deriv(const GaugeField &U, GaugeField &dSdU) { GRID_ASSERT(0); }
//////////////////////////////////
+3 -3
View File
@@ -254,9 +254,9 @@ static void testGenerators(GroupName::Sp) {
}
}
template <int N>
static Lattice<iScalar<iScalar<iMatrix<vComplexD, N> > > >
ProjectOnGeneralGroup(const Lattice<iScalar<iScalar<iMatrix<vComplexD, N> > > > &Umu, GroupName::Sp) {
template <class vtype, int N>
static Lattice<iScalar<iScalar<iMatrix<vtype, N> > > >
ProjectOnGeneralGroup(const Lattice<iScalar<iScalar<iMatrix<vtype, N> > > > &Umu, GroupName::Sp) {
return ProjectOnSpGroup(Umu);
}
+26 -8
View File
@@ -177,25 +177,43 @@ public:
}
//////////////////////////////////////////////////
// average over all x,y,z the temporal loop
// average Polyakov loop in mu direction over all directions != mu
//////////////////////////////////////////////////
static ComplexD avgPolyakovLoop(const GaugeField &Umu) { //assume Nd=4
GaugeMat Ut(Umu.Grid()), P(Umu.Grid());
static ComplexD avgPolyakovLoop(const GaugeField &Umu, const int mu) { //assume Nd=4
// Protect against bad value of mu [0, 3]
if ((mu < 0 ) || (mu > 3)) {
std::cout << GridLogError << "Index is not an integer inclusively between 0 and 3." << std::endl;
exit(1);
}
// U_loop is U_{mu}
GaugeMat U_loop(Umu.Grid()), P(Umu.Grid());
ComplexD out;
int T = Umu.Grid()->GlobalDimensions()[3];
int X = Umu.Grid()->GlobalDimensions()[0];
int Y = Umu.Grid()->GlobalDimensions()[1];
int Z = Umu.Grid()->GlobalDimensions()[2];
Ut = peekLorentz(Umu,3); //Select temporal direction
P = Ut;
for (int t=1;t<T;t++){
P = Gimpl::CovShiftForward(Ut,3,P);
// Number of sites in mu direction
int N_mu = Umu.Grid()->GlobalDimensions()[mu];
U_loop = peekLorentz(Umu, mu); //Select direction
P = U_loop;
for (int t=1;t<N_mu;t++){
P = Gimpl::CovShiftForward(U_loop,mu,P);
}
RealD norm = 1.0/(Nc*X*Y*Z*T);
out = sum(trace(P))*norm;
return out;
}
}
/////////////////////////////////////////////////
// overload for temporal Polyakov loop
/////////////////////////////////////////////////
static ComplexD avgPolyakovLoop(const GaugeField &Umu) {
return avgPolyakovLoop(Umu, 3);
}
//////////////////////////////////////////////////
// average over traced single links
+14 -16
View File
@@ -432,22 +432,20 @@ accelerator_inline int acceleratorSIMTlane(int Nsimd) {
#define accelerator_for2dNB( iter1, num1, iter2, num2, nsimd, ... ) \
{ \
typedef uint64_t Iterator; \
auto lambda = [=] accelerator \
(Iterator iter1,Iterator iter2,Iterator lane ) mutable { \
{ __VA_ARGS__;} \
}; \
int nt=acceleratorThreads(); \
dim3 hip_threads(nsimd, nt, 1); \
dim3 hip_blocks ((num1+nt-1)/nt,num2,1); \
if(hip_threads.x * hip_threads.y * hip_threads.z <= 64){ \
hipLaunchKernelGGL(LambdaApply64,hip_blocks,hip_threads, \
0,computeStream, \
num1,num2,nsimd, lambda); \
} else { \
hipLaunchKernelGGL(LambdaApply,hip_blocks,hip_threads, \
0,computeStream, \
num1,num2,nsimd, lambda); \
if (num1*num2) { \
typedef uint64_t Iterator; \
auto lambda = [=] accelerator \
(Iterator iter1,Iterator iter2,Iterator lane ) mutable { \
{ __VA_ARGS__;} \
}; \
int nt=acceleratorThreads(); \
dim3 hip_threads(nsimd, nt, 1); \
dim3 hip_blocks ((num1+nt-1)/nt,num2,1); \
if(hip_threads.x * hip_threads.y * hip_threads.z <= 64){ \
LambdaApply64<<<hip_blocks,hip_threads,0,computeStream>>>(num1,num2,nsimd,lambda); \
} else { \
LambdaApply<<<hip_blocks,hip_threads,0,computeStream>>>(num1,num2,nsimd,lambda); \
} \
} \
}
+6 -1
View File
@@ -24,7 +24,11 @@ See the full license in the file "LICENSE" in the top level distribution
directory
*************************************************************************************/
/* END LEGAL */
#include <Grid/Grid.h>
#include "disable_examples_without_instantiations.h"
#ifdef ENABLE_FERMION_INSTANTIATIONS
#include<Grid/Grid.h>
#if Nc == 3
#include <Grid/qcd/smearing/GaugeConfigurationMasked.h>
@@ -230,3 +234,4 @@ int main(int argc, char **argv)
#endif
} // main
#endif
+6 -3
View File
@@ -25,7 +25,11 @@ directory
*************************************************************************************/
/* END LEGAL */
#include <Grid/Grid.h>
#include "disable_examples_without_instantiations.h"
#ifdef ENABLE_FERMION_INSTANTIATIONS
#include<Grid/Grid.h>
#if Nc == 3
#include <Grid/qcd/smearing/GaugeConfigurationMasked.h>
@@ -231,5 +235,4 @@ int main(int argc, char **argv)
#endif
} // main
#endif
+6 -3
View File
@@ -24,7 +24,11 @@ See the full license in the file "LICENSE" in the top level distribution
directory
*************************************************************************************/
/* END LEGAL */
#include <Grid/Grid.h>
#include "disable_examples_without_instantiations.h"
#ifdef ENABLE_FERMION_INSTANTIATIONS
#include<Grid/Grid.h>
#if Nc == 3
#include <Grid/qcd/smearing/GaugeConfigurationMasked.h>
@@ -230,5 +234,4 @@ int main(int argc, char **argv)
#endif
} // main
#endif
+6 -3
View File
@@ -27,7 +27,11 @@ See the full license in the file "LICENSE" in the top level distribution
directory
*************************************************************************************/
/* END LEGAL */
#include <Grid/Grid.h>
#include "disable_examples_without_instantiations.h"
#ifdef ENABLE_FERMION_INSTANTIATIONS
#include<Grid/Grid.h>
int main(int argc, char **argv) {
using namespace Grid;
@@ -195,5 +199,4 @@ int main(int argc, char **argv) {
Grid_finalize();
} // main
#endif
+6 -3
View File
@@ -28,7 +28,11 @@ See the full license in the file "LICENSE" in the top level distribution
directory
*************************************************************************************/
/* END LEGAL */
#include <Grid/Grid.h>
#include "disable_examples_without_instantiations.h"
#ifdef ENABLE_FERMION_INSTANTIATIONS
#include<Grid/Grid.h>
#ifdef GRID_DEFAULT_PRECISION_DOUBLE
#define MIXED_PRECISION
@@ -449,5 +453,4 @@ int main(int argc, char **argv) {
Grid_finalize();
} // main
#endif
+6 -3
View File
@@ -28,7 +28,11 @@ See the full license in the file "LICENSE" in the top level distribution
directory
*************************************************************************************/
/* END LEGAL */
#include <Grid/Grid.h>
#include "disable_examples_without_instantiations.h"
#ifdef ENABLE_FERMION_INSTANTIATIONS
#include<Grid/Grid.h>
#ifdef GRID_DEFAULT_PRECISION_DOUBLE
#define MIXED_PRECISION
@@ -442,5 +446,4 @@ int main(int argc, char **argv) {
Grid_finalize();
} // main
#endif
+7 -1
View File
@@ -28,7 +28,11 @@ See the full license in the file "LICENSE" in the top level distribution
directory
*************************************************************************************/
/* END LEGAL */
#include <Grid/Grid.h>
#include "disable_examples_without_instantiations.h"
#ifdef ENABLE_FERMION_INSTANTIATIONS
#include<Grid/Grid.h>
using namespace Grid;
@@ -918,3 +922,5 @@ int main(int argc, char **argv) {
return 0;
#endif
} // main
#endif
+7 -1
View File
@@ -28,7 +28,11 @@ See the full license in the file "LICENSE" in the top level distribution
directory
*************************************************************************************/
/* END LEGAL */
#include <Grid/Grid.h>
#include "disable_examples_without_instantiations.h"
#ifdef ENABLE_FERMION_INSTANTIATIONS
#include<Grid/Grid.h>
using namespace Grid;
@@ -873,3 +877,5 @@ int main(int argc, char **argv) {
return 0;
#endif
} // main
#endif
+6 -3
View File
@@ -27,7 +27,11 @@ See the full license in the file "LICENSE" in the top level distribution
directory
*************************************************************************************/
/* END LEGAL */
#include <Grid/Grid.h>
#include "disable_examples_without_instantiations.h"
#ifdef ENABLE_FERMION_INSTANTIATIONS
#include<Grid/Grid.h>
int main(int argc, char **argv) {
using namespace Grid;
@@ -193,5 +197,4 @@ int main(int argc, char **argv) {
Grid_finalize();
} // main
#endif
+6 -3
View File
@@ -27,7 +27,11 @@ See the full license in the file "LICENSE" in the top level distribution
directory
*************************************************************************************/
/* END LEGAL */
#include <Grid/Grid.h>
#include "disable_examples_without_instantiations.h"
#ifdef ENABLE_FERMION_INSTANTIATIONS
#include<Grid/Grid.h>
NAMESPACE_BEGIN(Grid);
@@ -512,5 +516,4 @@ int main(int argc, char **argv) {
Grid_finalize();
} // main
#endif
+6 -3
View File
@@ -27,7 +27,11 @@ See the full license in the file "LICENSE" in the top level distribution
directory
*************************************************************************************/
/* END LEGAL */
#include <Grid/Grid.h>
#include "disable_examples_without_instantiations.h"
#ifdef ENABLE_FERMION_INSTANTIATIONS
#include<Grid/Grid.h>
int main(int argc, char **argv) {
using namespace Grid;
@@ -345,5 +349,4 @@ int main(int argc, char **argv) {
Grid_finalize();
} // main
#endif
+6 -3
View File
@@ -27,7 +27,11 @@ See the full license in the file "LICENSE" in the top level distribution
directory
*************************************************************************************/
/* END LEGAL */
#include <Grid/Grid.h>
#include "disable_examples_without_instantiations.h"
#ifdef ENABLE_FERMION_INSTANTIATIONS
#include<Grid/Grid.h>
NAMESPACE_BEGIN(Grid);
@@ -516,5 +520,4 @@ int main(int argc, char **argv) {
Grid_finalize();
} // main
#endif
+6 -3
View File
@@ -27,7 +27,11 @@ See the full license in the file "LICENSE" in the top level distribution
directory
*************************************************************************************/
/* END LEGAL */
#include <Grid/Grid.h>
#include "disable_examples_without_instantiations.h"
#ifdef ENABLE_FERMION_INSTANTIATIONS
#include<Grid/Grid.h>
NAMESPACE_BEGIN(Grid);
@@ -567,5 +571,4 @@ int main(int argc, char **argv) {
Grid_finalize();
} // main
#endif
+6 -3
View File
@@ -27,7 +27,11 @@ See the full license in the file "LICENSE" in the top level distribution
directory
*************************************************************************************/
/* END LEGAL */
#include <Grid/Grid.h>
#include "disable_examples_without_instantiations.h"
#ifdef ENABLE_FERMION_INSTANTIATIONS
#include<Grid/Grid.h>
int main(int argc, char **argv) {
using namespace Grid;
@@ -263,5 +267,4 @@ int main(int argc, char **argv) {
Grid_finalize();
} // main
#endif
+6 -3
View File
@@ -27,7 +27,11 @@ See the full license in the file "LICENSE" in the top level distribution
directory
*************************************************************************************/
/* END LEGAL */
#include <Grid/Grid.h>
#include "disable_examples_without_instantiations.h"
#ifdef ENABLE_FERMION_INSTANTIATIONS
#include<Grid/Grid.h>
int main(int argc, char **argv) {
using namespace Grid;
@@ -417,5 +421,4 @@ int main(int argc, char **argv) {
Grid_finalize();
} // main
#endif
+6 -3
View File
@@ -27,7 +27,11 @@ See the full license in the file "LICENSE" in the top level distribution
directory
*************************************************************************************/
/* END LEGAL */
#include <Grid/Grid.h>
#include "disable_examples_without_instantiations.h"
#ifdef ENABLE_FERMION_INSTANTIATIONS
#include<Grid/Grid.h>
NAMESPACE_BEGIN(Grid);
@@ -452,5 +456,4 @@ int main(int argc, char **argv) {
Grid_finalize();
} // main
#endif
+6 -3
View File
@@ -27,7 +27,11 @@ See the full license in the file "LICENSE" in the top level distribution
directory
*************************************************************************************/
/* END LEGAL */
#include <Grid/Grid.h>
#include "disable_examples_without_instantiations.h"
#ifdef ENABLE_FERMION_INSTANTIATIONS
#include<Grid/Grid.h>
NAMESPACE_BEGIN(Grid);
@@ -462,5 +466,4 @@ int main(int argc, char **argv) {
Grid_finalize();
} // main
#endif
+6 -3
View File
@@ -27,7 +27,11 @@ See the full license in the file "LICENSE" in the top level distribution
directory
*************************************************************************************/
/* END LEGAL */
#include <Grid/Grid.h>
#include "disable_examples_without_instantiations.h"
#ifdef ENABLE_FERMION_INSTANTIATIONS
#include<Grid/Grid.h>
@@ -264,5 +268,4 @@ int main(int argc, char **argv) {
Grid_finalize();
} // main
#endif
@@ -0,0 +1,16 @@
#include <Grid/Grid.h>
#pragma once
#ifndef ENABLE_FERMION_INSTANTIATIONS
#include <iostream>
int main(void) {
std::cout << "This build of Grid was configured to exclude fermion instantiations, "
<< "which this example relies on. "
<< "Please reconfigure and rebuild Grid with --enable-fermion-instantiations"
<< "to run this example."
<< std::endl;
return 1;
}
#endif
+5
View File
@@ -26,6 +26,9 @@ Author: paboyle <paboyle@ph.ed.ac.uk>
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#include "disable_benchmarks_without_instantiations.h"
#ifdef ENABLE_FERMION_INSTANTIATIONS
#include <Grid/Grid.h>
using namespace Grid;
@@ -731,3 +734,5 @@ int main (int argc, char ** argv)
Grid_finalize();
}
#endif
+4
View File
@@ -20,6 +20,9 @@
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#include "disable_benchmarks_without_instantiations.h"
#ifdef ENABLE_FERMION_INSTANTIATIONS
#include <Grid/Grid.h>
#ifdef GRID_CUDA
#define CUDA_PROFILE
@@ -439,3 +442,4 @@ void Benchmark(int Ls, Coordinate Dirichlet,bool sloppy)
GRID_ASSERT(norm2(src_e)<1.0e-4);
GRID_ASSERT(norm2(src_o)<1.0e-4);
}
#endif
+6
View File
@@ -20,6 +20,10 @@
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#include "disable_benchmarks_without_instantiations.h"
#ifdef ENABLE_FERMION_INSTANTIATIONS
#include <Grid/Grid.h>
#ifdef GRID_CUDA
#define CUDA_PROFILE
@@ -439,3 +443,5 @@ void Benchmark(int Ls, Coordinate Dirichlet,bool sloppy)
GRID_ASSERT(norm2(src_e)<1.0e-4);
GRID_ASSERT(norm2(src_o)<1.0e-4);
}
#endif
@@ -20,6 +20,9 @@
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#include "disable_benchmarks_without_instantiations.h"
#ifdef ENABLE_FERMION_INSTANTIATIONS
#include <Grid/Grid.h>
#ifdef GRID_CUDA
#define CUDA_PROFILE
@@ -385,3 +388,5 @@ int main (int argc, char ** argv)
Grid_finalize();
exit(0);
}
#endif
+4 -2
View File
@@ -26,6 +26,9 @@ Author: paboyle <paboyle@ph.ed.ac.uk>
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#include "disable_benchmarks_without_instantiations.h"
#ifdef ENABLE_FERMION_INSTANTIATIONS
#include <Grid/Grid.h>
using namespace std;
@@ -238,5 +241,4 @@ void benchDw(std::vector<int> & latt4, int Ls, int threads,int report )
}
}
#endif
+5
View File
@@ -1,3 +1,7 @@
#include "disable_benchmarks_without_instantiations.h"
#ifdef ENABLE_FERMION_INSTANTIATIONS
#include <Grid/Grid.h>
#include <sstream>
using namespace std;
@@ -155,3 +159,4 @@ int main (int argc, char ** argv)
Grid_finalize();
}
#endif
+5
View File
@@ -20,6 +20,9 @@
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#include "disable_benchmarks_without_instantiations.h"
#ifdef ENABLE_FERMION_INSTANTIATIONS
#include <Grid/Grid.h>
#ifdef GRID_CUDA
#define CUDA_PROFILE
@@ -129,3 +132,5 @@ int main (int argc, char ** argv)
Grid_finalize();
exit(0);
}
#endif
+5
View File
@@ -26,6 +26,9 @@ Author: paboyle <paboyle@ph.ed.ac.uk>
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#include "disable_benchmarks_without_instantiations.h"
#ifdef ENABLE_FERMION_INSTANTIATIONS
#include <Grid/Grid.h>
using namespace std;
@@ -149,3 +152,5 @@ int main (int argc, char ** argv)
Grid_finalize();
}
#endif
+4 -2
View File
@@ -26,6 +26,9 @@ Author: paboyle <paboyle@ph.ed.ac.uk>
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#include "disable_benchmarks_without_instantiations.h"
#ifdef ENABLE_FERMION_INSTANTIATIONS
#include <Grid/Grid.h>
using namespace std;
@@ -172,5 +175,4 @@ void benchDw(std::vector<int> & latt4, int Ls)
// Dw.Report();
}
#endif
+5
View File
@@ -26,6 +26,9 @@ Author: paboyle <paboyle@ph.ed.ac.uk>
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#include "disable_benchmarks_without_instantiations.h"
#ifdef ENABLE_FERMION_INSTANTIATIONS
#include <Grid/Grid.h>
using namespace std;
@@ -110,3 +113,5 @@ int main (int argc, char ** argv)
Grid_finalize();
}
#endif
+5
View File
@@ -26,6 +26,9 @@ Author: paboyle <paboyle@ph.ed.ac.uk>
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#include "disable_benchmarks_without_instantiations.h"
#ifdef ENABLE_FERMION_INSTANTIATIONS
#include <Grid/Grid.h>
using namespace std;
@@ -112,3 +115,5 @@ int main (int argc, char ** argv)
Grid_finalize();
}
#endif
+6
View File
@@ -26,6 +26,10 @@ Author: paboyle <paboyle@ph.ed.ac.uk>
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#include "disable_benchmarks_without_instantiations.h"
#ifdef ENABLE_FERMION_INSTANTIATIONS
#include <Grid/Grid.h>
#include <Grid/algorithms/blas/BatchedBlas.h>
@@ -978,3 +982,5 @@ int main (int argc, char ** argv)
Grid_finalize();
fclose(FP);
}
#endif
+5
View File
@@ -26,6 +26,9 @@ Author: paboyle <paboyle@ph.ed.ac.uk>
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#include "disable_benchmarks_without_instantiations.h"
#ifdef ENABLE_FERMION_INSTANTIATIONS
#include <Grid/Grid.h>
using namespace std;
@@ -258,3 +261,5 @@ int main (int argc, char ** argv)
Grid_finalize();
}
#endif
+5
View File
@@ -19,6 +19,9 @@ Author: Richard Rollins <rprollins@users.noreply.github.com>
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#include "disable_benchmarks_without_instantiations.h"
#ifdef ENABLE_FERMION_INSTANTIATIONS
#include <Grid/Grid.h>
using namespace std;
@@ -161,3 +164,5 @@ void bench_wilson_eo (
double flops = (single_site_flops * volume * ncall)/2.0;
std::cout << flops/(t1-t0) << "\t\t";
}
#endif
@@ -0,0 +1,16 @@
#include <Grid/Grid.h>
#pragma once
#ifndef ENABLE_FERMION_INSTANTIATIONS
#include <iostream>
int main(void) {
std::cout << "This build of Grid was configured to exclude fermion instantiations, "
<< "which this benchmark relies on. "
<< "Please reconfigure and rebuild Grid with --enable-fermion-instantiations"
<< "to run this benchmark."
<< std::endl;
return 1;
}
#endif
+9
View File
@@ -172,6 +172,12 @@ case ${ac_TRACING} in
esac
############### fermions
AC_ARG_ENABLE([fermion-instantiations],
[AS_HELP_STRING([--enable-fermion-instantiations=yes|no],[enable fermion instantiations])],
[ac_FERMION_REPS=${enable_fermion_instantiations}], [ac_FERMION_INSTANTIATIONS=yes])
AM_CONDITIONAL(BUILD_FERMION_INSTANTIATIONS, [ test "${ac_FERMION_INSTANTIATIONS}X" == "yesX" ])
AC_ARG_ENABLE([fermion-reps],
[AS_HELP_STRING([--enable-fermion-reps=yes|no],[enable extra fermion representation support])],
[ac_FERMION_REPS=${enable_fermion_reps}], [ac_FERMION_REPS=yes])
@@ -194,6 +200,9 @@ AM_CONDITIONAL(BUILD_ZMOBIUS, [ test "${ac_ZMOBIUS}X" == "yesX" ])
case ${ac_FERMION_REPS} in
yes) AC_DEFINE([ENABLE_FERMION_REPS],[1],[non QCD fermion reps]);;
esac
case ${ac_FERMION_INSTANTIATIONS} in
yes) AC_DEFINE([ENABLE_FERMION_INSTANTIATIONS],[1],[enable fermions]);;
esac
case ${ac_GPARITY} in
yes) AC_DEFINE([ENABLE_GPARITY],[1],[fermion actions with GPARITY BCs]);;
esac
+4 -2
View File
@@ -3,6 +3,9 @@
* without regression / tests being applied
*/
#include "disable_examples_without_instantiations.h"
#ifdef ENABLE_FERMION_INSTANTIATIONS
#include <Grid/Grid.h>
using namespace std;
@@ -310,5 +313,4 @@ int main (int argc, char ** argv)
Grid_finalize();
}
#endif
+4 -2
View File
@@ -3,6 +3,9 @@
* without regression / tests being applied
*/
#include "disable_examples_without_instantiations.h"
#ifdef ENABLE_FERMION_INSTANTIATIONS
#include <Grid/Grid.h>
using namespace std;
@@ -432,5 +435,4 @@ int main (int argc, char ** argv)
Grid_finalize();
}
#endif
+4 -2
View File
@@ -3,6 +3,9 @@
* without regression / tests being applied
*/
#include "disable_examples_without_instantiations.h"
#ifdef ENABLE_FERMION_INSTANTIATIONS
#include <Grid/Grid.h>
using namespace std;
@@ -535,5 +538,4 @@ int main (int argc, char ** argv)
Grid_finalize();
}
#endif
+4 -2
View File
@@ -3,6 +3,9 @@
* without regression / tests being applied
*/
#include "disable_examples_without_instantiations.h"
#ifdef ENABLE_FERMION_INSTANTIATIONS
#include <Grid/Grid.h>
using namespace std;
@@ -429,5 +432,4 @@ int main (int argc, char ** argv)
Grid_finalize();
}
#endif
@@ -0,0 +1,15 @@
#include <Grid/Grid.h>
#pragma once
#ifndef ENABLE_FERMION_INSTANTIATIONS
#include <iostream>
int main(void) {
std::cout << "This build of Grid was configured to exclude fermion instantiations, "
<< "which this example relies on. "
<< "Please reconfigure and rebuild Grid with --enable-fermion-instantiations"
<< "to run this example."
<< std::endl;
return 1;
}
#endif
+196
View File
@@ -0,0 +1,196 @@
---
name: communication-overlap
description: Design and implement communication/computation overlap pipelines for GPU+MPI codes — per-packet event tracking, host-staging through pinned memory, internode/intranode bandwidth separation, and the 7-phase pipeline pattern that replaces broken accelerator-aware MPI paths.
user-invocable: true
allowed-tools:
- Read
- Bash(grep -r)
---
# Communication/Computation Overlap Pipeline Design
## Why GPU-Direct MPI Is Often Not the Right Default
GPU-direct RDMA (passing GPU buffer pointers directly to MPI) is appealing because it eliminates explicit D2H/H2D copies. In practice on several leadership systems:
- **Bandwidth**: RDMA at 30% of wirespeed has been observed on Pontevecchio/Aurora. The overhead of staging through pinned host memory can be *lower* total latency than slow RDMA.
- **Correctness**: Device buffer aliasing in `MPI_Sendrecv` (see `mpi-heterogeneous.md`) makes direct GPU-to-GPU transfer unreliable.
- **Overlap**: Host-staging enables fine-grained overlap — each packet's D2H can be issued as a separate asynchronous event, and the corresponding MPI send can fire as soon as *that packet* arrives in host memory, not after all packets are ready.
The pipeline pattern below was developed to replace broken MPICH accelerator-aware paths. It achieves genuine computation/communication overlap by tracking per-packet GPU events.
## The 7-Phase Pipeline
Given a set of halo exchange operations (each identified by a `packet_index`):
### Phase 0: Prepare data on device
Pack halo data into contiguous GPU buffers. One buffer per direction/neighbour.
### Phase 1: Post receives + start D2H
Post all `MPI_Irecv` calls immediately (into pinned host buffers). Simultaneously, start asynchronous D2H copies for all send buffers:
```cpp
for (auto &pkt : send_packets) {
MPI_Irecv(pkt.host_recv_buf, pkt.bytes, MPI_BYTE,
pkt.src_rank, pkt.tag, comm, &pkt.recv_req);
acceleratorCopyFromDeviceAsync(pkt.device_send_buf,
pkt.host_send_buf,
pkt.bytes, &pkt.d2h_event);
}
```
The key: `pkt.d2h_event` is a per-packet GPU event (e.g. `cudaEvent_t`, `hipEvent_t`, or SYCL event). We can poll individual packet completion rather than waiting for all.
### Phase 2: Fire sends as D2H completes (packet by packet)
Poll packet D2H events. As each packet becomes ready in host memory, immediately fire the corresponding `MPI_Isend`. Also start intranode D2D copies at this point — these are deferred until now to avoid competing with the internode D2H on PCIe bandwidth:
```cpp
bool all_sent = false;
while (!all_sent) {
all_sent = true;
for (auto &pkt : send_packets) {
if (!pkt.sent && acceleratorEventIsComplete(pkt.d2h_event)) {
MPI_Isend(pkt.host_send_buf, pkt.bytes, MPI_BYTE,
pkt.dst_rank, pkt.tag, comm, &pkt.send_req);
pkt.sent = true;
start_intranode_copy(pkt); // now safe, D2H is done
}
if (!pkt.sent) all_sent = false;
}
}
```
### Phase 3: Poll receives + start H2D as each arrives
`MPI_Test` individual receive requests. As each completes, immediately start the H2D copy into device-resident halo buffer:
```cpp
bool all_recvd = false;
while (!all_recvd) {
all_recvd = true;
for (auto &pkt : recv_packets) {
if (!pkt.h2d_started) {
int flag = 0;
MPI_Test(&pkt.recv_req, &flag, MPI_STATUS_IGNORE);
if (flag) {
acceleratorCopyToDeviceAsync(pkt.host_recv_buf,
pkt.device_recv_buf,
pkt.bytes, &pkt.h2d_event);
pkt.h2d_started = true;
}
}
if (!pkt.h2d_started) all_recvd = false;
}
}
```
### Phase 4: Wait for all sends
```cpp
std::vector<MPI_Request> send_reqs;
for (auto &pkt : send_packets) send_reqs.push_back(pkt.send_req);
MPI_Waitall(send_reqs.size(), send_reqs.data(), MPI_STATUSES_IGNORE);
```
### Phase 5: Wait for all H2D copies
```cpp
for (auto &pkt : recv_packets) acceleratorEventWait(pkt.h2d_event);
```
### Phase 6: Run interior computation
The interior (non-halo) computation can run from Phase 1 onwards, overlapped with all of the above:
```cpp
// Launched in Phase 1, runs in parallel with the pipeline
accelerator_for(ss, interior_sites, ...) { compute_interior(ss); }
```
Synchronise with interior before using the full field:
```cpp
accelerator_barrier(); // interior kernel done
// Halo H2D is also complete (Phase 5 above)
// Now safe to use full field
```
## Per-Packet Event Tracking Data Structure
```cpp
struct Packet {
// Buffers
void *device_send_buf;
void *host_send_buf; // pinned
void *device_recv_buf;
void *host_recv_buf; // pinned
size_t bytes;
// MPI
int src_rank, dst_rank, tag;
MPI_Request send_req, recv_req;
// GPU events (one per packet, not one global barrier)
AcceleratorEvent d2h_event;
AcceleratorEvent h2d_event;
// State flags
bool sent = false;
bool h2d_started = false;
};
```
The critical design point: `d2h_event` and `h2d_event` are **per-packet**, not global. This allows the MPI send for packet 0 to fire while packet 1's D2H is still in progress.
## Internode vs Intranode Separation
PCIe (GPU-to-CPU) and NVLink/xGMI (GPU-to-GPU within a node) are separate bandwidth resources. They do not compete with each other, but they *do* compete with each other for transactions if both are active simultaneously.
Strategy: complete all internode D2H copies first (to maximise NIC injection bandwidth), then start intranode D2D copies (which use NVLink/xGMI and do not contend with PCIe for internode traffic):
```cpp
// In Phase 2: start intranode D2D only after D2H is confirmed complete
if (pkt.is_intranode && pkt.d2h_done) {
// Use peer access (cudaMemcpyPeerAsync / hipMemcpyPeerAsync)
// rather than staging through host for intranode
cudaMemcpyPeerAsync(pkt.peer_recv_buf, pkt.dst_device,
pkt.device_send_buf, pkt.src_device,
pkt.bytes, computeStream);
}
```
Grid reference: `Grid/communicator/Communicator_mpi3.cc` — search for `NVLINK_GET` and `ACCELERATOR_AWARE_MPI` conditional blocks.
## Pinned Memory Allocation
All host staging buffers must be pinned (page-locked) for async D2H/H2D:
```cpp
// CUDA
cudaMallocHost(&host_buf, bytes);
cudaFreeHost(host_buf);
// HIP
hipHostMalloc(&host_buf, bytes, hipHostMallocDefault);
hipHostFree(host_buf);
// SYCL
host_buf = sycl::malloc_host(bytes, *queue);
sycl::free(host_buf, *queue);
```
Pre-allocate at startup. Repeated `cudaMallocHost` in the hot path adds latency from the OS memory manager.
## Checksumming in the Pipeline
Insert checksum computation before D2H (on the GPU-resident data) and verification after H2D (on the received GPU-resident data). See `correctness-verification.md` for the checksum pattern. The salting (`packet_index + 1000 * tag`) detects packet transposition — critical for diagnosing MPI buffer aliasing bugs where two packets' contents are swapped.
## Smoke Test for a New System
Before running physics, validate the pipeline on a synthetic benchmark:
```cpp
// Send a buffer of known values, receive and check
// Run at multiple message sizes: 4KB, 64KB, 1MB, 16MB
// Run at multiple process counts: 2, 8, 64, 512
// Verify checksums on every packet
// Measure bandwidth: should be ≥ 80% of FDR/HDR/NDR peak for host-staged
```
Any bandwidth below 50% of theoretical, or any checksum failure, indicates a problem in the communication stack that must be resolved before production runs.
+154
View File
@@ -0,0 +1,154 @@
---
name: compiler-validation
description: Identify GPU compiler code generation bugs, distinguish them from hardware and runtime bugs, construct minimal reproducers, and validate correctness of generated assembly for performance-critical HPC kernels.
user-invocable: true
allowed-tools:
- Read
- Bash(grep -r)
- Bash(objdump)
---
# Compiler Validation for GPU HPC Codes
## Why Compiler Bugs Are Distinct
Compiler bugs have a unique diagnostic signature: they produce *deterministically wrong* results. The same input always produces the same wrong output. This distinguishes them from:
- Hardware bugs: usually stochastic (wrong answer sometimes, correct answer other times)
- Runtime bugs (premature barrier, buffer aliasing): often stochastic or history-dependent
- Race conditions: non-deterministic
**The determinism test**: run the same kernel 100 times with the same input. If the wrong answer is always the same wrong answer, suspect the compiler.
## The Minimal Reproducer Protocol
When a kernel produces wrong results, isolate the compiler as quickly as possible:
**Step 1: Eliminate the physics**. Reduce the failing kernel to the smallest possible computation that still exhibits the bug. Replace QCD fields with `double` arrays. Replace lattice operations with scalar arithmetic. The goal is a 20-line CUDA/HIP/SYCL file that any compiler engineer can compile and run.
**Step 2: Binary search over optimisation levels**. Compile at `-O0` (or equivalent). If the answer becomes correct, the bug is in an optimisation pass. Then test `-O1`, `-O2`, `-O3` individually to find which optimisation level introduces the bug.
```bash
# HIP example
hipcc -O0 minimal_repro.cc -o test_O0 && ./test_O0 # should be correct
hipcc -O1 minimal_repro.cc -o test_O1 && ./test_O1 # compare
hipcc -O2 minimal_repro.cc -o test_O2 && ./test_O2 # compare
```
**Step 3: Identify the optimisation pass**. For LLVM-based compilers (clang, hipcc, dpcpp, nvcc via ptxas):
```bash
# Disable individual optimisation passes:
hipcc -O2 -mllvm -disable-loop-unrolling minimal_repro.cc -o test
hipcc -O2 -fno-vectorize minimal_repro.cc -o test
hipcc -O2 -fno-slp-vectorize minimal_repro.cc -o test
```
**Step 4: Inspect the generated code**. For CUDA/HIP, use `--generate-line-info` and `cuobjdump` or `roc-obj-extract` to get annotated assembly:
```bash
# CUDA
nvcc -O2 --generate-line-info --keep minimal_repro.cu
cuobjdump --dump-ptx minimal_repro.o
# HIP/ROCm
hipcc -O2 --save-temps minimal_repro.cc
llvm-objdump -d minimal_repro.o
# SYCL/DPC++
icpx -O2 -fsycl -Xclang -ast-dump minimal_repro.cc 2>&1 | grep -A5 "suspicious_expr"
```
Look for: incorrect register spill/fill sequences, loop trip count miscalculation, vectorisation across iteration boundaries, incorrect address arithmetic.
## Known Compiler Bug Patterns in GPU Code
### Register Pressure / Spill Bugs
High register usage forces spills to local memory. Some compiler versions generate incorrect spill/fill code — the value is written to local memory but a stale register value is read back instead of the spilled value.
**Signature**: Wrong answer with high-register-count kernels; becomes correct when `--maxrregcount=N` forces lower register count (more spilling) or higher (`--maxrregcount=256`, fewer spills).
**Diagnostic**: Check register usage:
```bash
nvcc -O2 --ptxas-options=-v minimal_repro.cu 2>&1 | grep "registers"
hipcc -O2 --offload-arch=gfx90a --save-temps minimal_repro.cc
llvm-mc --arch=amdgcn minimal_repro.s 2>&1 | grep "VGPRs"
```
### Vectorisation Across Loop Boundaries
The compiler vectorises two successive loop iterations as a SIMD unit when they have a data dependency that the compiler has incorrectly determined does not exist.
**Signature**: Wrong answer that becomes correct when the loop body is extracted to a non-inlined function (disabling auto-vectorisation across iterations).
### Incorrect Constant Propagation
The compiler evaluates a compile-time expression incorrectly, substituting a wrong constant. Common in template-heavy code where `sizeof(T)` or `alignof(T)` is used in arithmetic that the compiler folds at compile time.
**Signature**: Wrong array index or wrong stride. Inspecting the generated assembly shows a literal constant where you expect a computed value.
## Stress Patterns for Compiler Validation
These patterns exercise the compiler in ways that commonly expose bugs:
```cpp
// 1. Aliased pointer write followed by immediate read
// (tests correct handling of write-after-write in register allocation)
__global__ void alias_stress(double *a, double *b, int n) {
int i = blockIdx.x * blockDim.x + threadIdx.x;
if (i < n) {
a[i] = a[i] * 2.0;
b[i] = a[i] + 1.0; // must read the updated value, not the original
}
}
// 2. Mixed-precision accumulation
// (tests correct type promotion in FMA sequences)
__global__ void precision_stress(float *in, double *out, int n) {
double acc = 0.0;
for (int i = 0; i < n; i++) acc += (double)in[i];
*out = acc;
}
// 3. Large struct in shared memory
// (tests alignment and offset calculation for non-power-of-2-sized objects)
struct S { double x[3]; }; // sizeof = 24 bytes, not a power of 2
__global__ void struct_stress(S *in, S *out, int n) {
extern __shared__ S smem[];
int tid = threadIdx.x;
smem[tid] = in[tid];
__syncthreads();
out[tid] = smem[(tid + 1) % blockDim.x];
}
```
## Separating Compiler from Runtime/Hardware
When results are deterministically wrong:
| Test | Compiler bug | Runtime/hardware bug |
|---|---|---|
| Recompile at -O0 | Fixes it | No effect |
| Run on CPU (host code equivalent) | Fixes it | No effect |
| Reorder loop iterations | Changes wrong answer | No effect or different pattern |
| Different compiler version | Fixes or changes wrong answer | No effect |
| Different GPU of same model | Same wrong answer | Different or no error |
| Different GPU model | Fixes it (ISA-specific codegen bug) | May or may not fix |
## Reporting to Compiler Teams
A compiler bug report needs:
1. Minimal reproducer (< 50 lines)
2. Compiler version (`hipcc --version`, `nvcc --version`, `icpx --version`)
3. GPU model and driver version
4. Exact wrong and correct answers (hexfloat for reproducibility)
5. Which compile flags change the behaviour
6. Generated assembly for the correct and incorrect variants
File with: LLVM Bugzilla (for hipcc/clang/dpcpp backends), NVIDIA bug portal (nvcc/ptxas), or vendor-specific developer forum. The minimal reproducer is the single most important element — without it, compiler teams cannot prioritise.
## Pragmatic In-Production Workaround
When a compiler bug is confirmed but the fix is not yet available, the lowest-risk workaround is to mark the affected function with reduced optimisation:
```cpp
#pragma clang optimize off // clang/hipcc/dpcpp
void __attribute__((optimize("O0"))) affected_kernel_host_wrapper() { ... }
// For device code, use per-file compilation flags via CMake/Makefile
```
Document the workaround with a comment referencing the compiler bug report number so it can be removed when the compiler is updated.
+169
View File
@@ -0,0 +1,169 @@
---
name: correctness-verification
description: Implement application-level correctness verification for HPC codes on unreliable hardware — double-run pattern, deterministic reductions, per-packet checksums, and flight recorder step logging.
user-invocable: true
allowed-tools:
- Read
- Bash(grep -r)
---
# Correctness Verification Infrastructure for HPC Codes
## The Problem
Leadership computing facilities sometimes have hardware or firmware bugs below the level visible to application code. The accelerator runtime can return from `q.wait()` or `cudaDeviceSynchronize()` before work is actually complete, or silently produce wrong answers in DMA transfers. Standard testing does not catch these because they are non-deterministic and often topology-dependent (fail only at specific process counts or on specific node configurations).
The symptoms look like numerical instabilities, random MPI hangs, or wrong physics results — not like crashes. Without deliberate infrastructure, diagnosing root cause takes months.
## The Double-Run Pattern
The most reliable correctness check for non-deterministic hardware bugs is to run every computation twice and compare bit-identical fingerprints.
**Key constraint**: the second run must use a *deterministic* code path. Non-deterministic floating-point ordering (e.g. from MPI_Allreduce with different reduction trees on retry) produces false mismatches. See `mpi-heterogeneous.md` for how to make reductions deterministic.
```cpp
// Pseudocode: double-run a step and compare CRC fingerprints
void run_step_verified(State &state) {
state.save_checkpoint();
uint64_t crc_a = run_step_and_fingerprint(state);
state.restore_checkpoint();
uint64_t crc_b = run_step_and_fingerprint(state);
if (crc_a != crc_b) {
report_mismatch("step", crc_a, crc_b);
// Policy: abort, retry from checkpoint, or continue with alarm
}
}
```
**Fingerprinting**: XOR-fold a CRC32 over all floating-point data after each step. XOR is order-independent, so it works across distributed nodes without communication. For field data:
```cpp
uint64_t fingerprint(const double *data, size_t n) {
uint64_t acc = 0;
for (size_t i = 0; i < n; i++) {
uint64_t bits;
memcpy(&bits, &data[i], sizeof(bits));
acc ^= crc32(bits);
}
return acc;
}
```
On GPU, compute the XOR reduction on-device (avoids D2H transfer of the full field):
```cpp
// SYCL
uint64_t svm_xor(uint64_t *vec, uint64_t L) {
uint64_t ret = 0;
{ sycl::buffer<uint64_t,1> abuff(&ret, {1});
theGridAccelerator->submit([&](sycl::handler &cgh) {
auto R = sycl::reduction(abuff, cgh, uint64_t(0), std::bit_xor<>());
cgh.parallel_for(sycl::range<1>{L}, R,
[=](sycl::id<1> i, auto &sum) { sum ^= vec[i]; });
}); }
theGridAccelerator->wait();
return ret;
}
```
## Per-Packet Communication Checksums
Silent data corruption in MPI buffers (documented in MPICH with device-resident buffers; see `mpi-heterogeneous.md`) requires per-packet verification, not just end-to-end. The pattern:
1. Before packing a send buffer, compute a GPU-side checksum of the payload.
2. Append the checksum to the host staging buffer alongside the data.
3. After receiving and copying to device, recompute the checksum on-device and compare.
Salt each checksum with `packet_index + 1000 * mpi_tag` to detect transposition (packet A landing in packet B's slot):
```cpp
uint64_t salt = (uint64_t)packet_index + 1000ULL * mpi_tag;
checksum_send = checksum_gpu(payload_gpu, payload_words) ^ salt;
// ... transmit payload + checksum_send ...
checksum_recv = checksum_gpu(payload_gpu_recv, payload_words) ^ salt;
assert(checksum_recv == checksum_send);
```
Grid reference: `Grid/communicator/Communicator_mpi3.cc`, `#ifdef GRID_CHECKSUM_COMMS`.
## Flight Recorder: Step-Level Logging
Maintain a monotonic counter that names the current operation. On a hang, this is the only way to know *which* operation the process is stuck in without a debugger.
```cpp
struct FlightRecorder {
std::atomic<uint64_t> step_counter{0};
const char *step_name = "init";
void step_log(const char *name) {
step_name = name;
step_counter.fetch_add(1, std::memory_order_relaxed);
}
};
extern FlightRecorder gRecorder;
```
In Record mode, also store floating-point norms and communication checksums to vectors. In Verify mode, compare against stored values:
```cpp
void norm_log(double val) {
if (mode == Record) norm_log_vec.push_back(val);
if (mode == Verify) {
double expected = norm_log_vec[norm_counter];
if (val != expected) { // bit-exact for deterministic paths
std::cerr << "MISMATCH at step " << step_counter
<< " (" << step_name << "): "
<< std::hexfloat << val << " vs " << expected << "\n";
print_backtrace();
}
norm_counter++;
}
}
```
Grid reference: `Grid/util/FlightRecorder.h`, `Grid/util/FlightRecorder.cc`.
## Signal Handler for Hang Detection
Install a SIGHUP handler that dumps the current flight recorder state. This is async-safe only if the handler writes to a pre-allocated buffer using `write()` (not `printf`):
```cpp
static char hang_buf[4096];
static void sighup_handler(int) {
int n = snprintf(hang_buf, sizeof(hang_buf),
"rank=%d step=%llu name=%s\n",
mpi_rank,
(unsigned long long)gRecorder.step_counter.load(),
gRecorder.step_name);
write(STDERR_FILENO, hang_buf, n);
// Optional: call backtrace_symbols_fd (async-safe on Linux)
void *frames[64];
int depth = backtrace(frames, 64);
backtrace_symbols_fd(frames, depth, STDERR_FILENO);
}
// In main():
signal(SIGHUP, sighup_handler);
```
To diagnose a hang across all ranks: `kill -HUP $(pgrep my_app)` or via job scheduler.
## What to Verify at Each Step
| Data type | Fingerprint method | Frequency |
|---|---|---|
| Lattice fields | XOR of CRC32 over float64 words | Every algorithmic step |
| Communication buffers | GPU XOR reduction, salted | Every MPI operation |
| Scalar reductions | Bit-exact match of double | Every GlobalSum |
| Iteration counters | Exact integer match | Every solver iteration |
## When to Abort vs Continue
- **Abort immediately**: communication checksum mismatch (data is corrupt, continuing will silently propagate errors).
- **Log and continue**: norm mismatch in Verify mode if you need to map out which operations are unreliable.
- **Retry from checkpoint**: double-run mismatch when the underlying bug is non-deterministic (second retry will usually pass).
Track the mismatch rate over a production run. A rate above ~1/1000 steps indicates a systemic hardware issue that should be escalated to the facility.
+101
View File
@@ -0,0 +1,101 @@
---
name: gpu-runtime-correctness
description: Detect and work around GPU runtime correctness failures — premature completion signalling, infinite poll hangs, stale completion flags, and the double-wait diagnostic pattern. Covers CUDA, HIP/ROCm, and SYCL/Level Zero runtimes.
user-invocable: true
allowed-tools:
- Read
- Bash(grep -r)
---
# GPU Runtime Correctness
## The Completion Signalling Problem
GPU runtimes expose a synchronisation primitive — `cudaDeviceSynchronize()`, `hipDeviceSynchronize()`, `q.wait()` — that is supposed to block until all previously submitted GPU work is complete. On several production systems, this guarantee has been violated in two distinct ways:
### Failure Mode A: Premature Return
The wait returns before the GPU work is done. The subsequent CPU code reads stale data from the output buffer. This is the most dangerous failure because it looks like a numerical instability, not a crash. Results are wrong but the program exits normally.
**Identifying Premature Return**: Insert a second, independent wait immediately after the first. If a second `q.wait()` "fixes" incorrect results that appeared with a single `q.wait()`, the first wait was returning prematurely.
```cpp
// Diagnostic version — if this stabilises results, you have premature return
accelerator_barrier(); // first wait
accelerator_barrier(); // second wait (diagnostic)
```
Production fix: submit a trivially cheap no-op kernel after the real work and wait for it. The no-op kernel cannot complete until all previous commands in the queue are done (command queue ordering guarantee), so waiting for the no-op is a stronger barrier than waiting for the queue itself:
```cpp
// Lightweight fence kernel
template<class T>
__global__ void noop_kernel(T *p) { if (threadIdx.x == 0) (void)(*p); }
void strong_barrier(T *device_ptr) {
noop_kernel<<<1, 1, 0, computeStream>>>(device_ptr);
cudaStreamSynchronize(computeStream); // wait for the no-op
}
```
### Failure Mode B: Infinite Poll
The wait enters a polling loop that never terminates. The process consumes 100% CPU in a runtime library. The GPU has either stopped signalling progress entirely, or the completion flag is in a memory region that has become incoherent.
This is distinct from Failure Mode A: with premature return the CPU proceeds; with infinite poll the CPU is stuck.
**Identifying Infinite Poll**: `top` shows the MPI rank at 100% CPU. `perf top -p PID` or `strace -p PID` shows the process burning cycles inside the GPU runtime library (e.g. `libze_intel_gpu.so`, `libamdhip64.so`).
**Documented instances**:
- Intel Level Zero on Pontevecchio (Aurora): both premature return *and* infinite poll have been observed as independent bugs on the same system.
- The two failure modes can co-exist and have overlapping symptoms at the application level.
## Completion Signalling Architecture
Understanding why these bugs happen requires knowing how completion signalling works:
```
GPU command processor
→ signals completion by writing to a host-visible memory address
→ CPU runtime polls that address (or uses OS event notification via ioctl)
```
A premature return means the memory write happened before the actual work completed (e.g. the signal is on a different command stream that has not been serialised with the work stream). An infinite poll means the memory write never happens (hardware or driver bug preventing the signal from being written).
**Implication**: `accelerator_barrier()` is not an unconditional correctness guarantee on all production systems. Application-level verification (double-run, checksums) is necessary as a second line of defence.
## The Double-Wait Pattern in Practice
The double-wait is a pragmatic workaround when premature return is suspected but not yet confirmed. It adds latency but does not change correctness if the barrier is working properly, so it is safe to enable in production:
```cpp
#ifdef WORKAROUND_PREMATURE_BARRIER
#define accelerator_barrier() do { \
real_accelerator_barrier(); \
real_accelerator_barrier(); \
} while(0)
#endif
```
Monitor whether this changes observed behaviour. If double-wait eliminates wrong answers, you have confirmed premature return. If it does not help but inserting a no-op kernel does, the issue is with the wait primitive specifically, not with the underlying completion signal.
## SYCL/Level Zero Specifics
Level Zero (the backend for Intel GPU runtimes) separates command submission from synchronisation. A `q.wait()` should wait for all previously submitted command lists to retire. Documented bugs include:
- `q.wait()` returning before the associated fence in Level Zero has been signalled.
- `q.wait()` entering an `ioctl(i915, I915_GEM_WAIT)` call that never returns (kernel driver bug, not runtime bug).
The latter requires a node reboot and cannot be worked around in application code. Detect it by checking process state (`D` in `ps aux`) and the kernel function via `/proc/PID/wchan`.
## Stream Ordering and Compute Streams
All GPU work must be submitted to the *same* stream/queue if you rely on in-order execution guarantees. Mixing default stream and non-default streams invalidates ordering assumptions on some backends.
Grid uses `computeStream` (CUDA/HIP) or `theGridAccelerator` (SYCL) consistently throughout. If mixing Grid with third-party GPU code, ensure the third-party code is directed to the same stream, or insert explicit inter-stream barriers.
## Checklist for New GPU Code
1. Every kernel launch is followed by an `accelerator_barrier()` before reading device-side output on the host.
2. All device-to-host copies use an explicit stream synchronisation after the copy, not before.
3. If results are non-deterministic across runs, insert a second barrier and observe whether reproducibility improves.
4. For correctness-critical operations (reductions that will be compared against reference values), add the double-run checksum test from `correctness-verification.md`.
5. If the process hangs at 100% CPU in a runtime library function, this is a driver/runtime bug — there is no application-level fix beyond scheduling a node reboot.
+102
View File
@@ -0,0 +1,102 @@
---
name: hang-diagnosis
description: Diagnose and isolate process hangs on HPC systems — distinguishing kernel-level ioctl hangs, infinite poll loops, collective deadlocks, and GPU completion signalling failures using async-safe signal handlers and flight recorder step counters.
user-invocable: true
allowed-tools:
- Read
- Bash(grep -r)
- Bash(strace)
- Bash(gdb)
---
# Hang Diagnosis on HPC Systems
## Taxonomy of Hangs
Not all hangs are the same. Misidentifying the type leads to wrong mitigation. The four distinct classes encountered on production leadership systems:
### 1. Kernel-level ioctl hang (never returns)
The process is in `D` (uninterruptible sleep) state. `strace` shows it blocked in an `ioctl` syscall. The GPU device driver has entered an unrecoverable state.
**Diagnosis**: `ps aux | grep D` — the process shows `D` state. `cat /proc/PID/wchan` shows `i915_gem_wait_for_error` or similar.
**Resolution**: Only a driver reload or node reboot recovers it. Log the node identifier and request replacement from the facility scheduler.
### 2. Infinite poll loop (`q.wait()` or `cudaDeviceSynchronize()` never returns)
The process is in `R` (running) state, consuming 100% CPU. A polling loop inside the runtime is checking a completion flag that never becomes true, either because the hardware never sets it or because the flag is in a memory region not visible to the polling thread.
**Diagnosis**: `top` shows the rank at 100% CPU. `strace -p PID` shows repeated `futex` or `read` syscalls with zero-length results, or no syscalls at all (pure spinloop). `perf top -p PID` shows the process burning cycles in a single tight loop in a runtime library (e.g., `ze_intel_gpu.so`).
**Resolution**: The double-wait workaround — submit a trivially cheap kernel after the operation under test to act as a fence, then wait for the trivial kernel. See `gpu-runtime-correctness.md`.
### 3. Collective deadlock
One or more ranks are blocked in an MPI call, usually `MPI_Allreduce` or `MPI_Barrier`, while others are not. Root cause: a topology-dependent bug in the MPI library's collective algorithm where some ranks' contributions never arrive.
**Diagnosis**: Flight recorder step logs show some ranks at step N (inside the collective) while others are at step N+1 or stuck at step N with different `step_name` strings. The hung ranks will show `D` or `S` state in `ps`.
**Resolution**: Replace `MPI_Allreduce` with a deterministic point-to-point tree reduction. See `mpi-heterogeneous.md`.
### 4. Premature return from wait (silent wrong answer, not a hang)
The runtime returns from `q.wait()` before the GPU work is complete. The next operation reads stale data. This is not a hang — it manifests as a wrong answer or non-deterministic floating-point results. It is listed here because it is the most confusing failure mode: the code appears to run correctly and completes normally.
**Diagnosis**: Double-run with checksum (see `correctness-verification.md`). Insert a second `q.wait()` after the first and observe if results become reproducible. If inserting the second wait "fixes" wrong answers, the first wait was returning prematurely.
## Flight Recorder for Hang Localization
The most important diagnostic tool is knowing *which operation* a process is in when it hangs. Maintain a named step counter:
```cpp
// Call at the start of every major operation
FlightRecorder::StepLog("MPI_Allreduce::norm");
// ... do the operation ...
FlightRecorder::StepLog("MPI_Allreduce::done");
```
On SIGHUP, dump rank, step counter value, and step name to stderr in an async-safe manner:
```cpp
static void sighup_handler(int) {
char buf[256];
int n = snprintf(buf, sizeof(buf), "rank %d: step %llu '%s'\n",
comm_rank,
(unsigned long long)step_counter,
step_name);
write(2, buf, n);
// backtrace_symbols_fd is async-safe on Linux glibc
void *frames[32];
backtrace_symbols_fd(frames, backtrace(frames, 32), 2);
}
signal(SIGHUP, sighup_handler);
```
Broadcast SIGHUP to all ranks from outside the job:
```bash
# In a separate shell while the job is hung
squeue --job $JOBID -o "%i %N" | awk '{print $2}' | \
xargs -I{} ssh {} "pkill -SIGHUP -f my_application"
```
The step names from all ranks will reveal which collective operation has diverged.
## Distinguishing Driver Hang from MPI Hang
| Symptom | Driver hang | MPI hang |
|---|---|---|
| Process state | `D` (ioctl) or `R` (spinloop) | `S` (blocked in syscall) |
| `strace` | blocked `ioctl` or tight loop | blocked `recvmsg` / `read` |
| Scope | single rank / single node | subset of ranks, pattern-dependent |
| Recovery | reboot node | cancel job |
| Flight recorder | step name is a GPU operation | step name is a collective |
## Reducing Diagnostic Time
1. **Name every collective operation** in the flight recorder before calling it.
2. **Separate GPU work from MPI work** in the code so the step name unambiguously identifies which subsystem is hung.
3. **Log node identifiers** alongside step names so flaky nodes can be identified and blacklisted.
4. **Request flight recorder dumps from all ranks simultaneously** (SIGHUP broadcast) rather than attaching a debugger — attaching `gdb` to one rank of a hung MPI job usually deadlocks the debugger too.
## What Not to Do
- Do not `kill -9` a hung rank immediately — get the flight recorder dump first, otherwise diagnostic information is lost.
- Do not assume the first rank that prints an error is the faulty one — collective hangs are frequently caused by the *last* rank to arrive at the barrier.
- Do not use `MPI_Abort` in the hang handler — it may itself hang on some implementations. Use `_exit(1)` to force termination.
+137
View File
@@ -0,0 +1,137 @@
---
name: mpi-heterogeneous
description: Diagnose and work around MPI correctness bugs on heterogeneous (CPU+GPU) systems — device buffer aliasing in MPI_Sendrecv, AARCH64 PLT corruption from libfabric, topology-dependent allreduce hangs, and deterministic point-to-point reduction trees as a replacement for MPI_Allreduce.
user-invocable: true
allowed-tools:
- Read
- Bash(grep -r)
---
# MPI Correctness on Heterogeneous HPC Systems
## The Core Problem
MPI libraries were designed for CPU-resident buffers. When GPU-resident buffers are passed directly (GPU-aware MPI / GPU direct RDMA), several correctness assumptions break:
- **Buffer aliasing**: The MPI library may internally alias send/receive buffer addresses for `MPI_Sendrecv` in ways that are safe for CPU memory but wrong for GPU memory with different cache coherency rules.
- **RDMA bandwidth**: GPU direct RDMA on some fabrics operates at a fraction of peak wirespeed (documented at ~30% on Pontevecchio/Aurora), making host-staging mandatory for performance even when correctness is not an issue.
- **Collective tree topology**: `MPI_Allreduce` implementations may select reduction trees based on process count or communicator topology that expose rank-ordering bugs, causing hangs on some configurations but not others.
## Bug Class 1: Device Buffer Aliasing in MPI_Sendrecv
**Symptom**: `MPI_Sendrecv` with GPU-resident send and receive buffers produces wrong results. The received data matches neither the expected payload nor a host-staged copy. The failure is *deterministic* for a given problem size and process count, but *history-dependent* — earlier sends affect which alias is selected.
**Root cause**: The MPI library internally reuses GPU buffer addresses for temporary staging without proper device memory ordering. When the same physical GPU memory pages appear in both the send and receive paths, writes from one path corrupt the other.
**Diagnosis**:
1. Enable per-packet checksumming (see `correctness-verification.md`). If the checksum on the received packet does not match the sent checksum, the data was corrupted in transit.
2. Replace `MPI_Sendrecv` with separate `MPI_Isend` + `MPI_Irecv` + `MPI_Waitall`. If this fixes the problem, the bug is in the `MPI_Sendrecv` implementation's internal buffer handling.
3. Stage through host memory (`cudaMemcpy`/`hipMemcpy` to a host buffer, then `MPI_Sendrecv` on host buffers, then copy back). If this fixes the problem, confirms GPU-specific aliasing.
**Reported as**: MPICH issue #7302. Affects MPICH on Intel Pontevecchio (Aurora) with device-resident buffers.
**Workaround**: Do not use `MPI_Sendrecv` with GPU buffers. Use asynchronous send/receive pairs or host-staging. See `communication-overlap.md` for the full pipeline pattern.
## Bug Class 2: PLT Corruption on AARCH64 (libfabric)
**Symptom**: Application crashes or hangs on first `MPI_Comm_dup` call on AARCH64 systems (e.g. NVIDIA Grace/H200). Backtrace shows a bad instruction in the PLT (Procedure Linkage Table) for `MPI_Comm_dup` — specifically a `br x15` instruction that should instead be a proper trampoline.
**Root cause**: `libfabric`'s memory registration cache monitor patches PLT entries at runtime to intercept memory allocation calls. Its AARCH64 trampoline generation writes an incorrect instruction sequence, leaving `br x15` (branch to whatever happens to be in x15) in the PLT entry. The next call through that PLT entry executes garbage.
**Diagnosis**:
```bash
# Check if the PLT entry is corrupted
objdump -d /proc/PID/exe | grep -A5 "MPI_Comm_dup@plt"
# Look for "br x15" — this should be a proper stub, not a register branch
```
Or check the disassembly of the live process:
```bash
gdb -p PID -batch -ex "disassemble 'MPI_Comm_dup@plt'"
```
**Workaround**:
```bash
export FI_MR_CACHE_MONITOR=disabled
```
This prevents libfabric from patching PLT entries. It may reduce MR cache performance but restores correctness.
**Reported as**: libfabric issue #11451. Affects systems using AARCH64 + libfabric OFI provider (Cray Slingshot, AWS EFA) with memory registration cache enabled.
## Bug Class 3: Topology-Dependent Allreduce Hangs
**Symptom**: `MPI_Allreduce` hangs indefinitely on some node configurations but completes correctly on others. The failure correlates with process count (e.g. fails at 512 ranks, works at 256) or network topology (fails when crossing specific router boundaries).
**Root cause**: The MPI library's collective selection algorithm picks a reduction tree implementation that assumes symmetric participation from all ranks. A bug in one rank's contribution path (e.g. a GPU-side buffer not yet flushed when MPI reads it, due to premature barrier — see `gpu-runtime-correctness.md`) causes that rank to send wrong or incomplete data, and the tree-reduction protocol deadlocks waiting for data that never arrives correctly.
**Diagnosis**: Flight recorder step logging (see `hang-diagnosis.md`). SIGHUP broadcast to all ranks. Ranks that are hung will show step name `MPI_Allreduce::...`; ranks that completed will show the next step. The hung ranks are the *recipients* of the stale data, not necessarily the *cause*.
**Workaround — deterministic P2P reduction tree**:
Replace `MPI_Allreduce` with an explicit point-to-point binary tree reduction. This is slower for large communicators but:
1. Is immune to topology-dependent collective bugs.
2. Is deterministic in floating-point ordering (the tree is fixed, not chosen at runtime).
3. Makes the hang location explicit — each P2P operation is a named step in the flight recorder.
```cpp
// Binary tree reduction: rank 0 collects, then broadcasts
void GlobalSumP2P(double *data, int count, MPI_Comm comm) {
int rank, size;
MPI_Comm_rank(comm, &rank); MPI_Comm_size(comm, &size);
// Reduce phase: even ranks receive from odd neighbours
for (int stride = 1; stride < size; stride *= 2) {
if (rank % (2*stride) == 0) {
int partner = rank + stride;
if (partner < size) {
std::vector<double> tmp(count);
MPI_Recv(tmp.data(), count, MPI_DOUBLE, partner, 0, comm, MPI_STATUS_IGNORE);
for (int i = 0; i < count; i++) data[i] += tmp[i];
}
} else if (rank % stride == 0) {
int partner = rank - stride;
MPI_Send(data, count, MPI_DOUBLE, partner, 0, comm);
break;
}
}
// Broadcast phase
for (int stride = /* highest power of 2 <= size */; stride >= 1; stride /= 2) {
if (rank % (2*stride) == 0) {
int partner = rank + stride;
if (partner < size)
MPI_Send(data, count, MPI_DOUBLE, partner, 0, comm);
} else if (rank % stride == 0) {
int partner = rank - stride;
MPI_Recv(data, count, MPI_DOUBLE, partner, 0, comm, MPI_STATUS_IGNORE);
}
}
}
```
Grid reference: `USE_GRID_REDUCTION` macro in `Grid/communicator/Communicator_mpi3.cc`.
## Compile-Time Guard Structure
Recommended macro structure to switch between the workaround paths:
```cpp
// In configure / CMake, expose as options:
// ACCELERATOR_AWARE_MPI — use GPU direct (fast, potentially broken)
// GRID_CHECKSUM_COMMS — per-packet checksums (overhead: ~5%)
// USE_GRID_REDUCTION — P2P tree instead of MPI_Allreduce (slower, deterministic)
// FI_MR_CACHE_MONITOR — libfabric PLT workaround (env var, not compile-time)
```
On a known-good system, enable `ACCELERATOR_AWARE_MPI` and disable the others. On a system with known bugs, disable `ACCELERATOR_AWARE_MPI` and enable `GRID_CHECKSUM_COMMS` + `USE_GRID_REDUCTION` as needed.
## Escalation Checklist
Before concluding a bug is in your code:
1. [ ] Can you reproduce with a minimal reproducer (two MPI ranks, no physics code)?
2. [ ] Does the failure rate correlate with buffer size, process count, or network route?
3. [ ] Does staging through host memory eliminate the failure?
4. [ ] Is the failure deterministic for a given input (same answer, always wrong) or stochastic?
5. [ ] Does the failure appear on a different MPI implementation (e.g. OpenMPI vs MPICH)?
Deterministic wrong answers that reproduce with minimal reproducers and disappear with host-staging are strong evidence of an MPI library bug. File with the MPI library issue tracker with the minimal reproducer.
+5 -2
View File
@@ -3,7 +3,10 @@ echo spack
. /autofs/nccs-svm1_home1/paboyle/Crusher/Grid/spack/share/spack/setup-env.sh
module load cce/15.0.1
module load rocm/5.3.0
module load amd/7.0.2
#module load amd/7.1.1
#module load rocm/7.2.0
#module load rocm/6.4.2
module load cray-fftw
module load craype-accel-amd-gfx90a
@@ -11,7 +14,7 @@ module load craype-accel-amd-gfx90a
export LD_LIBRARY_PATH=/opt/cray/libfabric/1.20.1/lib64/:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/opt/gcc/mpfr/3.1.4/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=`pwd`/:$LD_LIBRARY_PATH
ln -s /opt/rocm-6.0.0/lib/libamdhip64.so.6 .
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/LD_PATH/
#echo spack load c-lime
#spack load c-lime
+4 -5
View File
@@ -1,17 +1,16 @@
DIR=`pwd`
PREFIX=$DIR/../Prequisites/install/
PREFIX=$HOME/DDHMC/Grid/systems/Prerequisites/install/
../../configure \
--enable-comms=mpi \
--enable-simd=GPU \
--enable-shm=nvlink \
--enable-gen-simd-width=64 \
--enable-accelerator=cuda \
--enable-setdevice \
--disable-accelerator-cshift \
--with-gmp=$PREFIX \
--with-mpfr=$PREFIX \
--enable-accelerator=cuda \
--disable-fermion-reps \
--disable-unified \
--disable-gparity \
CXX=nvcc \
LDFLAGS="-cudart shared " \
CXXFLAGS="-ccbin CC -gencode arch=compute_80,code=sm_80 -std=c++14 -cudart shared"
CXXFLAGS="-ccbin CC -gencode arch=compute_80,code=sm_80 -std=c++17 -cudart shared"
+39
View File
@@ -0,0 +1,39 @@
#!/bin/bash
##SBATCH -A m5294_g
#SBATCH -A mp13_g
#m3886_g
#SBATCH -C gpu
#SBATCH -q premium
#SBATCH -t 00:20
#SBATCH -c 32
#SBATCH -N 384
#SBATCH -n 1536
#SBATCH --ntasks-per-node=4
#SBATCH --gpus-per-task=1
#SBATCH --exclusive
#SBATCH --gpu-bind=none
export SLURM_CPU_BIND="cores"
export MPICH_GPU_SUPPORT_ENABLED=1
export MPICH_RDMA_ENABLED_CUDA=1
export MPICH_GPU_IPC_ENABLED=1
export MPICH_GPU_EAGER_REGISTER_HOST_MEM=0
export MPICH_GPU_NO_ASYNC_MEMCPY=0
#export MPICH_SMP_SINGLE_COPY_MODE=CMA
cat << EOF > select_gpu
#!/bin/bash
export GPU_MAP=(0 1 2 3)
export NUMA_MAP=( 0 1 2 3 )
export GPU=\$SLURM_LOCALID
export NUMA=\$SLURM_LOCALID
export CUDA_VISIBLE_DEVICES=\$GPU
exec numactl -m \$NUMA -N \$NUMA \$*
EOF
chmod +x ./select_gpu
export VOL=128.128.128.288
OPT="--comms-overlap --shm-mpi 0"
srun ./select_gpu ./benchmarks/Benchmark_dwf_fp32 --mpi 4.8.4.12 --grid $VOL --device-mem 16000 --accelerator-threads 8 --shm 2048 $OPT
+19 -6
View File
@@ -1,5 +1,6 @@
#!/bin/bash
#SBATCH -A m3886_g
#SBATCH -A m5294_g
#m3886_g
#SBATCH -C gpu
#SBATCH -q debug
#SBATCH -t 0:20:00
@@ -19,9 +20,21 @@ export MPICH_GPU_EAGER_REGISTER_HOST_MEM=0
export MPICH_GPU_NO_ASYNC_MEMCPY=0
#export MPICH_SMP_SINGLE_COPY_MODE=CMA
OPT="--comms-sequential --shm-mpi 1"
VOL=64.64.64.64
srun ./benchmarks/Benchmark_dwf_fp32 --mpi 2.2.1.1 --grid $VOL --accelerator-threads 8 --shm 2048 $OPT
#srun ./benchmarks/Benchmark_dwf_fp32 --mpi 2.1.1.4 --grid $VOL --accelerator-threads 8 --shm 2048 $OPT
#srun ./benchmarks/Benchmark_dwf_fp32 --mpi 1.1.1.8 --grid $VOL --accelerator-threads 8 --shm 2048 $OPT
cat << EOF > select_gpu
#!/bin/bash
export GPU_MAP=(0 1 2 3)
export NUMA_MAP=( 0 1 2 3 )
export GPU=\$SLURM_LOCALID
export NUMA=\$SLURM_LOCALID
export CUDA_VISIBLE_DEVICES=\$GPU
exec numactl -m \$NUMA -N \$NUMA \$*
EOF
chmod +x ./select_gpu
OPT="--comms-sequential --shm-mpi 0"
VOL=64.64.32.32
srun ./select_gpu ./benchmarks/Benchmark_dwf_fp32 --mpi 2.2.1.1 --grid $VOL --device-mem 16000 --accelerator-threads 8 --shm 2048 $OPT
OPT="--comms-overlap --shm-mpi 0"
srun ./select_gpu ./benchmarks/Benchmark_dwf_fp32 --mpi 2.2.1.1 --grid $VOL --device-mem 16000 --accelerator-threads 8 --shm 2048 $OPT
+44
View File
@@ -0,0 +1,44 @@
#!/bin/bash
##SBATCH -A m5294_g
#SBATCH -A mp13_g
#m3886_g
#SBATCH -C gpu
#SBATCH -q premium
#SBATCH -t 00:10
#SBATCH -c 32
#SBATCH -N 32
#SBATCH -n 128
#SBATCH --ntasks-per-node=4
#SBATCH --gpus-per-task=1
#SBATCH --exclusive
#SBATCH --gpu-bind=none
export SLURM_CPU_BIND="cores"
export MPICH_GPU_SUPPORT_ENABLED=1
export MPICH_RDMA_ENABLED_CUDA=1
export MPICH_GPU_IPC_ENABLED=1
export MPICH_GPU_EAGER_REGISTER_HOST_MEM=0
export MPICH_GPU_NO_ASYNC_MEMCPY=0
#export MPICH_SMP_SINGLE_COPY_MODE=CMA
cat << EOF > select_gpu
#!/bin/bash
export GPU_MAP=(0 1 2 3)
export NUMA_MAP=( 0 1 2 3 )
export GPU=\$SLURM_LOCALID
export NUMA=\$SLURM_LOCALID
export CUDA_VISIBLE_DEVICES=\$GPU
exec numactl -m \$NUMA -N \$NUMA \$*
EOF
chmod +x ./select_gpu
OPT="--comms-overlap --shm-mpi 0"
#
# Local volume WAS 32.16.32.24
#
# 384 nodes
#srun ./select_gpu ./Test_dwf_mixedcg_prec --seconds 300 --grid 128.128.128.288 --mpi 4.8.4.12 --device-mem 16000 --accelerator-threads 8 --shm 2048 $OPT > job.log
# 32 nodes, same volume per node
srun ./select_gpu ./Test_dwf_mixedcg_prec --seconds 300 --grid 64.32.64.96 --mpi 2.2.2.4 --device-mem 16000 --accelerator-threads 8 --shm 2048 $OPT > job.log
+38
View File
@@ -0,0 +1,38 @@
#!/bin/bash
##SBATCH -A m5294_g
#SBATCH -A mp13_g
#m3886_g
#SBATCH -C gpu
#SBATCH -q premium
#SBATCH -t 00:10
#SBATCH -c 32
#SBATCH -N 384
#SBATCH -n 1536
#SBATCH --ntasks-per-node=4
#SBATCH --gpus-per-task=1
#SBATCH --exclusive
#SBATCH --gpu-bind=none
export SLURM_CPU_BIND="cores"
export MPICH_GPU_SUPPORT_ENABLED=1
export MPICH_RDMA_ENABLED_CUDA=1
export MPICH_GPU_IPC_ENABLED=1
export MPICH_GPU_EAGER_REGISTER_HOST_MEM=0
export MPICH_GPU_NO_ASYNC_MEMCPY=0
#export MPICH_SMP_SINGLE_COPY_MODE=CMA
cat << EOF > select_gpu
#!/bin/bash
export GPU_MAP=(0 1 2 3)
export NUMA_MAP=( 0 1 2 3 )
export GPU=\$SLURM_LOCALID
export NUMA=\$SLURM_LOCALID
export CUDA_VISIBLE_DEVICES=\$GPU
exec numactl -m \$NUMA -N \$NUMA \$*
EOF
chmod +x ./select_gpu
OPT="--comms-overlap --shm-mpi 0"
srun ./select_gpu ./Test_dwf_mixedcg_prec --seconds 300 --grid 128.128.128.288 --mpi 4.8.4.12 --device-mem 16000 --accelerator-threads 8 --shm 2048 $OPT > job.log
+39
View File
@@ -0,0 +1,39 @@
#!/bin/bash
#SBATCH -A m5294_g
#m3886_g
#SBATCH -C gpu
#SBATCH -q debug
#SBATCH -t 0:30:00
#SBATCH -c 32
#SBATCH -N 4
#SBATCH -n 16
#SBATCH --ntasks-per-node=4
#SBATCH --gpus-per-task=1
#SBATCH --exclusive
#SBATCH --gpu-bind=none
export SLURM_CPU_BIND="cores"
export MPICH_GPU_SUPPORT_ENABLED=1
export MPICH_RDMA_ENABLED_CUDA=1
export MPICH_GPU_IPC_ENABLED=1
export MPICH_GPU_EAGER_REGISTER_HOST_MEM=0
export MPICH_GPU_NO_ASYNC_MEMCPY=0
#export MPICH_SMP_SINGLE_COPY_MODE=CMA
cat << EOF > select_gpu
#!/bin/bash
export GPU_MAP=(0 1 2 3)
export NUMA_MAP=( 0 1 2 3 )
export GPU=\$SLURM_LOCALID
export NUMA=\$SLURM_LOCALID
export CUDA_VISIBLE_DEVICES=\$GPU
exec numactl -m \$NUMA -N \$NUMA \$*
EOF
chmod +x ./select_gpu
OPT="--comms-sequential --shm-mpi 0"
VOL=64.64.32.32
srun ./select_gpu ./benchmarks/Benchmark_usqcd --mpi 2.2.2.2 --grid $VOL --device-mem 16000 --accelerator-threads 8 --shm 2048 $OPT > usqcd.log
+4 -3
View File
@@ -25,6 +25,9 @@ Author: Peter Boyle <paboyle@ph.ed.ac.uk>
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#include "disable_tests_without_instantiations.h"
#ifdef ENABLE_FERMION_INSTANTIATIONS
#include <Grid/Grid.h>
using namespace std;
@@ -273,8 +276,6 @@ void TestWhat(What & Ddwf,
err = phi-chi;
std::cout<<GridLogMessage << "norm diff "<< norm2(err)<< std::endl;
}
#endif
@@ -30,6 +30,9 @@ Author: Peter Boyle <paboyle@ph.ed.ac.uk>
* Reimplement the badly named "multigrid" lanczos as compressed Lanczos using the features
* in Grid that were intended to be used to support blocked Aggregates, from
*/
#include "disable_tests_without_instantiations.h"
#ifdef ENABLE_FERMION_INSTANTIATIONS
#include <Grid/Grid.h>
#include <Grid/algorithms/iterative/ImplicitlyRestartedLanczos.h>
#include <Grid/algorithms/iterative/LocalCoherenceLanczos.h>
@@ -256,3 +259,4 @@ int main (int argc, char ** argv) {
Grid_finalize();
}
#endif
+5
View File
@@ -25,6 +25,9 @@ Author: Peter Boyle <paboyle@ph.ed.ac.uk>
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#include "disable_tests_without_instantiations.h"
#ifdef ENABLE_FERMION_INSTANTIATIONS
#include <Grid/Grid.h>
using namespace std;
@@ -237,3 +240,5 @@ int main (int argc, char ** argv)
Grid_finalize();
}
#endif
+5
View File
@@ -25,6 +25,9 @@ Author: Peter Boyle <paboyle@ph.ed.ac.uk>
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#include "disable_tests_without_instantiations.h"
#ifdef ENABLE_FERMION_INSTANTIATIONS
#include <Grid/Grid.h>
using namespace std;
@@ -222,3 +225,5 @@ int main (int argc, char ** argv)
Grid_finalize();
}
#endif
+4
View File
@@ -25,6 +25,9 @@ Author: Peter Boyle <paboyle@ph.ed.ac.uk>
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#include "disable_tests_without_instantiations.h"
#ifdef ENABLE_FERMION_INSTANTIATIONS
#include <Grid/Grid.h>
using namespace std;
@@ -118,3 +121,4 @@ int main (int argc, char ** argv)
Grid_finalize();
}
#endif
#endif
+4
View File
@@ -24,6 +24,8 @@ with this program; if not, write to the Free Software Foundation, Inc.,
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
#include "disable_tests_without_instantiations.h"
#ifdef ENABLE_FERMION_INSTANTIATIONS
#include <Grid/Grid.h>
#include <Grid/qcd/utils/A2Autils.h>
@@ -157,3 +159,5 @@ int main(int argc, char *argv[])
return EXIT_SUCCESS;
}
#endif
+30 -13
View File
@@ -128,6 +128,10 @@ int main (int argc, char ** argv)
typedef HermOpAdaptor<LatticeFermionD> HermFineMatrix;
HermFineMatrix FineHermOp(HermOpEO);
LatticeFermionD src(FrbGrid);
src = ComplexD(1.0);
PowerMethod<LatticeFermionD> PM; PM(HermOpEO,src);
////////////////////////////////////////////////////////////
///////////// Coarse basis and Little Dirac Operator ///////
////////////////////////////////////////////////////////////
@@ -150,7 +154,7 @@ int main (int argc, char ** argv)
std::cout << "**************************************"<<std::endl;
std::cout << "Create Subspace"<<std::endl;
std::cout << "**************************************"<<std::endl;
Aggregates.CreateSubspaceChebyshevNew(RNG5,HermOpEO,95.);
Aggregates.CreateSubspaceChebyshev(RNG5,HermOpEO,nbasis,35.,0.01,500);// <== last run
std::cout << "**************************************"<<std::endl;
std::cout << "Refine Subspace"<<std::endl;
@@ -185,7 +189,7 @@ int main (int argc, char ** argv)
std::cout << "**************************************"<<std::endl;
typedef HermitianLinearOperator<MultiGeneralCoarsenedMatrix_t,CoarseVector> MrhsHermMatrix;
Chebyshev<CoarseVector> IRLCheby(0.05,40.0,101); // 1 iter
Chebyshev<CoarseVector> IRLCheby(0.01,16.0,201); // 1 iter
MrhsHermMatrix MrhsCoarseOp (mrhs);
CoarseVector pm_src(CoarseMrhs);
@@ -193,10 +197,10 @@ int main (int argc, char ** argv)
PowerMethod<CoarseVector> cPM;
cPM(MrhsCoarseOp,pm_src);
int Nk=nrhs;
int Nm=Nk*3;
// int Nk=36;
// int Nm=144;
// int Nk=16;
// int Nm=Nk*3;
int Nk=32;
int Nm=128;
int Nstop=Nk;
int Nconv_test_interval=1;
@@ -210,7 +214,7 @@ int main (int argc, char ** argv)
nrhs,
Nk,
Nm,
1e-4,10);
1e-4,100);
int Nconv;
std::vector<RealD> eval(Nm);
@@ -231,8 +235,6 @@ int main (int argc, char ** argv)
std::cout << "**************************************"<<std::endl;
std::cout << " Recompute coarse evecs "<<std::endl;
std::cout << "**************************************"<<std::endl;
evec.resize(Nm,Coarse5d);
eval.resize(Nm);
for(int r=0;r<nrhs;r++){
random(CRNG,c_src[r]);
}
@@ -243,7 +245,7 @@ int main (int argc, char ** argv)
// Deflation guesser object
///////////////////////
std::cout << "**************************************"<<std::endl;
std::cout << " Reimport coarse evecs "<<std::endl;
std::cout << " Reimport coarse evecs "<<evec.size()<<" "<<eval.size()<<std::endl;
std::cout << "**************************************"<<std::endl;
MultiRHSDeflation<CoarseVector> MrhsGuesser;
MrhsGuesser.ImportEigenBasis(evec,eval);
@@ -252,9 +254,11 @@ int main (int argc, char ** argv)
// Extra HDCG parameters
//////////////////////////
int maxit=3000;
ConjugateGradient<CoarseVector> CG(2.0e-1,maxit,false);
RealD lo=2.0;
int ord = 9;
// ConjugateGradient<CoarseVector> CG(2.0e-1,maxit,false);
// ConjugateGradient<CoarseVector> CG(1.0e-2,maxit,false);
ConjugateGradient<CoarseVector> CG(5.0e-2,maxit,false);
RealD lo=0.2;
int ord = 7;
DoNothingGuesser<CoarseVector> DoNothing;
HPDSolver<CoarseVector> HPDSolveMrhs(MrhsCoarseOp,CG,DoNothing);
@@ -300,6 +304,19 @@ int main (int argc, char ** argv)
ConjugateGradient<LatticeFermionD> CGfine(1.0e-8,30000,false);
CGfine(HermOpEO, src, result);
}
{
std::cout << "**************************************"<<std::endl;
std::cout << "Calling MdagM CG"<<std::endl;
std::cout << "**************************************"<<std::endl;
LatticeFermion result(FGrid); result=Zero();
LatticeFermion src(FGrid); random(RNG5,src);
result=Zero();
MdagMLinearOperator<MobiusFermionD, LatticeFermionD> HermOp(Ddwf);
ConjugateGradient<LatticeFermionD> CGfine(1.0e-8,30000,false);
CGfine(HermOp, src, result);
}
#endif
Grid_finalize();
return 0;
+5 -2
View File
@@ -368,7 +368,10 @@ int main (int argc, char ** argv)
TrivialPrecon<CoarseVector> simple;
NonHermitianLinearOperator<LittleDiracOperator,CoarseVector> LinOpCoarse(LittleDiracOpPV);
// PrecGeneralisedConjugateResidualNonHermitian<CoarseVector> L2PGCR(1.0e-4, 100, LinOpCoarse,simple,10,10);
PrecGeneralisedConjugateResidualNonHermitian<CoarseVector> L2PGCR(3.0e-2, 100, LinOpCoarse,simple,10,10);
// PrecGeneralisedConjugateResidualNonHermitian<CoarseVector> L2PGCR(3.0e-2, 100, LinOpCoarse,simple,12,12); // 35 outer
// PrecGeneralisedConjugateResidualNonHermitian<CoarseVector> L2PGCR(5.0e-2, 100, LinOpCoarse,simple,12,12); // 36 outer, 12s
// PrecGeneralisedConjugateResidualNonHermitian<CoarseVector> L2PGCR(1.0e-1, 100, LinOpCoarse,simple,12,12); // 36 ; 11s
PrecGeneralisedConjugateResidualNonHermitian<CoarseVector> L2PGCR(3.0e-1, 100, LinOpCoarse,simple,12,12);
L2PGCR.Level(3);
c_res=Zero();
L2PGCR(c_src,c_res);
@@ -400,7 +403,7 @@ int main (int argc, char ** argv)
LinOpCoarse,
L2PGCR);
PrecGeneralisedConjugateResidualNonHermitian<LatticeFermion> L1PGCR(1.0e-8,1000,PVdagM,TwoLevelPrecon,16,16);
PrecGeneralisedConjugateResidualNonHermitian<LatticeFermion> L1PGCR(1.0e-8,100,PVdagM,TwoLevelPrecon,10,10);
L1PGCR.Level(1);
f_res=Zero();
@@ -0,0 +1,493 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./tests/Test_padded_cell.cc
Copyright (C) 2023
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#include <Grid/Grid.h>
#include <Grid/lattice/PaddedCell.h>
#include <Grid/stencil/GeneralLocalStencil.h>
#include <Grid/algorithms/iterative/PrecGeneralisedConjugateResidual.h>
#include <Grid/algorithms/iterative/PrecGeneralisedConjugateResidualNonHermitian.h>
#include <Grid/algorithms/iterative/BiCGSTAB.h>
using namespace std;
using namespace Grid;
template<class Matrix,class Field>
class PVdagMLinearOperator : public LinearOperatorBase<Field> {
Matrix &_Mat;
Matrix &_PV;
public:
PVdagMLinearOperator(Matrix &Mat,Matrix &PV): _Mat(Mat),_PV(PV){};
void OpDiag (const Field &in, Field &out) { assert(0); }
void OpDir (const Field &in, Field &out,int dir,int disp) { assert(0); }
void OpDirAll (const Field &in, std::vector<Field> &out){ assert(0); };
void Op (const Field &in, Field &out){
// std::cout << GridLogMessage<< "Op: PVdag M "<<std::endl;
Field tmp(in.Grid());
_Mat.M(in,tmp);
_PV.Mdag(tmp,out);
}
void AdjOp (const Field &in, Field &out){
// std::cout << GridLogMessage<<"AdjOp: Mdag PV "<<std::endl;
Field tmp(in.Grid());
_PV.M(in,tmp);
_Mat.Mdag(tmp,out);
}
void HermOpAndNorm(const Field &in, Field &out,RealD &n1,RealD &n2){
assert(0);
}
void HermOp(const Field &in, Field &out){
// std::cout <<GridLogMessage<< "HermOp: Mdag PV PVdag M"<<std::endl;
Field tmp(in.Grid());
Op(in,tmp);
AdjOp(tmp,out);
// std::cout << "HermOp done "<<norm2(out)<<std::endl;
}
};
template<class Matrix,class Field>
class MdagPVLinearOperator : public LinearOperatorBase<Field> {
Matrix &_Mat;
Matrix &_PV;
public:
MdagPVLinearOperator(Matrix &Mat,Matrix &PV): _Mat(Mat),_PV(PV){};
void OpDiag (const Field &in, Field &out) { assert(0); }
void OpDir (const Field &in, Field &out,int dir,int disp) { assert(0); }
void OpDirAll (const Field &in, std::vector<Field> &out){ assert(0); };
void Op (const Field &in, Field &out){
Field tmp(in.Grid());
// std::cout <<GridLogMessage<< "Op: PVdag M "<<std::endl;
_PV.M(in,tmp);
_Mat.Mdag(tmp,out);
}
void AdjOp (const Field &in, Field &out){
// std::cout <<GridLogMessage<< "AdjOp: Mdag PV "<<std::endl;
Field tmp(in.Grid());
_Mat.M(in,tmp);
_PV.Mdag(tmp,out);
}
void HermOpAndNorm(const Field &in, Field &out,RealD &n1,RealD &n2){
assert(0);
}
void HermOp(const Field &in, Field &out){
// std::cout << GridLogMessage<<"HermOp: PVdag M Mdag PV "<<std::endl;
Field tmp(in.Grid());
Op(in,tmp);
AdjOp(tmp,out);
// std::cout << "HermOp done "<<norm2(out)<<std::endl;
}
};
template<class Matrix,class Field>
class ShiftedPVdagMLinearOperator : public LinearOperatorBase<Field> {
Matrix &_Mat;
Matrix &_PV;
RealD shift;
public:
ShiftedPVdagMLinearOperator(RealD _shift,Matrix &Mat,Matrix &PV): shift(_shift),_Mat(Mat),_PV(PV){};
void OpDiag (const Field &in, Field &out) { assert(0); }
void OpDir (const Field &in, Field &out,int dir,int disp) { assert(0); }
void OpDirAll (const Field &in, std::vector<Field> &out){ assert(0); };
void Op (const Field &in, Field &out){
// std::cout << "Op: PVdag M "<<std::endl;
Field tmp(in.Grid());
_Mat.M(in,tmp);
_PV.Mdag(tmp,out);
out = out + shift * in;
}
void AdjOp (const Field &in, Field &out){
// std::cout << "AdjOp: Mdag PV "<<std::endl;
Field tmp(in.Grid());
_PV.M(tmp,out);
_Mat.Mdag(in,tmp);
out = out + shift * in;
}
void HermOpAndNorm(const Field &in, Field &out,RealD &n1,RealD &n2){ assert(0); }
void HermOp(const Field &in, Field &out){
// std::cout << "HermOp: Mdag PV PVdag M"<<std::endl;
Field tmp(in.Grid());
Op(in,tmp);
AdjOp(tmp,out);
}
};
template<class Fobj,class CComplex,int nbasis>
class MGPreconditionerSVD : public LinearFunction< Lattice<Fobj> > {
public:
using LinearFunction<Lattice<Fobj> >::operator();
typedef Aggregation<Fobj,CComplex,nbasis> Aggregates;
typedef typename Aggregation<Fobj,CComplex,nbasis>::FineField FineField;
typedef typename Aggregation<Fobj,CComplex,nbasis>::CoarseVector CoarseVector;
typedef typename Aggregation<Fobj,CComplex,nbasis>::CoarseMatrix CoarseMatrix;
typedef LinearOperatorBase<FineField> FineOperator;
typedef LinearFunction <FineField> FineSmoother;
typedef LinearOperatorBase<CoarseVector> CoarseOperator;
typedef LinearFunction <CoarseVector> CoarseSolver;
Aggregates & _FineToCoarse;
Aggregates & _CoarseToFine;
FineOperator & _FineOperator;
FineSmoother & _PreSmoother;
FineSmoother & _PostSmoother;
CoarseOperator & _CoarseOperator;
CoarseSolver & _CoarseSolve;
int level; void Level(int lv) {level = lv; };
MGPreconditionerSVD(Aggregates &FtoC,
Aggregates &CtoF,
FineOperator &Fine,
FineSmoother &PreSmoother,
FineSmoother &PostSmoother,
CoarseOperator &CoarseOperator_,
CoarseSolver &CoarseSolve_)
: _FineToCoarse(FtoC),
_CoarseToFine(CtoF),
_FineOperator(Fine),
_PreSmoother(PreSmoother),
_PostSmoother(PostSmoother),
_CoarseOperator(CoarseOperator_),
_CoarseSolve(CoarseSolve_),
level(1) { }
virtual void operator()(const FineField &in, FineField & out)
{
GridBase *CoarseGrid = _FineToCoarse.CoarseGrid;
// auto CoarseGrid = _CoarseOperator.Grid();
CoarseVector Csrc(CoarseGrid);
CoarseVector Csol(CoarseGrid);
FineField vec1(in.Grid());
FineField vec2(in.Grid());
std::cout<<GridLogMessage << "Calling PreSmoother " <<std::endl;
// std::cout<<GridLogMessage << "Calling PreSmoother input residual "<<norm2(in) <<std::endl;
double t;
// Fine Smoother
// out = in;
out = Zero();
t=-usecond();
_PreSmoother(in,out);
t+=usecond();
std::cout<<GridLogMessage << "PreSmoother took "<< t/1000.0<< "ms" <<std::endl;
// Update the residual
_FineOperator.Op(out,vec1); sub(vec1, in ,vec1);
// std::cout<<GridLogMessage <<"Residual-1 now " <<norm2(vec1)<<std::endl;
// Fine to Coarse
t=-usecond();
_FineToCoarse.ProjectToSubspace (Csrc,vec1);
t+=usecond();
std::cout<<GridLogMessage << "Project to coarse took "<< t/1000.0<< "ms" <<std::endl;
// Coarse correction
t=-usecond();
Csol = Zero();
_CoarseSolve(Csrc,Csol);
//Csol=Zero();
t+=usecond();
std::cout<<GridLogMessage << "Coarse solve took "<< t/1000.0<< "ms" <<std::endl;
// Coarse to Fine
t=-usecond();
// _CoarseOperator.PromoteFromSubspace(_Aggregates,Csol,vec1);
_CoarseToFine.PromoteFromSubspace(Csol,vec1);
add(out,out,vec1);
t+=usecond();
std::cout<<GridLogMessage << "Promote to this level took "<< t/1000.0<< "ms" <<std::endl;
// Residual
_FineOperator.Op(out,vec1); sub(vec1 ,in , vec1);
// std::cout<<GridLogMessage <<"Residual-2 now " <<norm2(vec1)<<std::endl;
// Fine Smoother
t=-usecond();
// vec2=vec1;
vec2=Zero();
_PostSmoother(vec1,vec2);
t+=usecond();
std::cout<<GridLogMessage << "PostSmoother took "<< t/1000.0<< "ms" <<std::endl;
add( out,out,vec2);
std::cout<<GridLogMessage << "Done " <<std::endl;
}
};
int main (int argc, char ** argv)
{
Grid_init(&argc,&argv);
const int Ls=16;
GridCartesian * UGrid = SpaceTimeGrid::makeFourDimGrid(GridDefaultLatt(), GridDefaultSimd(Nd,vComplex::Nsimd()),GridDefaultMpi());
GridRedBlackCartesian * UrbGrid = SpaceTimeGrid::makeFourDimRedBlackGrid(UGrid);
GridCartesian * FGrid = SpaceTimeGrid::makeFiveDimGrid(Ls,UGrid);
GridRedBlackCartesian * FrbGrid = SpaceTimeGrid::makeFiveDimRedBlackGrid(Ls,UGrid);
// Construct a coarsened grid
Coordinate clatt = GridDefaultLatt();
for(int d=0;d<clatt.size();d++){
clatt[d] = clatt[d]/2;
// clatt[d] = clatt[d]/4;
}
GridCartesian *Coarse4d = SpaceTimeGrid::makeFourDimGrid(clatt, GridDefaultSimd(Nd,vComplex::Nsimd()),GridDefaultMpi());;
GridCartesian *Coarse5d = SpaceTimeGrid::makeFiveDimGrid(1,Coarse4d);
std::vector<int> seeds4({1,2,3,4});
std::vector<int> seeds5({5,6,7,8});
std::vector<int> cseeds({5,6,7,8});
GridParallelRNG RNG5(FGrid); RNG5.SeedFixedIntegers(seeds5);
GridParallelRNG RNG4(UGrid); RNG4.SeedFixedIntegers(seeds4);
GridParallelRNG CRNG(Coarse5d);CRNG.SeedFixedIntegers(cseeds);
LatticeFermion src(FGrid); random(RNG5,src);
LatticeFermion result(FGrid); result=Zero();
LatticeFermion ref(FGrid); ref=Zero();
LatticeFermion tmp(FGrid);
LatticeFermion err(FGrid);
LatticeGaugeField Umu(UGrid);
FieldMetaData header;
std::string file("ckpoint_lat.4000");
NerscIO::readConfiguration(Umu,header,file);
RealD mass=0.01;
RealD M5=1.8;
DomainWallFermionD Ddwf(Umu,*FGrid,*FrbGrid,*UGrid,*UrbGrid,mass,M5);
DomainWallFermionD Dpv(Umu,*FGrid,*FrbGrid,*UGrid,*UrbGrid,1.0,M5);
const int nbasis = 30;
const int cb = 0 ;
NextToNearestStencilGeometry5D geom(Coarse5d);
std::cout<<GridLogMessage<<std::endl;
std::cout<<GridLogMessage<<"*******************************************"<<std::endl;
std::cout<<GridLogMessage<<std::endl;
typedef PVdagMLinearOperator<DomainWallFermionD,LatticeFermionD> PVdagM_t;
typedef MdagPVLinearOperator<DomainWallFermionD,LatticeFermionD> MdagPV_t;
typedef ShiftedPVdagMLinearOperator<DomainWallFermionD,LatticeFermionD> ShiftedPVdagM_t;
PVdagM_t PVdagM(Ddwf,Dpv);
MdagPV_t MdagPV(Ddwf,Dpv);
// ShiftedPVdagM_t ShiftedPVdagM(2.0,Ddwf,Dpv); // 355
// ShiftedPVdagM_t ShiftedPVdagM(1.0,Ddwf,Dpv); // 246
// ShiftedPVdagM_t ShiftedPVdagM(0.5,Ddwf,Dpv); // 183
// ShiftedPVdagM_t ShiftedPVdagM(0.25,Ddwf,Dpv); // 145
// ShiftedPVdagM_t ShiftedPVdagM(0.1,Ddwf,Dpv); // 134
// ShiftedPVdagM_t ShiftedPVdagM(0.1,Ddwf,Dpv); // 127 -- NULL space via inverse iteration
// ShiftedPVdagM_t ShiftedPVdagM(0.1,Ddwf,Dpv); // 57 -- NULL space via inverse iteration; 3 iterations
// ShiftedPVdagM_t ShiftedPVdagM(0.25,Ddwf,Dpv); // 57 , tighter inversion
// ShiftedPVdagM_t ShiftedPVdagM(0.25,Ddwf,Dpv); // nbasis 20 -- 49 iters
// ShiftedPVdagM_t ShiftedPVdagM(0.25,Ddwf,Dpv); // nbasis 20 -- 70 iters; asymmetric
// ShiftedPVdagM_t ShiftedPVdagM(0.25,Ddwf,Dpv); // 58; Loosen coarse, tighten fine
// ShiftedPVdagM_t ShiftedPVdagM(0.1,Ddwf,Dpv); // 56 ...
// ShiftedPVdagM_t ShiftedPVdagM(0.1,Ddwf,Dpv); // 51 ... with 24 vecs
// ShiftedPVdagM_t ShiftedPVdagM(0.1,Ddwf,Dpv); // 31 ... with 24 vecs and 2^4 blocking
// ShiftedPVdagM_t ShiftedPVdagM(0.1,Ddwf,Dpv); // 43 ... with 16 vecs and 2^4 blocking, sloppier
// ShiftedPVdagM_t ShiftedPVdagM(0.1,Ddwf,Dpv); // 35 ... with 20 vecs and 2^4 blocking
// ShiftedPVdagM_t ShiftedPVdagM(0.1,Ddwf,Dpv); // 35 ... with 20 vecs and 2^4 blocking, looser coarse
// ShiftedPVdagM_t ShiftedPVdagM(0.1,Ddwf,Dpv); // 64 ... with 20 vecs, Christoph setup, and 2^4 blocking, looser coarse
ShiftedPVdagM_t ShiftedPVdagM(0.01,Ddwf,Dpv); //
// Run power method on HOA??
PowerMethod<LatticeFermion> PM;
// PM(PVdagM,src);
// PM(MdagPV,src);
// Warning: This routine calls PVdagM.Op, not PVdagM.HermOp
typedef Aggregation<vSpinColourVector,vTComplex,nbasis> Subspace;
Subspace V(Coarse5d,FGrid,cb);
Subspace U(Coarse5d,FGrid,cb);
// Breeds right singular vectors with call to HermOp (V)
V.CreateSubspaceChebyshev(RNG5,PVdagM,
nbasis,
4000.0,0.003,
500);
// Breeds left singular vectors with call to HermOp (U)
// U.CreateSubspaceChebyshev(RNG5,PVdagM,
U.CreateSubspaceChebyshev(RNG5,MdagPV,
nbasis,
4000.0,0.003,
500);
typedef Aggregation<vSpinColourVector,vTComplex,2*nbasis> CombinedSubspace;
CombinedSubspace CombinedUV(Coarse5d,FGrid,cb);
for(int b=0;b<nbasis;b++){
CombinedUV.subspace[b] = V.subspace[b];
CombinedUV.subspace[b+nbasis] = U.subspace[b];
}
int bl, br;
std::cout <<" <V| PVdagM| V> " <<std::endl;
for(bl=0;bl<nbasis;bl++){
for(br=0;br<nbasis;br++){
PVdagM.Op(V.subspace[br],src);
std::cout <<bl<<" "<<br<<"\t"<<innerProduct(V.subspace[bl],src)<<std::endl;
}}
std::cout <<" <V| PVdagM| U> " <<std::endl;
for(bl=0;bl<nbasis;bl++){
for(br=0;br<nbasis;br++){
PVdagM.Op(U.subspace[br],src);
std::cout <<bl<<" "<<br<<"\t"<<innerProduct(V.subspace[bl],src)<<std::endl;
}}
std::cout <<" <U| PVdagM| V> " <<std::endl;
for(bl=0;bl<nbasis;bl++){
for(br=0;br<nbasis;br++){
PVdagM.Op(V.subspace[br],src);
std::cout <<bl<<" "<<br<<"\t"<<innerProduct(U.subspace[bl],src)<<std::endl;
}}
std::cout <<" <U| PVdagM| U> " <<std::endl;
for(bl=0;bl<nbasis;bl++){
for(br=0;br<nbasis;br++){
PVdagM.Op(U.subspace[br],src);
std::cout <<bl<<" "<<br<<"\t"<<innerProduct(U.subspace[bl],src)<<std::endl;
}}
typedef GeneralCoarsenedMatrix<vSpinColourVector,vTComplex,nbasis> LittleDiracOperatorV;
typedef LittleDiracOperatorV::CoarseVector CoarseVectorV;
typedef GeneralCoarsenedMatrix<vSpinColourVector,vTComplex,2*nbasis> LittleDiracOperator;
typedef LittleDiracOperator::CoarseVector CoarseVector;
V.Orthogonalise();
for(int b =0 ; b<nbasis;b++){
CoarseVectorV c_src (Coarse5d);
V.ProjectToSubspace (c_src,U.subspace[b]);
V.PromoteFromSubspace(c_src,src);
std::cout << " Completeness of U in V ["<< b<<"] "<< std::sqrt(norm2(src)/norm2(U.subspace[b]))<<std::endl;
}
CoarseVector c_src (Coarse5d);
CoarseVector c_res (Coarse5d);
CoarseVector c_proj(Coarse5d);
LittleDiracOperator LittleDiracOpPV(geom,FGrid,Coarse5d);
LittleDiracOpPV.CoarsenOperator(PVdagM,CombinedUV,CombinedUV);
std::cout<<GridLogMessage<<std::endl;
std::cout<<GridLogMessage<<"*******************************************"<<std::endl;
std::cout<<GridLogMessage<<std::endl;
std::cout<<GridLogMessage<<"Testing coarsened operator "<<std::endl;
Complex one(1.0);
c_src = one; // 1 in every element for vector 1.
blockPromote(c_src,err,CombinedUV.subspace);
LatticeFermion prom(FGrid);
prom=Zero();
for(int b=0;b<nbasis*2;b++){
prom=prom+CombinedUV.subspace[b];
}
std::cout<<GridLogMessage<<"c_src "<<norm2(c_src)<<std::endl;
std::cout<<GridLogMessage<<"prom "<<norm2(prom)<<std::endl;
PVdagM.Op(prom,tmp);
blockProject(c_proj,tmp,CombinedUV.subspace);
std::cout<<GridLogMessage<<" Called Big Dirac Op "<<norm2(tmp)<<std::endl;
LittleDiracOpPV.M(c_src,c_res);
std::cout<<GridLogMessage<<" Called Little Dirac Op c_src "<< norm2(c_src) << " c_res "<< norm2(c_res) <<std::endl;
std::cout<<GridLogMessage<<"Little dop : "<<norm2(c_res)<<std::endl;
std::cout<<GridLogMessage<<"Big dop in subspace : "<<norm2(c_proj)<<std::endl;
c_proj = c_proj - c_res;
std::cout<<GridLogMessage<<" ldop error: "<<norm2(c_proj)<<std::endl;
/**********
* Some solvers
**********
*/
///////////////////////////////////////
// Coarse grid solver test
///////////////////////////////////////
std::cout<<GridLogMessage<<"******************* "<<std::endl;
std::cout<<GridLogMessage<<" Coarse Grid Solve -- Level 3 "<<std::endl;
std::cout<<GridLogMessage<<"******************* "<<std::endl;
TrivialPrecon<CoarseVector> simple;
NonHermitianLinearOperator<LittleDiracOperator,CoarseVector> LinOpCoarse(LittleDiracOpPV);
// PrecGeneralisedConjugateResidualNonHermitian<CoarseVector> L2PGCR(1.0e-4, 100, LinOpCoarse,simple,10,10);
PrecGeneralisedConjugateResidualNonHermitian<CoarseVector> L2PGCR(1.0e-2, 10, LinOpCoarse,simple,20,20);
L2PGCR.Level(3);
c_res=Zero();
L2PGCR(c_src,c_res);
////////////////////////////////////////
// Fine grid smoother
////////////////////////////////////////
std::cout<<GridLogMessage<<"******************* "<<std::endl;
std::cout<<GridLogMessage<<" Fine Grid Smoother -- Level 2 "<<std::endl;
std::cout<<GridLogMessage<<"******************* "<<std::endl;
TrivialPrecon<LatticeFermionD> simple_fine;
// NonHermitianLinearOperator<PVdagM_t,LatticeFermionD> LinOpSmooth(PVdagM);
PrecGeneralisedConjugateResidualNonHermitian<LatticeFermionD> SmootherGCR(0.01,1,ShiftedPVdagM,simple_fine,16,16);
SmootherGCR.Level(2);
LatticeFermionD f_src(FGrid);
LatticeFermionD f_res(FGrid);
f_src = one; // 1 in every element for vector 1.
f_res=Zero();
SmootherGCR(f_src,f_res);
typedef MGPreconditionerSVD<vSpinColourVector, vTComplex,nbasis*2> TwoLevelMG;
TwoLevelMG TwoLevelPrecon(CombinedUV,CombinedUV,
PVdagM,
simple_fine,
SmootherGCR,
LinOpCoarse,
L2PGCR);
PrecGeneralisedConjugateResidualNonHermitian<LatticeFermion> L1PGCR(1.0e-8,1000,PVdagM,TwoLevelPrecon,20,20);
L1PGCR.Level(1);
f_res=Zero();
L1PGCR(f_src,f_res);
std::cout<<GridLogMessage<<std::endl;
std::cout<<GridLogMessage<<"*******************************************"<<std::endl;
std::cout<<GridLogMessage<<std::endl;
std::cout<<GridLogMessage << "Done "<< std::endl;
Grid_finalize();
return 0;
}
@@ -0,0 +1,492 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./tests/Test_padded_cell.cc
Copyright (C) 2023
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#include <Grid/Grid.h>
#include <Grid/lattice/PaddedCell.h>
#include <Grid/stencil/GeneralLocalStencil.h>
#include <Grid/algorithms/iterative/PrecGeneralisedConjugateResidual.h>
#include <Grid/algorithms/iterative/PrecGeneralisedConjugateResidualNonHermitian.h>
#include <Grid/algorithms/iterative/BiCGSTAB.h>
using namespace std;
using namespace Grid;
template<class Matrix,class Field>
class PVdagMLinearOperator : public LinearOperatorBase<Field> {
Matrix &_Mat;
Matrix &_PV;
public:
PVdagMLinearOperator(Matrix &Mat,Matrix &PV): _Mat(Mat),_PV(PV){};
void OpDiag (const Field &in, Field &out) { assert(0); }
void OpDir (const Field &in, Field &out,int dir,int disp) { assert(0); }
void OpDirAll (const Field &in, std::vector<Field> &out){ assert(0); };
void Op (const Field &in, Field &out){
// std::cout << GridLogMessage<< "Op: PVdag M "<<std::endl;
Field tmp(in.Grid());
_Mat.M(in,tmp);
_PV.Mdag(tmp,out);
}
void AdjOp (const Field &in, Field &out){
// std::cout << GridLogMessage<<"AdjOp: Mdag PV "<<std::endl;
Field tmp(in.Grid());
_PV.M(in,tmp);
_Mat.Mdag(tmp,out);
}
void HermOpAndNorm(const Field &in, Field &out,RealD &n1,RealD &n2){
HermOp(in,out);
ComplexD dot = innerProduct(in,out);
n1=real(dot);
n2=norm2(out);
}
void HermOp(const Field &in, Field &out){
// std::cout <<GridLogMessage<< "HermOp: Mdag PV PVdag M"<<std::endl;
Field tmp(in.Grid());
Op(in,tmp);
AdjOp(tmp,out);
// std::cout << "HermOp done "<<norm2(out)<<std::endl;
}
};
template<class Matrix,class Field>
class MdagPVLinearOperator : public LinearOperatorBase<Field> {
Matrix &_Mat;
Matrix &_PV;
public:
MdagPVLinearOperator(Matrix &Mat,Matrix &PV): _Mat(Mat),_PV(PV){};
void OpDiag (const Field &in, Field &out) { assert(0); }
void OpDir (const Field &in, Field &out,int dir,int disp) { assert(0); }
void OpDirAll (const Field &in, std::vector<Field> &out){ assert(0); };
void Op (const Field &in, Field &out){
Field tmp(in.Grid());
// std::cout <<GridLogMessage<< "Op: PVdag M "<<std::endl;
_PV.M(in,tmp);
_Mat.Mdag(tmp,out);
}
void AdjOp (const Field &in, Field &out){
// std::cout <<GridLogMessage<< "AdjOp: Mdag PV "<<std::endl;
Field tmp(in.Grid());
_Mat.M(in,tmp);
_PV.Mdag(tmp,out);
}
void HermOpAndNorm(const Field &in, Field &out,RealD &n1,RealD &n2){
ComplexD dot = innerProduct(in,out);
n1=real(dot);
n2=norm2(out);
}
void HermOp(const Field &in, Field &out){
// std::cout << GridLogMessage<<"HermOp: PVdag M Mdag PV "<<std::endl;
Field tmp(in.Grid());
Op(in,tmp);
AdjOp(tmp,out);
// std::cout << "HermOp done "<<norm2(out)<<std::endl;
}
};
template<class Matrix,class Field>
class ShiftedPVdagMLinearOperator : public LinearOperatorBase<Field> {
Matrix &_Mat;
Matrix &_PV;
RealD shift;
public:
ShiftedPVdagMLinearOperator(RealD _shift,Matrix &Mat,Matrix &PV): shift(_shift),_Mat(Mat),_PV(PV){};
void OpDiag (const Field &in, Field &out) { assert(0); }
void OpDir (const Field &in, Field &out,int dir,int disp) { assert(0); }
void OpDirAll (const Field &in, std::vector<Field> &out){ assert(0); };
void Op (const Field &in, Field &out){
// std::cout << "Op: PVdag M "<<std::endl;
Field tmp(in.Grid());
_Mat.M(in,tmp);
_PV.Mdag(tmp,out);
out = out + shift * in;
}
void AdjOp (const Field &in, Field &out){
// std::cout << "AdjOp: Mdag PV "<<std::endl;
Field tmp(in.Grid());
_PV.M(tmp,out);
_Mat.Mdag(in,tmp);
out = out + shift * in;
}
void HermOpAndNorm(const Field &in, Field &out,RealD &n1,RealD &n2){ assert(0); }
void HermOp(const Field &in, Field &out){
// std::cout << "HermOp: Mdag PV PVdag M"<<std::endl;
Field tmp(in.Grid());
Op(in,tmp);
AdjOp(tmp,out);
}
};
template<class Fobj,class CComplex,int nbasis>
class MGPreconditionerSVD : public LinearFunction< Lattice<Fobj> > {
public:
using LinearFunction<Lattice<Fobj> >::operator();
typedef Aggregation<Fobj,CComplex,nbasis> Aggregates;
typedef typename Aggregation<Fobj,CComplex,nbasis>::FineField FineField;
typedef typename Aggregation<Fobj,CComplex,nbasis>::CoarseVector CoarseVector;
typedef typename Aggregation<Fobj,CComplex,nbasis>::CoarseMatrix CoarseMatrix;
typedef LinearOperatorBase<FineField> FineOperator;
typedef LinearFunction <FineField> FineSmoother;
typedef LinearOperatorBase<CoarseVector> CoarseOperator;
typedef LinearFunction <CoarseVector> CoarseSolver;
Aggregates & _FineToCoarse;
Aggregates & _CoarseToFine;
FineOperator & _FineOperator;
FineSmoother & _PreSmoother;
FineSmoother & _PostSmoother;
CoarseOperator & _CoarseOperator;
CoarseSolver & _CoarseSolve;
int level; void Level(int lv) {level = lv; };
MGPreconditionerSVD(Aggregates &FtoC,
Aggregates &CtoF,
FineOperator &Fine,
FineSmoother &PreSmoother,
FineSmoother &PostSmoother,
CoarseOperator &CoarseOperator_,
CoarseSolver &CoarseSolve_)
: _FineToCoarse(FtoC),
_CoarseToFine(CtoF),
_FineOperator(Fine),
_PreSmoother(PreSmoother),
_PostSmoother(PostSmoother),
_CoarseOperator(CoarseOperator_),
_CoarseSolve(CoarseSolve_),
level(1) { }
virtual void operator()(const FineField &in, FineField & out)
{
GridBase *CoarseGrid = _FineToCoarse.CoarseGrid;
// auto CoarseGrid = _CoarseOperator.Grid();
CoarseVector Csrc(CoarseGrid);
CoarseVector Csol(CoarseGrid);
FineField vec1(in.Grid());
FineField vec2(in.Grid());
std::cout<<GridLogMessage << "Calling PreSmoother " <<std::endl;
// std::cout<<GridLogMessage << "Calling PreSmoother input residual "<<norm2(in) <<std::endl;
double t;
// Fine Smoother
// out = in;
out = Zero();
t=-usecond();
_PreSmoother(in,out);
t+=usecond();
std::cout<<GridLogMessage << "PreSmoother took "<< t/1000.0<< "ms" <<std::endl;
// Update the residual
_FineOperator.Op(out,vec1); sub(vec1, in ,vec1);
// std::cout<<GridLogMessage <<"Residual-1 now " <<norm2(vec1)<<std::endl;
// Fine to Coarse
t=-usecond();
_FineToCoarse.ProjectToSubspace (Csrc,vec1);
t+=usecond();
std::cout<<GridLogMessage << "Project to coarse took "<< t/1000.0<< "ms" <<std::endl;
// Coarse correction
t=-usecond();
Csol = Zero();
_CoarseSolve(Csrc,Csol);
//Csol=Zero();
t+=usecond();
std::cout<<GridLogMessage << "Coarse solve took "<< t/1000.0<< "ms" <<std::endl;
// Coarse to Fine
t=-usecond();
// _CoarseOperator.PromoteFromSubspace(_Aggregates,Csol,vec1);
_CoarseToFine.PromoteFromSubspace(Csol,vec1);
add(out,out,vec1);
t+=usecond();
std::cout<<GridLogMessage << "Promote to this level took "<< t/1000.0<< "ms" <<std::endl;
// Residual
_FineOperator.Op(out,vec1); sub(vec1 ,in , vec1);
// std::cout<<GridLogMessage <<"Residual-2 now " <<norm2(vec1)<<std::endl;
// Fine Smoother
t=-usecond();
// vec2=vec1;
vec2=Zero();
_PostSmoother(vec1,vec2);
t+=usecond();
std::cout<<GridLogMessage << "PostSmoother took "<< t/1000.0<< "ms" <<std::endl;
add( out,out,vec2);
std::cout<<GridLogMessage << "Done " <<std::endl;
}
};
int main (int argc, char ** argv)
{
Grid_init(&argc,&argv);
const int Ls=16;
GridCartesian * UGrid = SpaceTimeGrid::makeFourDimGrid(GridDefaultLatt(), GridDefaultSimd(Nd,vComplex::Nsimd()),GridDefaultMpi());
GridRedBlackCartesian * UrbGrid = SpaceTimeGrid::makeFourDimRedBlackGrid(UGrid);
GridCartesian * FGrid = SpaceTimeGrid::makeFiveDimGrid(Ls,UGrid);
GridRedBlackCartesian * FrbGrid = SpaceTimeGrid::makeFiveDimRedBlackGrid(Ls,UGrid);
// Construct a coarsened grid
Coordinate clatt = GridDefaultLatt();
for(int d=0;d<clatt.size();d++){
clatt[d] = clatt[d]/2;
// clatt[d] = clatt[d]/4;
}
GridCartesian *Coarse4d = SpaceTimeGrid::makeFourDimGrid(clatt, GridDefaultSimd(Nd,vComplex::Nsimd()),GridDefaultMpi());;
GridCartesian *Coarse5d = SpaceTimeGrid::makeFiveDimGrid(1,Coarse4d);
std::vector<int> seeds4({1,2,3,4});
std::vector<int> seeds5({5,6,7,8});
std::vector<int> cseeds({5,6,7,8});
GridParallelRNG RNG5(FGrid); RNG5.SeedFixedIntegers(seeds5);
GridParallelRNG RNG4(UGrid); RNG4.SeedFixedIntegers(seeds4);
GridParallelRNG CRNG(Coarse5d);CRNG.SeedFixedIntegers(cseeds);
LatticeFermion src(FGrid); random(RNG5,src);
LatticeFermion result(FGrid); result=Zero();
LatticeFermion ref(FGrid); ref=Zero();
LatticeFermion tmp(FGrid);
LatticeFermion err(FGrid);
LatticeGaugeField Umu(UGrid);
FieldMetaData header;
std::string file("ckpoint_lat.4000");
NerscIO::readConfiguration(Umu,header,file);
RealD mass=0.01;
RealD M5=1.8;
DomainWallFermionD Ddwf(Umu,*FGrid,*FrbGrid,*UGrid,*UrbGrid,mass,M5);
DomainWallFermionD Dpv(Umu,*FGrid,*FrbGrid,*UGrid,*UrbGrid,1.0,M5);
const int nbasis = 20;
const int cb = 0 ;
NextToNearestStencilGeometry5D geom(Coarse5d);
std::cout<<GridLogMessage<<std::endl;
std::cout<<GridLogMessage<<"*******************************************"<<std::endl;
std::cout<<GridLogMessage<<std::endl;
typedef PVdagMLinearOperator<DomainWallFermionD,LatticeFermionD> PVdagM_t;
typedef MdagPVLinearOperator<DomainWallFermionD,LatticeFermionD> MdagPV_t;
typedef ShiftedPVdagMLinearOperator<DomainWallFermionD,LatticeFermionD> ShiftedPVdagM_t;
PVdagM_t PVdagM(Ddwf,Dpv);
MdagPV_t MdagPV(Ddwf,Dpv);
// ShiftedPVdagM_t ShiftedPVdagM(2.0,Ddwf,Dpv); // 355
// ShiftedPVdagM_t ShiftedPVdagM(1.0,Ddwf,Dpv); // 246
// ShiftedPVdagM_t ShiftedPVdagM(0.5,Ddwf,Dpv); // 183
// ShiftedPVdagM_t ShiftedPVdagM(0.25,Ddwf,Dpv); // 145
// ShiftedPVdagM_t ShiftedPVdagM(0.1,Ddwf,Dpv); // 134
// ShiftedPVdagM_t ShiftedPVdagM(0.1,Ddwf,Dpv); // 127 -- NULL space via inverse iteration
// ShiftedPVdagM_t ShiftedPVdagM(0.1,Ddwf,Dpv); // 57 -- NULL space via inverse iteration; 3 iterations
// ShiftedPVdagM_t ShiftedPVdagM(0.25,Ddwf,Dpv); // 57 , tighter inversion
// ShiftedPVdagM_t ShiftedPVdagM(0.25,Ddwf,Dpv); // nbasis 20 -- 49 iters
// ShiftedPVdagM_t ShiftedPVdagM(0.25,Ddwf,Dpv); // nbasis 20 -- 70 iters; asymmetric
// ShiftedPVdagM_t ShiftedPVdagM(0.25,Ddwf,Dpv); // 58; Loosen coarse, tighten fine
// ShiftedPVdagM_t ShiftedPVdagM(0.1,Ddwf,Dpv); // 56 ...
// ShiftedPVdagM_t ShiftedPVdagM(0.1,Ddwf,Dpv); // 51 ... with 24 vecs
// ShiftedPVdagM_t ShiftedPVdagM(0.1,Ddwf,Dpv); // 31 ... with 24 vecs and 2^4 blocking
// ShiftedPVdagM_t ShiftedPVdagM(0.1,Ddwf,Dpv); // 43 ... with 16 vecs and 2^4 blocking, sloppier
// ShiftedPVdagM_t ShiftedPVdagM(0.1,Ddwf,Dpv); // 35 ... with 20 vecs and 2^4 blocking
// ShiftedPVdagM_t ShiftedPVdagM(0.1,Ddwf,Dpv); // 35 ... with 20 vecs and 2^4 blocking, looser coarse
// ShiftedPVdagM_t ShiftedPVdagM(0.1,Ddwf,Dpv); // 64 ... with 20 vecs, Christoph setup, and 2^4 blocking, looser coarse
ShiftedPVdagM_t ShiftedPVdagM(0.01,Ddwf,Dpv); //
// Run power method on HOA??
PowerMethod<LatticeFermion> PM;
// PM(PVdagM,src);
// PM(MdagPV,src);
// Warning: This routine calls PVdagM.Op, not PVdagM.HermOp
typedef Aggregation<vSpinColourVector,vTComplex,nbasis> Subspace;
Subspace V(Coarse5d,FGrid,cb);
Subspace U(Coarse5d,FGrid,cb);
// Breeds right singular vectors with call to HermOp (V)
V.CreateSubspace(RNG5,PVdagM,nbasis);
// Breeds left singular vectors with call to HermOp (U)
// U.CreateSubspaceChebyshev(RNG5,MdagPV,
U.CreateSubspace(RNG5,PVdagM,nbasis);
typedef Aggregation<vSpinColourVector,vTComplex,2*nbasis> CombinedSubspace;
CombinedSubspace CombinedUV(Coarse5d,FGrid,cb);
for(int b=0;b<nbasis;b++){
CombinedUV.subspace[b] = V.subspace[b];
CombinedUV.subspace[b+nbasis] = U.subspace[b];
}
int bl, br;
std::cout <<" <V| PVdagM| V> " <<std::endl;
for(bl=0;bl<nbasis;bl++){
for(br=0;br<nbasis;br++){
PVdagM.Op(V.subspace[br],src);
std::cout <<bl<<" "<<br<<"\t"<<innerProduct(V.subspace[bl],src)<<std::endl;
}}
std::cout <<" <V| PVdagM| U> " <<std::endl;
for(bl=0;bl<nbasis;bl++){
for(br=0;br<nbasis;br++){
PVdagM.Op(U.subspace[br],src);
std::cout <<bl<<" "<<br<<"\t"<<innerProduct(V.subspace[bl],src)<<std::endl;
}}
std::cout <<" <U| PVdagM| V> " <<std::endl;
for(bl=0;bl<nbasis;bl++){
for(br=0;br<nbasis;br++){
PVdagM.Op(V.subspace[br],src);
std::cout <<bl<<" "<<br<<"\t"<<innerProduct(U.subspace[bl],src)<<std::endl;
}}
std::cout <<" <U| PVdagM| U> " <<std::endl;
for(bl=0;bl<nbasis;bl++){
for(br=0;br<nbasis;br++){
PVdagM.Op(U.subspace[br],src);
std::cout <<bl<<" "<<br<<"\t"<<innerProduct(U.subspace[bl],src)<<std::endl;
}}
typedef GeneralCoarsenedMatrix<vSpinColourVector,vTComplex,nbasis> LittleDiracOperatorV;
typedef LittleDiracOperatorV::CoarseVector CoarseVectorV;
typedef GeneralCoarsenedMatrix<vSpinColourVector,vTComplex,2*nbasis> LittleDiracOperator;
typedef LittleDiracOperator::CoarseVector CoarseVector;
V.Orthogonalise();
for(int b =0 ; b<nbasis;b++){
CoarseVectorV c_src (Coarse5d);
V.ProjectToSubspace (c_src,U.subspace[b]);
V.PromoteFromSubspace(c_src,src);
std::cout << " Completeness of U in V ["<< b<<"] "<< std::sqrt(norm2(src)/norm2(U.subspace[b]))<<std::endl;
}
CoarseVector c_src (Coarse5d);
CoarseVector c_res (Coarse5d);
CoarseVector c_proj(Coarse5d);
LittleDiracOperator LittleDiracOpPV(geom,FGrid,Coarse5d);
LittleDiracOpPV.CoarsenOperator(PVdagM,CombinedUV,CombinedUV);
std::cout<<GridLogMessage<<std::endl;
std::cout<<GridLogMessage<<"*******************************************"<<std::endl;
std::cout<<GridLogMessage<<std::endl;
std::cout<<GridLogMessage<<"Testing coarsened operator "<<std::endl;
Complex one(1.0);
c_src = one; // 1 in every element for vector 1.
blockPromote(c_src,err,CombinedUV.subspace);
LatticeFermion prom(FGrid);
prom=Zero();
for(int b=0;b<nbasis*2;b++){
prom=prom+CombinedUV.subspace[b];
}
std::cout<<GridLogMessage<<"c_src "<<norm2(c_src)<<std::endl;
std::cout<<GridLogMessage<<"prom "<<norm2(prom)<<std::endl;
PVdagM.Op(prom,tmp);
blockProject(c_proj,tmp,CombinedUV.subspace);
std::cout<<GridLogMessage<<" Called Big Dirac Op "<<norm2(tmp)<<std::endl;
LittleDiracOpPV.M(c_src,c_res);
std::cout<<GridLogMessage<<" Called Little Dirac Op c_src "<< norm2(c_src) << " c_res "<< norm2(c_res) <<std::endl;
std::cout<<GridLogMessage<<"Little dop : "<<norm2(c_res)<<std::endl;
std::cout<<GridLogMessage<<"Big dop in subspace : "<<norm2(c_proj)<<std::endl;
c_proj = c_proj - c_res;
std::cout<<GridLogMessage<<" ldop error: "<<norm2(c_proj)<<std::endl;
/**********
* Some solvers
**********
*/
///////////////////////////////////////
// Coarse grid solver test
///////////////////////////////////////
std::cout<<GridLogMessage<<"******************* "<<std::endl;
std::cout<<GridLogMessage<<" Coarse Grid Solve -- Level 3 "<<std::endl;
std::cout<<GridLogMessage<<"******************* "<<std::endl;
TrivialPrecon<CoarseVector> simple;
NonHermitianLinearOperator<LittleDiracOperator,CoarseVector> LinOpCoarse(LittleDiracOpPV);
// PrecGeneralisedConjugateResidualNonHermitian<CoarseVector> L2PGCR(1.0e-4, 100, LinOpCoarse,simple,10,10);
PrecGeneralisedConjugateResidualNonHermitian<CoarseVector> L2PGCR(1.0e-2, 10, LinOpCoarse,simple,20,20);
L2PGCR.Level(3);
c_res=Zero();
L2PGCR(c_src,c_res);
////////////////////////////////////////
// Fine grid smoother
////////////////////////////////////////
std::cout<<GridLogMessage<<"******************* "<<std::endl;
std::cout<<GridLogMessage<<" Fine Grid Smoother -- Level 2 "<<std::endl;
std::cout<<GridLogMessage<<"******************* "<<std::endl;
TrivialPrecon<LatticeFermionD> simple_fine;
// NonHermitianLinearOperator<PVdagM_t,LatticeFermionD> LinOpSmooth(PVdagM);
PrecGeneralisedConjugateResidualNonHermitian<LatticeFermionD> SmootherGCR(0.01,1,ShiftedPVdagM,simple_fine,16,16);
SmootherGCR.Level(2);
LatticeFermionD f_src(FGrid);
LatticeFermionD f_res(FGrid);
f_src = one; // 1 in every element for vector 1.
f_res=Zero();
SmootherGCR(f_src,f_res);
typedef MGPreconditionerSVD<vSpinColourVector, vTComplex,nbasis*2> TwoLevelMG;
TwoLevelMG TwoLevelPrecon(CombinedUV,CombinedUV,
PVdagM,
simple_fine,
SmootherGCR,
LinOpCoarse,
L2PGCR);
PrecGeneralisedConjugateResidualNonHermitian<LatticeFermion> L1PGCR(1.0e-8,1000,PVdagM,TwoLevelPrecon,20,20);
L1PGCR.Level(1);
f_res=Zero();
L1PGCR(f_src,f_res);
std::cout<<GridLogMessage<<std::endl;
std::cout<<GridLogMessage<<"*******************************************"<<std::endl;
std::cout<<GridLogMessage<<std::endl;
std::cout<<GridLogMessage << "Done "<< std::endl;
Grid_finalize();
return 0;
}
@@ -0,0 +1,479 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./tests/Test_padded_cell.cc
Copyright (C) 2023
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#include <Grid/Grid.h>
#include <Grid/lattice/PaddedCell.h>
#include <Grid/stencil/GeneralLocalStencil.h>
#include <Grid/algorithms/iterative/PrecGeneralisedConjugateResidual.h>
#include <Grid/algorithms/iterative/PrecGeneralisedConjugateResidualNonHermitian.h>
#include <Grid/algorithms/iterative/BiCGSTAB.h>
using namespace std;
using namespace Grid;
template<class Matrix,class Field>
class PVdagMLinearOperator : public LinearOperatorBase<Field> {
Matrix &_Mat;
Matrix &_PV;
public:
PVdagMLinearOperator(Matrix &Mat,Matrix &PV): _Mat(Mat),_PV(PV){};
void OpDiag (const Field &in, Field &out) { assert(0); }
void OpDir (const Field &in, Field &out,int dir,int disp) { assert(0); }
void OpDirAll (const Field &in, std::vector<Field> &out){ assert(0); };
void Op (const Field &in, Field &out){
// std::cout << GridLogMessage<< "Op: PVdag M "<<std::endl;
Field tmp(in.Grid());
_Mat.M(in,tmp);
_PV.Mdag(tmp,out);
}
void AdjOp (const Field &in, Field &out){
// std::cout << GridLogMessage<<"AdjOp: Mdag PV "<<std::endl;
Field tmp(in.Grid());
_PV.M(in,tmp);
_Mat.Mdag(tmp,out);
}
void HermOpAndNorm(const Field &in, Field &out,RealD &n1,RealD &n2){
assert(0);
}
void HermOp(const Field &in, Field &out){
// std::cout <<GridLogMessage<< "HermOp: Mdag PV PVdag M"<<std::endl;
Field tmp(in.Grid());
Op(in,tmp);
AdjOp(tmp,out);
// std::cout << "HermOp done "<<norm2(out)<<std::endl;
}
};
template<class Matrix,class Field>
class MdagPVLinearOperator : public LinearOperatorBase<Field> {
Matrix &_Mat;
Matrix &_PV;
public:
MdagPVLinearOperator(Matrix &Mat,Matrix &PV): _Mat(Mat),_PV(PV){};
void OpDiag (const Field &in, Field &out) { assert(0); }
void OpDir (const Field &in, Field &out,int dir,int disp) { assert(0); }
void OpDirAll (const Field &in, std::vector<Field> &out){ assert(0); };
void Op (const Field &in, Field &out){
Field tmp(in.Grid());
// std::cout <<GridLogMessage<< "Op: PVdag M "<<std::endl;
_PV.M(in,tmp);
_Mat.Mdag(tmp,out);
}
void AdjOp (const Field &in, Field &out){
// std::cout <<GridLogMessage<< "AdjOp: Mdag PV "<<std::endl;
Field tmp(in.Grid());
_Mat.M(in,tmp);
_PV.Mdag(tmp,out);
}
void HermOpAndNorm(const Field &in, Field &out,RealD &n1,RealD &n2){
assert(0);
}
void HermOp(const Field &in, Field &out){
// std::cout << GridLogMessage<<"HermOp: PVdag M Mdag PV "<<std::endl;
Field tmp(in.Grid());
Op(in,tmp);
AdjOp(tmp,out);
// std::cout << "HermOp done "<<norm2(out)<<std::endl;
}
};
template<class Matrix,class Field>
class ShiftedPVdagMLinearOperator : public LinearOperatorBase<Field> {
Matrix &_Mat;
Matrix &_PV;
RealD shift;
public:
ShiftedPVdagMLinearOperator(RealD _shift,Matrix &Mat,Matrix &PV): shift(_shift),_Mat(Mat),_PV(PV){};
void OpDiag (const Field &in, Field &out) { assert(0); }
void OpDir (const Field &in, Field &out,int dir,int disp) { assert(0); }
void OpDirAll (const Field &in, std::vector<Field> &out){ assert(0); };
void Op (const Field &in, Field &out){
// std::cout << "Op: PVdag M "<<std::endl;
Field tmp(in.Grid());
_Mat.M(in,tmp);
_PV.Mdag(tmp,out);
out = out + shift * in;
}
void AdjOp (const Field &in, Field &out){
// std::cout << "AdjOp: Mdag PV "<<std::endl;
Field tmp(in.Grid());
_PV.M(tmp,out);
_Mat.Mdag(in,tmp);
out = out + shift * in;
}
void HermOpAndNorm(const Field &in, Field &out,RealD &n1,RealD &n2){ assert(0); }
void HermOp(const Field &in, Field &out){
// std::cout << "HermOp: Mdag PV PVdag M"<<std::endl;
Field tmp(in.Grid());
Op(in,tmp);
AdjOp(tmp,out);
}
};
template<class Fobj,class CComplex,int nbasis>
class MGPreconditionerSVD : public LinearFunction< Lattice<Fobj> > {
public:
using LinearFunction<Lattice<Fobj> >::operator();
typedef Aggregation<Fobj,CComplex,nbasis> Aggregates;
typedef typename Aggregation<Fobj,CComplex,nbasis>::FineField FineField;
typedef typename Aggregation<Fobj,CComplex,nbasis>::CoarseVector CoarseVector;
typedef typename Aggregation<Fobj,CComplex,nbasis>::CoarseMatrix CoarseMatrix;
typedef LinearOperatorBase<FineField> FineOperator;
typedef LinearFunction <FineField> FineSmoother;
typedef LinearOperatorBase<CoarseVector> CoarseOperator;
typedef LinearFunction <CoarseVector> CoarseSolver;
///////////////////////////////
// SVD is M = U S Vdag
//
// Define a subset of Vc and Uc in Complex_f,c matrix
// - these are the coarsening, non-square matrices
//
// Solve a coarse approx to
//
// M psi = eta
//
// via
//
// Uc^dag U S Vdag Vc Vc^dag psi = Uc^dag eta
//
// M_coarse Vc^dag psi = M_coarse psi_c = eta_c
//
///////////////////////////////
Aggregates & _U;
Aggregates & _V;
FineOperator & _FineOperator;
FineSmoother & _PreSmoother;
FineSmoother & _PostSmoother;
CoarseOperator & _CoarseOperator;
CoarseSolver & _CoarseSolve;
int level; void Level(int lv) {level = lv; };
MGPreconditionerSVD(Aggregates &U,
Aggregates &V,
FineOperator &Fine,
FineSmoother &PreSmoother,
FineSmoother &PostSmoother,
CoarseOperator &CoarseOperator_,
CoarseSolver &CoarseSolve_)
: _U(U),
_V(V),
_FineOperator(Fine),
_PreSmoother(PreSmoother),
_PostSmoother(PostSmoother),
_CoarseOperator(CoarseOperator_),
_CoarseSolve(CoarseSolve_),
level(1) { }
virtual void operator()(const FineField &in, FineField & out)
{
GridBase *CoarseGrid = _U.CoarseGrid;
// auto CoarseGrid = _CoarseOperator.Grid();
CoarseVector Csrc(CoarseGrid);
CoarseVector Csol(CoarseGrid);
FineField vec1(in.Grid());
FineField vec2(in.Grid());
std::cout<<GridLogMessage << "Calling PreSmoother " <<std::endl;
// std::cout<<GridLogMessage << "Calling PreSmoother input residual "<<norm2(in) <<std::endl;
double t;
// Fine Smoother
// out = in;
out = Zero();
t=-usecond();
_PreSmoother(in,out);
t+=usecond();
std::cout<<GridLogMessage << "PreSmoother took "<< t/1000.0<< "ms" <<std::endl;
// Update the residual
_FineOperator.Op(out,vec1); sub(vec1, in ,vec1);
// std::cout<<GridLogMessage <<"Residual-1 now " <<norm2(vec1)<<std::endl;
// Uc^dag U S Vdag Vc Vc^dag psi = Uc^dag eta
// Fine to Coarse
t=-usecond();
_U.ProjectToSubspace (Csrc,vec1);
t+=usecond();
std::cout<<GridLogMessage << "Project to coarse took "<< t/1000.0<< "ms" <<std::endl;
// Coarse correction
t=-usecond();
Csol = Zero();
_CoarseSolve(Csrc,Csol);
//Csol=Zero();
t+=usecond();
std::cout<<GridLogMessage << "Coarse solve took "<< t/1000.0<< "ms" <<std::endl;
// Coarse to Fine
t=-usecond();
// _CoarseOperator.PromoteFromSubspace(_Aggregates,Csol,vec1);
_V.PromoteFromSubspace(Csol,vec1);
add(out,out,vec1);
t+=usecond();
std::cout<<GridLogMessage << "Promote to this level took "<< t/1000.0<< "ms" <<std::endl;
// Residual
_FineOperator.Op(out,vec1); sub(vec1 ,in , vec1);
// std::cout<<GridLogMessage <<"Residual-2 now " <<norm2(vec1)<<std::endl;
// Fine Smoother
t=-usecond();
// vec2=vec1;
vec2=Zero();
_PostSmoother(vec1,vec2);
t+=usecond();
std::cout<<GridLogMessage << "PostSmoother took "<< t/1000.0<< "ms" <<std::endl;
add( out,out,vec2);
std::cout<<GridLogMessage << "Done " <<std::endl;
}
};
int main (int argc, char ** argv)
{
Grid_init(&argc,&argv);
const int Ls=16;
GridCartesian * UGrid = SpaceTimeGrid::makeFourDimGrid(GridDefaultLatt(), GridDefaultSimd(Nd,vComplex::Nsimd()),GridDefaultMpi());
GridRedBlackCartesian * UrbGrid = SpaceTimeGrid::makeFourDimRedBlackGrid(UGrid);
GridCartesian * FGrid = SpaceTimeGrid::makeFiveDimGrid(Ls,UGrid);
GridRedBlackCartesian * FrbGrid = SpaceTimeGrid::makeFiveDimRedBlackGrid(Ls,UGrid);
// Construct a coarsened grid
Coordinate clatt = GridDefaultLatt();
for(int d=0;d<clatt.size();d++){
clatt[d] = clatt[d]/2;
// clatt[d] = clatt[d]/4;
}
GridCartesian *Coarse4d = SpaceTimeGrid::makeFourDimGrid(clatt, GridDefaultSimd(Nd,vComplex::Nsimd()),GridDefaultMpi());;
GridCartesian *Coarse5d = SpaceTimeGrid::makeFiveDimGrid(1,Coarse4d);
std::vector<int> seeds4({1,2,3,4});
std::vector<int> seeds5({5,6,7,8});
std::vector<int> cseeds({5,6,7,8});
GridParallelRNG RNG5(FGrid); RNG5.SeedFixedIntegers(seeds5);
GridParallelRNG RNG4(UGrid); RNG4.SeedFixedIntegers(seeds4);
GridParallelRNG CRNG(Coarse5d);CRNG.SeedFixedIntegers(cseeds);
LatticeFermion src(FGrid); random(RNG5,src);
LatticeFermion result(FGrid); result=Zero();
LatticeFermion ref(FGrid); ref=Zero();
LatticeFermion tmp(FGrid);
LatticeFermion err(FGrid);
LatticeGaugeField Umu(UGrid);
FieldMetaData header;
std::string file("ckpoint_lat.4000");
NerscIO::readConfiguration(Umu,header,file);
RealD mass=0.01;
RealD M5=1.8;
DomainWallFermionD Ddwf(Umu,*FGrid,*FrbGrid,*UGrid,*UrbGrid,mass,M5);
DomainWallFermionD Dpv(Umu,*FGrid,*FrbGrid,*UGrid,*UrbGrid,1.0,M5);
const int nbasis = 60;
const int cb = 0 ;
NextToNearestStencilGeometry5D geom(Coarse5d);
std::cout<<GridLogMessage<<std::endl;
std::cout<<GridLogMessage<<"*******************************************"<<std::endl;
std::cout<<GridLogMessage<<std::endl;
typedef PVdagMLinearOperator<DomainWallFermionD,LatticeFermionD> PVdagM_t;
typedef MdagPVLinearOperator<DomainWallFermionD,LatticeFermionD> MdagPV_t;
typedef ShiftedPVdagMLinearOperator<DomainWallFermionD,LatticeFermionD> ShiftedPVdagM_t;
PVdagM_t PVdagM(Ddwf,Dpv);
MdagPV_t MdagPV(Ddwf,Dpv);
// ShiftedPVdagM_t ShiftedPVdagM(2.0,Ddwf,Dpv); // 355
// ShiftedPVdagM_t ShiftedPVdagM(1.0,Ddwf,Dpv); // 246
// ShiftedPVdagM_t ShiftedPVdagM(0.5,Ddwf,Dpv); // 183
// ShiftedPVdagM_t ShiftedPVdagM(0.25,Ddwf,Dpv); // 145
// ShiftedPVdagM_t ShiftedPVdagM(0.1,Ddwf,Dpv); // 134
// ShiftedPVdagM_t ShiftedPVdagM(0.1,Ddwf,Dpv); // 127 -- NULL space via inverse iteration
// ShiftedPVdagM_t ShiftedPVdagM(0.1,Ddwf,Dpv); // 57 -- NULL space via inverse iteration; 3 iterations
// ShiftedPVdagM_t ShiftedPVdagM(0.25,Ddwf,Dpv); // 57 , tighter inversion
// ShiftedPVdagM_t ShiftedPVdagM(0.25,Ddwf,Dpv); // nbasis 20 -- 49 iters
// ShiftedPVdagM_t ShiftedPVdagM(0.25,Ddwf,Dpv); // nbasis 20 -- 70 iters; asymmetric
// ShiftedPVdagM_t ShiftedPVdagM(0.25,Ddwf,Dpv); // 58; Loosen coarse, tighten fine
// ShiftedPVdagM_t ShiftedPVdagM(0.1,Ddwf,Dpv); // 56 ...
// ShiftedPVdagM_t ShiftedPVdagM(0.1,Ddwf,Dpv); // 51 ... with 24 vecs
// ShiftedPVdagM_t ShiftedPVdagM(0.1,Ddwf,Dpv); // 31 ... with 24 vecs and 2^4 blocking
// ShiftedPVdagM_t ShiftedPVdagM(0.1,Ddwf,Dpv); // 43 ... with 16 vecs and 2^4 blocking, sloppier
// ShiftedPVdagM_t ShiftedPVdagM(0.1,Ddwf,Dpv); // 35 ... with 20 vecs and 2^4 blocking
// ShiftedPVdagM_t ShiftedPVdagM(0.1,Ddwf,Dpv); // 35 ... with 20 vecs and 2^4 blocking, looser coarse
// ShiftedPVdagM_t ShiftedPVdagM(0.1,Ddwf,Dpv); // 64 ... with 20 vecs, Christoph setup, and 2^4 blocking, looser coarse
ShiftedPVdagM_t ShiftedPVdagM(0.01,Ddwf,Dpv); //
// Run power method on HOA??
PowerMethod<LatticeFermion> PM;
PM(PVdagM,src);
PM(MdagPV,src);
// Warning: This routine calls PVdagM.Op, not PVdagM.HermOp
typedef Aggregation<vSpinColourVector,vTComplex,nbasis> Subspace;
Subspace V(Coarse5d,FGrid,cb);
// Subspace U(Coarse5d,FGrid,cb);
// Breeds right singular vectors with call to HermOp
V.CreateSubspaceChebyshev(RNG5,PVdagM,
nbasis,
4000.0,0.003,
300);
// Breeds left singular vectors with call to HermOp
// U.CreateSubspaceChebyshev(RNG5,MdagPV,
// nbasis,
// 4000.0,0.003,
// 300);
// U.subspace=V.subspace;
// typedef Aggregation<vSpinColourVector,vTComplex,2*nbasis> CombinedSubspace;
// CombinedSubspace CombinedUV(Coarse5d,FGrid,cb);
// for(int b=0;b<nbasis;b++){
// CombinedUV.subspace[b] = V.subspace[b];
// CombinedUV.subspace[b+nbasis] = U.subspace[b];
// }
// typedef GeneralCoarsenedMatrix<vSpinColourVector,vTComplex,2*nbasis> LittleDiracOperator;
typedef GeneralCoarsenedMatrix<vSpinColourVector,vTComplex,nbasis> LittleDiracOperator;
typedef LittleDiracOperator::CoarseVector CoarseVector;
LittleDiracOperator LittleDiracOpPV(geom,FGrid,Coarse5d);
LittleDiracOpPV.CoarsenOperator(PVdagM,V,V);
std::cout<<GridLogMessage<<std::endl;
std::cout<<GridLogMessage<<"*******************************************"<<std::endl;
std::cout<<GridLogMessage<<std::endl;
std::cout<<GridLogMessage<<"Testing coarsened operator "<<std::endl;
CoarseVector c_src (Coarse5d);
CoarseVector c_res (Coarse5d);
CoarseVector c_proj(Coarse5d);
Complex one(1.0);
c_src = one; // 1 in every element for vector 1.
// blockPromote(c_src,err,CoarseToFine.subspace);
LatticeFermion prom(FGrid);
prom=Zero();
for(int b=0;b<nbasis;b++){
prom=prom+V.subspace[b];
}
std::cout<<GridLogMessage<<"c_src "<<norm2(c_src)<<std::endl;
std::cout<<GridLogMessage<<"prom "<<norm2(prom)<<std::endl;
PVdagM.Op(prom,tmp);
blockProject(c_proj,tmp,V.subspace);
std::cout<<GridLogMessage<<" Called Big Dirac Op "<<norm2(tmp)<<std::endl;
LittleDiracOpPV.M(c_src,c_res);
std::cout<<GridLogMessage<<" Called Little Dirac Op c_src "<< norm2(c_src) << " c_res "<< norm2(c_res) <<std::endl;
std::cout<<GridLogMessage<<"Little dop : "<<norm2(c_res)<<std::endl;
std::cout<<GridLogMessage<<"Big dop in subspace : "<<norm2(c_proj)<<std::endl;
c_proj = c_proj - c_res;
std::cout<<GridLogMessage<<" ldop error: "<<norm2(c_proj)<<std::endl;
/**********
* Some solvers
**********
*/
///////////////////////////////////////
// Coarse grid solver test
///////////////////////////////////////
std::cout<<GridLogMessage<<"******************* "<<std::endl;
std::cout<<GridLogMessage<<" Coarse Grid Solve -- Level 3 "<<std::endl;
std::cout<<GridLogMessage<<"******************* "<<std::endl;
TrivialPrecon<CoarseVector> simple;
NonHermitianLinearOperator<LittleDiracOperator,CoarseVector> LinOpCoarse(LittleDiracOpPV);
// PrecGeneralisedConjugateResidualNonHermitian<CoarseVector> L2PGCR(1.0e-4, 100, LinOpCoarse,simple,10,10);
PrecGeneralisedConjugateResidualNonHermitian<CoarseVector> L3PGCR(1.0e-4, 10, LinOpCoarse,simple,20,20);
L3PGCR.Level(3);
c_res=Zero();
L3PGCR(c_src,c_res);
////////////////////////////////////////
// Fine grid smoother
////////////////////////////////////////
std::cout<<GridLogMessage<<"******************* "<<std::endl;
std::cout<<GridLogMessage<<" Fine Grid Smoother -- Level 2 "<<std::endl;
std::cout<<GridLogMessage<<"******************* "<<std::endl;
TrivialPrecon<LatticeFermionD> simple_fine;
// NonHermitianLinearOperator<PVdagM_t,LatticeFermionD> LinOpSmooth(PVdagM);
PrecGeneralisedConjugateResidualNonHermitian<LatticeFermionD> SmootherGCR(0.01,1,ShiftedPVdagM,simple_fine,16,16);
SmootherGCR.Level(2);
LatticeFermionD f_src(FGrid);
LatticeFermionD f_res(FGrid);
f_src = one; // 1 in every element for vector 1.
f_res=Zero();
SmootherGCR(f_src,f_res);
// typedef MGPreconditionerSVD<vSpinColourVector, vTComplex,nbasis*2> TwoLevelMG;
typedef MGPreconditionerSVD<vSpinColourVector, vTComplex,nbasis> TwoLevelMG;
// TwoLevelMG TwoLevelPrecon(CombinedUV,CombinedUV,
TwoLevelMG TwoLevelPrecon(V,V,
PVdagM,
simple_fine,
SmootherGCR,
LinOpCoarse,
L3PGCR);
PrecGeneralisedConjugateResidualNonHermitian<LatticeFermion> L1PGCR(1.0e-8,1000,PVdagM,TwoLevelPrecon,16,16);
L1PGCR.Level(1);
f_res=Zero();
L1PGCR(f_src,f_res);
std::cout<<GridLogMessage<<std::endl;
std::cout<<GridLogMessage<<"*******************************************"<<std::endl;
std::cout<<GridLogMessage<<std::endl;
std::cout<<GridLogMessage << "Done "<< std::endl;
Grid_finalize();
return 0;
}
+333
View File
@@ -0,0 +1,333 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./tests/Test_padded_cell.cc
Copyright (C) 2023
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#include <Grid/Grid.h>
#include <Grid/lattice/PaddedCell.h>
#include <Grid/stencil/GeneralLocalStencil.h>
#include <Grid/algorithms/iterative/PrecGeneralisedConjugateResidual.h>
#include <Grid/algorithms/iterative/PrecGeneralisedConjugateResidualNonHermitian.h>
#include <Grid/algorithms/iterative/BiCGSTAB.h>
using namespace std;
using namespace Grid;
template<class Fobj,class CComplex,int nbasis>
class MGPreconditioner : public LinearFunction< Lattice<Fobj> > {
public:
using LinearFunction<Lattice<Fobj> >::operator();
typedef Aggregation<Fobj,CComplex,nbasis> Aggregates;
typedef typename Aggregation<Fobj,CComplex,nbasis>::FineField FineField;
typedef typename Aggregation<Fobj,CComplex,nbasis>::CoarseVector CoarseVector;
typedef typename Aggregation<Fobj,CComplex,nbasis>::CoarseMatrix CoarseMatrix;
typedef LinearOperatorBase<FineField> FineOperator;
typedef LinearFunction <FineField> FineSmoother;
typedef LinearOperatorBase<CoarseVector> CoarseOperator;
typedef LinearFunction <CoarseVector> CoarseSolver;
Aggregates & _Aggregates;
FineOperator & _FineOperator;
FineSmoother & _PreSmoother;
FineSmoother & _PostSmoother;
CoarseOperator & _CoarseOperator;
CoarseSolver & _CoarseSolve;
int level; void Level(int lv) {level = lv; };
MGPreconditioner(Aggregates &Agg,
FineOperator &Fine,
FineSmoother &PreSmoother,
FineSmoother &PostSmoother,
CoarseOperator &CoarseOperator_,
CoarseSolver &CoarseSolve_)
: _Aggregates(Agg),
_FineOperator(Fine),
_PreSmoother(PreSmoother),
_PostSmoother(PostSmoother),
_CoarseOperator(CoarseOperator_),
_CoarseSolve(CoarseSolve_),
level(1) { }
virtual void operator()(const FineField &in, FineField & out)
{
GridBase *CoarseGrid = _Aggregates.CoarseGrid;
// auto CoarseGrid = _CoarseOperator.Grid();
CoarseVector Csrc(CoarseGrid);
CoarseVector Csol(CoarseGrid);
FineField vec1(in.Grid());
FineField vec2(in.Grid());
std::cout<<GridLogMessage << "Calling PreSmoother " <<std::endl;
// std::cout<<GridLogMessage << "Calling PreSmoother input residual "<<norm2(in) <<std::endl;
double t;
// Fine Smoother
// out = in;
out = Zero();
t=-usecond();
_PreSmoother(in,out);
t+=usecond();
std::cout<<GridLogMessage << "PreSmoother took "<< t/1000.0<< "ms" <<std::endl;
// Update the residual
_FineOperator.Op(out,vec1); sub(vec1, in ,vec1);
// std::cout<<GridLogMessage <<"Residual-1 now " <<norm2(vec1)<<std::endl;
// Fine to Coarse
t=-usecond();
_Aggregates.ProjectToSubspace (Csrc,vec1);
t+=usecond();
std::cout<<GridLogMessage << "Project to coarse took "<< t/1000.0<< "ms" <<std::endl;
// Coarse correction
t=-usecond();
Csol = Zero();
_CoarseSolve(Csrc,Csol);
//Csol=Zero();
t+=usecond();
std::cout<<GridLogMessage << "Coarse solve took "<< t/1000.0<< "ms" <<std::endl;
// Coarse to Fine
t=-usecond();
// _CoarseOperator.PromoteFromSubspace(_Aggregates,Csol,vec1);
_Aggregates.PromoteFromSubspace(Csol,vec1);
add(out,out,vec1);
t+=usecond();
std::cout<<GridLogMessage << "Promote to this level took "<< t/1000.0<< "ms" <<std::endl;
// Residual
_FineOperator.Op(out,vec1); sub(vec1 ,in , vec1);
// std::cout<<GridLogMessage <<"Residual-2 now " <<norm2(vec1)<<std::endl;
// Fine Smoother
t=-usecond();
// vec2=vec1;
vec2=Zero();
_PostSmoother(vec1,vec2);
t+=usecond();
std::cout<<GridLogMessage << "PostSmoother took "<< t/1000.0<< "ms" <<std::endl;
add( out,out,vec2);
std::cout<<GridLogMessage << "Done " <<std::endl;
}
};
int main (int argc, char ** argv)
{
Grid_init(&argc,&argv);
const int Ls=16;
GridCartesian * UGrid = SpaceTimeGrid::makeFourDimGrid(GridDefaultLatt(), GridDefaultSimd(Nd,vComplex::Nsimd()),GridDefaultMpi());
GridRedBlackCartesian * UrbGrid = SpaceTimeGrid::makeFourDimRedBlackGrid(UGrid);
GridCartesian * FGrid = UGrid;
GridRedBlackCartesian * FrbGrid = UrbGrid;
// Construct a coarsened grid
Coordinate clatt = GridDefaultLatt();
for(int d=0;d<clatt.size();d++){
clatt[d] = clatt[d]/2;
//clatt[d] = clatt[d]/4;
}
GridCartesian *Coarse4d = SpaceTimeGrid::makeFourDimGrid(clatt, GridDefaultSimd(Nd,vComplex::Nsimd()),GridDefaultMpi());;
std::vector<int> seeds4({1,2,3,4});
std::vector<int> cseeds({5,6,7,8});
GridParallelRNG RNG4(UGrid); RNG4.SeedFixedIntegers(seeds4);
GridParallelRNG CRNG(Coarse4d);CRNG.SeedFixedIntegers(cseeds);
Complex one(1.0);
LatticeFermion src(FGrid); src=one;
LatticeFermion result(FGrid); result=Zero();
LatticeFermion ref(FGrid); ref=Zero();
LatticeFermion tmp(FGrid);
LatticeFermion err(FGrid);
LatticeFermion precsrc(FGrid);
LatticeGaugeField Umu(UGrid);
FieldMetaData header;
std::string file("ckpoint_lat");
NerscIO::readConfiguration(Umu,header,file);
RealD csw =0.0;
RealD mass=-0.92;
WilsonCloverFermionD Dw(Umu,*UGrid,*UrbGrid,mass,csw,csw);
const int nbasis = 20;
const int cb = 0 ;
LatticeFermion prom(FGrid);
typedef GeneralCoarsenedMatrix<vSpinColourVector,vTComplex,2*nbasis> LittleDiracOperator;
typedef LittleDiracOperator::CoarseVector CoarseVector;
NearestStencilGeometry4D geom(Coarse4d);
std::cout<<GridLogMessage<<std::endl;
std::cout<<GridLogMessage<<"*******************************************"<<std::endl;
std::cout<<GridLogMessage<<std::endl;
// Warning: This routine calls Linop.Op, not LinOpo.HermOp
typedef Aggregation<vSpinColourVector,vTComplex,nbasis> Subspace;
Subspace Aggregates(Coarse4d,FGrid,cb);
NonHermitianLinearOperator<WilsonCloverFermionD,LatticeFermion> LinOpDw(Dw);
ShiftedNonHermitianLinearOperator<WilsonCloverFermionD,LatticeFermion> ShiftedLinOpDw(Dw,0.01);
Aggregates.CreateSubspaceGCR(RNG4,
LinOpDw,
nbasis);
typedef Aggregation<vSpinColourVector,vTComplex,2*nbasis> CombinedSubspace;
CombinedSubspace CombinedUV(Coarse4d,UGrid,cb);
for(int b=0;b<nbasis;b++){
Gamma G5(Gamma::Algebra::Gamma5);
CombinedUV.subspace[b] = Aggregates.subspace[b];
CombinedUV.subspace[b+nbasis] = G5*Aggregates.subspace[b];
}
LittleDiracOperator LittleDiracOp(geom,FGrid,Coarse4d);
LittleDiracOp.CoarsenOperator(LinOpDw,CombinedUV);
std::cout<<GridLogMessage<<std::endl;
std::cout<<GridLogMessage<<"*******************************************"<<std::endl;
std::cout<<GridLogMessage<<std::endl;
std::cout<<GridLogMessage<<"Testing coarsened operator "<<std::endl;
CoarseVector c_src (Coarse4d);
CoarseVector c_res (Coarse4d);
CoarseVector c_proj(Coarse4d);
std::vector<LatticeFermion> subspace(2*nbasis,FGrid);
subspace=CombinedUV.subspace;
c_src = one; // 1 in every element for vector 1.
blockPromote(c_src,err,subspace);
prom=Zero();
for(int b=0;b<2*nbasis;b++){
prom=prom+subspace[b];
}
err=err-prom;
std::cout<<GridLogMessage<<"Promoted back from subspace: err "<<norm2(err)<<std::endl;
std::cout<<GridLogMessage<<"c_src "<<norm2(c_src)<<std::endl;
std::cout<<GridLogMessage<<"prom "<<norm2(prom)<<std::endl;
LinOpDw.Op(prom,tmp);
blockProject(c_proj,tmp,subspace);
std::cout<<GridLogMessage<<" Called Big Dirac Op "<<norm2(tmp)<<std::endl;
LittleDiracOp.M(c_src,c_res);
std::cout<<GridLogMessage<<" Called Little Dirac Op c_src "<< norm2(c_src) << " c_res "<< norm2(c_res) <<std::endl;
std::cout<<GridLogMessage<<"Little dop : "<<norm2(c_res)<<std::endl;
// std::cout<<GridLogMessage<<" Little "<< c_res<<std::endl;
std::cout<<GridLogMessage<<"Big dop in subspace : "<<norm2(c_proj)<<std::endl;
// std::cout<<GridLogMessage<<" Big "<< c_proj<<std::endl;
c_proj = c_proj - c_res;
std::cout<<GridLogMessage<<" ldop error: "<<norm2(c_proj)<<std::endl;
// std::cout<<GridLogMessage<<" error "<< c_proj<<std::endl;
/**********
* Some solvers
**********
*/
// CG
{
MdagMLinearOperator<WilsonFermionD,LatticeFermion> HermOp(Dw);
ConjugateGradient<LatticeFermion> CG(1.0e-8,10000);
Dw.Mdag(src,precsrc);
CG(HermOp,precsrc,result);
result=Zero();
}
///////////////////////////////////////
// Coarse grid solver test
///////////////////////////////////////
std::cout<<GridLogMessage<<"******************* "<<std::endl;
std::cout<<GridLogMessage<<" Coarse Grid Solve -- Level 3 "<<std::endl;
std::cout<<GridLogMessage<<"******************* "<<std::endl;
TrivialPrecon<CoarseVector> simple;
NonHermitianLinearOperator<LittleDiracOperator,CoarseVector> LinOpCoarse(LittleDiracOp);
ShiftedNonHermitianLinearOperator<LittleDiracOperator,CoarseVector> ShiftedLinOpCoarse(LittleDiracOp,0.001);
// ShiftedNonHermitianLinearOperator<LittleDiracOperator,CoarseVector> ShiftedLinOpCoarse(LittleDiracOp,0.01);
// ShiftedNonHermitianLinearOperator<LittleDiracOperator,CoarseVector> ShiftedLinOpCoarse(LinOpCoarse,0.001);
// PrecGeneralisedConjugateResidualNonHermitian<CoarseVector> L2PGCR(1.0e-4, 100, LinOpCoarse,simple,10,10);
// PrecGeneralisedConjugateResidualNonHermitian<CoarseVector> L2PGCR(1.0e-1, 100, LinOpCoarse,simple,30,30);
PrecGeneralisedConjugateResidualNonHermitian<CoarseVector> L2PGCR(2.0e-1, 50, ShiftedLinOpCoarse,simple,50,50);
L2PGCR.Level(3);
c_res=Zero();
L2PGCR(c_src,c_res);
////////////////////////////////////////
// Fine grid smoother
////////////////////////////////////////
std::cout<<GridLogMessage<<"******************* "<<std::endl;
std::cout<<GridLogMessage<<" Fine Grid Smoother -- Level 2 "<<std::endl;
std::cout<<GridLogMessage<<"******************* "<<std::endl;
TrivialPrecon<LatticeFermionD> simple_fine;
PrecGeneralisedConjugateResidualNonHermitian<LatticeFermionD> SmootherGCR(0.1,1,ShiftedLinOpDw,simple_fine,4,4);
SmootherGCR.Level(2);
LatticeFermionD f_src(FGrid);
LatticeFermionD f_res(FGrid);
f_src = one; // 1 in every element for vector 1.
f_res=Zero();
SmootherGCR(f_src,f_res);
typedef MGPreconditioner<vSpinColourVector, vTComplex,2*nbasis> TwoLevelMG;
TwoLevelMG TwoLevelPrecon(CombinedUV,
LinOpDw,
simple_fine,
SmootherGCR,
LinOpCoarse,
L2PGCR);
PrecGeneralisedConjugateResidualNonHermitian<LatticeFermion> L1PGCR(1.0e-8,1000,LinOpDw,TwoLevelPrecon,16,16);
L1PGCR.Level(1);
f_res=Zero();
L1PGCR(f_src,f_res);
std::cout<<GridLogMessage<<std::endl;
std::cout<<GridLogMessage<<"*******************************************"<<std::endl;
std::cout<<GridLogMessage<<std::endl;
std::cout<<GridLogMessage << "Done "<< std::endl;
Grid_finalize();
return 0;
}
@@ -0,0 +1,326 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./tests/Test_padded_cell.cc
Copyright (C) 2023
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#include <Grid/Grid.h>
#include <Grid/lattice/PaddedCell.h>
#include <Grid/stencil/GeneralLocalStencil.h>
#include <Grid/algorithms/iterative/PrecGeneralisedConjugateResidual.h>
#include <Grid/algorithms/iterative/PrecGeneralisedConjugateResidualNonHermitian.h>
#include <Grid/algorithms/iterative/BiCGSTAB.h>
using namespace std;
using namespace Grid;
template<class Fobj,class CComplex,int nbasis>
class MGPreconditioner : public LinearFunction< Lattice<Fobj> > {
public:
using LinearFunction<Lattice<Fobj> >::operator();
typedef Aggregation<Fobj,CComplex,nbasis> Aggregates;
typedef typename Aggregation<Fobj,CComplex,nbasis>::FineField FineField;
typedef typename Aggregation<Fobj,CComplex,nbasis>::CoarseVector CoarseVector;
typedef typename Aggregation<Fobj,CComplex,nbasis>::CoarseMatrix CoarseMatrix;
typedef LinearOperatorBase<FineField> FineOperator;
typedef LinearFunction <FineField> FineSmoother;
typedef LinearOperatorBase<CoarseVector> CoarseOperator;
typedef LinearFunction <CoarseVector> CoarseSolver;
Aggregates & _Aggregates;
FineOperator & _FineOperator;
FineSmoother & _PreSmoother;
FineSmoother & _PostSmoother;
CoarseOperator & _CoarseOperator;
CoarseSolver & _CoarseSolve;
int level; void Level(int lv) {level = lv; };
MGPreconditioner(Aggregates &Agg,
FineOperator &Fine,
FineSmoother &PreSmoother,
FineSmoother &PostSmoother,
CoarseOperator &CoarseOperator_,
CoarseSolver &CoarseSolve_)
: _Aggregates(Agg),
_FineOperator(Fine),
_PreSmoother(PreSmoother),
_PostSmoother(PostSmoother),
_CoarseOperator(CoarseOperator_),
_CoarseSolve(CoarseSolve_),
level(1) { }
virtual void operator()(const FineField &in, FineField & out)
{
GridBase *CoarseGrid = _Aggregates.CoarseGrid;
// auto CoarseGrid = _CoarseOperator.Grid();
CoarseVector Csrc(CoarseGrid);
CoarseVector Csol(CoarseGrid);
FineField vec1(in.Grid());
FineField vec2(in.Grid());
std::cout<<GridLogMessage << "Calling PreSmoother " <<std::endl;
// std::cout<<GridLogMessage << "Calling PreSmoother input residual "<<norm2(in) <<std::endl;
double t;
// Fine Smoother
// out = in;
out = Zero();
t=-usecond();
_PreSmoother(in,out);
t+=usecond();
std::cout<<GridLogMessage << "PreSmoother took "<< t/1000.0<< "ms" <<std::endl;
// Update the residual
_FineOperator.Op(out,vec1); sub(vec1, in ,vec1);
// std::cout<<GridLogMessage <<"Residual-1 now " <<norm2(vec1)<<std::endl;
// Fine to Coarse
t=-usecond();
_Aggregates.ProjectToSubspace (Csrc,vec1);
t+=usecond();
std::cout<<GridLogMessage << "Project to coarse took "<< t/1000.0<< "ms" <<std::endl;
// Coarse correction
t=-usecond();
Csol = Zero();
_CoarseSolve(Csrc,Csol);
//Csol=Zero();
t+=usecond();
std::cout<<GridLogMessage << "Coarse solve took "<< t/1000.0<< "ms" <<std::endl;
// Coarse to Fine
t=-usecond();
// _CoarseOperator.PromoteFromSubspace(_Aggregates,Csol,vec1);
_Aggregates.PromoteFromSubspace(Csol,vec1);
add(out,out,vec1);
t+=usecond();
std::cout<<GridLogMessage << "Promote to this level took "<< t/1000.0<< "ms" <<std::endl;
// Residual
_FineOperator.Op(out,vec1); sub(vec1 ,in , vec1);
// std::cout<<GridLogMessage <<"Residual-2 now " <<norm2(vec1)<<std::endl;
// Fine Smoother
t=-usecond();
// vec2=vec1;
vec2=Zero();
_PostSmoother(vec1,vec2);
t+=usecond();
std::cout<<GridLogMessage << "PostSmoother took "<< t/1000.0<< "ms" <<std::endl;
add( out,out,vec2);
std::cout<<GridLogMessage << "Done " <<std::endl;
}
};
int main (int argc, char ** argv)
{
Grid_init(&argc,&argv);
const int Ls=16;
GridCartesian * UGrid = SpaceTimeGrid::makeFourDimGrid(GridDefaultLatt(), GridDefaultSimd(Nd,vComplex::Nsimd()),GridDefaultMpi());
GridRedBlackCartesian * UrbGrid = SpaceTimeGrid::makeFourDimRedBlackGrid(UGrid);
GridCartesian * FGrid = UGrid;
GridRedBlackCartesian * FrbGrid = UrbGrid;
// Construct a coarsened grid
Coordinate clatt = GridDefaultLatt();
for(int d=0;d<clatt.size();d++){
clatt[d] = clatt[d]/2;
// clatt[d] = clatt[d]/4;
}
GridCartesian *Coarse4d = SpaceTimeGrid::makeFourDimGrid(clatt, GridDefaultSimd(Nd,vComplex::Nsimd()),GridDefaultMpi());;
std::vector<int> seeds4({1,2,3,4});
std::vector<int> cseeds({5,6,7,8});
GridParallelRNG RNG4(UGrid); RNG4.SeedFixedIntegers(seeds4);
GridParallelRNG CRNG(Coarse4d);CRNG.SeedFixedIntegers(cseeds);
Complex one(1.0);
LatticeFermion src(FGrid); src=one;
LatticeFermion result(FGrid); result=Zero();
LatticeFermion ref(FGrid); ref=Zero();
LatticeFermion tmp(FGrid);
LatticeFermion err(FGrid);
LatticeFermion precsrc(FGrid);
LatticeGaugeField Umu(UGrid);
FieldMetaData header;
std::string file("ckpoint_lat");
NerscIO::readConfiguration(Umu,header,file);
RealD csw =0.0;
RealD mass=-0.92;
WilsonCloverFermionD Dw(Umu,*UGrid,*UrbGrid,mass,csw,csw);
const int nbasis = 40;
const int cb = 0 ;
LatticeFermion prom(FGrid);
typedef GeneralCoarsenedMatrix<vSpinColourVector,vTComplex,nbasis> LittleDiracOperator;
typedef LittleDiracOperator::CoarseVector CoarseVector;
NearestStencilGeometry4D geom(Coarse4d);
std::cout<<GridLogMessage<<std::endl;
std::cout<<GridLogMessage<<"*******************************************"<<std::endl;
std::cout<<GridLogMessage<<std::endl;
// Warning: This routine calls Linop.Op, not LinOpo.HermOp
typedef Aggregation<vSpinColourVector,vTComplex,nbasis> Subspace;
Subspace Aggregates(Coarse4d,FGrid,cb);
NonHermitianLinearOperator<WilsonCloverFermionD,LatticeFermion> LinOpDw(Dw);
ShiftedNonHermitianLinearOperator<WilsonCloverFermionD,LatticeFermion> ShiftedLinOpDw(Dw,0.01);
Aggregates.CreateSubspaceGCR(RNG4,
LinOpDw,
nbasis);
LittleDiracOperator LittleDiracOp(geom,FGrid,Coarse4d);
LittleDiracOp.CoarsenOperator(LinOpDw,Aggregates);
std::cout<<GridLogMessage<<std::endl;
std::cout<<GridLogMessage<<"*******************************************"<<std::endl;
std::cout<<GridLogMessage<<std::endl;
std::cout<<GridLogMessage<<"Testing coarsened operator "<<std::endl;
CoarseVector c_src (Coarse4d);
CoarseVector c_res (Coarse4d);
CoarseVector c_proj(Coarse4d);
std::vector<LatticeFermion> subspace(nbasis,FGrid);
subspace=Aggregates.subspace;
c_src = one; // 1 in every element for vector 1.
blockPromote(c_src,err,subspace);
prom=Zero();
for(int b=0;b<nbasis;b++){
prom=prom+subspace[b];
}
err=err-prom;
std::cout<<GridLogMessage<<"Promoted back from subspace: err "<<norm2(err)<<std::endl;
std::cout<<GridLogMessage<<"c_src "<<norm2(c_src)<<std::endl;
std::cout<<GridLogMessage<<"prom "<<norm2(prom)<<std::endl;
LinOpDw.Op(prom,tmp);
blockProject(c_proj,tmp,subspace);
std::cout<<GridLogMessage<<" Called Big Dirac Op "<<norm2(tmp)<<std::endl;
LittleDiracOp.M(c_src,c_res);
std::cout<<GridLogMessage<<" Called Little Dirac Op c_src "<< norm2(c_src) << " c_res "<< norm2(c_res) <<std::endl;
std::cout<<GridLogMessage<<"Little dop : "<<norm2(c_res)<<std::endl;
// std::cout<<GridLogMessage<<" Little "<< c_res<<std::endl;
std::cout<<GridLogMessage<<"Big dop in subspace : "<<norm2(c_proj)<<std::endl;
// std::cout<<GridLogMessage<<" Big "<< c_proj<<std::endl;
c_proj = c_proj - c_res;
std::cout<<GridLogMessage<<" ldop error: "<<norm2(c_proj)<<std::endl;
// std::cout<<GridLogMessage<<" error "<< c_proj<<std::endl;
/**********
* Some solvers
**********
*/
// CG
{
MdagMLinearOperator<WilsonFermionD,LatticeFermion> HermOp(Dw);
ConjugateGradient<LatticeFermion> CG(1.0e-8,10000);
Dw.Mdag(src,precsrc);
CG(HermOp,precsrc,result);
result=Zero();
}
///////////////////////////////////////
// Coarse grid solver test
///////////////////////////////////////
std::cout<<GridLogMessage<<"******************* "<<std::endl;
std::cout<<GridLogMessage<<" Coarse Grid Solve -- Level 3 "<<std::endl;
std::cout<<GridLogMessage<<"******************* "<<std::endl;
TrivialPrecon<CoarseVector> simple;
NonHermitianLinearOperator<LittleDiracOperator,CoarseVector> LinOpCoarse(LittleDiracOp);
ShiftedNonHermitianLinearOperator<LittleDiracOperator,CoarseVector> ShiftedLinOpCoarse(LittleDiracOp,0.001);
// ShiftedNonHermitianLinearOperator<LittleDiracOperator,CoarseVector> ShiftedLinOpCoarse(LittleDiracOp,0.01);
// ShiftedNonHermitianLinearOperator<LittleDiracOperator,CoarseVector> ShiftedLinOpCoarse(LinOpCoarse,0.001);
// PrecGeneralisedConjugateResidualNonHermitian<CoarseVector> L2PGCR(1.0e-4, 100, LinOpCoarse,simple,10,10);
PrecGeneralisedConjugateResidualNonHermitian<CoarseVector> L2PGCR(1.0e-1, 100, LinOpCoarse,simple,30,30);
// PrecGeneralisedConjugateResidualNonHermitian<CoarseVector> L2PGCR(2.0e-1, 50, ShiftedLinOpCoarse,simple,50,50);
L2PGCR.Level(3);
c_res=Zero();
L2PGCR(c_src,c_res);
////////////////////////////////////////
// Fine grid smoother
////////////////////////////////////////
std::cout<<GridLogMessage<<"******************* "<<std::endl;
std::cout<<GridLogMessage<<" Fine Grid Smoother -- Level 2 "<<std::endl;
std::cout<<GridLogMessage<<"******************* "<<std::endl;
TrivialPrecon<LatticeFermionD> simple_fine;
PrecGeneralisedConjugateResidualNonHermitian<LatticeFermionD> SmootherGCR(0.1,1,ShiftedLinOpDw,simple_fine,6,6);
SmootherGCR.Level(2);
LatticeFermionD f_src(FGrid);
LatticeFermionD f_res(FGrid);
f_src = one; // 1 in every element for vector 1.
f_res=Zero();
SmootherGCR(f_src,f_res);
typedef MGPreconditioner<vSpinColourVector, vTComplex,nbasis> TwoLevelMG;
TwoLevelMG TwoLevelPrecon(Aggregates,
LinOpDw,
simple_fine,
SmootherGCR,
LinOpCoarse,
L2PGCR);
PrecGeneralisedConjugateResidualNonHermitian<LatticeFermion> L1PGCR(1.0e-8,1000,LinOpDw,TwoLevelPrecon,16,16);
L1PGCR.Level(1);
f_res=Zero();
L1PGCR(f_src,f_res);
std::cout<<GridLogMessage<<std::endl;
std::cout<<GridLogMessage<<"*******************************************"<<std::endl;
std::cout<<GridLogMessage<<std::endl;
std::cout<<GridLogMessage << "Done "<< std::endl;
Grid_finalize();
return 0;
}
@@ -0,0 +1,320 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./tests/Test_padded_cell.cc
Copyright (C) 2023
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#include <Grid/Grid.h>
#include <Grid/lattice/PaddedCell.h>
#include <Grid/stencil/GeneralLocalStencil.h>
#include <Grid/algorithms/iterative/PrecGeneralisedConjugateResidual.h>
#include <Grid/algorithms/iterative/PrecGeneralisedConjugateResidualNonHermitian.h>
#include <Grid/algorithms/iterative/BiCGSTAB.h>
using namespace std;
using namespace Grid;
template<class Fobj,class CComplex,int nbasis>
class MGPreconditioner : public LinearFunction< Lattice<Fobj> > {
public:
using LinearFunction<Lattice<Fobj> >::operator();
typedef Aggregation<Fobj,CComplex,nbasis> Aggregates;
typedef typename Aggregation<Fobj,CComplex,nbasis>::FineField FineField;
typedef typename Aggregation<Fobj,CComplex,nbasis>::CoarseVector CoarseVector;
typedef typename Aggregation<Fobj,CComplex,nbasis>::CoarseMatrix CoarseMatrix;
typedef LinearOperatorBase<FineField> FineOperator;
typedef LinearFunction <FineField> FineSmoother;
typedef LinearOperatorBase<CoarseVector> CoarseOperator;
typedef LinearFunction <CoarseVector> CoarseSolver;
Aggregates & _Aggregates;
FineOperator & _FineOperator;
FineSmoother & _PreSmoother;
FineSmoother & _PostSmoother;
CoarseOperator & _CoarseOperator;
CoarseSolver & _CoarseSolve;
int level; void Level(int lv) {level = lv; };
MGPreconditioner(Aggregates &Agg,
FineOperator &Fine,
FineSmoother &PreSmoother,
FineSmoother &PostSmoother,
CoarseOperator &CoarseOperator_,
CoarseSolver &CoarseSolve_)
: _Aggregates(Agg),
_FineOperator(Fine),
_PreSmoother(PreSmoother),
_PostSmoother(PostSmoother),
_CoarseOperator(CoarseOperator_),
_CoarseSolve(CoarseSolve_),
level(1) { }
virtual void operator()(const FineField &in, FineField & out)
{
GridBase *CoarseGrid = _Aggregates.CoarseGrid;
// auto CoarseGrid = _CoarseOperator.Grid();
CoarseVector Csrc(CoarseGrid);
CoarseVector Csol(CoarseGrid);
FineField vec1(in.Grid());
FineField vec2(in.Grid());
std::cout<<GridLogMessage << "Calling PreSmoother " <<std::endl;
// std::cout<<GridLogMessage << "Calling PreSmoother input residual "<<norm2(in) <<std::endl;
double t;
// Fine Smoother
// out = in;
out = Zero();
t=-usecond();
_PreSmoother(in,out);
t+=usecond();
std::cout<<GridLogMessage << "PreSmoother took "<< t/1000.0<< "ms" <<std::endl;
// Update the residual
_FineOperator.Op(out,vec1); sub(vec1, in ,vec1);
// std::cout<<GridLogMessage <<"Residual-1 now " <<norm2(vec1)<<std::endl;
// Fine to Coarse
t=-usecond();
_Aggregates.ProjectToSubspace (Csrc,vec1);
t+=usecond();
std::cout<<GridLogMessage << "Project to coarse took "<< t/1000.0<< "ms" <<std::endl;
// Coarse correction
t=-usecond();
Csol = Zero();
_CoarseSolve(Csrc,Csol);
//Csol=Zero();
t+=usecond();
std::cout<<GridLogMessage << "Coarse solve took "<< t/1000.0<< "ms" <<std::endl;
// Coarse to Fine
t=-usecond();
// _CoarseOperator.PromoteFromSubspace(_Aggregates,Csol,vec1);
_Aggregates.PromoteFromSubspace(Csol,vec1);
add(out,out,vec1);
t+=usecond();
std::cout<<GridLogMessage << "Promote to this level took "<< t/1000.0<< "ms" <<std::endl;
// Residual
_FineOperator.Op(out,vec1); sub(vec1 ,in , vec1);
// std::cout<<GridLogMessage <<"Residual-2 now " <<norm2(vec1)<<std::endl;
// Fine Smoother
t=-usecond();
// vec2=vec1;
vec2=Zero();
_PostSmoother(vec1,vec2);
t+=usecond();
std::cout<<GridLogMessage << "PostSmoother took "<< t/1000.0<< "ms" <<std::endl;
add( out,out,vec2);
std::cout<<GridLogMessage << "Done " <<std::endl;
}
};
int main (int argc, char ** argv)
{
Grid_init(&argc,&argv);
const int Ls=16;
GridCartesian * UGrid = SpaceTimeGrid::makeFourDimGrid(GridDefaultLatt(), GridDefaultSimd(Nd,vComplex::Nsimd()),GridDefaultMpi());
GridRedBlackCartesian * UrbGrid = SpaceTimeGrid::makeFourDimRedBlackGrid(UGrid);
GridCartesian * FGrid = UGrid;
GridRedBlackCartesian * FrbGrid = UrbGrid;
// Construct a coarsened grid
Coordinate clatt = GridDefaultLatt();
for(int d=0;d<clatt.size();d++){
clatt[d] = clatt[d]/2;
// clatt[d] = clatt[d]/4;
}
GridCartesian *Coarse4d = SpaceTimeGrid::makeFourDimGrid(clatt, GridDefaultSimd(Nd,vComplex::Nsimd()),GridDefaultMpi());;
std::vector<int> seeds4({1,2,3,4});
std::vector<int> cseeds({5,6,7,8});
GridParallelRNG RNG4(UGrid); RNG4.SeedFixedIntegers(seeds4);
GridParallelRNG CRNG(Coarse4d);CRNG.SeedFixedIntegers(cseeds);
LatticeFermion src(FGrid); random(RNG4,src);
LatticeFermion result(FGrid); result=Zero();
LatticeFermion ref(FGrid); ref=Zero();
LatticeFermion tmp(FGrid);
LatticeFermion err(FGrid);
LatticeGaugeField Umu(UGrid);
FieldMetaData header;
std::string file("ckpoint_lat");
NerscIO::readConfiguration(Umu,header,file);
RealD csw =0.0;
RealD mass=-0.92;
WilsonCloverFermionD Dw(Umu,*UGrid,*UrbGrid,mass,csw,csw);
const int nbasis = 20;
const int cb = 0 ;
LatticeFermion prom(FGrid);
typedef GeneralCoarsenedMatrix<vSpinColourVector,vTComplex,2*nbasis> LittleDiracOperator;
typedef LittleDiracOperator::CoarseVector CoarseVector;
NearestStencilGeometry4D geom(Coarse4d);
std::cout<<GridLogMessage<<std::endl;
std::cout<<GridLogMessage<<"*******************************************"<<std::endl;
std::cout<<GridLogMessage<<std::endl;
// Warning: This routine calls Linop.Op, not LinOpo.HermOp
typedef Aggregation<vSpinColourVector,vTComplex,nbasis> Subspace;
Subspace Aggregates(Coarse4d,FGrid,cb);
MdagMLinearOperator<WilsonCloverFermionD,LatticeFermion> MdagMOpDw(Dw);
NonHermitianLinearOperator<WilsonCloverFermionD,LatticeFermion> LinOpDw(Dw);
ShiftedNonHermitianLinearOperator<WilsonCloverFermionD,LatticeFermion> ShiftedLinOpDw(Dw,0.5);
// Aggregates.CreateSubspaceGCR(RNG4,
// LinOpDw,
// nbasis);
Aggregates.CreateSubspace(RNG4,MdagMOpDw,nbasis);
typedef Aggregation<vSpinColourVector,vTComplex,2*nbasis> CombinedSubspace;
CombinedSubspace CombinedUV(Coarse4d,UGrid,cb);
for(int b=0;b<nbasis;b++){
Gamma G5(Gamma::Algebra::Gamma5);
CombinedUV.subspace[b] = Aggregates.subspace[b];
CombinedUV.subspace[b+nbasis] = G5*Aggregates.subspace[b];
}
LittleDiracOperator LittleDiracOp(geom,FGrid,Coarse4d);
LittleDiracOp.CoarsenOperator(LinOpDw,CombinedUV);
std::cout<<GridLogMessage<<std::endl;
std::cout<<GridLogMessage<<"*******************************************"<<std::endl;
std::cout<<GridLogMessage<<std::endl;
std::cout<<GridLogMessage<<"Testing coarsened operator "<<std::endl;
CoarseVector c_src (Coarse4d);
CoarseVector c_res (Coarse4d);
CoarseVector c_proj(Coarse4d);
std::vector<LatticeFermion> subspace(2*nbasis,FGrid);
subspace=CombinedUV.subspace;
Complex one(1.0);
c_src = one; // 1 in every element for vector 1.
blockPromote(c_src,err,subspace);
prom=Zero();
for(int b=0;b<2*nbasis;b++){
prom=prom+subspace[b];
}
err=err-prom;
std::cout<<GridLogMessage<<"Promoted back from subspace: err "<<norm2(err)<<std::endl;
std::cout<<GridLogMessage<<"c_src "<<norm2(c_src)<<std::endl;
std::cout<<GridLogMessage<<"prom "<<norm2(prom)<<std::endl;
LinOpDw.Op(prom,tmp);
blockProject(c_proj,tmp,subspace);
std::cout<<GridLogMessage<<" Called Big Dirac Op "<<norm2(tmp)<<std::endl;
LittleDiracOp.M(c_src,c_res);
std::cout<<GridLogMessage<<" Called Little Dirac Op c_src "<< norm2(c_src) << " c_res "<< norm2(c_res) <<std::endl;
std::cout<<GridLogMessage<<"Little dop : "<<norm2(c_res)<<std::endl;
// std::cout<<GridLogMessage<<" Little "<< c_res<<std::endl;
std::cout<<GridLogMessage<<"Big dop in subspace : "<<norm2(c_proj)<<std::endl;
// std::cout<<GridLogMessage<<" Big "<< c_proj<<std::endl;
c_proj = c_proj - c_res;
std::cout<<GridLogMessage<<" ldop error: "<<norm2(c_proj)<<std::endl;
// std::cout<<GridLogMessage<<" error "<< c_proj<<std::endl;
/**********
* Some solvers
**********
*/
///////////////////////////////////////
// Coarse grid solver test
///////////////////////////////////////
std::cout<<GridLogMessage<<"******************* "<<std::endl;
std::cout<<GridLogMessage<<" Coarse Grid Solve -- Level 3 "<<std::endl;
std::cout<<GridLogMessage<<"******************* "<<std::endl;
TrivialPrecon<CoarseVector> simple;
NonHermitianLinearOperator<LittleDiracOperator,CoarseVector> LinOpCoarse(LittleDiracOp);
// PrecGeneralisedConjugateResidualNonHermitian<CoarseVector> L2PGCR(1.0e-4, 100, LinOpCoarse,simple,10,10);
PrecGeneralisedConjugateResidualNonHermitian<CoarseVector> L2PGCR(1.0e-2, 100, LinOpCoarse,simple,30,30);
L2PGCR.Level(3);
c_res=Zero();
L2PGCR(c_src,c_res);
////////////////////////////////////////
// Fine grid smoother
////////////////////////////////////////
std::cout<<GridLogMessage<<"******************* "<<std::endl;
std::cout<<GridLogMessage<<" Fine Grid Smoother -- Level 2 "<<std::endl;
std::cout<<GridLogMessage<<"******************* "<<std::endl;
TrivialPrecon<LatticeFermionD> simple_fine;
PrecGeneralisedConjugateResidualNonHermitian<LatticeFermionD> SmootherGCR(0.01,1,ShiftedLinOpDw,simple_fine,4,4);
SmootherGCR.Level(2);
LatticeFermionD f_src(FGrid);
LatticeFermionD f_res(FGrid);
f_src = one; // 1 in every element for vector 1.
f_res=Zero();
SmootherGCR(f_src,f_res);
typedef MGPreconditioner<vSpinColourVector, vTComplex,2*nbasis> TwoLevelMG;
TwoLevelMG TwoLevelPrecon(CombinedUV,
LinOpDw,
simple_fine,
SmootherGCR,
LinOpCoarse,
L2PGCR);
PrecGeneralisedConjugateResidualNonHermitian<LatticeFermion> L1PGCR(1.0e-8,1000,LinOpDw,TwoLevelPrecon,32,32);
L1PGCR.Level(1);
f_res=Zero();
L1PGCR(f_src,f_res);
std::cout<<GridLogMessage<<std::endl;
std::cout<<GridLogMessage<<"*******************************************"<<std::endl;
std::cout<<GridLogMessage<<std::endl;
std::cout<<GridLogMessage << "Done "<< std::endl;
Grid_finalize();
return 0;
}
@@ -0,0 +1,312 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./tests/Test_padded_cell.cc
Copyright (C) 2023
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#include <Grid/Grid.h>
#include <Grid/lattice/PaddedCell.h>
#include <Grid/stencil/GeneralLocalStencil.h>
#include <Grid/algorithms/iterative/PrecGeneralisedConjugateResidual.h>
#include <Grid/algorithms/iterative/PrecGeneralisedConjugateResidualNonHermitian.h>
#include <Grid/algorithms/iterative/BiCGSTAB.h>
using namespace std;
using namespace Grid;
template<class Fobj,class CComplex,int nbasis>
class MGPreconditioner : public LinearFunction< Lattice<Fobj> > {
public:
using LinearFunction<Lattice<Fobj> >::operator();
typedef Aggregation<Fobj,CComplex,nbasis> Aggregates;
typedef typename Aggregation<Fobj,CComplex,nbasis>::FineField FineField;
typedef typename Aggregation<Fobj,CComplex,nbasis>::CoarseVector CoarseVector;
typedef typename Aggregation<Fobj,CComplex,nbasis>::CoarseMatrix CoarseMatrix;
typedef LinearOperatorBase<FineField> FineOperator;
typedef LinearFunction <FineField> FineSmoother;
typedef LinearOperatorBase<CoarseVector> CoarseOperator;
typedef LinearFunction <CoarseVector> CoarseSolver;
Aggregates & _Aggregates;
FineOperator & _FineOperator;
FineSmoother & _PreSmoother;
FineSmoother & _PostSmoother;
CoarseOperator & _CoarseOperator;
CoarseSolver & _CoarseSolve;
int level; void Level(int lv) {level = lv; };
MGPreconditioner(Aggregates &Agg,
FineOperator &Fine,
FineSmoother &PreSmoother,
FineSmoother &PostSmoother,
CoarseOperator &CoarseOperator_,
CoarseSolver &CoarseSolve_)
: _Aggregates(Agg),
_FineOperator(Fine),
_PreSmoother(PreSmoother),
_PostSmoother(PostSmoother),
_CoarseOperator(CoarseOperator_),
_CoarseSolve(CoarseSolve_),
level(1) { }
virtual void operator()(const FineField &in, FineField & out)
{
GridBase *CoarseGrid = _Aggregates.CoarseGrid;
// auto CoarseGrid = _CoarseOperator.Grid();
CoarseVector Csrc(CoarseGrid);
CoarseVector Csol(CoarseGrid);
FineField vec1(in.Grid());
FineField vec2(in.Grid());
std::cout<<GridLogMessage << "Calling PreSmoother " <<std::endl;
// std::cout<<GridLogMessage << "Calling PreSmoother input residual "<<norm2(in) <<std::endl;
double t;
// Fine Smoother
// out = in;
out = Zero();
t=-usecond();
_PreSmoother(in,out);
t+=usecond();
std::cout<<GridLogMessage << "PreSmoother took "<< t/1000.0<< "ms" <<std::endl;
// Update the residual
_FineOperator.Op(out,vec1); sub(vec1, in ,vec1);
// std::cout<<GridLogMessage <<"Residual-1 now " <<norm2(vec1)<<std::endl;
// Fine to Coarse
t=-usecond();
_Aggregates.ProjectToSubspace (Csrc,vec1);
t+=usecond();
std::cout<<GridLogMessage << "Project to coarse took "<< t/1000.0<< "ms" <<std::endl;
// Coarse correction
t=-usecond();
Csol = Zero();
_CoarseSolve(Csrc,Csol);
//Csol=Zero();
t+=usecond();
std::cout<<GridLogMessage << "Coarse solve took "<< t/1000.0<< "ms" <<std::endl;
// Coarse to Fine
t=-usecond();
// _CoarseOperator.PromoteFromSubspace(_Aggregates,Csol,vec1);
_Aggregates.PromoteFromSubspace(Csol,vec1);
add(out,out,vec1);
t+=usecond();
std::cout<<GridLogMessage << "Promote to this level took "<< t/1000.0<< "ms" <<std::endl;
// Residual
_FineOperator.Op(out,vec1); sub(vec1 ,in , vec1);
// std::cout<<GridLogMessage <<"Residual-2 now " <<norm2(vec1)<<std::endl;
// Fine Smoother
t=-usecond();
// vec2=vec1;
vec2=Zero();
_PostSmoother(vec1,vec2);
t+=usecond();
std::cout<<GridLogMessage << "PostSmoother took "<< t/1000.0<< "ms" <<std::endl;
add( out,out,vec2);
std::cout<<GridLogMessage << "Done " <<std::endl;
}
};
int main (int argc, char ** argv)
{
Grid_init(&argc,&argv);
const int Ls=16;
GridCartesian * UGrid = SpaceTimeGrid::makeFourDimGrid(GridDefaultLatt(), GridDefaultSimd(Nd,vComplex::Nsimd()),GridDefaultMpi());
GridRedBlackCartesian * UrbGrid = SpaceTimeGrid::makeFourDimRedBlackGrid(UGrid);
GridCartesian * FGrid = UGrid;
GridRedBlackCartesian * FrbGrid = UrbGrid;
// Construct a coarsened grid
Coordinate clatt = GridDefaultLatt();
for(int d=0;d<clatt.size();d++){
clatt[d] = clatt[d]/2;
// clatt[d] = clatt[d]/4;
}
GridCartesian *Coarse4d = SpaceTimeGrid::makeFourDimGrid(clatt, GridDefaultSimd(Nd,vComplex::Nsimd()),GridDefaultMpi());;
std::vector<int> seeds4({1,2,3,4});
std::vector<int> cseeds({5,6,7,8});
GridParallelRNG RNG4(UGrid); RNG4.SeedFixedIntegers(seeds4);
GridParallelRNG CRNG(Coarse4d);CRNG.SeedFixedIntegers(cseeds);
LatticeFermion src(FGrid); random(RNG4,src);
LatticeFermion result(FGrid); result=Zero();
LatticeFermion ref(FGrid); ref=Zero();
LatticeFermion tmp(FGrid);
LatticeFermion err(FGrid);
LatticeGaugeField Umu(UGrid);
FieldMetaData header;
std::string file("ckpoint_lat");
NerscIO::readConfiguration(Umu,header,file);
RealD csw =0.0;
RealD mass=-0.92;
WilsonCloverFermionD Dw(Umu,*UGrid,*UrbGrid,mass,csw,csw);
const int nbasis = 40;
const int cb = 0 ;
LatticeFermion prom(FGrid);
typedef GeneralCoarsenedMatrix<vSpinColourVector,vTComplex,nbasis> LittleDiracOperator;
typedef LittleDiracOperator::CoarseVector CoarseVector;
NearestStencilGeometry4D geom(Coarse4d);
std::cout<<GridLogMessage<<std::endl;
std::cout<<GridLogMessage<<"*******************************************"<<std::endl;
std::cout<<GridLogMessage<<std::endl;
// Warning: This routine calls Linop.Op, not LinOpo.HermOp
typedef Aggregation<vSpinColourVector,vTComplex,nbasis> Subspace;
Subspace Aggregates(Coarse4d,FGrid,cb);
MdagMLinearOperator<WilsonCloverFermionD,LatticeFermion> MdagMOpDw(Dw);
NonHermitianLinearOperator<WilsonCloverFermionD,LatticeFermion> LinOpDw(Dw);
ShiftedNonHermitianLinearOperator<WilsonCloverFermionD,LatticeFermion> ShiftedLinOpDw(Dw,0.5);
// Aggregates.CreateSubspaceGCR(RNG4,
// LinOpDw,
// nbasis);
Aggregates.CreateSubspace(RNG4,MdagMOpDw,nbasis);
LittleDiracOperator LittleDiracOp(geom,FGrid,Coarse4d);
LittleDiracOp.CoarsenOperator(LinOpDw,Aggregates);
std::cout<<GridLogMessage<<std::endl;
std::cout<<GridLogMessage<<"*******************************************"<<std::endl;
std::cout<<GridLogMessage<<std::endl;
std::cout<<GridLogMessage<<"Testing coarsened operator "<<std::endl;
CoarseVector c_src (Coarse4d);
CoarseVector c_res (Coarse4d);
CoarseVector c_proj(Coarse4d);
std::vector<LatticeFermion> subspace(nbasis,FGrid);
subspace=Aggregates.subspace;
Complex one(1.0);
c_src = one; // 1 in every element for vector 1.
blockPromote(c_src,err,subspace);
prom=Zero();
for(int b=0;b<nbasis;b++){
prom=prom+subspace[b];
}
err=err-prom;
std::cout<<GridLogMessage<<"Promoted back from subspace: err "<<norm2(err)<<std::endl;
std::cout<<GridLogMessage<<"c_src "<<norm2(c_src)<<std::endl;
std::cout<<GridLogMessage<<"prom "<<norm2(prom)<<std::endl;
LinOpDw.Op(prom,tmp);
blockProject(c_proj,tmp,subspace);
std::cout<<GridLogMessage<<" Called Big Dirac Op "<<norm2(tmp)<<std::endl;
LittleDiracOp.M(c_src,c_res);
std::cout<<GridLogMessage<<" Called Little Dirac Op c_src "<< norm2(c_src) << " c_res "<< norm2(c_res) <<std::endl;
std::cout<<GridLogMessage<<"Little dop : "<<norm2(c_res)<<std::endl;
// std::cout<<GridLogMessage<<" Little "<< c_res<<std::endl;
std::cout<<GridLogMessage<<"Big dop in subspace : "<<norm2(c_proj)<<std::endl;
// std::cout<<GridLogMessage<<" Big "<< c_proj<<std::endl;
c_proj = c_proj - c_res;
std::cout<<GridLogMessage<<" ldop error: "<<norm2(c_proj)<<std::endl;
// std::cout<<GridLogMessage<<" error "<< c_proj<<std::endl;
/**********
* Some solvers
**********
*/
///////////////////////////////////////
// Coarse grid solver test
///////////////////////////////////////
std::cout<<GridLogMessage<<"******************* "<<std::endl;
std::cout<<GridLogMessage<<" Coarse Grid Solve -- Level 3 "<<std::endl;
std::cout<<GridLogMessage<<"******************* "<<std::endl;
TrivialPrecon<CoarseVector> simple;
NonHermitianLinearOperator<LittleDiracOperator,CoarseVector> LinOpCoarse(LittleDiracOp);
// PrecGeneralisedConjugateResidualNonHermitian<CoarseVector> L2PGCR(1.0e-4, 100, LinOpCoarse,simple,10,10);
PrecGeneralisedConjugateResidualNonHermitian<CoarseVector> L2PGCR(1.0e-2, 100, LinOpCoarse,simple,30,30);
L2PGCR.Level(3);
c_res=Zero();
L2PGCR(c_src,c_res);
////////////////////////////////////////
// Fine grid smoother
////////////////////////////////////////
std::cout<<GridLogMessage<<"******************* "<<std::endl;
std::cout<<GridLogMessage<<" Fine Grid Smoother -- Level 2 "<<std::endl;
std::cout<<GridLogMessage<<"******************* "<<std::endl;
TrivialPrecon<LatticeFermionD> simple_fine;
PrecGeneralisedConjugateResidualNonHermitian<LatticeFermionD> SmootherGCR(0.01,1,ShiftedLinOpDw,simple_fine,6,6);
SmootherGCR.Level(2);
LatticeFermionD f_src(FGrid);
LatticeFermionD f_res(FGrid);
f_src = one; // 1 in every element for vector 1.
f_res=Zero();
SmootherGCR(f_src,f_res);
typedef MGPreconditioner<vSpinColourVector, vTComplex,nbasis> TwoLevelMG;
TwoLevelMG TwoLevelPrecon(Aggregates,
LinOpDw,
simple_fine,
SmootherGCR,
LinOpCoarse,
L2PGCR);
PrecGeneralisedConjugateResidualNonHermitian<LatticeFermion> L1PGCR(1.0e-8,1000,LinOpDw,TwoLevelPrecon,32,32);
L1PGCR.Level(1);
f_res=Zero();
L1PGCR(f_src,f_res);
std::cout<<GridLogMessage<<std::endl;
std::cout<<GridLogMessage<<"*******************************************"<<std::endl;
std::cout<<GridLogMessage<<std::endl;
std::cout<<GridLogMessage << "Done "<< std::endl;
Grid_finalize();
return 0;
}
+1 -1
View File
@@ -490,7 +490,7 @@ public:
}
}
GRID_ASSERT(s==nshift);
assert(s==nshift);
coalescedWrite(gStaple_v[ss],stencil_ss);
}
);
+206
View File
@@ -0,0 +1,206 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./tests/debug/Test_reduction.cc
Copyright (C) 2024
Author: Peter Boyle <pboyle@bnl.gov>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#include <Grid/Grid.h>
using namespace std;
using namespace Grid;
static int passed = 0;
static int failed = 0;
static void check(bool ok, const std::string &msg)
{
if (ok) {
std::cout << GridLogMessage << "PASS " << msg << std::endl;
passed++;
} else {
std::cout << GridLogMessage << "FAIL " << msg << std::endl;
failed++;
}
}
// Squared magnitude of a Grid scalar tensor aggregate: innerProduct(a,a).
// For iScalar: real(conj(a)*a)
// For iMatrix<T,N>: sum_{i,j} real(conj(a_ij)*a_ij) (Frobenius)
// Named squaredSum to make clear the squaring is applied to the aggregate
// (the sum), not to individual site values before summing.
template<class T>
RealD squaredSum(const T &a)
{
return (RealD)real(TensorRemove(innerProduct(a, a)));
}
template<class Field>
void testReduction(GridCartesian *grid, GridParallelRNG &rng,
const std::string &name, int Ncomp)
{
typedef typename Field::vector_object vobj;
typedef typename vobj::scalar_object sobj;
typedef typename vobj::scalar_type scalar_type;
const Integer V = grid->_gsites;
const Integer osites = grid->oSites();
// Detect single vs double precision by comparing fundamental scalar sizes.
const bool isFloat = (sizeof(scalar_type) < sizeof(ComplexD));
std::cout << GridLogMessage << "=== " << name << " ===" << std::endl;
Field field(grid);
//--------------------------------------------------------------------
// a) Gaussian random field: sum_gpu (new CUB path) vs sum_gpu_old
// (preserved hand-rolled shared-memory path). Both promote lanes
// to double internally, so results should agree to near-roundoff.
//--------------------------------------------------------------------
#if defined(GRID_CUDA) || defined(GRID_HIP) || defined(GRID_SYCL)
{
gaussian(rng, field);
autoView(v, field, AcceleratorRead);
sobj new_result = sum_gpu (&v[0], osites);
sobj old_result = sum_gpu_old(&v[0], osites);
sobj diff = new_result - old_result;
RealD diffn = squaredSum(diff);
RealD refn = squaredSum(old_result);
RealD reldiff = (refn > 0.0) ? std::sqrt(diffn / refn) : std::sqrt(diffn);
// Float fields: both paths cast from double to float, expect O(eps_float).
// Double fields: ordering differences at most O(V * eps_double).
RealD tol = isFloat ? 1e-6 : 1e-10;
std::cout << GridLogMessage
<< name << " random reldiff = " << reldiff << std::endl;
check(reldiff < tol, name + " random: sum_gpu agrees with sum_gpu_old");
}
#endif
//--------------------------------------------------------------------
// b) Timing: new (CUB/sycl::reduction) vs old (hand-rolled) path.
// Warmup first, then Niter timed calls; report us/call and GB/s.
//--------------------------------------------------------------------
#if defined(GRID_CUDA) || defined(GRID_HIP) || defined(GRID_SYCL)
{
const int Nwarm = 5;
const int Niter = 100;
gaussian(rng, field);
{
autoView(v, field, AcceleratorRead);
for (int i = 0; i < Nwarm; i++) sum_gpu (&v[0], osites);
for (int i = 0; i < Nwarm; i++) sum_gpu_old(&v[0], osites);
}
RealD t_new, t_old;
{
autoView(v, field, AcceleratorRead);
t_new = -usecond();
for (int i = 0; i < Niter; i++) sum_gpu(&v[0], osites);
t_new += usecond();
}
{
autoView(v, field, AcceleratorRead);
t_old = -usecond();
for (int i = 0; i < Niter; i++) sum_gpu_old(&v[0], osites);
t_old += usecond();
}
RealD bytes = (RealD)osites * sizeof(vobj);
RealD GBs_new = bytes / (t_new / Niter) * 1e-3;
RealD GBs_old = bytes / (t_old / Niter) * 1e-3;
std::cout << GridLogMessage << name << " timing (" << Niter << " calls):" << std::endl;
std::cout << GridLogMessage
<< " sum_gpu " << t_new/Niter << " us " << GBs_new << " GB/s" << std::endl;
std::cout << GridLogMessage
<< " sum_gpu_old " << t_old/Niter << " us " << GBs_old << " GB/s" << std::endl;
}
#endif
//--------------------------------------------------------------------
// d) Constant field via field = 1.0.
//
// Grid's iMatrix::operator=(scalar) sets only the diagonal, so:
// LatticeComplex -> scalar 1.0 (Ncomp = 1 nonzero per site)
// LatticeColourMatrix -> Nc x Nc identity (Ncomp = Nc nonzero per site)
// LatticePropagator -> (Ns*Nc)^2 identity (Ncomp = Ns*Nc nonzero per site)
//
// After GlobalSum: sum_result has Ncomp diagonal entries each equal to V,
// all off-diagonal entries zero. Grid's recursive innerProduct computes
// the Frobenius inner product (sum of |element|^2 over all indices), giving
//
// innerProduct(sum_result, sum_result) = Ncomp * V^2
//--------------------------------------------------------------------
{
field = 1.0;
sobj sum_result = sum(field); // uses new GPU path + GlobalSum
RealD got = squaredSum(sum_result);
RealD expected = (RealD)Ncomp * (RealD)V * (RealD)V;
RealD reldiff = std::abs(got - expected) / expected;
std::cout << GridLogMessage
<< name << " const: got " << got
<< " expected " << expected
<< " reldiff " << reldiff << std::endl;
check(reldiff < 1e-8, name + " const: innerProduct(sum,sum) = Ncomp*V^2");
}
}
int main(int argc, char **argv)
{
Grid_init(&argc, &argv);
Coordinate latt = GridDefaultLatt();
Coordinate mpi = GridDefaultMpi();
GridCartesian *UGrid = SpaceTimeGrid::makeFourDimGrid(latt, GridDefaultSimd(Nd, vComplexD::Nsimd()), mpi);
GridCartesian *UGrid_f = SpaceTimeGrid::makeFourDimGrid(latt, GridDefaultSimd(Nd, vComplexF::Nsimd()), mpi);
GridParallelRNG rng(UGrid);
rng.SeedFixedIntegers({1, 2, 3, 4});
GridParallelRNG rng_f(UGrid_f);
rng_f.SeedFixedIntegers({1, 2, 3, 4});
std::cout << GridLogMessage << "Lattice : " << latt << std::endl;
std::cout << GridLogMessage << "Volume : " << UGrid->_gsites << std::endl;
testReduction<LatticeComplexF> (UGrid_f, rng_f, "LatticeComplexF", 1 );
testReduction<LatticeComplexD> (UGrid, rng, "LatticeComplexD", 1 );
testReduction<LatticeColourMatrixF> (UGrid_f, rng_f, "LatticeColourMatrixF", Nc );
testReduction<LatticeColourMatrixD> (UGrid, rng, "LatticeColourMatrixD", Nc );
testReduction<LatticePropagatorF> (UGrid_f, rng_f, "LatticePropagatorF", Ns*Nc );
testReduction<LatticePropagatorD> (UGrid, rng, "LatticePropagatorD", Ns*Nc );
std::cout << GridLogMessage << "==============================" << std::endl;
std::cout << GridLogMessage << passed << " PASSED " << failed << " FAILED" << std::endl;
Grid_finalize();
return (failed > 0) ? EXIT_FAILURE : EXIT_SUCCESS;
}
@@ -0,0 +1,16 @@
#include <Grid/Grid.h>
#pragma once
#ifndef ENABLE_FERMION_INSTANTIATIONS
#include <iostream>
int main(void) {
std::cout << "This build of Grid was configured to exclude fermion instantiations, "
<< "which this test relies on. "
<< "Please reconfigure and rebuild Grid with --enable-fermion-instantiations"
<< "to run this test."
<< std::endl;
return 1;
}
#endif
+171
View File
@@ -0,0 +1,171 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## What This Is
VTK-based visualisation and analysis tools for Grid lattice QCD eigenvector density and HMC force data. All programmes link against both Grid (for reading Scidac/ILDG lattice files) and VTK (for rendering).
## Build
```bash
cd /Users/peterboyle/QCD/AmSC/Grid/visualisation/build
cmake .. -DVTK_DIR=$HOME/QCD/vtk/VTK-9.4.2-install/lib/cmake/vtk-9.4
make <target> # e.g. make ControlledVisualise5D
```
All executables are built as macOS bundles (`.app`) except `ForceAnalysis`, `FindPeak`, and `DumpField`.
## Programmes
### ControlledVisualise5D
Interactive VTK renderer for 5D DWF eigenvector density (`LatticeComplexD`). Driven via named pipe `/tmp/visualise_cmd`.
**Launch script**: `/Volumes/X9Pro/visualisation/Grid/visualisation/build/Hdwf_1_long/visualise_controlled.sh`
**Wire protocol** (one command per line to `/tmp/visualise_cmd`):
| Command | Effect |
|---------|--------|
| `file <N>` / `file +N` / `file -N` | Jump to file by index or relative |
| `slice <dim> <N>` / `+N` / `-N` | Set or shift slice coordinate in dimension dim |
| `spin <deg>` | Continuous azimuth rotation at deg/tick (100ms tick); `spin 0` stops |
| `azimuth <deg>` | Single azimuth rotation step |
| `elevation <deg>` | Single elevation rotation step |
| `zoom <factor>` | Camera dolly (>1 = in) |
| `iso <value>` | Isosurface threshold in RMS units |
| `status` | Print current state |
| `quit` | Exit |
**Dimension indices for 5D DWF grid** (`--grid 48.32.32.32.32`):
| dim | axis | size |
|-----|------|------|
| 0 | s (Ls) | 48 |
| 1 | x | 32 |
| 2 | y | 32 |
| 3 | z | 32 |
| 4 | t | 32 |
**MD time mapping** for trajectory 702 (241 files, τ=3.34.0):
- File index N → τ = 3.300000 + N × (1/480)
- τ → file index = round((τ 3.3) × 480)
**Display axes**: `--xyz 0.3.4` shows s, z, t. The `--slice` argument sets initial values for all dims; dims not in `--xyz`, `--sum`, or `--loop` are the fixed slice dimensions (x=dim1, y=dim2 with `--xyz 0.3.4`).
**Spin**: Implemented via `g_spinDeg` global applied on every 100ms poll timer tick inside `CommandHandler::Execute()`. Does not flood the pipe.
### FindPeak
Reads a `LatticeComplexD` Scidac file, prints the top-N sites by real value to stderr.
```bash
./FindPeak --grid 48.32.32.32.32 --mpi 1.1.1.1.1 <file> 2>peaks.txt
```
Key result: At τ=3.670833 the tunneling hotsite on the s=0 wall is (x=21, y=24, z=2, t=23).
### ForceAnalysis
Reads 4D `LatticeComplexD` force snapshot files (Shuhei's snapshots at `/Volumes/X9Pro/visualisation/Shuhei/snapshots/`). Outputs TSV of RMS and hotsite value per file to stderr.
```bash
./ForceAnalysis --grid 32.32.32.32 --mpi 1.1.1 --hotsite 21.24.2.23 \
<files...> 2>force.tsv 1>/dev/null
```
Force components: `Gauge_lat`, `Gauge_smr`, `Jacobian_smr`, `Ferm0047_lat`, `Ferm0047_smr`.
### DumpField
Reads a `LatticeComplexD` and dumps via Grid's `<<` operator to stdout for verification.
### TranscriptToVideo
Renders a conversation transcript to an MP4 video (1280×720, 10 fps) with a typewriter animation effect, scrolling history, and optional captions. Does **not** link against Grid — pure VTK only.
#### Transcript format
```
[USER] First question text, possibly
continuing on the next line.
A blank line within a turn creates a paragraph break (visual spacer).
[ASSISTANT] Response text.
Multiple continuation lines are preserved
as separate display lines, not merged.
[CAPTION] Caption text shown at bottom of screen in white italic.
[CAPTION] (whitespace-only body clears the caption)
[USER] Next question...
```
- Lines beginning `[USER]`, `[ASSISTANT]`, `[CAPTION]` start a new turn.
- Continuation lines (no `[TAG]` prefix) are joined with `\n` — each becomes its own wrapped display line.
- Blank lines within a turn become paragraph-break spacers.
- Markdown emphasis markers (`**`, `*`, `` ` ``) are stripped automatically.
- UTF-8 smart quotes, em-dashes, ellipses, arrows are transliterated to ASCII.
#### Usage
```bash
cd /Users/peterboyle/QCD/AmSC/Grid/visualisation/build
# Set runtime library paths first (see Runtime Environment below)
./TranscriptToVideo <transcript_file> <output.mp4>
```
Transcript files live in `/Users/peterboyle/QCD/AmSC/Grid/visualisation/` (e.g. `transcript`, `transcript2`, `transcript3`).
#### Visual layout
| Element | Detail |
|---------|--------|
| Background | Near-black navy `(0.04, 0.04, 0.10)` |
| `[USER]` text | Gold `(1.00, 0.84, 0.00)` |
| `[ASSISTANT]` text | Steel blue `(0.68, 0.85, 0.90)` |
| History | Up to 18 lines; brightness fades linearly from 0.85 (newest) to 0.20 (oldest) |
| Caption | Arial italic 20pt white with shadow, centred at bottom |
| Progress bar | Blue, top of frame |
| Typewriter speed | 50 chars/sec (5 chars/frame at 10 fps) |
| Pause between lines | 3 frames (0.3 s) |
| Word-wrap column | 60 chars (body only, after prefix) |
#### Key implementation notes
- **Persistent render context**: a single `vtkRenderWindow` is created once and reused for all frames. Creating a new window per frame exhausts the macOS Metal GPU command buffer after ~33 frames (`MTLCommandBufferErrorDomain Code=8`).
- **`SanitiseASCII()`**: replaces multi-byte UTF-8 sequences before passing to VTK's font renderer (which crashes on non-ASCII input).
- Output format is MP4 via `vtkFFMPEGWriter`. `SetOffScreenRendering(1)` is required for headless rendering.
## Runtime Environment
All executables in `build/` require Spack-installed HDF5/FFTW/GMP/MPFR on the dynamic linker path:
```bash
SPACK=/Users/peterboyle/QCD/Spack/spack/opt/spack/darwin-m1
export DYLD_LIBRARY_PATH=\
$SPACK/hdf5-1.14.6-2265ms4kymgw6hcnwi6vqehslyfv74t4/lib:\
$SPACK/fftw-3.3.10-aznn6h3nac5cycidlhrhgjxvntpcbg57/lib:\
$SPACK/gmp-6.3.0-cwiz4n7ww33fnb3aban2iup4orcr6c7i/lib:\
$SPACK/mpfr-4.2.1-exgbz4qshmet6tmmuttdewdlunfvtrlb/lib:\
$DYLD_LIBRARY_PATH
```
(These paths are also set by the ControlledVisualise5D launch script.)
## Key Physics Context
See `/Volumes/X9Pro/visualisation/analysis_notes_20260407.md` for full analysis. Summary:
- Near-zero mode of H_DWF localises on the two walls (s=0 and s=47) of the 5D domain wall geometry
- Topology change transfers norm between walls, mediated by a near-zero mode of H_w (Hermitian Wilson at m=1.8)
- Tunneling hotsite on s=0 wall: (x=21, y=24, z=2, t=23); s=47 wall: (x=4, y=8, z=0, t=20)
- Light fermion pseudofermion force (Ferm0047_smr) peaks at ~20× RMS at the hotsite during tunneling — this is the restoring force that causes topological bounces
## Grid/VTK interaction notes
- Grid log messages go to stdout; all data output in analysis programmes uses stderr to avoid interleaving
- `TensorRemove()` is required when extracting a scalar from `peekSite()` result: `real(TensorRemove(peekSite(field, site)))`
- For runtime-determined grid dimensionality use `GridDefaultSimd(latt_size.size(), vComplex::Nsimd())`
- DYLD_LIBRARY_PATH must include Spack HDF5/FFTW/GMP/MPFR paths (see launch script)
+891
View File
@@ -0,0 +1,891 @@
// ControlledVisualise5D.cxx
// Derived from Visualise5D.cxx by Peter Boyle
//
// A minimal-protocol rendering engine for 5D DWF eigenvector-density data.
// Intended to be driven by an external intelligent controller (e.g. Claude)
// that handles all natural-language interpretation and state tracking.
//
// Commands are sent one per line to the named pipe /tmp/visualise_cmd.
// State is reported to stdout after every command.
//
// Wire protocol (all fields whitespace-separated):
//
// slice <dim> <N> set Slice[dim] = N (0-based, wraps to lattice size)
// slice <dim> +<N> increment Slice[dim] by N
// slice <dim> -<N> decrement Slice[dim] by N
// zoom <factor> camera Dolly by factor (>1 = in, <1 = out)
// iso <value> set isosurface threshold to <value> x RMS
// file <index> jump to file by absolute index
// file +<N> advance N files
// file -<N> go back N files
// render force a render with current state
// status print current state to stdout
// quit exit cleanly
//
// Dimension indices for 5D DWF grid (e.g. --grid 48.32.32.32.32):
// s=0 (Ls) x=1 y=2 z=3 t=4
// For a 4D grid (--grid 32.32.32.32):
// x=0 y=1 z=2 t=3
#include <vtkActor.h>
#include <vtkCamera.h>
#include <vtkNamedColors.h>
#include <vtkNew.h>
#include <vtkOutlineFilter.h>
#include <vtkPolyDataMapper.h>
#include <vtkProperty.h>
#include <vtkRenderWindow.h>
#include <vtkRenderWindowInteractor.h>
#include <vtkRenderer.h>
#include <vtkStripper.h>
#include <vtkImageData.h>
#include <vtkCallbackCommand.h>
#include <vtkTextActor.h>
#include <vtkTextProperty.h>
#include <vtkProperty2D.h>
#include <vtkWindowToImageFilter.h>
#define MPEG
#ifdef MPEG
#include <vtkFFMPEGWriter.h>
#endif
#include <array>
#include <string>
#include <vector>
#include <queue>
#include <mutex>
#include <thread>
#include <atomic>
#include <sstream>
#include <iostream>
#include <fstream>
#include <cmath>
#include <cstdlib>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <Grid/Grid.h>
#define USE_FLYING_EDGES
#ifdef USE_FLYING_EDGES
#include <vtkFlyingEdges3D.h>
typedef vtkFlyingEdges3D isosurface;
#else
#include <vtkMarchingCubes.h>
typedef vtkMarchingCubes isosurface;
#endif
#define CMD_PIPE "/tmp/visualise_cmd"
static int g_mpeg = 0;
static int g_framerate = 10;
// ─── Thread-safe command queue ────────────────────────────────────────────────
static std::queue<std::string> g_cmdQueue;
static std::mutex g_cmdMutex;
static std::atomic<bool> g_running{true};
static double g_spinDeg = 0.0; // degrees per poll tick; 0 = stopped
// ─── MPEG recording state ─────────────────────────────────────────────────────
static bool g_recording = false;
static vtkFFMPEGWriter* g_mpegWriter = nullptr;
static vtkWindowToImageFilter* g_imageFilter = nullptr;
static std::string g_recordingFile; // AVI filename for mux step
// ─── Audio state (PCM audio track, synced to video frames) ───────────────────
static const int AUDIO_RATE = 44100;
static const double BEEP_FREQ = 800.0;
static const int BEEP_SAMPLES = AUDIO_RATE * 4 / 100; // 40ms beep
static std::vector<int16_t> g_audioBuffer;
static int g_beepRemaining = 0;
static double g_beepPhase = 0.0;
static int g_samplesPerFrame = AUDIO_RATE / 10; // updated at record start
// Write one video frame worth of audio samples (beep or silence) to the buffer.
static void GenerateAudioFrame()
{
for (int i = 0; i < g_samplesPerFrame; i++) {
int16_t s = 0;
if (g_beepRemaining > 0) {
int pos = BEEP_SAMPLES - g_beepRemaining;
double env = 1.0;
int fade = AUDIO_RATE / 100; // 10ms fade
if (pos < fade) env = (double)pos / fade;
if (g_beepRemaining < fade) env = (double)g_beepRemaining / fade;
s = (int16_t)(16000.0 * env * std::sin(2.0 * M_PI * BEEP_FREQ * g_beepPhase / AUDIO_RATE));
g_beepPhase += 1.0;
--g_beepRemaining;
} else {
g_beepPhase = 0.0;
}
g_audioBuffer.push_back(s);
}
}
static void TriggerBeep() { g_beepRemaining = BEEP_SAMPLES; }
// Simple mono 16-bit PCM WAV writer.
static void WriteWAV(const std::string& path, const std::vector<int16_t>& buf, int rate)
{
std::ofstream f(path, std::ios::binary);
int dataBytes = (int)(buf.size() * 2);
int chunkSize = 36 + dataBytes;
int byteRate = rate * 2;
f.write("RIFF", 4); f.write((char*)&chunkSize, 4);
f.write("WAVE", 4);
f.write("fmt ", 4);
int fmtSz = 16; f.write((char*)&fmtSz, 4);
int16_t pcm = 1; f.write((char*)&pcm, 2);
int16_t ch = 1; f.write((char*)&ch, 2);
f.write((char*)&rate, 4);
f.write((char*)&byteRate, 4);
int16_t blk = 2; f.write((char*)&blk, 2);
int16_t bps = 16; f.write((char*)&bps, 2);
f.write("data", 4); f.write((char*)&dataBytes, 4);
f.write((char*)buf.data(), dataBytes);
}
// Play a short audible beep on the local machine (non-blocking).
static void PlayBeepAudible()
{
system("afplay /System/Library/Sounds/Tink.aiff -v 0.4 &");
}
// ─── Grid I/O ─────────────────────────────────────────────────────────────────
template <class T>
void readFile(T& out, const std::string& fname)
{
Grid::emptyUserRecord record;
Grid::ScidacReader RD;
RD.open(fname);
RD.readScidacFieldRecord(out, record);
RD.close();
}
using namespace Grid;
// ─── Command reader thread ────────────────────────────────────────────────────
void CommandReaderThread()
{
mkfifo(CMD_PIPE, 0666);
std::cout << "[cmd] Listening on " << CMD_PIPE << std::endl;
while (g_running) {
int fd = open(CMD_PIPE, O_RDONLY | O_NONBLOCK);
if (fd < 0) { usleep(200000); continue; }
int flags = fcntl(fd, F_GETFL);
fcntl(fd, F_SETFL, flags & ~O_NONBLOCK);
char buf[4096];
std::string partial;
ssize_t n;
while (g_running && (n = read(fd, buf, sizeof(buf) - 1)) > 0) {
buf[n] = '\0';
partial += buf;
size_t pos;
while ((pos = partial.find('\n')) != std::string::npos) {
std::string line = partial.substr(0, pos);
if (!line.empty() && line.back() == '\r') line.pop_back();
if (!line.empty()) {
std::lock_guard<std::mutex> lk(g_cmdMutex);
g_cmdQueue.push(line);
}
partial = partial.substr(pos + 1);
}
}
close(fd);
}
}
// ─── FrameUpdater ─────────────────────────────────────────────────────────────
class FrameUpdater : public vtkCallbackCommand
{
public:
FrameUpdater() : ffile(0), TimerCount(0), old_file(-1), timerId(-2), maxCount(-1) {}
static FrameUpdater* New() { return new FrameUpdater; }
int ffile;
int old_file;
int timerId;
int maxCount;
Coordinate latt;
Coordinate xyz_dims, xyz_ranges, g_xyz_ranges;
uint64_t xyz_vol;
Coordinate loop_dims, loop_ranges;
uint64_t loop_vol;
Coordinate sum_dims, sum_ranges;
uint64_t sum_vol;
Coordinate slice_dims;
Coordinate Slice;
std::vector<std::string> files;
int Nd;
GridBase* grid;
Grid::LatticeComplexD* grid_data;
double rms;
vtkImageData* imageData = nullptr;
vtkTextActor* text = nullptr;
isosurface* posExtractor = nullptr;
isosurface* negExtractor = nullptr;
void SetGrid(GridBase* _grid)
{
grid = _grid;
Nd = grid->Nd();
latt = grid->GlobalDimensions();
grid_data = new Grid::LatticeComplexD(grid);
}
void SetFiles(std::vector<std::string> list) { files = list; old_file = -1; }
void SetSlice(Coordinate _Slice) { Slice = _Slice; }
void SetSumDimensions(Coordinate _SumDims)
{
sum_dims = _SumDims; sum_ranges = Coordinate(Nd); sum_vol = 1;
for (int d = 0; d < Nd; d++) { sum_ranges[d] = sum_dims[d] ? latt[d] : 1; sum_vol *= sum_ranges[d]; }
}
void SetLoopDimensions(Coordinate _LoopDims)
{
loop_dims = _LoopDims; loop_ranges = Coordinate(Nd); loop_vol = 1;
for (int d = 0; d < Nd; d++) { loop_ranges[d] = loop_dims[d] ? latt[d] : 1; loop_vol *= loop_ranges[d]; }
}
void SetDisplayDimensions(Coordinate _xyz_dims)
{
xyz_dims = _xyz_dims; g_xyz_ranges = Coordinate(Nd); xyz_ranges = Coordinate(3); xyz_vol = 1;
for (int d = 0; d < 3; d++) { xyz_ranges[d] = latt[xyz_dims[d]]; xyz_vol *= xyz_ranges[d]; }
for (int d = 0; d < Nd; d++) {
g_xyz_ranges[d] = 1;
for (int dd = 0; dd < 3; dd++) if (xyz_dims[dd] == d) g_xyz_ranges[d] = latt[d];
}
}
void SetSliceDimensions()
{
Coordinate sd;
for (int d = 0; d < Nd; d++) {
if (g_xyz_ranges[d] > 1 || loop_dims[d] || sum_dims[d]) continue;
sd.push_back(d);
}
slice_dims = sd;
std::cout << " Slice dimensions: " << slice_dims << std::endl;
}
void FillImageData(int loop_idx)
{
Coordinate loop_coor;
Lexicographic::CoorFromIndex(loop_coor, loop_idx, loop_ranges);
Coordinate xyz_coor(3), g_xyz_coor(Nd), sum_coor(Nd);
for (uint64_t xyz = 0; xyz < xyz_vol; xyz++) {
Lexicographic::CoorFromIndex(xyz_coor, xyz, xyz_ranges);
Lexicographic::CoorFromIndex(g_xyz_coor, xyz, g_xyz_ranges);
RealD val = 0.0;
for (uint64_t si = 0; si < sum_vol; si++) {
Lexicographic::CoorFromIndex(sum_coor, si, sum_ranges);
Coordinate site(Nd);
for (int d = 0; d < Nd; d++)
site[d] = (sum_coor[d] + loop_coor[d] + g_xyz_coor[d] + Slice[d]) % latt[d];
val += real(peekSite(*grid_data, site));
}
imageData->SetScalarComponentFromDouble(xyz_coor[0], xyz_coor[1], xyz_coor[2], 0, val);
}
imageData->Modified();
}
// Reload if needed, fill image, update label, render — no timer advance.
void ForceRender(vtkRenderWindowInteractor* iren)
{
int file = ((TimerCount / (int)loop_vol) + ffile) % (int)files.size();
if (file != old_file) {
std::cout << "[render] Loading " << files[file] << std::endl;
readFile(*grid_data, files[file]);
old_file = file;
}
FillImageData(TimerCount % (int)loop_vol);
UpdateLabel(file, TimerCount % (int)loop_vol);
iren->GetRenderWindow()->Render();
}
virtual void Execute(vtkObject* caller, unsigned long eventId, void* callData)
{
if (vtkCommand::KeyPressEvent == eventId) {
vtkRenderWindowInteractor* iren = static_cast<vtkRenderWindowInteractor*>(caller);
std::string key = iren->GetKeySym();
if (slice_dims.size() > 0) {
int vert = slice_dims[slice_dims.size() - 1];
int horz = slice_dims[0];
if (key == "Up") Slice[vert] = (Slice[vert] + 1) % latt[vert];
if (key == "Down") Slice[vert] = (Slice[vert] + latt[vert] - 1) % latt[vert];
if (key == "Right") Slice[horz] = (Slice[horz] + 1) % latt[horz];
if (key == "Left") Slice[horz] = (Slice[horz] + latt[horz] - 1) % latt[horz];
}
if (key == "greater") ffile = (ffile + 1) % (int)files.size();
if (key == "less") ffile = (ffile - 1 + (int)files.size()) % (int)files.size();
ForceRender(iren);
return;
}
if (vtkCommand::TimerEvent == eventId) {
// timerId == -2: no animation timer (--notime), ignore all timer events
if (timerId < 0) return;
int tid = *(reinterpret_cast<int*>(callData));
if (tid != timerId) return;
int file = ((TimerCount / (int)loop_vol) + ffile) % (int)files.size();
if (file != old_file) { readFile(*grid_data, files[file]); old_file = file; }
FillImageData(TimerCount % (int)loop_vol);
UpdateLabel(file, TimerCount % (int)loop_vol);
dynamic_cast<vtkRenderWindowInteractor*>(caller)->GetRenderWindow()->Render();
++TimerCount;
if (TimerCount >= maxCount && timerId > -1)
dynamic_cast<vtkRenderWindowInteractor*>(caller)->DestroyTimer(timerId);
}
}
private:
int TimerCount;
void UpdateLabel(int file, int loop_idx)
{
Coordinate loop_coor;
Lexicographic::CoorFromIndex(loop_coor, loop_idx, loop_ranges);
// Extract tau value from filename (last '_'-delimited field)
const std::string& path = files[file];
std::string tau = path.substr(path.rfind('_') + 1);
std::stringstream ss;
ss << "tau = " << tau << "\nSlice " << Slice;
text->SetInput(ss.str().c_str());
}
};
// ─── Typewriter caption state ─────────────────────────────────────────────────
// User caption (gold, upper line) — cleared on new user: instruction
static std::string g_userCaptionFull;
static size_t g_userCaptionPos = 0;
// Claude caption (light blue, lower line) — cleared when user: arrives
static std::string g_claudeCaptionFull;
static size_t g_claudeCaptionPos = 0;
static int g_captionTick = 0;
static const int g_captionRate = 1; // ticks per character (1 x 100ms = 10 chars/sec)
static std::string WrapText(const std::string& s, int maxCols = 45) {
std::istringstream words(s);
std::string word, line, result;
while (words >> word) {
if (!line.empty() && (int)(line.size() + 1 + word.size()) > maxCols) {
result += line + "\n";
line = word;
} else {
if (!line.empty()) line += " ";
line += word;
}
}
if (!line.empty()) result += line;
return result;
}
// ─── CommandHandler ───────────────────────────────────────────────────────────
// Minimal parser for the wire protocol. Natural-language interpretation
// is handled externally (by Claude) before commands reach this program.
class CommandHandler : public vtkCallbackCommand
{
public:
static CommandHandler* New() { return new CommandHandler; }
FrameUpdater* fu;
vtkCamera* camera;
vtkRenderer* renderer;
vtkRenderWindowInteractor* iren;
vtkTextActor* captionActor = nullptr; // claude (light blue, lower)
vtkTextActor* userCaptionActor = nullptr; // user (gold, upper)
int pollTimerId = -1;
double isosurfaceLevel = 1.0; // in RMS units
void CaptureFrame() {
if (g_recording && g_mpegWriter && g_imageFilter) {
GenerateAudioFrame();
g_imageFilter->Modified();
g_mpegWriter->Write();
}
}
void SetIsosurface(double level)
{
isosurfaceLevel = std::max(0.0, std::min(10.0, level));
fu->posExtractor->SetValue(0, isosurfaceLevel * fu->rms);
fu->negExtractor->SetValue(0, -isosurfaceLevel * fu->rms);
fu->posExtractor->Modified();
fu->negExtractor->Modified();
}
void PrintStatus()
{
std::cout << "[status] file = " << fu->ffile
<< " : " << fu->files[fu->ffile] << "\n"
<< "[status] Slice = " << fu->Slice << "\n"
<< "[status] latt = " << fu->latt << "\n"
<< "[status] isosurface = " << isosurfaceLevel
<< " x RMS (" << isosurfaceLevel * fu->rms << ")" << std::endl;
}
// Execute one line of the wire protocol.
void RunLine(const std::string& line)
{
std::istringstream iss(line);
std::string verb;
if (!(iss >> verb)) return;
// ── slice <dim> <N|+N|-N> ────────────────────────────────────────────
if (verb == "slice") {
int dim; std::string valstr;
if (!(iss >> dim >> valstr)) { std::cout << "[cmd] slice: expected dim value" << std::endl; return; }
if (dim < 0 || dim >= fu->Nd) { std::cout << "[cmd] slice: dim out of range" << std::endl; return; }
int n = (int)fu->latt[dim];
int newval;
if (!valstr.empty() && (valstr[0] == '+' || valstr[0] == '-')) {
int delta = std::stoi(valstr);
newval = ((fu->Slice[dim] + delta) % n + n) % n;
} else {
newval = ((std::stoi(valstr) % n) + n) % n;
}
fu->Slice[dim] = newval;
fu->ForceRender(iren);
PrintStatus();
}
// ── zoom <factor> ────────────────────────────────────────────────────
else if (verb == "zoom") {
double factor;
if (!(iss >> factor)) { std::cout << "[cmd] zoom: expected factor" << std::endl; return; }
camera->Dolly(factor);
renderer->ResetCameraClippingRange();
iren->GetRenderWindow()->Render();
}
// ── azimuth <degrees> ────────────────────────────────────────────────
else if (verb == "azimuth") {
double deg;
if (!(iss >> deg)) { std::cout << "[cmd] azimuth: expected degrees" << std::endl; return; }
camera->Azimuth(deg);
renderer->ResetCameraClippingRange();
iren->GetRenderWindow()->Render();
}
// ── elevation <degrees> ──────────────────────────────────────────────
else if (verb == "elevation") {
double deg;
if (!(iss >> deg)) { std::cout << "[cmd] elevation: expected degrees" << std::endl; return; }
camera->Elevation(deg);
renderer->ResetCameraClippingRange();
iren->GetRenderWindow()->Render();
}
// ── spin <degrees_per_tick> ──────────────────────────────────────────
// Applies azimuth rotation on every 100ms poll tick. spin 0 stops.
else if (verb == "spin") {
double deg;
if (!(iss >> deg)) { std::cout << "[cmd] spin: expected degrees" << std::endl; return; }
g_spinDeg = deg;
std::cout << "[cmd] spin rate = " << g_spinDeg << " deg/tick" << std::endl;
}
// ── caption user: <text> / caption claude: <text> / caption ─────────
// user: clears both lines, types user text (gold) on upper line.
// claude: keeps user line, types response (light blue) on lower line.
// caption alone clears both immediately.
else if (verb == "caption") {
std::string rest;
std::getline(iss, rest);
if (!rest.empty() && rest[0] == ' ') rest = rest.substr(1);
if (rest.empty()) {
g_userCaptionFull = ""; g_userCaptionPos = 0;
g_claudeCaptionFull = ""; g_claudeCaptionPos = 0;
g_captionTick = 0;
if (userCaptionActor) userCaptionActor->SetInput("");
if (captionActor) captionActor->SetInput("");
iren->GetRenderWindow()->Render(); CaptureFrame();
} else if (rest.substr(0,5) == "user:") {
// New instruction: clear both, start typing user text
g_claudeCaptionFull = ""; g_claudeCaptionPos = 0;
g_userCaptionFull = WrapText(rest); g_userCaptionPos = 0;
g_captionTick = 0;
if (userCaptionActor) userCaptionActor->SetInput("");
if (captionActor) captionActor->SetInput("");
iren->GetRenderWindow()->Render(); CaptureFrame();
} else {
// claude: or unlabelled — keep user line, type below
g_claudeCaptionFull = WrapText(rest); g_claudeCaptionPos = 0;
g_captionTick = 0;
}
}
// ── record start <filename> / record stop ────────────────────────────
else if (verb == "record") {
#ifdef MPEG
std::string sub;
if (!(iss >> sub)) { std::cout << "[cmd] record: expected start <file> or stop" << std::endl; return; }
if (sub == "stop") {
if (g_recording && g_mpegWriter) {
g_mpegWriter->End();
g_mpegWriter->Delete(); g_mpegWriter = nullptr;
g_imageFilter->Delete(); g_imageFilter = nullptr;
g_recording = false;
std::cout << "[cmd] recording stopped: " << g_recordingFile << std::endl;
// Write WAV and mux to MP4
std::string wavFile = g_recordingFile + ".wav";
WriteWAV(wavFile, g_audioBuffer, AUDIO_RATE);
g_audioBuffer.clear();
std::string mp4File = g_recordingFile;
if (mp4File.size() > 4 && mp4File.substr(mp4File.size()-4) == ".avi")
mp4File = mp4File.substr(0, mp4File.size()-4) + ".mp4";
else
mp4File += ".mp4";
std::string cmd = "ffmpeg -y -i \"" + g_recordingFile + "\" -i \"" + wavFile +
"\" -c:v copy -c:a aac -shortest \"" + mp4File + "\" 2>/dev/null";
int ret = system(cmd.c_str());
if (ret == 0) {
std::cout << "[cmd] muxed output: " << mp4File << std::endl;
unlink(wavFile.c_str()); // clean up intermediate WAV
} else {
std::cout << "[cmd] mux failed (ffmpeg not found?). WAV kept: " << wavFile << std::endl;
}
} else {
std::cout << "[cmd] not recording" << std::endl;
}
} else if (sub == "start") {
std::string fname = "recording.avi";
iss >> fname;
if (g_recording) { std::cout << "[cmd] already recording" << std::endl; return; }
g_recordingFile = fname;
g_audioBuffer.clear();
g_samplesPerFrame = AUDIO_RATE / std::max(1, g_framerate);
g_beepRemaining = 0;
g_beepPhase = 0.0;
g_imageFilter = vtkWindowToImageFilter::New();
g_imageFilter->SetInput(iren->GetRenderWindow());
g_imageFilter->SetInputBufferTypeToRGB();
g_mpegWriter = vtkFFMPEGWriter::New();
g_mpegWriter->SetFileName(fname.c_str());
g_mpegWriter->SetRate(g_framerate);
g_mpegWriter->SetInputConnection(g_imageFilter->GetOutputPort());
g_mpegWriter->Start();
g_recording = true;
std::cout << "[cmd] recording started: " << fname << std::endl;
} else {
std::cout << "[cmd] record: unknown subcommand '" << sub << "'" << std::endl;
}
#else
std::cout << "[cmd] record: MPEG support not compiled" << std::endl;
#endif
}
// ── iso <value> ──────────────────────────────────────────────────────
else if (verb == "iso") {
double val;
if (!(iss >> val)) { std::cout << "[cmd] iso: expected value" << std::endl; return; }
SetIsosurface(val);
fu->ForceRender(iren);
PrintStatus();
}
// ── file <index|+N|-N> ───────────────────────────────────────────────
else if (verb == "file") {
std::string valstr;
if (!(iss >> valstr)) { std::cout << "[cmd] file: expected index" << std::endl; return; }
int n = (int)fu->files.size();
int newval;
if (!valstr.empty() && (valstr[0] == '+' || valstr[0] == '-')) {
int delta = std::stoi(valstr);
newval = ((fu->ffile + delta) % n + n) % n;
} else {
newval = ((std::stoi(valstr) % n) + n) % n;
}
fu->ffile = newval;
fu->old_file = -1;
fu->ForceRender(iren);
PrintStatus();
}
// ── render ───────────────────────────────────────────────────────────
else if (verb == "render") {
fu->ForceRender(iren);
}
// ── status ───────────────────────────────────────────────────────────
else if (verb == "status") {
PrintStatus();
}
// ── quit ─────────────────────────────────────────────────────────────
else if (verb == "quit" || verb == "exit") {
g_running = false;
iren->TerminateApp();
}
else {
std::cout << "[cmd] Unknown command: '" << line << "'" << std::endl;
}
}
virtual void Execute(vtkObject*, unsigned long eventId, void* callData)
{
if (eventId != vtkCommand::TimerEvent) return;
if (pollTimerId >= 0) {
int tid = *(reinterpret_cast<int*>(callData));
if (tid != pollTimerId) return;
}
std::vector<std::string> pending;
{
std::lock_guard<std::mutex> lk(g_cmdMutex);
while (!g_cmdQueue.empty()) { pending.push_back(g_cmdQueue.front()); g_cmdQueue.pop(); }
}
for (const auto& line : pending) {
std::cout << "[cmd] >> " << line << std::endl;
RunLine(line);
// CaptureFrame() called inside RunLine for caption; for other
// rendering commands capture here (duplicate Modified() is harmless)
CaptureFrame();
}
// Typewriter: advance one character every g_captionRate ticks.
// User line types first; claude line starts once user line is complete.
bool typing = (g_userCaptionPos < g_userCaptionFull.size()) ||
(g_claudeCaptionPos < g_claudeCaptionFull.size());
if (typing) {
if (++g_captionTick >= g_captionRate) {
g_captionTick = 0;
bool rendered = false;
if (g_userCaptionPos < g_userCaptionFull.size()) {
++g_userCaptionPos;
if (userCaptionActor)
userCaptionActor->SetInput(g_userCaptionFull.substr(0, g_userCaptionPos).c_str());
PlayBeepAudible();
TriggerBeep();
rendered = true;
} else if (g_claudeCaptionPos < g_claudeCaptionFull.size()) {
++g_claudeCaptionPos;
if (captionActor)
captionActor->SetInput(g_claudeCaptionFull.substr(0, g_claudeCaptionPos).c_str());
rendered = true;
}
if (rendered) {
iren->GetRenderWindow()->Render();
CaptureFrame();
}
}
}
// Apply continuous spin (if active) at poll-timer rate
if (g_spinDeg != 0.0) {
camera->Azimuth(g_spinDeg);
renderer->ResetCameraClippingRange();
iren->GetRenderWindow()->Render();
CaptureFrame();
}
}
};
// ─── main ─────────────────────────────────────────────────────────────────────
int main(int argc, char* argv[])
{
using namespace Grid;
Grid_init(&argc, &argv);
GridLogLayout();
auto latt_size = GridDefaultLatt();
auto simd_layout = GridDefaultSimd(latt_size.size(), vComplex::Nsimd());
auto mpi_layout = GridDefaultMpi();
GridCartesian Grid(latt_size, simd_layout, mpi_layout);
double default_contour = 1.0;
std::string arg;
std::vector<std::string> file_list({"file1","file2","file3","file4",
"file5","file6","file7","file8"});
if (GridCmdOptionExists(argv, argv+argc, "--files")) {
arg = GridCmdOptionPayload(argv, argv+argc, "--files");
GridCmdOptionCSL(arg, file_list);
}
#ifdef MPEG
if (GridCmdOptionExists(argv, argv+argc, "--mpeg")) g_mpeg = 1;
#endif
if (GridCmdOptionExists(argv, argv+argc, "--fps")) {
arg = GridCmdOptionPayload(argv, argv+argc, "--fps");
GridCmdOptionInt(arg, g_framerate);
}
if (GridCmdOptionExists(argv, argv+argc, "--isosurface")) {
arg = GridCmdOptionPayload(argv, argv+argc, "--isosurface");
GridCmdOptionFloat(arg, default_contour);
}
int NoTime = 0, Nd = Grid.Nd();
Coordinate Slice(Nd,0), SumDims(Nd,0), LoopDims(Nd,0), XYZDims({0,1,2});
if (GridCmdOptionExists(argv, argv+argc, "--slice")) {
arg = GridCmdOptionPayload(argv, argv+argc, "--slice");
GridCmdOptionIntVector(arg, Slice);
}
if (GridCmdOptionExists(argv, argv+argc, "--sum")) {
arg = GridCmdOptionPayload(argv, argv+argc, "--sum");
GridCmdOptionIntVector(arg, SumDims);
}
if (GridCmdOptionExists(argv, argv+argc, "--loop")) {
arg = GridCmdOptionPayload(argv, argv+argc, "--loop");
GridCmdOptionIntVector(arg, LoopDims);
}
if (GridCmdOptionExists(argv, argv+argc, "--xyz")) {
arg = GridCmdOptionPayload(argv, argv+argc, "--xyz");
GridCmdOptionIntVector(arg, XYZDims);
}
if (GridCmdOptionExists(argv, argv+argc, "--notime")) { NoTime = 1; }
std::thread cmdThread(CommandReaderThread);
cmdThread.detach();
// ── VTK scene ────────────────────────────────────────────────────────────
vtkNew<vtkNamedColors> colors;
std::array<unsigned char,4> posColor{{240,184,160,255}}; colors->SetColor("posColor", posColor.data());
std::array<unsigned char,4> bkg{{51,77,102,255}}; colors->SetColor("BkgColor", bkg.data());
vtkNew<vtkRenderWindow> renWin;
vtkNew<vtkRenderWindowInteractor> iren;
iren->SetRenderWindow(renWin);
int frameCount = (int)file_list.size();
for (int d = 0; d < Nd; d++) if (LoopDims[d]) frameCount *= latt_size[d];
vtkNew<vtkCamera> aCamera;
aCamera->SetViewUp(0,0,-1); aCamera->SetPosition(0,-1000,0); aCamera->SetFocalPoint(0,0,0);
aCamera->ComputeViewPlaneNormal(); aCamera->Azimuth(30.0); aCamera->Elevation(30.0);
vtkNew<vtkRenderer> aRenderer;
renWin->AddRenderer(aRenderer);
double nrm, rms, contour;
{ LatticeComplexD data(&Grid); readFile(data, file_list[0]); nrm = norm2(data); }
rms = std::sqrt(nrm / Grid.gSites());
contour = default_contour * rms;
vtkNew<vtkImageData> imageData;
imageData->SetDimensions(latt_size[XYZDims[0]], latt_size[XYZDims[1]], latt_size[XYZDims[2]]);
imageData->AllocateScalars(VTK_DOUBLE, 1);
for (int xx=0;xx<latt_size[XYZDims[0]];xx++)
for (int yy=0;yy<latt_size[XYZDims[1]];yy++)
for (int zz=0;zz<latt_size[XYZDims[2]];zz++)
imageData->SetScalarComponentFromDouble(xx,yy,zz,0,0.0);
vtkNew<isosurface> posExtractor; posExtractor->SetInputData(imageData); posExtractor->SetValue(0, contour);
vtkNew<vtkStripper> posStripper; posStripper->SetInputConnection(posExtractor->GetOutputPort());
vtkNew<vtkPolyDataMapper> posMapper; posMapper->SetInputConnection(posStripper->GetOutputPort()); posMapper->ScalarVisibilityOff();
vtkNew<vtkActor> pos; pos->SetMapper(posMapper);
pos->GetProperty()->SetDiffuseColor(colors->GetColor3d("posColor").GetData());
pos->GetProperty()->SetSpecular(0.3); pos->GetProperty()->SetSpecularPower(20); pos->GetProperty()->SetOpacity(0.5);
vtkNew<isosurface> negExtractor; negExtractor->SetInputData(imageData); negExtractor->SetValue(0, -contour);
vtkNew<vtkStripper> negStripper; negStripper->SetInputConnection(negExtractor->GetOutputPort());
vtkNew<vtkPolyDataMapper> negMapper; negMapper->SetInputConnection(negStripper->GetOutputPort()); negMapper->ScalarVisibilityOff();
vtkNew<vtkActor> neg; neg->SetMapper(negMapper);
neg->GetProperty()->SetDiffuseColor(colors->GetColor3d("Ivory").GetData());
vtkNew<vtkOutlineFilter> outlineData; outlineData->SetInputData(imageData);
vtkNew<vtkPolyDataMapper> mapOutline; mapOutline->SetInputConnection(outlineData->GetOutputPort());
vtkNew<vtkActor> outline; outline->SetMapper(mapOutline);
outline->GetProperty()->SetColor(colors->GetColor3d("Black").GetData());
vtkNew<vtkTextActor> TextT;
TextT->SetInput("Initialising...");
TextT->SetPosition(10, 920);
TextT->GetTextProperty()->SetFontSize(24);
TextT->GetTextProperty()->SetColor(colors->GetColor3d("Gold").GetData());
// Claude response caption (light blue, lower line)
vtkNew<vtkTextActor> CaptionT;
CaptionT->SetInput("");
CaptionT->SetPosition(512, 38);
CaptionT->GetTextProperty()->SetFontSize(32);
CaptionT->GetTextProperty()->SetColor(0.6, 0.9, 1.0);
CaptionT->GetTextProperty()->SetJustificationToCentered();
CaptionT->GetTextProperty()->SetBackgroundColor(0.0, 0.0, 0.0);
CaptionT->GetTextProperty()->SetBackgroundOpacity(0.6);
CaptionT->GetTextProperty()->BoldOn();
// User instruction caption (gold, upper line)
vtkNew<vtkTextActor> UserCaptionT;
UserCaptionT->SetInput("");
UserCaptionT->SetPosition(512, 82);
UserCaptionT->GetTextProperty()->SetFontSize(32);
UserCaptionT->GetTextProperty()->SetColor(1.0, 0.85, 0.0);
UserCaptionT->GetTextProperty()->SetJustificationToCentered();
UserCaptionT->GetTextProperty()->SetBackgroundColor(0.0, 0.0, 0.0);
UserCaptionT->GetTextProperty()->SetBackgroundOpacity(0.6);
UserCaptionT->GetTextProperty()->BoldOn();
aRenderer->AddActor(TextT); aRenderer->AddActor(CaptionT); aRenderer->AddActor(UserCaptionT); aRenderer->AddActor(outline);
aRenderer->AddActor(pos); aRenderer->AddActor(neg);
vtkNew<FrameUpdater> fu;
fu->SetGrid(&Grid); fu->SetFiles(file_list); fu->SetSlice(Slice);
fu->SetSumDimensions(SumDims); fu->SetLoopDimensions(LoopDims);
fu->SetDisplayDimensions(XYZDims); fu->SetSliceDimensions();
fu->imageData = imageData; fu->text = TextT; fu->maxCount = frameCount;
fu->posExtractor = posExtractor; fu->negExtractor = negExtractor; fu->rms = rms;
iren->AddObserver(vtkCommand::TimerEvent, fu);
iren->AddObserver(vtkCommand::KeyPressEvent, fu);
aRenderer->SetActiveCamera(aCamera); aRenderer->ResetCamera();
aRenderer->SetBackground(colors->GetColor3d("BkgColor").GetData());
aCamera->Dolly(1.0); aRenderer->SetViewport(0.0,0.0,1.0,1.0);
aRenderer->ResetCameraClippingRange();
renWin->SetSize(1024,1024); renWin->SetWindowName("ControlledFieldDensity");
renWin->Render(); iren->Initialize();
// CommandHandler on fast poll timer
vtkNew<CommandHandler> cmdHandler;
cmdHandler->fu = fu;
cmdHandler->camera = aCamera;
cmdHandler->renderer = aRenderer;
cmdHandler->iren = iren;
cmdHandler->captionActor = CaptionT;
cmdHandler->userCaptionActor = UserCaptionT;
cmdHandler->isosurfaceLevel = default_contour;
iren->AddObserver(vtkCommand::TimerEvent, cmdHandler);
cmdHandler->pollTimerId = iren->CreateRepeatingTimer(100);
if (g_mpeg == 0 && NoTime == 0) {
fu->timerId = iren->CreateRepeatingTimer(10000 / g_framerate);
}
if (g_mpeg) {
#ifdef MPEG
vtkWindowToImageFilter* imageFilter = vtkWindowToImageFilter::New();
imageFilter->SetInput(renWin); imageFilter->SetInputBufferTypeToRGB();
vtkFFMPEGWriter* writer = vtkFFMPEGWriter::New();
writer->SetFileName("movie.avi"); writer->SetRate(g_framerate);
writer->SetInputConnection(imageFilter->GetOutputPort()); writer->Start();
for (int i = 0; i < fu->maxCount; i++) {
fu->Execute(iren, vtkCommand::TimerEvent, &fu->timerId);
imageFilter->Modified(); writer->Write();
}
writer->End(); writer->Delete(); imageFilter->Delete();
#else
assert(-1 && "MPEG support not compiled");
#endif
} else {
iren->Start();
}
g_running = false;
Grid_finalize();
return EXIT_SUCCESS;
}
+633
View File
@@ -0,0 +1,633 @@
// ForceAnalysis.cxx
//
// Reads a sequence of force snapshot files (LatticeComplexD, real part = force magnitude)
// and produces two outputs:
//
// 1. Tab-separated timeseries to stdout:
// idx Gauge_lat_rms Gauge_lat_hot Gauge_smr_rms ...
// where _rms is the lattice RMS and _hot is the value at --hotsite.
//
// 2. PNG images (one per force component per snapshot) rendered via VTK
// as isosurfaces of the force density, using the same pipeline as
// Visualise5D. Images are written to --pngdir/<label>_<idx>.png.
// These can be read back by Claude to interpret spatial structure.
//
// Usage:
// ForceAnalysis --grid 32.32.32.32 --mpi 1.1.1.1
// --snapdir /path/to/snapshots
// --first 0 --last 1920 --step 10
// --hotsite x.y.z.t
// --pngdir /path/to/output/pngs
// --isosurface 1.0 (contour in units of field RMS)
// --fixediso 0.05 (fixed absolute contour, overrides --isosurface)
// --slice t (which dimension to fix for 3D display, default: t)
// --sliceval 2 (value of that dimension, default: 0)
//
// Dimension order on the 32^4 lattice: x=0 y=1 z=2 t=3
#include <vtkActor.h>
#include <vtkActor2D.h>
#include <vtkCamera.h>
#include <vtkImageActor.h>
#include <vtkImageMapper3D.h>
#include <vtkImageData.h>
#include <vtkImageMapToColors.h>
#include <vtkLookupTable.h>
#include <vtkNamedColors.h>
#include <vtkNew.h>
#include <vtkOutlineFilter.h>
#include <vtkPolyData.h>
#include <vtkPolyDataMapper.h>
#include <vtkPolyDataMapper2D.h>
#include <vtkProperty.h>
#include <vtkProperty2D.h>
#include <vtkPoints.h>
#include <vtkCellArray.h>
#include <vtkRenderWindow.h>
#include <vtkRenderWindowInteractor.h>
#include <vtkRenderer.h>
#include <vtkStripper.h>
#include <vtkCallbackCommand.h>
#include <vtkTextActor.h>
#include <vtkTextProperty.h>
#include <vtkWindowToImageFilter.h>
#include <vtkPNGWriter.h>
#define USE_FLYING_EDGES
#ifdef USE_FLYING_EDGES
#include <vtkFlyingEdges3D.h>
typedef vtkFlyingEdges3D isosurface;
#else
#include <vtkMarchingCubes.h>
typedef vtkMarchingCubes isosurface;
#endif
#include <Grid/Grid.h>
#include <iostream>
#include <fstream>
#include <sstream>
#include <iomanip>
#include <string>
#include <vector>
#include <cmath>
#include <memory>
#include <sys/stat.h>
using namespace Grid;
// ─── I/O ─────────────────────────────────────────────────────────────────────
template <class T>
bool tryReadFile(T& out, const std::string& fname)
{
std::ifstream test(fname);
if (!test.good()) return false;
test.close();
emptyUserRecord record;
ScidacReader RD;
RD.open(fname);
RD.readScidacFieldRecord(out, record);
RD.close();
return true;
}
// ─── Fill a 3D vtkImageData slice from a 4D lattice field ────────────────────
// Sums over the sliced dimension at sliceval, displays the remaining 3 dims.
void fillImageData(vtkImageData* img,
LatticeComplexD& field,
const Coordinate& latt_size,
int slice_dim, int sliceval)
{
// Display dims = all dims except slice_dim, in order
std::vector<int> disp;
for (int d = 0; d < 4; d++) if (d != slice_dim) disp.push_back(d);
int Nx = latt_size[disp[0]];
int Ny = latt_size[disp[1]];
int Nz = latt_size[disp[2]];
for (int ix = 0; ix < Nx; ix++)
for (int iy = 0; iy < Ny; iy++)
for (int iz = 0; iz < Nz; iz++) {
Coordinate site(4);
site[disp[0]] = ix;
site[disp[1]] = iy;
site[disp[2]] = iz;
site[slice_dim] = sliceval;
RealD val = real(peekSite(field, site));
img->SetScalarComponentFromDouble(ix, iy, iz, 0, val);
}
img->Modified();
}
// ─── 2D heatmap: persistent context ───────────────────────────────────────────
// Renders a fixed (dim1=v1, dim2=v2) slice of the 4D force field as a
// diverging blue→white→red colour map, with a fixed symmetric colour scale
// so brightness directly encodes force magnitude across all frames.
// A white cross-hair marks the hotsite projection onto the slice.
struct HeatmapCtx {
// image pipeline
vtkNew<vtkImageData> img;
vtkNew<vtkLookupTable> lut;
vtkNew<vtkImageMapToColors> colorMap;
vtkNew<vtkImageActor> imgActor;
// colour scale legend (text, avoids needing RenderingAnnotation module)
vtkNew<vtkTextActor> cbar;
// hotsite cross-hair (2D overlay actors)
vtkNew<vtkPolyData> crossPD;
vtkNew<vtkPoints> crossPts;
vtkNew<vtkCellArray> crossLines;
vtkNew<vtkActor2D> crossActor;
// title
vtkNew<vtkTextActor> titleAct;
// renderer / window
vtkNew<vtkRenderer> ren;
vtkNew<vtkRenderWindow> renWin;
vtkNew<vtkWindowToImageFilter> w2i;
vtkNew<vtkPNGWriter> writer;
int Nx = 0, Ny = 0; // display dimensions of the slice
double scale = 0.07; // colour range: [-scale, +scale]
int hotX = -1, hotY = -1; // hotsite projection onto (Nx,Ny) plane
// pixel coords of the image origin in the render window
int imgOffX = 60, imgOffY = 40;
int imgW = 0, imgH = 0; // rendered pixel size of each lattice cell
void init(int nx, int ny, double sc, int hx, int hy)
{
Nx = nx; Ny = ny; scale = sc;
hotX = hx; hotY = hy;
const int WIN_W = 900, WIN_H = 700;
// Make cells square and as large as possible within the central area
int cellW = (WIN_W - 160) / Nx;
int cellH = (WIN_H - 120) / Ny;
imgW = std::min(cellW, cellH);
imgH = imgW;
imgOffX = (WIN_W - Nx * imgW) / 2;
imgOffY = 60;
// --- Image data (scalar field, one component) ---
img->SetDimensions(Nx, Ny, 1);
img->SetSpacing(imgW, imgH, 1);
img->SetOrigin(imgOffX, imgOffY, 0);
img->AllocateScalars(VTK_DOUBLE, 1);
// --- Diverging LUT: blue(-scale) → white(0) → red(+scale) ---
lut->SetNumberOfTableValues(512);
lut->SetRange(-scale, scale);
lut->SetNanColor(0.2, 0.2, 0.2, 1.0);
for (int i = 0; i < 512; ++i) {
double t = i / 511.0; // 0=blue, 0.5=white, 1=red
double r = (t > 0.5) ? 1.0 : 2.0 * t;
double g = (t < 0.5) ? 2.0 * t : 2.0 * (1.0 - t);
double b = (t < 0.5) ? 1.0 : 2.0 * (1.0 - t);
lut->SetTableValue(i, r, g, b, 1.0);
}
lut->Build();
// --- Colour map pipeline ---
colorMap->SetInputData(img);
colorMap->SetLookupTable(lut);
colorMap->Update();
imgActor->GetMapper()->SetInputConnection(colorMap->GetOutputPort());
// --- Colour scale legend (text) ---
{
std::ostringstream ss;
ss << std::scientific << std::setprecision(2)
<< "blue=-" << sc << " white=0 red=+" << sc;
cbar->SetInput(ss.str().c_str());
}
cbar->GetTextProperty()->SetFontFamilyToCourier();
cbar->GetTextProperty()->SetFontSize(13);
cbar->GetTextProperty()->SetColor(0.9, 0.9, 0.9);
cbar->SetDisplayPosition(10, 10);
// --- Cross-hair at hotsite (2D display coords) ---
if (hotX >= 0 && hotY >= 0) {
double cx = imgOffX + (hotX + 0.5) * imgW;
double cy = imgOffY + (hotY + 0.5) * imgH;
double arm = imgW * 0.8;
crossPts->InsertNextPoint(cx - arm, cy, 0);
crossPts->InsertNextPoint(cx + arm, cy, 0);
crossPts->InsertNextPoint(cx, cy - arm, 0);
crossPts->InsertNextPoint(cx, cy + arm, 0);
vtkIdType seg0[2] = {0, 1};
vtkIdType seg1[2] = {2, 3};
crossLines->InsertNextCell(2, seg0);
crossLines->InsertNextCell(2, seg1);
crossPD->SetPoints(crossPts);
crossPD->SetLines(crossLines);
vtkNew<vtkPolyDataMapper2D> crossMap;
crossMap->SetInputData(crossPD);
crossActor->SetMapper(crossMap);
crossActor->GetProperty()->SetColor(1, 1, 1);
crossActor->GetProperty()->SetLineWidth(2.0);
}
// --- Title ---
titleAct->GetTextProperty()->SetFontFamilyToCourier();
titleAct->GetTextProperty()->SetFontSize(16);
titleAct->GetTextProperty()->SetColor(1, 1, 0);
titleAct->SetDisplayPosition(10, WIN_H - 30);
// --- Renderer (2D parallel projection so image fills correctly) ---
ren->SetBackground(0.08, 0.08, 0.12);
ren->AddActor(imgActor);
ren->AddActor2D(cbar);
ren->AddActor2D(crossActor);
ren->AddActor2D(titleAct);
ren->GetActiveCamera()->ParallelProjectionOn();
// Set up camera to look straight down at the image plane
ren->GetActiveCamera()->SetPosition(WIN_W/2.0, WIN_H/2.0, 1000);
ren->GetActiveCamera()->SetFocalPoint(WIN_W/2.0, WIN_H/2.0, 0);
ren->GetActiveCamera()->SetViewUp(0, 1, 0);
ren->GetActiveCamera()->SetParallelScale(WIN_H / 2.0);
ren->ResetCameraClippingRange();
renWin->AddRenderer(ren);
renWin->SetSize(WIN_W, WIN_H);
renWin->SetOffScreenRendering(1);
renWin->SetMultiSamples(0);
w2i->SetInput(renWin);
w2i->SetInputBufferTypeToRGB();
w2i->ReadFrontBufferOff();
}
};
void renderHeatmap(HeatmapCtx& ctx,
LatticeComplexD& field,
const Coordinate& latt_size,
int dim1, int val1, // first fixed dimension
int dim2, int val2, // second fixed dimension
const std::string& title,
const std::string& outpath)
{
// Display dimensions: the two dims that are NOT fixed
std::vector<int> disp;
for (int d = 0; d < 4; d++)
if (d != dim1 && d != dim2) disp.push_back(d);
int Nx = latt_size[disp[0]];
int Ny = latt_size[disp[1]];
// Fill image data
for (int ix = 0; ix < Nx; ix++) {
for (int iy = 0; iy < Ny; iy++) {
Coordinate site(4);
site[disp[0]] = ix;
site[disp[1]] = iy;
site[dim1] = val1;
site[dim2] = val2;
RealD val = real(TensorRemove(peekSite(field, site)));
ctx.img->SetScalarComponentFromDouble(ix, iy, 0, 0, val);
}
}
ctx.img->Modified();
ctx.colorMap->Update();
ctx.titleAct->SetInput(title.c_str());
ctx.renWin->Render();
ctx.w2i->Modified();
ctx.w2i->Update();
ctx.writer->SetFileName(outpath.c_str());
ctx.writer->SetInputConnection(ctx.w2i->GetOutputPort());
ctx.writer->Write();
}
// ─── Persistent rendering context (created once, reused every frame) ──────────
// Avoids Metal GPU context exhaustion on macOS when rendering hundreds of frames.
struct RenderCtx {
vtkNew<vtkNamedColors> colors;
vtkNew<vtkImageData> imageData;
vtkNew<isosurface> posEx, negEx;
vtkNew<vtkStripper> posSt, negSt;
vtkNew<vtkPolyDataMapper> posMap, negMap, outMap;
vtkNew<vtkActor> posAct, negAct, outAct;
vtkNew<vtkOutlineFilter> outF;
vtkNew<vtkTextActor> label;
vtkNew<vtkRenderer> ren;
vtkNew<vtkCamera> cam;
vtkNew<vtkRenderWindow> renWin;
vtkNew<vtkWindowToImageFilter> w2i;
vtkNew<vtkPNGWriter> writer;
void init(int Nx, int Ny, int Nz)
{
std::array<unsigned char,4> posColor{{240,184,160,255}};
colors->SetColor("posColor", posColor.data());
std::array<unsigned char,4> bkg{{51,77,102,255}};
colors->SetColor("BkgColor", bkg.data());
imageData->SetDimensions(Nx, Ny, Nz);
imageData->AllocateScalars(VTK_DOUBLE, 1);
posEx->SetInputData(imageData); posEx->SetValue(0, 1.0);
posSt->SetInputConnection(posEx->GetOutputPort());
posMap->SetInputConnection(posSt->GetOutputPort());
posMap->ScalarVisibilityOff();
posAct->SetMapper(posMap);
posAct->GetProperty()->SetDiffuseColor(colors->GetColor3d("posColor").GetData());
posAct->GetProperty()->SetSpecular(0.3);
posAct->GetProperty()->SetSpecularPower(20);
posAct->GetProperty()->SetOpacity(0.6);
negEx->SetInputData(imageData); negEx->SetValue(0, -1.0);
negSt->SetInputConnection(negEx->GetOutputPort());
negMap->SetInputConnection(negSt->GetOutputPort());
negMap->ScalarVisibilityOff();
negAct->SetMapper(negMap);
negAct->GetProperty()->SetDiffuseColor(colors->GetColor3d("Ivory").GetData());
negAct->GetProperty()->SetOpacity(0.6);
outF->SetInputData(imageData);
outMap->SetInputConnection(outF->GetOutputPort());
outAct->SetMapper(outMap);
outAct->GetProperty()->SetColor(colors->GetColor3d("Black").GetData());
label->SetPosition(10, 10);
label->GetTextProperty()->SetFontFamilyToCourier();
label->GetTextProperty()->SetFontSize(18);
label->GetTextProperty()->SetColor(colors->GetColor3d("Gold").GetData());
ren->AddActor(posAct);
ren->AddActor(negAct);
ren->AddActor(outAct);
ren->AddActor2D(label);
ren->SetBackground(colors->GetColor3d("BkgColor").GetData());
cam->SetViewUp(0,0,-1);
cam->SetPosition(0,-1000,0);
cam->SetFocalPoint(0,0,0);
cam->ComputeViewPlaneNormal();
cam->Azimuth(30.0);
cam->Elevation(30.0);
ren->SetActiveCamera(cam);
renWin->AddRenderer(ren);
renWin->SetSize(800, 600);
renWin->SetOffScreenRendering(1);
renWin->SetMultiSamples(0);
w2i->SetInput(renWin);
w2i->SetInputBufferTypeToRGB();
w2i->ReadFrontBufferOff();
}
};
// ─── Render one force field snapshot to a PNG (reuses existing RenderCtx) ─────
void renderPNG(RenderCtx& ctx,
LatticeComplexD& field,
const Coordinate& latt_size,
int slice_dim, int sliceval,
double contour,
const std::string& title,
const std::string& outpath)
{
// Update image data
fillImageData(ctx.imageData, field, latt_size, slice_dim, sliceval);
// Update isosurface levels
ctx.posEx->SetValue(0, contour);
ctx.negEx->SetValue(0, -contour);
// Update label
ctx.label->SetInput(title.c_str());
// Reset camera to fit the (possibly new) data bounds
ctx.ren->ResetCamera();
ctx.cam->Dolly(1.2);
ctx.ren->ResetCameraClippingRange();
ctx.renWin->Render();
ctx.w2i->Modified();
ctx.w2i->Update();
ctx.writer->SetFileName(outpath.c_str());
ctx.writer->SetInputConnection(ctx.w2i->GetOutputPort());
ctx.writer->Write();
}
// ─── main ─────────────────────────────────────────────────────────────────────
int main(int argc, char* argv[])
{
Grid_init(&argc, &argv);
GridLogMessage.Active(0);
GridLogIterative.Active(0);
GridLogDebug.Active(0);
GridLogPerformance.Active(0);
GridLogComms.Active(0);
GridLogDslash.Active(0);
GridLogMemory.Active(0);
// ── CLI ──────────────────────────────────────────────────────────────────
std::string snapdir = ".";
std::string pngdir = "";
int first = 0, last = 1920, step = 1;
int slice_dim = 3, sliceval = 0; // default: fix t=0, display xyz
double iso_rms = 1.0;
double fixed_iso = -1.0; // if >0, use this absolute contour
double tau_start = -1.0; // if >=0, display MD time tau = tau_start + idx*tau_step
double tau_step = 0.0;
// Heatmap mode: fix two dimensions, show 2D colour map
bool do_heatmap = false;
int slice_dim2 = -1, sliceval2 = 0;
double heat_scale = -1.0; // if >0, fixed symmetric colour scale; else auto
Coordinate hotsite({0,0,0,0});
bool has_hotsite = false;
std::string arg;
if (GridCmdOptionExists(argv, argv+argc, "--snapdir"))
snapdir = GridCmdOptionPayload(argv, argv+argc, "--snapdir");
if (GridCmdOptionExists(argv, argv+argc, "--pngdir"))
pngdir = GridCmdOptionPayload(argv, argv+argc, "--pngdir");
if (GridCmdOptionExists(argv, argv+argc, "--first")) {
arg = GridCmdOptionPayload(argv, argv+argc, "--first");
GridCmdOptionInt(arg, first);
}
if (GridCmdOptionExists(argv, argv+argc, "--last")) {
arg = GridCmdOptionPayload(argv, argv+argc, "--last");
GridCmdOptionInt(arg, last);
}
if (GridCmdOptionExists(argv, argv+argc, "--step")) {
arg = GridCmdOptionPayload(argv, argv+argc, "--step");
GridCmdOptionInt(arg, step);
}
if (GridCmdOptionExists(argv, argv+argc, "--slicedim")) {
arg = GridCmdOptionPayload(argv, argv+argc, "--slicedim");
GridCmdOptionInt(arg, slice_dim);
}
if (GridCmdOptionExists(argv, argv+argc, "--sliceval")) {
arg = GridCmdOptionPayload(argv, argv+argc, "--sliceval");
GridCmdOptionInt(arg, sliceval);
}
if (GridCmdOptionExists(argv, argv+argc, "--isosurface")) {
arg = GridCmdOptionPayload(argv, argv+argc, "--isosurface");
GridCmdOptionFloat(arg, iso_rms);
}
if (GridCmdOptionExists(argv, argv+argc, "--fixediso")) {
arg = GridCmdOptionPayload(argv, argv+argc, "--fixediso");
GridCmdOptionFloat(arg, fixed_iso);
}
if (GridCmdOptionExists(argv, argv+argc, "--taustart")) {
arg = GridCmdOptionPayload(argv, argv+argc, "--taustart");
GridCmdOptionFloat(arg, tau_start);
}
if (GridCmdOptionExists(argv, argv+argc, "--taustep")) {
arg = GridCmdOptionPayload(argv, argv+argc, "--taustep");
GridCmdOptionFloat(arg, tau_step);
}
if (GridCmdOptionExists(argv, argv+argc, "--hotsite")) {
arg = GridCmdOptionPayload(argv, argv+argc, "--hotsite");
GridCmdOptionIntVector(arg, hotsite);
has_hotsite = true;
}
if (GridCmdOptionExists(argv, argv+argc, "--heatmap"))
do_heatmap = true;
if (GridCmdOptionExists(argv, argv+argc, "--slicedim2")) {
arg = GridCmdOptionPayload(argv, argv+argc, "--slicedim2");
GridCmdOptionInt(arg, slice_dim2);
}
if (GridCmdOptionExists(argv, argv+argc, "--sliceval2")) {
arg = GridCmdOptionPayload(argv, argv+argc, "--sliceval2");
GridCmdOptionInt(arg, sliceval2);
}
if (GridCmdOptionExists(argv, argv+argc, "--heatscale")) {
arg = GridCmdOptionPayload(argv, argv+argc, "--heatscale");
GridCmdOptionFloat(arg, heat_scale);
}
bool do_png = !pngdir.empty();
if (do_png) mkdir(pngdir.c_str(), 0755);
// ── Grid setup ───────────────────────────────────────────────────────────
auto latt_size = GridDefaultLatt();
auto simd_layout = GridDefaultSimd(Nd, vComplex::Nsimd());
auto mpi_layout = GridDefaultMpi();
GridCartesian grid(latt_size, simd_layout, mpi_layout);
LatticeComplexD field(&grid);
// Force components
struct ForceSpec { std::string prefix; std::string label; };
std::vector<ForceSpec> forces = {
{ "F_IwasakiGaugeAction_lat", "Gauge_lat" },
{ "F_IwasakiGaugeAction_smr", "Gauge_smr" },
{ "F_JacobianAction_smr", "Jacobian" },
{ "F_TwoFlavourEvenOddRatioPseudoFermionActiondet_0.0047_det_0.05_lat", "Ferm0047_lat" },
{ "F_TwoFlavourEvenOddRatioPseudoFermionActiondet_0.0047_det_0.05_smr", "Ferm0047_smr" },
{ "F_TwoFlavourEvenOddRatioPseudoFermionActiondet_0.05_det_0.1_lat", "Ferm005_lat" },
{ "F_TwoFlavourEvenOddRatioPseudoFermionActiondet_0.1_det_0.25_lat", "Ferm01_lat" },
{ "F_TwoFlavourEvenOddRatioPseudoFermionActiondet_0.25_det_0.5_lat", "Ferm025_lat" },
{ "F_TwoFlavourEvenOddRatioPseudoFermionActiondet_0.5_det_1_lat", "Ferm05_lat" },
};
// ── Stdout header ─────────────────────────────────────────────────────────
std::cerr << "idx";
for (auto& fs : forces) {
std::cerr << "\t" << fs.label << "_rms";
if (has_hotsite) std::cerr << "\t" << fs.label << "_hot";
}
std::cerr << "\n";
// ── Persistent render contexts (one GPU context for all frames) ──────────
std::unique_ptr<RenderCtx> ctx; // isosurface mode
std::unique_ptr<HeatmapCtx> hctx; // heatmap mode
// ── Main loop ─────────────────────────────────────────────────────────────
for (int idx = first; idx <= last; idx += step) {
std::cerr << idx;
for (auto& fs : forces) {
std::string fname = snapdir + "/" + fs.prefix + "." + std::to_string(idx);
if (!tryReadFile(field, fname)) {
std::cerr << "\t-";
if (has_hotsite) std::cerr << "\t-";
continue;
}
// RMS (real part)
RealD sumsq = 0.0;
for (int i = 0; i < grid.gSites(); i++) {
Coordinate site;
Lexicographic::CoorFromIndex(site, i, latt_size);
RealD v = real(peekSite(field, site));
sumsq += v * v;
}
RealD rms = std::sqrt(sumsq / grid.gSites());
std::cerr << "\t" << rms;
if (has_hotsite) {
RealD hval = real(TensorRemove(peekSite(field, hotsite)));
std::cerr << "\t" << hval;
}
// PNG output (isosurface or heatmap)
if (do_png) {
// Build title string
std::ostringstream title;
title << fs.label << " ";
if (tau_start >= 0.0 && tau_step > 0.0) {
double tau = tau_start + idx * tau_step;
title << std::fixed << std::setprecision(6) << "tau=" << tau;
} else {
title << "idx=" << idx;
}
title << " rms=" << std::scientific << std::setprecision(3) << rms;
std::ostringstream outpath;
outpath << pngdir << "/" << fs.label
<< "_" << std::setfill('0') << std::setw(6) << idx << ".png";
if (do_heatmap && slice_dim2 >= 0) {
// ── Heatmap mode ────────────────────────────────────────
// Display dims = the two that are NOT fixed
std::vector<int> disp;
for (int d = 0; d < 4; d++)
if (d != slice_dim && d != slice_dim2) disp.push_back(d);
if (!hctx) {
double sc = (heat_scale > 0) ? heat_scale : rms * 20.0;
// Hotsite projection onto display plane
int hx = -1, hy = -1;
if (has_hotsite) {
hx = hotsite[disp[0]];
hy = hotsite[disp[1]];
}
hctx = std::make_unique<HeatmapCtx>();
hctx->init(latt_size[disp[0]], latt_size[disp[1]], sc, hx, hy);
}
title << " scale=+-" << std::fixed << std::setprecision(4) << hctx->scale;
renderHeatmap(*hctx, field, latt_size,
slice_dim, sliceval,
slice_dim2, sliceval2,
title.str(), outpath.str());
} else {
// ── Isosurface mode ─────────────────────────────────────
double contour = (fixed_iso > 0) ? fixed_iso : iso_rms * rms;
title << " iso=" << contour;
if (!ctx) {
std::vector<int> disp;
for (int d = 0; d < 4; d++) if (d != slice_dim) disp.push_back(d);
ctx = std::make_unique<RenderCtx>();
ctx->init(latt_size[disp[0]], latt_size[disp[1]], latt_size[disp[2]]);
}
renderPNG(*ctx, field, latt_size, slice_dim, sliceval,
contour, title.str(), outpath.str());
}
}
}
std::cerr << "\n";
std::cerr.flush();
}
Grid_finalize();
return EXIT_SUCCESS;
}
+742
View File
@@ -0,0 +1,742 @@
// TranscriptToVideo.cxx
//
// Reads a conversation transcript file with [User] / [Claude] turns and
// renders it to an AVI using vtkFFMPEGWriter at 1280x720, 10 fps.
//
// Transcript format:
// [USER] Some question or command, possibly spanning
// multiple continuation lines.
// [ASSISTANT] A response, also possibly
// spanning multiple lines.
// ...
//
// Rules:
// - A line beginning "[User]" or "[Claude]" starts a new turn.
// - Any subsequent non-blank line that does NOT begin with "[" is a
// continuation of the previous turn (joined with a single space).
// - Blank lines are ignored.
//
// Usage:
// ./TranscriptToVideo <transcript.txt> <output.avi>
//
// Typewriter speed : 10 chars/sec → 1 frame/char at 10 fps
// Pause after turn : 0.5 s → 5 frames
// Word-wrap column : 62
#include <vtkActor.h>
#include <vtkActor2D.h>
#include <vtkCellArray.h>
#include <vtkFFMPEGWriter.h>
#include <vtkNamedColors.h>
#include <vtkNew.h>
#include <vtkPoints.h>
#include <vtkPolyData.h>
#include <vtkPolyDataMapper2D.h>
#include <vtkProperty2D.h>
#include <vtkRenderWindow.h>
#include <vtkRenderer.h>
#include <vtkTextActor.h>
#include <vtkTextProperty.h>
#include <vtkWindowToImageFilter.h>
#include <algorithm>
#include <fstream>
#include <iostream>
#include <sstream>
#include <string>
#include <vector>
// ---------------------------------------------------------------------------
// Constants
// ---------------------------------------------------------------------------
static const int WIDTH = 1280;
static const int HEIGHT = 720;
static const int FPS = 10;
static const int CHARS_PER_FRAME = 5; // 50 chars/sec at 10 fps
static const int PAUSE_FRAMES = 3; // 0.3 s
static const int WRAP_COLS = 60;
static const int MAX_HISTORY = 18; // visible completed lines
static const int FONT_SIZE = 18;
static const int TITLE_SIZE = 23;
static const int LINE_HEIGHT = 28; // pixels between lines
static const int MARGIN_LEFT = 48;
static const int MARGIN_TOP = 58; // below title bar
// Colours (R, G, B in 01)
static const double COL_BG[3] = { 0.04, 0.04, 0.10 }; // near-black navy
static const double COL_USER[3] = { 1.00, 0.84, 0.00 }; // gold
static const double COL_CLAUDE[3] = { 0.68, 0.85, 0.90 }; // light steel blue
static const double COL_TITLE[3] = { 1.00, 1.00, 1.00 }; // white
static const double COL_BAR[3] = { 0.30, 0.55, 0.80 }; // progress bar blue
static const double COL_LABEL[3] = { 0.65, 0.65, 0.65 }; // dim grey for speaker tag
// ---------------------------------------------------------------------------
// Structs
// ---------------------------------------------------------------------------
enum class Speaker { User, Claude, Caption };
struct Turn {
Speaker speaker;
std::string text; // full unwrapped text
};
// A rendered display line (already word-wrapped, tagged with speaker)
struct DisplayLine {
Speaker speaker;
std::string prefix; // "[USER] " or "[ASSISTANT] "
std::string body; // wrapped segment
bool isFirst; // first line of this turn (prefix printed)
bool isBlank; // spacer between turns (no text rendered)
bool isCaption; // caption update — body holds text (empty = clear)
DisplayLine() : speaker(Speaker::User), isFirst(false), isBlank(false), isCaption(false) {}
};
// ---------------------------------------------------------------------------
// Replace common multi-byte UTF-8 sequences with ASCII equivalents so that
// VTK's font renderer (which only handles ASCII reliably) does not crash.
// Any remaining non-ASCII byte is replaced with '?'.
// ---------------------------------------------------------------------------
static std::string SanitiseASCII(const std::string& s)
{
std::string out;
out.reserve(s.size());
const unsigned char* p = (const unsigned char*)s.data();
const unsigned char* end = p + s.size();
while (p < end) {
unsigned char c = *p;
if (c < 0x80) {
out += (char)c;
++p;
} else if (c == 0xE2 && (p + 2) < end) {
// 3-byte sequence starting 0xE2
unsigned char b1 = *(p+1), b2 = *(p+2);
// U+2018 ' U+2019 ' (0xE2 0x80 0x98 / 0x99)
if (b1 == 0x80 && (b2 == 0x98 || b2 == 0x99)) { out += '\''; p += 3; }
// U+201C " U+201D " (0xE2 0x80 0x9C / 0x9D)
else if (b1 == 0x80 && (b2 == 0x9C || b2 == 0x9D)) { out += '"'; p += 3; }
// U+2013 en-dash (0xE2 0x80 0x93)
else if (b1 == 0x80 && b2 == 0x93) { out += '-'; p += 3; }
// U+2014 em-dash (0xE2 0x80 0x94)
else if (b1 == 0x80 && b2 == 0x94) { out += '-'; p += 3; }
// U+2026 ellipsis (0xE2 0x80 0xA6)
else if (b1 == 0x80 && b2 == 0xA6) { out += "..."; p += 3; }
// U+2192 arrow (0xE2 0x86 0x92)
else if (b1 == 0x86 && b2 == 0x92) { out += "->"; p += 3; }
// U+00D7 multiplication sign (0xC3 0x97) — caught below, but
// U+22xx math operators: replace with '~'
else { out += '~'; p += 3; }
} else if (c == 0xC3 && (p + 1) < end) {
// 2-byte latin-1 supplement
unsigned char b1 = *(p+1);
if (b1 == 0x97) { out += 'x'; p += 2; } // U+00D7 ×
else if (b1 == 0xB7) { out += '/'; p += 2; } // U+00F7 ÷
else { out += '?'; p += 2; }
} else {
// Unknown multi-byte: skip the whole sequence
out += '?';
++p;
while (p < end && (*p & 0xC0) == 0x80) ++p; // skip continuation bytes
}
}
return out;
}
// ---------------------------------------------------------------------------
// Strip markdown emphasis markers (* and `) so they don't appear in the video.
// ---------------------------------------------------------------------------
static std::string StripMarkdown(const std::string& s)
{
std::string out;
out.reserve(s.size());
for (char c : s) {
if (c == '*' || c == '`') continue;
out += c;
}
return out;
}
// ---------------------------------------------------------------------------
// Word-wrap: splits `text` into lines of at most maxCols characters.
// ---------------------------------------------------------------------------
static std::vector<std::string> WrapText(const std::string& text, int maxCols)
{
std::vector<std::string> lines;
std::istringstream words(text);
std::string word, line;
while (words >> word) {
if (!line.empty() && (int)(line.size() + 1 + word.size()) > maxCols) {
lines.push_back(line);
line = word;
} else {
if (!line.empty()) line += ' ';
line += word;
}
}
if (!line.empty()) lines.push_back(line);
if (lines.empty()) lines.push_back("");
return lines;
}
// ---------------------------------------------------------------------------
// Parse transcript file into list of Turns.
// ---------------------------------------------------------------------------
static std::vector<Turn> ParseTranscript(const std::string& path)
{
std::ifstream f(path);
if (!f) {
std::cerr << "Cannot open transcript: " << path << "\n";
std::exit(1);
}
std::vector<Turn> turns;
std::string line;
while (std::getline(f, line)) {
// Trim trailing CR (Windows files)
if (!line.empty() && line.back() == '\r') line.pop_back();
if (line.empty()) {
// Blank line within a dialogue turn = paragraph break.
// Caption turns don't support paragraph breaks.
if (!turns.empty() && turns.back().speaker != Speaker::Caption) {
turns.back().text += '\n'; // '\n' sentinel: expands to blank spacer
}
continue;
}
if (line.size() >= 6 && line.substr(0, 6) == "[USER]") {
Turn t;
t.speaker = Speaker::User;
t.text = line.substr(6);
while (!t.text.empty() && t.text.front() == ' ') t.text.erase(t.text.begin());
t.text = SanitiseASCII(t.text);
turns.push_back(std::move(t));
} else if (line.size() >= 11 && line.substr(0, 11) == "[ASSISTANT]") {
Turn t;
t.speaker = Speaker::Claude;
t.text = line.substr(11);
while (!t.text.empty() && t.text.front() == ' ') t.text.erase(t.text.begin());
t.text = SanitiseASCII(t.text);
turns.push_back(std::move(t));
} else if (line.size() >= 9 && line.substr(0, 9) == "[CAPTION]") {
Turn t;
t.speaker = Speaker::Caption;
t.text = line.substr(9);
while (!t.text.empty() && t.text.front() == ' ') t.text.erase(t.text.begin());
t.text = SanitiseASCII(t.text);
turns.push_back(std::move(t));
} else if (!turns.empty()) {
// Continuation line — strip leading whitespace, preserve as separate line
size_t start = line.find_first_not_of(" \t");
if (start != std::string::npos) {
if (!turns.back().text.empty()) turns.back().text += '\n';
turns.back().text += SanitiseASCII(line.substr(start));
}
}
}
return turns;
}
// ---------------------------------------------------------------------------
// Expand all Turns into DisplayLines (word-wrapped).
// ---------------------------------------------------------------------------
static std::vector<DisplayLine> ExpandToDisplayLines(const std::vector<Turn>& turns)
{
std::vector<DisplayLine> out;
// Prefix widths kept equal for alignment
const std::string userPfx = "[USER] ";
const std::string claudePfx = "[ASSISTANT] ";
// Track previous non-caption speaker to know when to insert blank spacers
Speaker prevSpeaker = Speaker::Caption; // sentinel: no spacer before first real turn
for (size_t ti = 0; ti < turns.size(); ++ti) {
const Turn& t = turns[ti];
// Caption turn: emit one special DisplayLine, no spacer, no history entry
if (t.speaker == Speaker::Caption) {
DisplayLine dl;
dl.isCaption = true;
dl.speaker = Speaker::Caption;
// body: whitespace-only → clear; otherwise wrap lines joined with \n
std::string trimmed = t.text;
size_t first = trimmed.find_first_not_of(" \t\r\n");
if (first == std::string::npos) {
dl.body = ""; // signal to clear caption
} else {
// Wrap to ~90 cols for the wider caption zone
auto lines = WrapText(trimmed, 90);
for (size_t i = 0; i < lines.size(); ++i) {
if (i > 0) dl.body += "\n";
dl.body += lines[i];
}
}
out.push_back(dl);
continue;
}
// Insert a blank spacer before each new dialogue turn (not before the first)
if (prevSpeaker != Speaker::Caption) {
DisplayLine blank;
blank.isBlank = true;
blank.speaker = t.speaker;
out.push_back(blank);
}
prevSpeaker = t.speaker;
const std::string& pfx = (t.speaker == Speaker::User) ? userPfx : claudePfx;
int bodyWidth = WRAP_COLS - (int)pfx.size();
if (bodyWidth < 20) bodyWidth = 20;
// Split the turn text on '\n' to get individual source lines.
// Empty source lines become blank spacers (paragraph breaks within a turn).
// Non-empty source lines are stripped of markdown markers and word-wrapped.
std::vector<std::string> srcLines;
{
std::string seg;
for (char ch : t.text) {
if (ch == '\n') { srcLines.push_back(seg); seg.clear(); }
else seg += ch;
}
srcLines.push_back(seg);
}
bool firstOfTurn = true;
for (const auto& srcLine : srcLines) {
// Strip markdown emphasis markers
std::string stripped = StripMarkdown(srcLine);
// Trim leading/trailing whitespace
size_t f = stripped.find_first_not_of(" \t");
if (f == std::string::npos) {
// Blank — paragraph break spacer within the turn
DisplayLine blank;
blank.isBlank = true;
blank.speaker = t.speaker;
out.push_back(blank);
continue;
}
size_t l = stripped.find_last_not_of(" \t");
stripped = stripped.substr(f, l - f + 1);
auto wrapped = WrapText(stripped, bodyWidth);
for (size_t i = 0; i < wrapped.size(); ++i) {
DisplayLine dl;
dl.speaker = t.speaker;
dl.prefix = pfx;
dl.body = wrapped[i];
dl.isFirst = firstOfTurn && (i == 0);
out.push_back(dl);
}
firstOfTurn = false;
}
}
return out;
}
// ---------------------------------------------------------------------------
// Helper: set actor text to `s`, configure font/colour, position.
// ---------------------------------------------------------------------------
static void ConfigureTextActor(vtkTextActor* a, int fontSize,
double r, double g, double b)
{
a->GetTextProperty()->SetFontFamilyToCourier();
a->GetTextProperty()->SetFontSize(fontSize);
a->GetTextProperty()->SetColor(r, g, b);
a->GetTextProperty()->SetBold(0);
a->GetTextProperty()->SetItalic(0);
a->GetTextProperty()->ShadowOff();
a->GetTextProperty()->SetJustificationToLeft();
a->GetTextProperty()->SetVerticalJustificationToBottom();
}
// ---------------------------------------------------------------------------
// Create a thin horizontal progress bar actor (2D polygon).
// Returns the actor; caller adds to renderer.
// ---------------------------------------------------------------------------
static vtkActor2D* MakeProgressBar(vtkPolyData*& pd, vtkPoints*& pts)
{
pts = vtkPoints::New();
vtkCellArray* cells = vtkCellArray::New();
// 4 points, updated every frame
pts->SetNumberOfPoints(4);
pts->SetPoint(0, 0, HEIGHT - 8, 0);
pts->SetPoint(1, 0, HEIGHT - 2, 0);
pts->SetPoint(2, 100, HEIGHT - 2, 0);
pts->SetPoint(3, 100, HEIGHT - 8, 0);
vtkIdType quad[4] = { 0, 1, 2, 3 };
cells->InsertNextCell(4, quad);
pd = vtkPolyData::New();
pd->SetPoints(pts);
pd->SetPolys(cells);
cells->Delete();
vtkPolyDataMapper2D* mapper = vtkPolyDataMapper2D::New();
mapper->SetInputData(pd);
vtkActor2D* actor = vtkActor2D::New();
actor->SetMapper(mapper);
actor->GetProperty()->SetColor(COL_BAR[0], COL_BAR[1], COL_BAR[2]);
mapper->Delete();
return actor;
}
// ---------------------------------------------------------------------------
// Main
// ---------------------------------------------------------------------------
int main(int argc, char* argv[])
{
if (argc < 3) {
std::cerr << "Usage: TranscriptToVideo <transcript.txt> <output.avi>\n";
return 1;
}
const std::string transcriptPath = argv[1];
const std::string outputPath = argv[2];
// -----------------------------------------------------------------------
// Parse & expand
// -----------------------------------------------------------------------
auto turns = ParseTranscript(transcriptPath);
auto displayLines = ExpandToDisplayLines(turns);
const int totalLines = (int)displayLines.size();
if (totalLines == 0) {
std::cerr << "Transcript has no parseable turns.\n";
return 1;
}
// Total frames (rough estimate for progress bar denominator)
// Each display line: on average ~40 chars + PAUSE_FRAMES
// We'll compute exact total below after knowing char counts.
int totalFrames = 0;
for (const auto& dl : displayLines) {
int chars = (int)dl.body.size();
totalFrames += (chars + CHARS_PER_FRAME - 1) / CHARS_PER_FRAME + 1 + PAUSE_FRAMES;
}
// -----------------------------------------------------------------------
// VTK Setup
// -----------------------------------------------------------------------
vtkNew<vtkRenderer> ren;
vtkNew<vtkRenderWindow> renWin;
ren->SetBackground(COL_BG[0], COL_BG[1], COL_BG[2]);
renWin->AddRenderer(ren);
renWin->SetSize(WIDTH, HEIGHT);
renWin->SetOffScreenRendering(1);
renWin->SetMultiSamples(0);
// -----------------------------------------------------------------------
// Title actor
// -----------------------------------------------------------------------
vtkNew<vtkTextActor> titleActor;
titleActor->SetInput("Transcript");
ConfigureTextActor(titleActor, TITLE_SIZE,
COL_TITLE[0], COL_TITLE[1], COL_TITLE[2]);
titleActor->GetTextProperty()->SetBold(1);
titleActor->GetTextProperty()->SetJustificationToCentered();
titleActor->SetDisplayPosition(WIDTH / 2, HEIGHT - 36);
ren->AddActor2D(titleActor);
// Thin separator line below title (drawn as a narrow quad)
{
vtkPoints* lpts = vtkPoints::New();
vtkCellArray* lcell = vtkCellArray::New();
lpts->InsertNextPoint(0, HEIGHT - 44, 0);
lpts->InsertNextPoint(WIDTH, HEIGHT - 44, 0);
lpts->InsertNextPoint(WIDTH, HEIGHT - 42, 0);
lpts->InsertNextPoint(0, HEIGHT - 42, 0);
vtkIdType q[4] = {0,1,2,3};
lcell->InsertNextCell(4, q);
vtkPolyData* lpd = vtkPolyData::New();
lpd->SetPoints(lpts);
lpd->SetPolys(lcell);
vtkPolyDataMapper2D* lmap = vtkPolyDataMapper2D::New();
lmap->SetInputData(lpd);
vtkActor2D* lineActor = vtkActor2D::New();
lineActor->SetMapper(lmap);
lineActor->GetProperty()->SetColor(0.3, 0.3, 0.5);
ren->AddActor2D(lineActor);
lpts->Delete(); lcell->Delete(); lpd->Delete();
lmap->Delete(); lineActor->Delete();
}
// -----------------------------------------------------------------------
// History text actors (MAX_HISTORY lines, reused with shifting content)
// -----------------------------------------------------------------------
std::vector<vtkTextActor*> histActors(MAX_HISTORY);
for (int i = 0; i < MAX_HISTORY; ++i) {
histActors[i] = vtkTextActor::New();
histActors[i]->SetInput("");
ConfigureTextActor(histActors[i], FONT_SIZE, 0.5, 0.5, 0.5);
// Position: bottom of history area = just above current-line area
int y = MARGIN_TOP + (MAX_HISTORY - 1 - i) * LINE_HEIGHT;
histActors[i]->SetDisplayPosition(MARGIN_LEFT, HEIGHT - y);
ren->AddActor2D(histActors[i]);
}
// Current (actively typing) line actor — two actors: prefix + body
vtkNew<vtkTextActor> curPfxActor; // "[User] " in dim colour
vtkNew<vtkTextActor> curBodyActor; // body text in vivid colour
vtkNew<vtkTextActor> cursorActor; // blinking block
ConfigureTextActor(curPfxActor, FONT_SIZE, COL_LABEL[0], COL_LABEL[1], COL_LABEL[2]);
ConfigureTextActor(curBodyActor, FONT_SIZE, 1, 1, 1); // will be overridden per turn
ConfigureTextActor(cursorActor, FONT_SIZE, 1, 1, 1);
cursorActor->SetInput("|");
int curLineY = HEIGHT - (MARGIN_TOP + MAX_HISTORY * LINE_HEIGHT);
// Place current line below history
curPfxActor->SetDisplayPosition(MARGIN_LEFT, curLineY);
ren->AddActor2D(curPfxActor);
ren->AddActor2D(curBodyActor);
ren->AddActor2D(cursorActor);
// -----------------------------------------------------------------------
// Progress bar
// -----------------------------------------------------------------------
vtkPolyData* barPD = nullptr;
vtkPoints* barPts = nullptr;
vtkActor2D* barActor = MakeProgressBar(barPD, barPts);
ren->AddActor2D(barActor);
// Progress label
vtkNew<vtkTextActor> progLabelActor;
progLabelActor->SetInput("");
ConfigureTextActor(progLabelActor, 13,
COL_LABEL[0], COL_LABEL[1], COL_LABEL[2]);
progLabelActor->SetDisplayPosition(MARGIN_LEFT, 12);
ren->AddActor2D(progLabelActor);
// -----------------------------------------------------------------------
// Caption actor — bottom-centre, Arial, white, initially hidden
// -----------------------------------------------------------------------
vtkNew<vtkTextActor> captionActor;
captionActor->SetInput("");
captionActor->GetTextProperty()->SetFontFamilyToArial();
captionActor->GetTextProperty()->SetFontSize(20);
captionActor->GetTextProperty()->SetColor(1.0, 1.0, 1.0);
captionActor->GetTextProperty()->SetBold(0);
captionActor->GetTextProperty()->SetItalic(1);
captionActor->GetTextProperty()->ShadowOn();
captionActor->GetTextProperty()->SetShadowOffset(1, -1);
captionActor->GetTextProperty()->SetJustificationToCentered();
captionActor->GetTextProperty()->SetVerticalJustificationToBottom();
// Position: centred horizontally, in the gap between typing line and progress label
captionActor->SetDisplayPosition(WIDTH / 2, 32);
ren->AddActor2D(captionActor);
// -----------------------------------------------------------------------
// FFMPEG writer
// -----------------------------------------------------------------------
vtkNew<vtkWindowToImageFilter> w2i;
w2i->SetInput(renWin);
w2i->SetScale(1);
w2i->ReadFrontBufferOff();
vtkNew<vtkFFMPEGWriter> writer;
writer->SetInputConnection(w2i->GetOutputPort());
writer->SetFileName(outputPath.c_str());
writer->SetRate(FPS);
writer->SetBitRate(4000);
writer->SetBitRateTolerance(400);
writer->Start();
// -----------------------------------------------------------------------
// Helper: measure pixel width of a string in the current font
// (approximate: Courier is monospace, so width ≈ chars × charWidth)
// At font size 17 in Courier, one character ≈ 10.2px wide.
// -----------------------------------------------------------------------
const double CHAR_PX = 10.2;
auto bodyX = [&](const std::string& pfx) -> int {
return MARGIN_LEFT + (int)(pfx.size() * CHAR_PX);
};
// -----------------------------------------------------------------------
// Render one frame
// -----------------------------------------------------------------------
int frameCount = 0;
auto renderFrame = [&]() {
renWin->Render();
w2i->Modified();
writer->Write();
++frameCount;
};
// -----------------------------------------------------------------------
// History ring — holds completed display lines (most-recent last)
// -----------------------------------------------------------------------
std::vector<DisplayLine> history;
auto refreshHistory = [&]() {
// slot 0 = bottom row (newest), slot MAX_HISTORY-1 = top row (oldest).
// Brightness fades linearly from 0.85 (slot 0) to 0.20 (slot MAX-1).
// Blank spacer lines show as empty strings.
// The [USER]/[ASSISTANT] prefix stays at a fixed bright level so the
// speaker is always identifiable even in dim history.
int n = (int)history.size();
for (int slot = 0; slot < MAX_HISTORY; ++slot) {
int idx = n - 1 - slot; // slot 0 → newest, slot MAX-1 → oldest
if (idx < 0) {
histActors[slot]->SetInput("");
continue;
}
const auto& hl = history[idx];
if (hl.isBlank) {
histActors[slot]->SetInput("");
continue;
}
// Graduated brightness: bright near bottom, dim near top
double bodyBright = 0.20 + 0.65 * (1.0 - (double)slot / (MAX_HISTORY - 1));
const double* col = (hl.speaker == Speaker::User) ? COL_USER : COL_CLAUDE;
if (hl.isFirst) {
// Prefix stays vivid; body fades
// We render prefix + body as one string but colour the whole line
// at body brightness — a compromise since VTK TextActor is single-colour.
// Use a slightly higher floor for the prefix line so the tag is readable.
double pfxBright = std::min(1.0, bodyBright + 0.25);
std::string txt = hl.prefix + hl.body;
histActors[slot]->SetInput(txt.c_str());
histActors[slot]->GetTextProperty()->SetColor(
col[0] * pfxBright, col[1] * pfxBright, col[2] * pfxBright);
} else {
std::string txt = std::string(hl.prefix.size(), ' ') + hl.body;
histActors[slot]->SetInput(txt.c_str());
histActors[slot]->GetTextProperty()->SetColor(
col[0] * bodyBright, col[1] * bodyBright, col[2] * bodyBright);
}
}
};
// -----------------------------------------------------------------------
// Main animation loop
// -----------------------------------------------------------------------
int dlIdx = 0; // index into displayLines
// Count total turns to feed into progress label
// (display line index → turn index: count isFirst lines)
int totalTurns = 0;
for (const auto& dl : displayLines) if (dl.isFirst) ++totalTurns;
int turnsSeen = 0;
for (int li = 0; li < totalLines; ++li) {
const DisplayLine& dl = displayLines[li];
// Caption update: swap the bottom caption, no history, no typewriter
if (dl.isCaption) {
captionActor->SetInput(dl.body.c_str());
renderFrame();
continue;
}
// Blank spacer: push to history silently with no typewriter frames
if (dl.isBlank) {
history.push_back(dl);
refreshHistory();
continue;
}
// Vivid colour for current speaker
double cr, cg, cb;
if (dl.speaker == Speaker::User) {
cr = COL_USER[0]; cg = COL_USER[1]; cb = COL_USER[2];
} else {
cr = COL_CLAUDE[0]; cg = COL_CLAUDE[1]; cb = COL_CLAUDE[2];
}
curBodyActor->GetTextProperty()->SetColor(cr, cg, cb);
cursorActor->GetTextProperty()->SetColor(cr, cg, cb);
// Prefix: vivid speaker colour for first line, indent for continuation
if (dl.isFirst) {
curPfxActor->SetInput(dl.prefix.c_str());
curPfxActor->GetTextProperty()->SetColor(cr, cg, cb);
} else {
curPfxActor->SetInput(std::string(dl.prefix.size(), ' ').c_str());
curPfxActor->GetTextProperty()->SetColor(0, 0, 0);
}
int bx = bodyX(dl.prefix);
curBodyActor->SetDisplayPosition(bx, curLineY);
cursorActor->SetDisplayPosition(bx, curLineY);
if (dl.isFirst) ++turnsSeen;
// Update progress label
{
char buf[64];
std::snprintf(buf, sizeof(buf), "Turn %d / %d", turnsSeen, totalTurns);
progLabelActor->SetInput(buf);
}
// Update progress bar width
{
double frac = (totalFrames > 0) ? (double)frameCount / totalFrames : 0.0;
double barW = frac * (WIDTH - 2 * MARGIN_LEFT);
barPts->SetPoint(0, MARGIN_LEFT, HEIGHT - 8, 0);
barPts->SetPoint(1, MARGIN_LEFT, HEIGHT - 2, 0);
barPts->SetPoint(2, MARGIN_LEFT + barW, HEIGHT - 2, 0);
barPts->SetPoint(3, MARGIN_LEFT + barW, HEIGHT - 8, 0);
barPts->Modified();
barPD->Modified();
}
// Typewriter: reveal CHARS_PER_FRAME characters per rendered frame.
// Always render the fully-complete state last.
const std::string& body = dl.body;
int ci = 0;
while (true) {
curBodyActor->SetInput(body.substr(0, ci).c_str());
double cx = bx + ci * CHAR_PX;
cursorActor->SetDisplayPosition((int)cx, curLineY);
cursorActor->SetVisibility((frameCount % 2 == 0) ? 1 : 0);
renderFrame();
if (ci >= (int)body.size()) break;
ci = std::min(ci + CHARS_PER_FRAME, (int)body.size());
}
// Line is complete — move it to history
history.push_back(dl);
refreshHistory();
// Clear current line display
curPfxActor->SetInput("");
curBodyActor->SetInput("");
cursorActor->SetVisibility(0);
// Pause frames
for (int p = 0; p < PAUSE_FRAMES; ++p) {
// Update progress bar
{
double frac = (totalFrames > 0) ? (double)frameCount / totalFrames : 0.0;
double barW = frac * (WIDTH - 2 * MARGIN_LEFT);
barPts->SetPoint(0, MARGIN_LEFT, HEIGHT - 8, 0);
barPts->SetPoint(1, MARGIN_LEFT, HEIGHT - 2, 0);
barPts->SetPoint(2, MARGIN_LEFT + barW, HEIGHT - 2, 0);
barPts->SetPoint(3, MARGIN_LEFT + barW, HEIGHT - 8, 0);
barPts->Modified();
barPD->Modified();
}
renderFrame();
}
}
// -----------------------------------------------------------------------
// Finish
// -----------------------------------------------------------------------
writer->End();
// Tidy up manual ref-counted objects
barActor->Delete();
barPD->Delete();
barPts->Delete();
for (auto* a : histActors) a->Delete();
std::cerr << "Wrote " << frameCount << " frames ("
<< frameCount / FPS << " s) to " << outputPath << "\n";
return 0;
}
+729
View File
@@ -0,0 +1,729 @@
// Derived from VTK/Examples/Cxx/Medical2.cxx
// The example reads a volume dataset, extracts two isosurfaces that
// represent the skin and bone, and then displays them.
//
// Modified heavily by Peter Boyle to display lattice field theory data as movies and compare multiple files
#include <vtkActor.h>
#include <vtkCamera.h>
#include <vtkMetaImageReader.h>
#include <vtkNamedColors.h>
#include <vtkNew.h>
#include <vtkOutlineFilter.h>
#include <vtkPolyDataMapper.h>
#include <vtkProperty.h>
#include <vtkRenderWindow.h>
#include <vtkRenderWindowInteractor.h>
#include <vtkRenderer.h>
#include <vtkStripper.h>
#include <vtkImageData.h>
#include <vtkVersion.h>
#include <vtkCallbackCommand.h>
#include <vtkTextActor.h>
#include <vtkTextProperty.h>
#define MPEG
#ifdef MPEG
#include <vtkFFMPEGWriter.h>
#endif
#include <vtkProperty2D.h>
#include <vtkSliderWidget.h>
#include <vtkSliderRepresentation2D.h>
#include <vtkWindowToImageFilter.h>
#include <array>
#include <string>
#include <Grid/Grid.h>
#define USE_FLYING_EDGES
#ifdef USE_FLYING_EDGES
#include <vtkFlyingEdges3D.h>
typedef vtkFlyingEdges3D isosurface;
#else
#include <vtkMarchingCubes.h>
typedef vtkMarchingCubes isosurface;
#endif
int mpeg = 0 ;
int framerate = 10;
template <class T> void readFile(T& out, std::string const fname){
Grid::emptyUserRecord record;
Grid::ScidacReader RD;
RD.open(fname);
RD.readScidacFieldRecord(out,record);
RD.close();
}
using namespace Grid;
class FrameUpdater : public vtkCallbackCommand
{
public:
FrameUpdater() {
ffile=0;
TimerCount = 0;
xoff = 0;
t = 0;
imageData = nullptr;
timerId = 0;
maxCount = -1;
old_file=-1;
}
static FrameUpdater* New()
{
FrameUpdater* cb = new FrameUpdater;
cb->TimerCount = 0;
return cb;
}
//
// Must map a x,y,z + frame index into
// i) a d-dimensional site Coordinate
// ii) a file name
// Need a:
// loop_ranges
// sum_ranges
// loop_vol -- map loop_idx -> loop_coor
// sum_vol -- map sum_idx -> sum_coor with Lexicographic
//
/*
* Just set this up
*/
int old_file ; // Cache, avoid reread
Coordinate latt;
Coordinate xyz_dims ; // List lattice dimensions corresponding to xyz_dims displayed
Coordinate xyz_ranges ; // 3-vector
Coordinate g_xyz_ranges; // Nd-vector
uint64_t xyz_vol ;
Coordinate loop_dims; // List lattice dimensions put into movie time
Coordinate loop_ranges; // movie time ranges
uint64_t loop_vol;
Coordinate sum_dims; // List lattice dimensions summed
Coordinate sum_ranges; // summation ranges
uint64_t sum_vol;
Coordinate slice_dims; // List slice dimensions
Coordinate Slice;
std::vector<std::string> files; // file list that is looped over
int Nd;
GridBase *grid;
Grid::LatticeComplexD *grid_data;
void SetGrid(GridBase *_grid)
{
grid = _grid;
Nd=grid->Nd();
latt = grid->GlobalDimensions();
grid_data = new Grid::LatticeComplexD(grid);
}
void SetFiles(std::vector<std::string> list) { files = list; old_file = -1; }
void SetSlice(Coordinate _Slice) { Slice = _Slice;} // Offset / skew for lattice coords
void SetSumDimensions (Coordinate _SumDims ) {
sum_ranges=Coordinate(Nd);
sum_dims = _SumDims; // 1 hot for dimensions summed
sum_vol = 1;
for(int d=0;d<sum_dims.size();d++){
if ( sum_dims[d] == 1 ) sum_ranges[d] = latt[d];
else sum_ranges[d] = 1;
sum_vol*=sum_ranges[d];
}
}
void SetLoopDimensions(Coordinate _LoopDims) {
loop_ranges=Coordinate(Nd);
loop_dims= _LoopDims;
loop_vol = 1;
for(int d=0;d<loop_dims.size();d++){
if ( loop_dims[d] == 1 ) loop_ranges[d] = latt[d];
else loop_ranges[d] = 1;
loop_vol*=loop_ranges[d];
}
} //
void SetDisplayDimensions(Coordinate _xyz_dims ) {
g_xyz_ranges=Coordinate(Nd);
xyz_ranges=Coordinate(3);
xyz_dims = _xyz_dims;
xyz_vol = 1;
for(int d=0;d<3;d++){
xyz_ranges[d] = latt[xyz_dims[d]];
xyz_vol *= xyz_ranges[d];
}
// Find dim extents for grid
int dd=0;
for(int d=0;d<Nd;d++){
g_xyz_ranges[d] = 1;
for(int dd=0;dd<3;dd++) {
if ( xyz_dims[dd]==d ) {
g_xyz_ranges[d] = latt[d];
}
}
}
}
void SetSliceDimensions(void) {
Coordinate _slice_dims;
for ( int d=0;d<Nd;d++){
int is_slice = 1;
if(g_xyz_ranges[d]>1) is_slice = 0;
if(loop_dims[d]) is_slice = 0;
if(sum_dims[d] ) is_slice = 0;
if(is_slice) _slice_dims.push_back(d);
}
slice_dims = _slice_dims;
std::cout << " Setting Slice Dimensions to "<<slice_dims<<std::endl;
}
virtual void Execute(vtkObject* caller, unsigned long eventId,void* vtkNotUsed(callData))
{
const int max=256;
char text_string[max];
auto latt_size = grid->GlobalDimensions();
if ( vtkCommand::KeyPressEvent == eventId ) {
vtkRenderWindowInteractor* iren = static_cast<vtkRenderWindowInteractor*>(caller);
std::string key = iren->GetKeySym();
std::cout << "Pressed: " << key << std::endl;
if (slice_dims.size()>0) {
int vert = slice_dims[slice_dims.size()-1];
int horz = slice_dims[0];
if ( key == "Up" ) {
Slice[vert] = (Slice[vert]+1)%latt[vert];
}
if ( key == "Down" ) {
Slice[vert] = (Slice[vert]+latt[vert]-1)%latt[vert];
}
if ( key == "Right" ) {
Slice[horz] = (Slice[horz]+1)%latt[horz];
}
if ( key == "Left" ) {
Slice[horz] = (Slice[horz]+latt[horz]-1)%latt[horz];
}
}
if ( key == "greater" ) {
ffile = (ffile + 1) % files.size();
}
if ( key == "less" ) {
ffile = (ffile - 1 + files.size()) % files.size();
}
std::cout <<"Slice " <<Slice <<std::endl;
std::cout <<"File " <<ffile <<std::endl;
}
// Make a new frame for frame index TimerCount
if ( vtkCommand::TimerEvent == eventId || vtkCommand::KeyPressEvent == eventId)
{
int file = ((this->TimerCount / loop_vol) + ffile )%files.size();
if ( file != old_file ) {
readFile(*grid_data,files[file]);
old_file = file;
}
RealD max, min, max_abs,min_abs;
Coordinate max_site;
Coordinate min_site;
Coordinate max_abs_site;
Coordinate min_abs_site;
for(int idx=0;idx<grid->gSites();idx++){
Coordinate site;
Lexicographic::CoorFromIndex (site,idx,latt);
RealD val=real(peekSite(*grid_data,site));
if (idx==0){
max = min = val;
max_abs = min_abs = fabs(val);
max_site = site;
min_site = site;
min_abs_site = site;
max_abs_site = site;
} else {
if ( val > max ) {
max=val;
max_site = site;
}
if ( fabs(val) > max_abs ) {
max_abs=fabs(val);
max_abs_site = site;
}
if ( val < min ) {
min=val;
min_site = site;
}
if ( fabs(val) < min_abs ) {
min_abs=fabs(val);
min_abs_site = site;
}
}
}
std::cout << " abs_max "<<max_abs<<" at " << max_abs_site<<std::endl;
std::cout << " abs_min "<<min_abs<<" at " << min_abs_site<<std::endl;
std::cout << " max "<<max<<" at " << max_site<<std::endl;
std::cout << " min "<<min<<" at " << min_site<<std::endl;
// Looped dimensions, map index to coordinate
int loop_idx = this->TimerCount % loop_vol;
Coordinate loop_coor;
Lexicographic::CoorFromIndex (loop_coor,loop_idx,loop_ranges);
// Loop over xyz sites
Coordinate xyz_coor(3);
Coordinate g_xyz_coor(Nd);
Coordinate sum_coor(Nd);
for(uint64_t xyz = 0 ; xyz< xyz_vol; xyz++){
Lexicographic::CoorFromIndex (xyz_coor,xyz,xyz_ranges);
Lexicographic::CoorFromIndex (g_xyz_coor,xyz,g_xyz_ranges);
RealD sum_value = 0.0;
for(uint64_t sum_idx = 0 ; sum_idx< sum_vol; sum_idx++){
Lexicographic::CoorFromIndex (sum_coor,sum_idx,sum_ranges);
Coordinate site(Nd);
for(int d=0;d<Nd;d++){
site[d] = (sum_coor[d] + loop_coor[d] + g_xyz_coor[d] + Slice[d])%latt[d];
}
sum_value+= real(peekSite(*grid_data,site));
if(xyz==0) std::cout << "sum "<<sum_idx<<" "<<sum_value<<std::endl;
}
imageData->SetScalarComponentFromDouble(xyz_coor[0],xyz_coor[1],xyz_coor[2],0,sum_value);
}
imageData->Modified();
std::stringstream ss;
ss<< files[file] <<"\nSlice "<<Slice << "\nLoop " <<loop_coor<<"\nSummed "<<sum_dims;
text->SetInput(ss.str().c_str());
vtkRenderWindowInteractor* iren = dynamic_cast<vtkRenderWindowInteractor*>(caller);
iren->GetRenderWindow()->Render();
}
if ( vtkCommand::TimerEvent == eventId ) {
++this->TimerCount;
std::cout << " This was a timer event count "<<this->TimerCount << std::endl;
}
if (this->TimerCount >= this->maxCount) {
vtkRenderWindowInteractor* iren = dynamic_cast<vtkRenderWindowInteractor*>(caller);
if (this->timerId > -1)
{
iren->DestroyTimer(this->timerId);
}
}
}
private:
int TimerCount;
int ffile;
int xoff;
int t;
public:
vtkImageData* imageData = nullptr;
vtkTextActor* text = nullptr;
vtkFFMPEGWriter *writer = nullptr;
int timerId ;
int maxCount ;
double rms;
isosurface * posExtractor;
isosurface * negExtractor;
};
class SliderCallback : public vtkCommand
{
public:
static SliderCallback* New()
{
return new SliderCallback;
}
virtual void Execute(vtkObject* caller, unsigned long eventId, void* callData)
{
vtkSliderWidget *sliderWidget = vtkSliderWidget::SafeDownCast(caller);
if (sliderWidget)
{
contour = ((vtkSliderRepresentation *)sliderWidget->GetRepresentation())->GetValue();
}
fu->posExtractor->SetValue(0, SliderCallback::contour*fu->rms);
fu->negExtractor->SetValue(0, -SliderCallback::contour*fu->rms);
fu->posExtractor->Modified();
fu->negExtractor->Modified();
}
public:
static double contour;
FrameUpdater * fu;
};
FrameUpdater * KBfu;
void KeypressCallbackFunction(vtkObject* caller, long unsigned int eventId,
void* clientData, void* callData)
{
std::cout << "Keypress callback" << std::endl;
vtkRenderWindowInteractor* iren = static_cast<vtkRenderWindowInteractor*>(caller);
std::cout << "Pressed: " << iren->GetKeySym() << std::endl;
// imageData->Modified();
}
double SliderCallback::contour;
int main(int argc, char* argv[])
{
using namespace Grid;
Grid_init(&argc, &argv);
GridLogLayout();
auto latt_size = GridDefaultLatt();
auto simd_layout = GridDefaultSimd(latt_size.size(), vComplex::Nsimd());
auto mpi_layout = GridDefaultMpi();
GridCartesian Grid(latt_size, simd_layout, mpi_layout);
double default_contour = 1.0;
std::string arg;
std::cout << argc << " command Line arguments "<<std::endl;
for(int c=0;c<argc;c++) {
std::cout << " - "<<argv[c]<<std::endl;
}
std::vector<std::string> file_list({
"file1",
"file2",
"file3",
"file4",
"file5",
"file6",
"file7",
"file8"
});
if( GridCmdOptionExists(argv,argv+argc,"--files") ){
arg=GridCmdOptionPayload(argv,argv+argc,"--files");
GridCmdOptionCSL(arg, file_list);
}
#ifdef MPEG
if( GridCmdOptionExists(argv,argv+argc,"--mpeg") ){
mpeg = 1;
}
#endif
if( GridCmdOptionExists(argv,argv+argc,"--fps") ){
arg=GridCmdOptionPayload(argv,argv+argc,"--fps");
GridCmdOptionInt(arg,framerate);
}
if( GridCmdOptionExists(argv,argv+argc,"--isosurface") ){
arg=GridCmdOptionPayload(argv,argv+argc,"--isosurface");
GridCmdOptionFloat(arg,default_contour);
}
for(int c=0;c<file_list.size();c++) {
std::cout << " file: "<<file_list[c]<<std::endl;
}
int NoTime = 0;
int Nd; Nd = Grid.Nd();
Coordinate Slice(Nd,0);
Coordinate SumDims(Nd,0);
Coordinate LoopDims(Nd,0);
Coordinate XYZDims({0,1,2});
if( GridCmdOptionExists(argv,argv+argc,"--slice") ){
arg=GridCmdOptionPayload(argv,argv+argc,"--slice");
GridCmdOptionIntVector(arg,Slice);
}
if( GridCmdOptionExists(argv,argv+argc,"--sum") ){
arg=GridCmdOptionPayload(argv,argv+argc,"--sum");
GridCmdOptionIntVector(arg,SumDims);
}
if( GridCmdOptionExists(argv,argv+argc,"--loop") ){
arg=GridCmdOptionPayload(argv,argv+argc,"--loop");
GridCmdOptionIntVector(arg,LoopDims);
}
if( GridCmdOptionExists(argv,argv+argc,"--xyz") ){
arg=GridCmdOptionPayload(argv,argv+argc,"--xyz");
GridCmdOptionIntVector(arg,XYZDims);
std::cout << "xyz : "<<XYZDims<<std::endl;
}
if( GridCmdOptionExists(argv,argv+argc,"--notime") ){
NoTime = 1;
std::cout << "Suppressing time loop"<<std::endl;
}
// Common things:
vtkNew<vtkNamedColors> colors;
std::array<unsigned char, 4> posColor{{240, 184, 160, 255}}; colors->SetColor("posColor", posColor.data());
std::array<unsigned char, 4> bkg{{51, 77, 102, 255}}; colors->SetColor("BkgColor", bkg.data());
// Create the renderer, the render window, and the interactor. The renderer
// draws into the render window, the interactor enables mouse- and
// keyboard-based interaction with the data within the render window.
//
vtkNew<vtkRenderWindow> renWin;
vtkNew<vtkRenderWindowInteractor> iren;
iren->SetRenderWindow(renWin);
// std::vector<LatticeComplexD> data(file_list.size(),&Grid);
// FieldMetaData header;
int frameCount = file_list.size();
for(int d=0;d<Grid.Nd();d++) {
if ( LoopDims[d] ) frameCount*= latt_size[d];
}
// It is convenient to create an initial view of the data. The FocalPoint
// and Position form a vector direction. Later on (ResetCamera() method)
// this vector is used to position the camera to look at the data in
// this direction.
vtkNew<vtkCamera> aCamera;
aCamera->SetViewUp(0, 0, -1);
aCamera->SetPosition(0, -1000, 0);
aCamera->SetFocalPoint(0, 0, 0);
aCamera->ComputeViewPlaneNormal();
aCamera->Azimuth(30.0);
aCamera->Elevation(30.0);
vtkNew<vtkRenderer> aRenderer;
renWin->AddRenderer(aRenderer);
double vol = Grid.gSites();
std::cout << "Reading "<<file_list[0]<<std::endl;
double nrm, nrmbar,rms, contour;
{
LatticeComplexD data(&Grid);
readFile(data,file_list[0]);
nrm = norm2(data);
}
nrmbar = nrm/vol;
rms = sqrt(nrmbar);
contour = default_contour * rms; // default to 1 x RMS
// The following reader is used to read a series of 2D slices (images)
// that compose the volume. The slice dimensions are set, and the
// pixel spacing. The data Endianness must also be specified. The reader
// uses the FilePrefix in combination with the slice number to construct
// filenames using the format FilePrefix.%d. (In this case the FilePrefix
// is the root name of the file: quarter.)
vtkNew<vtkImageData> imageData;
imageData->SetDimensions(latt_size[0],latt_size[1],latt_size[2]);
imageData->AllocateScalars(VTK_DOUBLE, 1);
for(int xx=0;xx<latt_size[0];xx++){
for(int yy=0;yy<latt_size[1];yy++){
for(int zz=0;zz<latt_size[2];zz++){
Coordinate site({xx,yy,zz,0});
RealD value = 0;
imageData->SetScalarComponentFromDouble(xx,yy,zz,0,value);
}}}
vtkNew<isosurface> posExtractor;
posExtractor->SetInputData(imageData);
posExtractor->SetValue(0, contour);
vtkNew<vtkStripper> posStripper;
posStripper->SetInputConnection(posExtractor->GetOutputPort());
vtkNew<vtkPolyDataMapper> posMapper;
posMapper->SetInputConnection(posStripper->GetOutputPort());
posMapper->ScalarVisibilityOff();
vtkNew<vtkActor> pos;
pos->SetMapper(posMapper);
pos->GetProperty()->SetDiffuseColor(colors->GetColor3d("posColor").GetData());
pos->GetProperty()->SetSpecular(0.3);
pos->GetProperty()->SetSpecularPower(20);
pos->GetProperty()->SetOpacity(0.5);
// An isosurface, or contour value is set
// The triangle stripper is used to create triangle strips from the
// isosurface; these render much faster on may systems.
vtkNew<isosurface> negExtractor;
negExtractor->SetInputData(imageData);
negExtractor->SetValue(0, -contour);
vtkNew<vtkStripper> negStripper;
negStripper->SetInputConnection(negExtractor->GetOutputPort());
vtkNew<vtkPolyDataMapper> negMapper;
negMapper->SetInputConnection(negStripper->GetOutputPort());
negMapper->ScalarVisibilityOff();
vtkNew<vtkActor> neg;
neg->SetMapper(negMapper);
neg->GetProperty()->SetDiffuseColor(colors->GetColor3d("Ivory").GetData());
// An outline provides context around the data.
vtkNew<vtkOutlineFilter> outlineData;
outlineData->SetInputData(imageData);
vtkNew<vtkPolyDataMapper> mapOutline;
mapOutline->SetInputConnection(outlineData->GetOutputPort());
vtkNew<vtkActor> outline;
outline->SetMapper(mapOutline);
outline->GetProperty()->SetColor(colors->GetColor3d("Black").GetData());
vtkNew<vtkTextActor> Text;
// Text->SetInput(file_list[f].c_str());
Text->SetPosition2(0,0);
Text->GetTextProperty()->SetFontSize(24);
Text->GetTextProperty()->SetColor(colors->GetColor3d("Gold").GetData());
vtkNew<vtkTextActor> TextT;
TextT->SetInput("T=0");
TextT->SetPosition(0,.7*1025);
TextT->GetTextProperty()->SetFontSize(24);
TextT->GetTextProperty()->SetColor(colors->GetColor3d("Gold").GetData());
// Actors are added to the renderer. An initial camera view is created.
// The Dolly() method moves the camera towards the FocalPoint,
// thereby enlarging the image.
// aRenderer->AddActor(Text);
aRenderer->AddActor(TextT);
aRenderer->AddActor(outline);
aRenderer->AddActor(pos);
aRenderer->AddActor(neg);
// Sign up to receive TimerEvent
vtkNew<FrameUpdater> fu;
fu->SetGrid(&Grid);
fu->SetFiles(file_list);
fu->SetSlice(Slice);
fu->SetSumDimensions (SumDims);
fu->SetLoopDimensions(LoopDims);
fu->SetDisplayDimensions(XYZDims);
fu->SetSliceDimensions();
fu->imageData = imageData;
// fu->grid_data = &data[f];
fu->text = TextT;
fu->maxCount = frameCount;
fu->posExtractor = posExtractor;
fu->negExtractor = negExtractor;
fu->rms = rms;
iren->AddObserver(vtkCommand::TimerEvent, fu);
iren->AddObserver(vtkCommand::KeyPressEvent, fu);
aRenderer->SetActiveCamera(aCamera);
aRenderer->ResetCamera();
aRenderer->SetBackground(colors->GetColor3d("BkgColor").GetData());
aCamera->Dolly(1.0);
// double nf = file_list.size();
// std::cout << " Adding renderer " <<f<<" of "<<nf<<std::endl;
aRenderer->SetViewport(0.0, 0.0,1.0 , 1.0);
// Note that when camera movement occurs (as it does in the Dolly()
// method), the clipping planes often need adjusting. Clipping planes
// consist of two planes: near and far along the view direction. The
// near plane clips out objects in front of the plane; the far plane
// clips out objects behind the plane. This way only what is drawn
// between the planes is actually rendered.
aRenderer->ResetCameraClippingRange();
// Set a background color for the renderer and set the size of the
// render window (expressed in pixels).
// Initialize the event loop and then start it.
renWin->SetSize(1024, 1024);
renWin->SetWindowName("FieldDensity");
renWin->Render();
// Take a pointer to the FrameUpdater for keypress mgt.
// KBfu = fu;
// vtkNew<vtkCallbackCommand> keypressCallback;
// keypressCallback->SetCallback(KeypressCallbackFunction);
// iren->AddObserver(vtkCommand::KeyPressEvent,keypressCallback);
iren->Initialize();
if ( mpeg ) {
#ifdef MPEG
vtkWindowToImageFilter *imageFilter = vtkWindowToImageFilter::New();
imageFilter->SetInput( renWin );
imageFilter->SetInputBufferTypeToRGB();
vtkFFMPEGWriter *writer = vtkFFMPEGWriter::New();
writer->SetFileName("movie.avi");
writer->SetRate(framerate);
writer->SetInputConnection(imageFilter->GetOutputPort());
writer->Start();
for(int i=0;i<fu->maxCount;i++){
fu->Execute(iren,vtkCommand::TimerEvent,nullptr);
imageFilter->Modified();
writer->Write();
}
writer->End();
writer->Delete();
#else
assert(-1 && "MPEG support not compiled");
#endif
} else {
// Add control of contour threshold
// Create a slider widget
vtkSmartPointer<vtkSliderRepresentation2D> sliderRep = vtkSmartPointer<vtkSliderRepresentation2D>::New();
sliderRep->SetMinimumValue(0.0);
sliderRep->SetMaximumValue(10.0);
sliderRep->SetValue(1.0);
sliderRep->SetTitleText("Fraction RMS");
// Set color properties:
// Change the color of the knob that slides
// sliderRep->GetSliderProperty()->SetColor(colors->GetColor3d("Green").GetData());
sliderRep->GetTitleProperty()->SetColor(colors->GetColor3d("AliceBlue").GetData());
sliderRep->GetLabelProperty()->SetColor(colors->GetColor3d("AliceBlue").GetData());
sliderRep->GetSelectedProperty()->SetColor(colors->GetColor3d("DeepPink").GetData());
// Change the color of the bar
sliderRep->GetTubeProperty()->SetColor(colors->GetColor3d("MistyRose").GetData());
sliderRep->GetCapProperty()->SetColor(colors->GetColor3d("Yellow").GetData());
sliderRep->SetSliderLength(0.05);
sliderRep->SetSliderWidth(0.025);
sliderRep->SetEndCapLength(0.02);
sliderRep->GetPoint1Coordinate()->SetCoordinateSystemToNormalizedDisplay();
sliderRep->GetPoint1Coordinate()->SetValue(0.1, 0.1);
sliderRep->GetPoint2Coordinate()->SetCoordinateSystemToNormalizedDisplay();
sliderRep->GetPoint2Coordinate()->SetValue(0.9, 0.1);
vtkSmartPointer<vtkSliderWidget> sliderWidget = vtkSmartPointer<vtkSliderWidget>::New();
sliderWidget->SetInteractor(iren);
sliderWidget->SetRepresentation(sliderRep);
sliderWidget->SetAnimationModeToAnimate();
sliderWidget->EnabledOn();
// Create the slider callback
vtkSmartPointer<SliderCallback> slidercallback = vtkSmartPointer<SliderCallback>::New();
slidercallback->fu = fu;
sliderWidget->AddObserver(vtkCommand::InteractionEvent, slidercallback);
if ( NoTime==0 ) {
int timerId = iren->CreateRepeatingTimer(10000/framerate);
std::cout << "timerId "<<timerId<<std::endl;
}
// Start the interaction and timer
iren->Start();
}
Grid_finalize();
return EXIT_SUCCESS;
}