gfilaci
|
f7373e97a4
|
Missing conjugate in MooeeInvDag
|
2019-12-16 10:05:50 +01:00 |
|
Peter Boyle
|
848079e8ba
|
Merge pull request #235 from grid-test-organisation/feature/5d-improvement
MooeeInv and M5D optimisations + enable threading with nvcc
|
2019-12-10 21:45:03 -05:00 |
|
Peter Boyle
|
9b6b0caa55
|
Junk commit fix
|
2019-12-09 03:01:58 -05:00 |
|
Peter Boyle
|
3d2fe80780
|
Temporary size depends on checkerboard/uncheckerboard. The Mdir cares
|
2019-12-09 02:58:24 -05:00 |
|
gfilaci
|
a7fa86dc29
|
MooeeInv improvement for DW EOFA + comments
|
2019-09-05 12:05:21 +01:00 |
|
gfilaci
|
fdd9b14e82
|
speed up MooeeInvDag for DWF EOFA
|
2019-09-02 14:49:51 +01:00 |
|
gfilaci
|
e66669d300
|
fast MooeeInv for EOFA
|
2019-09-02 14:26:13 +01:00 |
|
gfilaci
|
0efaf3c4fa
|
access M5D coeffs through pointers
|
2019-09-02 11:33:00 +01:00 |
|
gfilaci
|
3ef519aaa4
|
fast MooeeInv
|
2019-09-02 11:18:14 +01:00 |
|
Peter Boyle
|
48e6efc7c9
|
Merge branch 'develop' into feature/gpu-port
Conflicts:
Grid/qcd/action/fermion/WilsonKernelsAsm.cc
Grid/qcd/action/fermion/implementation/ImprovedStaggeredFermionImplementation.h
Grid/qcd/action/fermion/implementation/StaggeredKernelsAsm.h
benchmarks/Benchmark_comms.cc
|
2019-08-14 18:56:54 +01:00 |
|
Peter Boyle
|
53e3ab4131
|
Fix force term
|
2019-08-11 11:06:13 +01:00 |
|
Peter Boyle
|
1282e1067f
|
Do the force term on the accelerator too. Needed particularly because comms buffers
are device memory.
|
2019-07-29 22:58:35 +01:00 |
|
Peter Boyle
|
fe700a183a
|
Getting HMC to run
|
2019-07-26 12:18:29 +01:00 |
|
Peter Boyle
|
fa9cd50c5b
|
Merge branch 'develop' into feature/gpu-port
|
2019-07-16 11:55:17 +01:00 |
|
Peter Boyle
|
bd155ca5c0
|
Overlap comms with comput now supported
|
2019-07-12 09:09:40 +01:00 |
|
Peter Boyle
|
d7b3efe893
|
Compile fix
|
2019-06-15 17:03:15 +01:00 |
|
Peter Boyle
|
decc99ca76
|
Accelerator version
|
2019-06-15 12:43:00 +01:00 |
|
Peter Boyle
|
464cd65931
|
Still to test this fully
|
2019-06-15 12:35:14 +01:00 |
|
Peter Boyle
|
a1ec2f4723
|
Still to test this routine fully
|
2019-06-15 12:33:55 +01:00 |
|
Peter Boyle
|
ea9662ec85
|
Thread loop changes
|
2019-06-15 09:09:57 +01:00 |
|
Peter Boyle
|
52c74f1cac
|
Thread loop changes
|
2019-06-15 09:08:16 +01:00 |
|
Peter Boyle
|
9a13d2992c
|
lean up
|
2019-06-15 09:05:16 +01:00 |
|
Peter Boyle
|
b0449ae270
|
Thread loop changes
|
2019-06-15 09:04:19 +01:00 |
|
Peter Boyle
|
1299225105
|
Accelerator loop changes
|
2019-06-15 09:03:46 +01:00 |
|
Peter Boyle
|
5925e7f405
|
Thread for changes
|
2019-06-15 09:01:30 +01:00 |
|
Peter Boyle
|
36f06555a2
|
Simplify Impl
|
2019-06-09 22:26:27 +01:00 |
|
Peter Boyle
|
d6c0e0756d
|
Remove GPU version
|
2019-06-09 11:23:42 +01:00 |
|
Peter Boyle
|
3e41b1055c
|
Remove Gpu only kernels.
|
2019-06-09 11:20:01 +01:00 |
|
Peter Boyle
|
e78a5e7838
|
ASM instantiation without link errors
|
2019-06-09 01:25:21 +01:00 |
|
Peter Boyle
|
c933ac2248
|
Temporarily introduce a SIMT_loop to test out approaches prior to making a global change to
accelerator_loop
|
2019-06-08 13:44:27 +01:00 |
|
Peter Boyle
|
ad2c433574
|
Instantiations move. Tried using Gianluca's suggestion about avoiding threadIdx but doesn't
seem to make a difference. Will revisit this and probably remove the lane parameter from the coalescedRead
|
2019-06-08 13:43:12 +01:00 |
|
Peter Boyle
|
86e7fb6e86
|
Instantiation relocation
|
2019-06-08 13:42:46 +01:00 |
|
Peter Boyle
|
fb91dda7be
|
Hand instantiation moved location
|
2019-06-08 13:42:26 +01:00 |
|
Peter Boyle
|
82cf7bc5ab
|
Move instantiation into fermion/instantiation
|
2019-06-08 13:41:46 +01:00 |
|
Peter Boyle
|
e452cc0a22
|
Move static variables into instantiation .cc file
|
2019-06-08 13:41:20 +01:00 |
|
Peter Boyle
|
4d2b938166
|
Remove explict instantiation from here
|
2019-06-08 13:41:01 +01:00 |
|
Peter Boyle
|
10d16ab76c
|
Remove explict instantiation from here
|
2019-06-08 13:40:32 +01:00 |
|
Peter Boyle
|
0ee6e77cbc
|
Compiles GPU and CPU, still gives good performance on CPU
|
2019-06-05 13:28:16 +01:00 |
|
Peter Boyle
|
7323099966
|
Instatiation fix
|
2019-06-05 00:14:38 +01:00 |
|
Peter Boyle
|
6379651cdd
|
Generic or GPU ready for benchmark test on GPU
|
2019-06-05 00:13:52 +01:00 |
|
Peter Boyle
|
ba4fd756b9
|
Fix signature, but deprecating this loops style
|
2019-06-05 00:12:36 +01:00 |
|
Peter Boyle
|
d185fc1ebf
|
clean up instantiation
|
2019-06-05 00:11:52 +01:00 |
|
Peter Boyle
|
96b36d8367
|
Instantiation clean up
|
2019-06-05 00:11:27 +01:00 |
|
Peter Boyle
|
899f8b5065
|
Instantiation clean up 5d vec removal
|
2019-06-05 00:11:05 +01:00 |
|
Peter Boyle
|
c8d0483fe9
|
Remove 5d vectorisation
|
2019-06-05 00:10:37 +01:00 |
|
Peter Boyle
|
0f214e5f76
|
Clean up instantiation
|
2019-06-05 00:10:13 +01:00 |
|
Peter Boyle
|
ade4a126da
|
Getting closer on the GPU port, but will start deleting 5th dim vectorised variants
for code maintainability
|
2019-06-04 11:53:44 +01:00 |
|
Peter Boyle
|
7b59ab5bd7
|
Compiling after reorganisation
|
2019-06-03 15:46:26 +01:00 |
|
Peter Boyle
|
fcd8cfe257
|
Gparity in
|
2019-06-03 15:45:09 +01:00 |
|
Peter Boyle
|
b4b53812cb
|
Move implementation to specific implementation headers
|
2019-06-03 15:43:01 +01:00 |
|