Dennis Bollweg
|
b8b9dc952d
|
Async memcpy's and cleanup
|
2024-02-01 17:55:35 -05:00 |
|
Dennis Bollweg
|
79a6ed32d8
|
Use accelerator_for2d and DeviceSegmentedRecude to avoid kernel launch latencies
|
2024-02-01 16:41:03 -05:00 |
|
dbollweg
|
caa5f97723
|
Add sliceSum gpu using cub/hipcub
|
2024-01-31 16:50:06 -05:00 |
|
david clarke
|
4924b3209e
|
projectU3 yields a unitary matrix
|
2024-01-23 14:43:58 -07:00 |
|
Peter Boyle
|
eb702f581b
|
Running on 12 rhs on 18 nodes of frontier
|
2024-01-22 17:44:15 -05:00 |
|
david clarke
|
f5b3d582b0
|
first attempt at U3 projection
|
2024-01-22 02:49:40 -07:00 |
|
david clarke
|
981c93d67a
|
update Test_fatLinks to accept Naik
|
2024-01-21 21:09:19 -07:00 |
|
Peter Boyle
|
d967eb53de
|
Working for first time
|
2024-01-17 16:31:12 -05:00 |
|
Peter Boyle
|
25f71913b7
|
MultiRHS coarse
|
2024-01-04 12:01:17 -05:00 |
|
Peter Boyle
|
d5fd90b2f3
|
Add 48^3 rtest
|
2024-01-04 12:00:01 -05:00 |
|
Peter Boyle
|
22c611bd1a
|
Delete temp file
|
2023-12-21 18:32:31 -05:00 |
|
Peter Boyle
|
c9bb1bf8ea
|
Passing new BLAs based
|
2023-12-21 18:31:17 -05:00 |
|
Peter Boyle
|
9e489887cf
|
General coarse multiRHS move to BLAS implementation
|
2023-12-21 15:24:48 -05:00 |
|
Peter Boyle
|
abcd6b8cb6
|
Faster version
|
2023-12-19 15:17:46 -05:00 |
|
Peter Boyle
|
6835a7f208
|
Better logging, test on 81 point stencil
|
2023-11-29 19:20:47 -05:00 |
|
Peter Boyle
|
f59993b979
|
Nbasis§
|
2023-11-29 09:47:36 -05:00 |
|
Peter Boyle
|
e859a199df
|
Reduce volume to interior for coarse stencil -- worth up to 4x gain
|
2023-11-28 10:23:16 -05:00 |
|
Peter Boyle
|
0a3682ad0b
|
MultiRHS work
|
2023-11-28 07:43:37 -05:00 |
|
Peter Boyle
|
59abaeb5cd
|
Time stamp
|
2023-11-24 12:56:45 -05:00 |
|
Peter Boyle
|
b302ad3d49
|
multiRHS test in place, passes Yay!
|
2023-11-23 18:20:15 -05:00 |
|
Peter Boyle
|
09946cf1ba
|
Improved, works on 48^3 moving to multiRHS optimisations
|
2023-11-15 18:03:05 -05:00 |
|
david clarke
|
9cd4128833
|
fix naik bug
|
2023-11-03 14:11:38 -06:00 |
|
david clarke
|
df9b958c40
|
naik now returns separately
|
2023-10-30 17:40:53 -06:00 |
|
david clarke
|
3d3376d1a3
|
LePage works, trying Naik
|
2023-10-27 16:26:31 -06:00 |
|
Peter Boyle
|
9c9c42d0df
|
Tests on frontier with real speed up . 3.5x on 16^3 at mq=0.01
|
2023-10-20 19:27:13 -04:00 |
|
Peter Boyle
|
0ae4478cd9
|
Checkpoint the subspace and ldop
|
2023-10-20 19:27:13 -04:00 |
|
Peter Boyle
|
ae4e705e09
|
Use random vec as easier for debug
|
2023-10-20 19:27:13 -04:00 |
|
david clarke
|
21ed6ac0f4
|
added floating-point support
|
2023-10-20 13:54:26 -06:00 |
|
david clarke
|
7bb8ab7000
|
improve smearing templating
|
2023-10-20 08:41:02 -06:00 |
|
david clarke
|
391fd9cc6a
|
try lepage term
|
2023-10-17 14:57:15 -06:00 |
|
david clarke
|
36600899e2
|
working 7-link; Grid_log; generalShift
|
2023-10-12 11:11:39 -06:00 |
|
david clarke
|
b9c70d156b
|
Merge branch 'develop' into hisq_fat_links
|
2023-10-10 22:44:17 -06:00 |
|
david clarke
|
eb89579fe7
|
Merge remote-tracking branch 'origin/develop' into develop
|
2023-10-10 22:43:51 -06:00 |
|
david clarke
|
0cfd13d18b
|
7-link working
|
2023-10-10 22:41:52 -06:00 |
|
Peter Boyle
|
2111e7ab5f
|
Run at physical mass
|
2023-10-06 21:20:21 -04:00 |
|
Peter Boyle
|
a751c42cc5
|
Checkpoint restore the setup
|
2023-10-06 21:03:08 -04:00 |
|
Peter Boyle
|
b58fd80379
|
I/O for coarse op and reorganise multigrid headers
|
2023-10-06 13:43:46 -04:00 |
|
Peter Boyle
|
3bc2da5321
|
Merge branch 'feature/scidac-wp1' of https://github.com/paboyle/Grid into feature/scidac-wp1
|
2023-10-05 16:57:59 -04:00 |
|
Peter Boyle
|
2d710d6bfd
|
Optimised parameters for 16^3
|
2023-10-05 16:56:55 -04:00 |
|
Peter Boyle
|
6532b7f32b
|
Eliminate older inefficient coarsening implementation
|
2023-10-05 16:56:15 -04:00 |
|
Peter Boyle
|
fcf5023845
|
Running on Frontier
|
2023-10-05 16:50:59 -04:00 |
|
Peter Boyle
|
737d3ffb98
|
ADEF1 and 1 hop projection
|
2023-10-03 14:22:18 -04:00 |
|
Peter Boyle
|
8a70314f54
|
Merge branch 'develop' into feature/scidac-wp1
|
2023-10-02 17:24:55 -04:00 |
|
Peter Boyle
|
c5f1420dea
|
Merge remote-tracking branch 'LupoA/develop' into LupoA-develop
|
2023-10-02 16:22:35 -04:00 |
|
Peter Boyle
|
018e6da872
|
Merge pull request #440 from giltirn/feature/paddedcellgauge
Feature/paddedcellgauge
|
2023-10-02 10:00:42 -04:00 |
|
Peter Boyle
|
e187bcb85c
|
Updating
|
2023-09-29 17:10:17 -04:00 |
|
Peter Boyle
|
be18ffe3b4
|
Further tuning and lanczos
|
2023-09-27 16:21:58 -04:00 |
|
Peter Boyle
|
3a86cce8c1
|
Compile
|
2023-09-27 16:19:18 -04:00 |
|
Peter Boyle
|
37884d369f
|
Coarse space is expensive, but gives a speed up in fine matrix multiplies now.
Down to optimisation
|
2023-09-25 17:24:19 -04:00 |
|
Peter Boyle
|
9246e653cd
|
Basic non-local coarsening of operator test
|
2023-09-25 17:20:58 -04:00 |
|