mirror of
				https://github.com/paboyle/Grid.git
				synced 2025-10-21 16:34:44 +01:00 
			
		
		
		
	Number of IO MPI tasks can be varied by selecting which
dimensions use parallel IO and which dimensions use Serial send to boss
I/O.
Thus can neck down from, say 1024 nodes = 4x4x8x8 to {1,8,32,64,128,256,1024} nodes
doing the I/O.
Interpolates nicely between ALL nodes write their data, a single boss per time-plane
in processor space [old UKQCD fortran code did this], and a single node doing all I/O.
Not sure I have the transfer sizes big enough and am not overly convinced fstream
is guaranteed to not give buffer inconsistencies unless I set streambuf size to zero.
Practically it has worked on 8 tasks, 2x1x2x2 writing /cloning NERSC configurations
on my MacOS + OpenMPI and Clang environment.
It is VERY easy to switch to pwrite at a later date, and also easy to send x-strips around from
each node in order to gather bigger chunks at the syscall level.
That would push us up to the circa 8x 18*4*8 == 4KB size write chunk, and by taking, say, x/y non
parallel we get to 16MB contiguous chunks written in multi 4KB transactions
per IOnode in 64^3 lattices for configuration I/O.
I suspect this is fine for system performance.
		
	
		
			
				
	
	
		
			17 lines
		
	
	
		
			133 B
		
	
	
	
		
			Bash
		
	
	
		
			Executable File
		
	
	
	
	
			
		
		
	
	
			17 lines
		
	
	
		
			133 B
		
	
	
	
		
			Bash
		
	
	
		
			Executable File
		
	
	
	
	
| #!/bin/bash
 | |
| 
 | |
| export LANG=C
 | |
| 
 | |
| WAS=$1
 | |
| shift
 | |
| IS=$1
 | |
| shift
 | |
| 
 | |
| while (( "$#" )); do
 | |
| 
 | |
| echo $1
 | |
| sed -e "s@${WAS}@${IS}@g" -i .bak $1
 | |
| 
 | |
| shift
 | |
| 
 | |
| done |