Nirvana Queue Question
Jeffrey Oldham
oldham at codesourcery.com
Tue Oct 16 23:01:13 UTC 2001
I was trying to execute
bsub -K -q shared -n 49 'mpirun -np 49 ./SGI64KCC/Doof2d --sim-params 2800 0 1 --num-patches 7 --run-impls 5 --pooma-stats --pooma-nocompress -mpi > results_mpi_5_p49_400_7' && pprof -m >> field_mpi_5_p49_400_7
The job got stuck for over an hour. Do you think this is user error
or did the job allocator refuse to allocate the job because every
possible 49-processor chunk already had something running? Note the
latter does not make since because there are more shared jobs running
than the 112 nodes running them. Also note I experienced no such
problem yesterday with similar jobs up to 36 processors.
Thanks for the information,
Jeffrey D. Oldham
oldham at codesourcery.com
More information about the pooma-dev
mailing list