Параллельные вычисления в ИММ УрО РАН
 
 
next up previous contents index
Next: Achieving High Performance on Up: Achieving High Performance with Previous: Achieving High Performance with

Achieving High Performance on a Distributed Memory Computer

      

Assuming that the ScaLAPACK installation was done correctly, the users need only make sure that they are using an appropriate number of processors and that their matrices are efficiently distributed. Here is a checklist to get started.

  • Use the right number of processors.
    • Rule of thumb: tex2html_wrap_inline16164 for an tex2html_wrap_inline15127 matrix. This provides a local matrix of size approximately 1000 by 1000. 
    • Do not try to solve a small problem on too many processors.
    • Do not exceed physical memory.
  • Use an efficient data distribution.
    • Block sizegif (i.e., MB,NB) = 64. 
    • Square processor grid, tex2html_wrap_inline16172. 
  • Use efficient machine-specific BLAS (not the Fortran 77 reference implementation BLAS) and BLACS (nondebug, BLACSDBGLVL=0 in Bmake.inc)

If the performance is still below that expected, see section 5.3. For guidelines on tuning for higher performance, see section 5.4.



Susan Blackford
Tue May 13 09:21:01 EDT 1997