Параллельные вычисления в ИММ УрО РАН
 
 
next up previous contents index
Next: Checking the BLAS and Up: Performance Evaluation Previous: Performance Evaluation

Obtaining High Performance with ScaLAPACK Codes

We suggest the following approach to obtain high performance with ScaLAPACK codes:

  • Use the best BLAS and BLACS libraries available.
  • Start with a standard data distribution.
    • A square processor grid (tex2html_wrap_inline17051) if tex2html_wrap_inline17053 
    • A one dimensional processor grid (Ptex2html_wrap_inline12112=1, Ptex2html_wrap_inline12114=P) if P < 9
    • Block size = 64 
  • Determine whether reasonable performance is being achieved.
  • Identify the performance bottleneck(s), if any,
  • Tune the distribution or routine parameters to improve performance further.

The standard data distribution will typically achieve 25-50% of the peak performance possible (depending in part on how many processors are ignored, i.e., the difference between tex2html_wrap_inline17061 and tex2html_wrap_inline17063). We do not recommend experimenting with different data distributions until performance that is acceptable (or nearly so) has been achieved. If each individual node requires a block size larger than 64 to achieve near-peak performance on local matrix-matrix multiply, the block size may have to be increased. This step is unlikely, however, unless the computer has a shared-memory multiprocessor with more than four processors on each node.



Susan Blackford
Tue May 13 09:21:01 EDT 1997