==============================================================
 High Performance Computing Linpack Benchmark (HPL)
 HPL 2.0 - September 10, 2008
==============================================================

 1) Retrieve the tar file, then

    gunzip hpl.tgz; tar -xvf hpl.tar

 this  will create an  hpl  directory,  that we call below the
 top-level directory.

 2) Create a file Make.<arch> in the  top-level directory. For
 this purpose,  you  may  want  to re-use one contained in the
 setup directory. This file essentially contains the compilers
 and librairies with their paths to be used.

 3) Type "make arch=<arch>". This  should create an executable
 in the bin/<arch> directory called xhpl.

 For example, on our Linux PII cluster, I create a file called
 Make.Linux_PII in the top-level directory. Then, I type
    "make arch=Linux_PII" 
 This creates the executable file bin/Linux_PII/xhpl.

 4) Quick check: run a few tests:

    cd bin/<arch>
    mpirun -np 4 xhpl

 5) Tuning: Most of the performance  parameters can be tuned,
 by modifying the input file bin/HPL.dat. See the file TUNING
 in the top-level directory.

==============================================================

 Compile time options:  At the end of the "model" Make.<arch>,
 ---------------------  the  user  is given the opportunity to
 compile the software with some specific compile options.  The
 list of this options and their meaning are:

    -DHPL_COPY_L
       force the copy of the panel L before bcast;

    -DHPL_CALL_CBLAS
       call the cblas interface;

    -DHPL_CALL_VSIPL
       call the vsip  library;

    -DHPL_DETAILED_TIMING
       enables detail timers;

 The  user  must  choose  between  either  the BLAS Fortran 77
 interface,  or the  BLAS  C  interface,  or the VSIPL library
 depending on which computational kernels are available on his
 system. Only one of these options should be selected.  If you
 choose the BLAS Fortran 77 interface, it is necessary to fill
 out the machine-specific C to Fortran 77 interface section of
 the  Make.<arch>  file.  To  do this,  please  refer  to  the 
 Make.<arch> examples contained in the setup directory.

 By default HPL will:
    *) not copy L before broadcast,
    *) call the BLAS Fortran 77 interface,
    *) not display detailed timing information.

 As an example,  suppose  one wants  HPL  to copy the panel of
 columns  into  a  contiguous buffer  before broadcasting.  In
 theory,  it  would be more efficient to let  HPL  create  the
 appropriate  MPI  user-defined data type since this may avoid 
 the data copy. So, it is a strange idea, but one insists.  To
 achieve this one would add -DHPL_COPY_L  to the definition of
 HPL_OPTS  at the end of the file  Make.<arch>.  Issue  then a
 "make clean arch=<arch>; make build arch=<arch>" and the xhpl
 executable will be re-build with that feature in.
==============================================================
 
 Check out  the website  www.netlib.org/benchmark/hpl  for the
 latest information.
==============================================================
