From owner-hpff  Sat May 11 12:38:54 1996
Received: (from daemon@localhost) by cs.rice.edu (8.7.1/8.7.1) id MAA13933 for hpff-out; Sat, 11 May 1996 12:38:54 -0500 (CDT)
Received: from tsunami.umiacs.UMD.EDU (tsunami.umiacs.umd.edu [128.8.119.2]) by cs.rice.edu (8.7.1/8.7.1) with ESMTP id MAA13922 for <hpff@cs.rice.edu>; Sat, 11 May 1996 12:38:50 -0500 (CDT)
Received: by tsunami.umiacs.UMD.EDU (8.7.5/UMIACS-0.9/04-05-88)
	id NAA08808; Sat, 11 May 1996 13:38:49 -0400 (EDT)
From: jtaylor@umiacs.UMD.EDU (Jason Taylor)
Message-Id: <199605111738.NAA08808@tsunami.umiacs.UMD.EDU>
Subject: hpff: Early opionions about DEC F90 and HPF in general
To: hpff@cs.rice.edu
Date: Sat, 11 May 1996 13:38:49 -0400 (EDT)
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-hpff
Precedence: bulk

---------------------------------------------------------------------------
hpff@cs.rice.edu is a mailing list for announcements related to High
Performance Fortran.  Instructions for adding or deleting yourself
from this list appear at the bottom of this message.
---------------------------------------------------------------------------

SUMMARY OF THIS MESSAGE: In my opinion, DEC's F90 HPF must have been
written by computer science majors, because it, and to a lesser extent
HPF in general, is NOT geared toward simple implementation of the most
common real-world applications of Fortran.

DETAILS: I got DEC's HPF F90 documentation kit yesterday.  From page
6-3 of D.H.P.F.90:

 "For new code, HPF is generally useful in the following cases:

  o Problems utilizing iterative solution methods
  o Signal processing
  o Image processing
  o Modeling
  o Grid-based problems in general
  o In general, most problems expressible as operations on large arrays"

	Reading that pretty much solidified my very negative opinions
as to how simple writing my programs using DEC's HPF to take advantage
of multiprocessing would be.  You see, the above is mostly geared
toward problems that manipulate array elements.  But I am not writing
programs that do time-evolution with very large arrays, etc.  Most of
my programs spend their cpu time under a top level DO loops which in
many cases just calculate a complicated function f(x) where, e.g.,
x=x_i and i=1,100.  Frequently, to evaluate f(x), I do a two- or
three-dimensional integration.  Thus the actual lines of code in the
main program that represent f(x) is generally quite small.  

	Let me be more precise.  The following is a real-world example
of such code.  The main block of this program is 1000 lines or so.
The code was written in RATFOR.  In the code, first, several variables
are setup.  Then a DO loop starts.  Then the integration routine QAGS
is called.  Then the results are printed out.  On a 277 MHz alpha
processor, each iteration (of I) takes ~5 seconds to evaluate, for a
total time of ~200 seconds.

...  (many setup statements removed)
#The following steps through the values of Vz and integrates over r.
Vzmax=1E5; Vzmin=50
IRES = 40  # IRES =(#of steps in Vz>1 plot).
VzSCALE=IRES/LOG(Vzmax); IMIN=INT(LOG(Vzmin)*VzSCALE)
Ithelpres=IRES-IMIN+1
Vzlast=Vzmax;dLdVzlast=0.
DO I=IRES,IMIN,-1
{Vz=(exp(I/VzSCALE)-exp(IMIN/VzSCALE))
 Rmax=Rmin
 While ((Sqrt(-2*(PHI(Rmax)))>Vz)&(Rmax<Rc/1.1)) {Rmax=Rmax*1.1}
 CALL QAGS(FrEXP,DLOG(Rmin),DLOG(Rmax),EPSABS,EPSREL,RESULT,ABSERR,INEVAL,IER)
 IF (IER.NE.0) {WRITE(6,*)'"RERROR!;ABSERR,INEVAL,IER="',ABSERR,INEVAL,IER}
 dLdVz=RESULT*sigma*2*PI/(Lumb/Lumi)
 #In units of 1/(km/sec);sigma=1km/sec=>neednt mult.
 WRITE(2,12)Vz*sigma,dLdVz,Lumb,INEVAL,ILASTVNEVAL,dLdVz*ABSERR,
  Rmax,Rmin,il*1.
 WRITE(6,14)Vz*sigma,dLdVz,Lumb,INEVAL,ILASTVNEVAL,IER,dLdVz*ABSERR,
  Rmax,Rmin,il*1.
 WRITE(14,12)dLdVz                                       #redlp.d
} # I
...  (many other statements removed)

Notes: (1) In theory, the above code should be `embarrassingly' simple
to parallelize; each value of dLdVz(Vz(I)) is independent from the
others.  (2) In normal F77 (after RATFOR preprocessing completes),
there are no arrays here.  (3) Almost all of the FORTRAN programs I
write are very similar in nature.  [Realize, there are SEVERAL points
where a code can be parallelized, and speed-wise it probably doesn't
matter too much which point is selected, so long as it is not at such
a `low' or `detailed' level that inter-processor communication
bandwidth becomes relevant.]

	I was thinking that I should be able to add one !HPF$
INDEPENDENT line to my code.  Ideally, the compiler should be able to
figure out the rest, since the above case is so simple.

	Instead, the manuals state that (1) the !HPF$ INDEPENDENT
command is not yet implemented.  (2) I must move the above lines into
a new "PURE" subroutine.  (3) I must add the necessary common blocks
such that the various functions and initialization variables are
passed to the new subroutine.  (4) I must convert the code to use
arrays, making, I presume, dLdVz(Vz) an array [dLdVz(I)].  (5) I must
replace the DO statement with the FORALL statement.  (6) I must add a
DISTRIBUTE statement describing how the target array is to be
distributed across the microprocessors.

	This is not just a lot of work, it is against the fundamental
principles of elegant TOP-DOWN programming.  If I am misunderstanding
the manuals, please correct me.  But it would appear to me that I will
end up making the situation MUCH more complicated than it really is.
Why?  Why?

	If I was REALLY young and willing to do this, I wouldn't be
using F90; I'd probably be using C.  So from a sales perspective, I
would have thought that there is good reason to make the transition to
F90 from F77 simple and painless as possible.  But I am finding this
transition not only to be painful, but even a step backward from what
I am used to.

	I can only speculate as to why the very typical above type of
program is not handled easily by DEC HPF90.  I suppose that most
parallel benchmarking programs of interest to ex-computer-science
majors require high inter-processor bandwidth.  Perhaps that is what
the software designers are concerned with (passing of large arrays,
etc.)  However, most non-time-evolution-like codes that take
significant computer time, such as the one above, should not require
significant cross-cpu bandwidth.  (Namely, a minimum of only 1000
bytes every second.)  THEY should be parallelizable without any
significant effort by the programmer.  But I am finding that this is
not the way HPF is being designed.  How sad.  

Comments anyone?

Regards,
Jason
-- 
Jason A. Taylor
Grad. Research Asst.  
Dept. of Physics University of Maryland 
USA
---------------------------------------------------------------------------
To (un)subscribe to this list, send mail to hpff-request@cs.rice.edu.  Leave
the subject line blank, and in the body put the line
(un)subscribe <email-address>
---------------------------------------------------------------------------

From owner-hpff  Mon May 13 11:40:24 1996
Received: (from daemon@localhost) by cs.rice.edu (8.7.1/8.7.1) id LAA14657 for hpff-out; Mon, 13 May 1996 11:40:24 -0500 (CDT)
Received: from elis.rug.ac.be (root@trappist.elis.rug.ac.be [157.193.67.1]) by cs.rice.edu (8.7.1/8.7.1) with SMTP id LAA14646 for <hpff@cs.rice.edu>; Mon, 13 May 1996 11:40:08 -0500 (CDT)
Received: from alfa (alfa.elis.rug.ac.be) by elis.rug.ac.be with SMTP id AA06719
  (5.65c/IDA-1.4.4 for <hpff@cs.rice.edu>); Mon, 13 May 1996 18:41:15 +0200
Date: Mon, 13 May 1996 17:29:05 +0100
Message-Id: <96051317290577@alfa.elis.rug.ac.be>
From: dhollander@alfa.elis.rug.ac.be (Erik D'Hollander)
To: hpff@cs.rice.edu
Subject: Re: hpff: Early opionions about DEC F90 and HPF in general
X-Vms-To: SMTP%"hpff@cs.rice.edu"
X-Vms-Cc: DHOLLANDER
Sender: owner-hpff
Precedence: bulk

---------------------------------------------------------------------------
hpff@cs.rice.edu is a mailing list for announcements related to High
Performance Fortran.  Instructions for adding or deleting yourself
from this list appear at the bottom of this message.
---------------------------------------------------------------------------

On 11-MAY-1996 jtaylor@umiacs.umd.edu wrote:

>SUMMARY OF THIS MESSAGE: In my opinion, DEC's F90 HPF must have been
>written by computer science majors, because it, and to a lesser extent
>HPF in general, is NOT geared toward simple implementation of the most
>common real-world applications of Fortran.

You are right when you assume that HPF is more complex than f77. The 
reason is that HPF (at this moment) is geared towards distributed 
computer platforms, where the communication is a bottleneck. Therefore 
HPF contains directives which allow the user to specify which data 
belongs together. An "intelligent" HPF-compiler will use these 
directives to generate good code and more importantly minimal 
communication overhead.

HPF is not a language which "detects" the parallelism in the program. 
It even doesn't detect the vectorizable loops, i.e. you have to write 
yourself the FORALL loops (for vector instructions) or the INDEPENDENT 
directive (for DOALL loops in which iterations run independently).

According to your experience, DEC didn't implement (yet) the 
INDEPENDENT directive. This is clearly what you need.

However, there exist parallelizing compilers, e.g. the FortranMP 
compiler of Sun Microsystems. There you can specify C$pardoall
as a synonym for INDEPENDENT. Also, the compiler is able to recognize 
simple forms of parallel loops. I ran yor code through that compiler
(after recoding it in f77) and found that your loop is automatically 
parallelized by FortranMP, provided the WRITE statements are removed.

Hope this helps.

Erik D'Hollander

RUG - Dept. of Electrical Engineering
Parallel Information Systems
St.-Pietersnieuwstraat 41
B-9000 Gent, BELGIUM
dhollander@elis.rug.ac.be
---------------------------------------------------------------------------
To (un)subscribe to this list, send mail to hpff-request@cs.rice.edu.  Leave
the subject line blank, and in the body put the line
(un)subscribe <email-address>
---------------------------------------------------------------------------

From owner-hpff  Thu May 16 15:21:18 1996
Received: (from daemon@localhost) by cs.rice.edu (8.7.1/8.7.1) id PAA02401 for hpff-out; Thu, 16 May 1996 15:21:18 -0500 (CDT)
Received: from mail13.digital.com (mail13.digital.com [192.208.46.30]) by cs.rice.edu (8.7.1/8.7.1) with ESMTP id PAA02394 for <hpff@cs.rice.edu>; Thu, 16 May 1996 15:21:10 -0500 (CDT)
Received: from mpsg.hpc.pko.dec.com by mail13.digital.com (8.7.5/UNX 1.2/1.0/WV)
	id QAA28726; Thu, 16 May 1996 16:12:11 -0400 (EDT)
Received: by mpsg.hpc.pko.dec.com; id AA01195; Thu, 16 May 1996 16:19:23 -0400
From: jharris@hpc.pko.dec.com (Jonathan Harris)
Received: by nature.hpc.pko.dec.com; id AA14266; Thu, 16 May 1996 16:11:18 -0400
Date: Thu, 16 May 1996 16:11:18 -0400
Message-Id: <9605162011.AA14266@nature.hpc.pko.dec.com>
To: hpff@cs.rice.edu
Subject: hpff: Re: Early opionions about DEC F90 and HPF in general
Cc: gale@hpc.pko.dec.com, jharris@hpc.pko.dec.com, loveman@msbcs.enet.dec.com,
        offner@hpc.pko.dec.com, wcelmast@hpc.pko.dec.com
Sender: owner-hpff
Precedence: bulk

---------------------------------------------------------------------------
hpff@cs.rice.edu is a mailing list for announcements related to High
Performance Fortran.  Instructions for adding or deleting yourself
from this list appear at the bottom of this message.
---------------------------------------------------------------------------

Jason,

Let me offer some specific comments on your objections, and then some more
general comments on Digital Fortran, the Fortran and HPF languages and
developing parallel programs.

> "[Digital's HPF compiler and ... the HPF language are] ... NOT geared toward
> simple implementation of the most common real-world applications of Fortran."

and

> DETAILS: I got DEC's HPF F90 documentation kit yesterday.  From page
> 6-3 of D.H.P.F.90:
>
> "For new code, HPF is generally useful in the following cases:
> 
>   o Problems utilizing iterative solution methods
>   o Signal processing
>   o Image processing
>   o Modeling
>   o Grid-based problems in general
>   o In general, most problems expressible as operations on large arrays"
> 
> 	Reading that pretty much solidified my very negative opinions
> as to how simple writing my programs ... would be."

The design center for the first version of HPF was to support data parallel
programming.  Nevertheless, even though your program is not data parallel, HPF
provides several methods for efficiently parallelizing your program.  As it
turns out -- for your program -- the simplest way to do this would be to use
the INDEPENDENT directive (however, see comments below about a hazard you would
run into if you did that).  The INDEPENDENT directive is not yet as
well-optimized under Digital Fortran as it should be -- and will be.  Full HPF
is a big language (in part because it includes full Fortran 90); our customers
told us they would prefer we implement full Fortran 90 and the other HPF
features we implemented before finishing support for INDEPENDENT (we have
supported full Fortran 90 since our first release over a year ago). This
probably comes about because of the prevalence, in scientific computing, of
large physical simulations that are modeled on 2- and 3- dimensional grids.  In
an upcoming release, we will be turning our attention towards some of the
features, such as INDEPENDENT, that users such as yourself have recently begun
to request.

> I was thinking that I should be able to add one !HPF$
> INDEPENDENT line to my code.  Ideally, the compiler should be able to
> figure out the rest, since the above case is so simple.

However, where would you suggest inserting the INDEPENDENT directive?  The DO
loop in your program (as written) is NOT independent!  The WRITE statements
within the DO loop are not independent. Executing the WRITEs out of order
changes the semantics of the program.  This does not mean the code is not
parallelizeable, of course; it just means that it's not as simple as inserting
a single INDEPENDENT directive.  The values written in the WRITE statements
could be stored in arrays and written out later.  Only after this would you
have a DO loop that you could safely mark as INDEPENDENT. OR--would you want us
to relax the restriction and allow the WRITEs out of order?  And, this analysis
assumes that the QAGS and PHI subroutines can be executed independently.

> If I was REALLY young and willing to do this, I wouldn't be using F90; I'd
> probably be using C.

There is no widely-available parallel C for distributed-memory machines.
Further, the C language provides no facilities to allow you to exploit
parallelism.  If you are interested in shared-memory machines, you can use
Digital's "KAP C for Digital Unix" preprocessor, in conjunction with Digital's
C compiler, to parallelize your C program.  However, you have to write your
program in a Fortran style to get good results with this tool.  Alternatively,
you could always write in C with explicit message passing in PVM or MPI.  If
you really want to program in C, your Alpha running DEC UNIX comes with an
excellent C compiler.

Here are some more general comments on Fortran and developing parallel programs:

Digital Fortran is intended to be Digital's premier language for technical and
scientific computing, supported well across all of Digital's platforms. The
objectives of Digital Fortran include:

    support for the Fortran 90, and soon Fortran 95, standards;
    
    compatibility with de facto standard extensions to Fortran 77 so that
    existing programs will compile correctly and run well with the Digital
    Fortran compiler; and
    
    support for the HPF specification to encourage the objectives of the
    HPF Forum to promulgate de facto standard extensions.

Digital provides tools that support a number of programming models for
technical and scientific programming:

    If you are developing new applications that are data parallel and
    appropriate for the array features of Fortran 90 and HPF, Digital Fortran
    with its HPF features will allow you to program at a reasonably high level
    while, at the same time, obtaining reasonable efficiency and portability
    across shared memory and distributed memory architectures.  At last year's
    Supercomputing'95, we demonstrated HPF programs running on a 96-processor
    cluster composed of eight 12-processor shared memory systems
    interconnected.  As you observed, we clearly stated what kinds of
    computations our HPF extensions are tailored for.
    
    If you have existing Fortran 77 code written in a style that allows
    for automatic parallelization, you have several alternatives:
    
        You can use Digital Fortran to obtain good performance on a single
	processor Alpha, relying on Alpha's speed and the quality of the code
	generated by the compiler to provide the performance you need.
	
	You can use the KAP preprocessor to compile your code for one of
	Digital's shared memory systems, which can contain up to 12 Alpha
	processors. KAP is the tool used by most vendors to provide shared
	memory parallelization.
	
	You can use PSR's VAST or APR's products to translate your Fortran 77
	into Fortran 90 with HPF directives and use Digital Fortran 90 to
	provide shared memory or distributed memory parallelization.
	
	You can hand parallelize your code to use DECThreads, Digital's
	implementation of Posix threads, for Digital's shared memory systems.
	
	You can hand parallelize your code to use message passing, using
	either PVM or MPI, both of which are available for Alpha.

The intention of the HPF Forum is to come to consensus on extensions to Fortran
for high performance technical and scientific programming.  The key is the idea
of consensus; the objective is to raise the level of language commonalty
between vendors and not to solve all possible programming problems. HPF has
been successful - a number of language features from HPF 1.1 are being
incorporated into the ISO and ANSI standard definition of Fortran.

We welcome comments by e-mail, fax, reader comment card, or letter about
Digital Fortran. Such comment by users helps us identify areas for improvement.
Fortran 90 and HPF are powerful new tools - it takes time for users to
understand the new and powerful capabilities available to them, and it takes
time for compiler developers to implement these capabilities both correctly and
efficiently.

		The HPF Team at Digital Equipment Corporation
		          fortran@digital.com
---------------------------------------------------------------------------
To (un)subscribe to this list, send mail to hpff-request@cs.rice.edu.  Leave
the subject line blank, and in the body put the line
(un)subscribe <email-address>
---------------------------------------------------------------------------