From owner-hpff-core  Wed Jan  3 11:17:46 1996
Received: (from daemon@localhost) by cs.rice.edu (8.7.1/8.7.1) id LAA14181 for hpff-core-out; Wed, 3 Jan 1996 11:17:46 -0600 (CST)
Received: from [128.42.1.241] (tanete.cs.rice.edu [128.42.1.241]) by cs.rice.edu (8.7.1/8.7.1) with SMTP id LAA14176; Wed, 3 Jan 1996 11:17:41 -0600 (CST)
Date: Wed, 3 Jan 1996 11:17:41 -0600 (CST)
X-Sender: tlc@cs.rice.edu
Message-Id: <v02110116ad0f189aa24b@[128.42.1.241]>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
X-Priority: 1 (Highest)
To: hpff-core
From: tlc@rice.edu (Theresa Chatman)
Subject: hpff-core: PLEASE READ - details for the January 9-12 HPFF meeting 
Cc: tlc, sfink@cs.ucsd.edu, nnemer@mcs213k.cs.umr.edu, hatchell@cs.yale.edu,
        butler@rice.edu
Sender: owner-hpff-core
Precedence: bulk

---------------------------------------------------------------------------
hpff-core@cs.rice.edu is a mailing list for announcements related to High
Performance Fortran.  Instructions for adding or deleting yourself
from this list appear at the bottom of this message.
---------------------------------------------------------------------------
To the HPFF core group:

The next HPFF meeting will be held on January 9-12, 1996.  The meeting will
be held at the Wyndham Greenspoint Hotel, 12400 Greenspoint Drive, Houston,
TX, 77060-1998, 713-875-2222, 713-875-1652 fax.

The registration fee for this meeting will be $250, and will be collected
on site.  Please make your checks and POs payable to Rice University (no
credit cards or cash, please).   Since the registration fee notice is late,
for those of you who have to submit check requests in advance to your
institution, please let me know.  I am willing to work with you on having
it mailed to me.

If you have documents that you would like to have distributed at the
meeting, please send them to me no later than 4pm CST on Friday, January 6.
Please send the file in Postscript to tlc@rice.edu AND
crpc.tr@cs.rice.edu.

In order to adequately prepare for the meeting, I will need to know if you
plan to attend, so please RSVP to tlc@rice.edu.  If you are a vegetarian or
have any other special needs, I will need to know this as well.

At this point, we only have 12 reservations under our room block.  They
are:  Boland, Choudhary, Fink (Baden's student), Mehrotra, Meltzer, Munroe,
Offner, Saltz, Schreiber, Schuster, Williamson, and Zosel.  If you plan to
attend and your name is not on this list, please let me know ASAP (along
with the name of the hotel in which you are staying, if you are not going
to stay at the Wyndham).

If you are still in need of a reservation, call 713-875-2222 no later than
midnight on Thursday, January 4, 1996.  Be sure to refer to the HPFF
meeting (this is important - we have to meet a certain quota to avoid
meeting room rental fees).  Our sleeping room rate is $99 for single and
double occupancy.  The hotel sleeping rooms are equipped with dataport
lines, so you will be able to use your portable computer from your room.

The Wyndham Greenspoint Hotel offers complimentary shuttle service to and
from Intercontinental Airport.  Upon arrival at the airport, you can call
for pick up.  The Wyndham has a courtesy phone in the phone bank of the
baggage claim area.  The shuttle hours are 6am to 11pm daily.   When
departing the hotel for the airport, please note that the shuttle leaves
the hotel every 30 minutes.

The format of the meeting follows:

January 9th through January 11th

8am
Continental breakfast

8:30am - 5pm
One room u-shaped for 40 people
2 breakouts for 20 people each

Lunch and an am and pm break will be provided.  Dinner is at your own expense.


January 12th
8am
Continental breakfast

8:30am - noon
One room u-shaped for 40 people
2 breakouts for 20 people each

An am break will be provided.


If you have any questions regarding the logistics for the HPFF meeting,
please let me know.  If you have any questions regarding technical issues,
please direct them to Mary Zosel (zosel@llnl.gov).

Also, for your information, our contact person at the Wyndham Greenspoint
Hotel is
Michael Sanders.

Sincerely,
Theresa Chatman
Manager of Outreach Programs, CRPC
Rice University
MS-41
6100 South Main Street
Houston, TX  77005
713-285-5180 Phone
713-285-5136 Fax
tlc@rice.edu


---------------------------------------------------------------------------
To (un)subscribe to this list, send mail to hpff-core-request@cs.rice.edu.
Leave the subject line blank, and in the body put the line
(un)subscribe <email-address>
---------------------------------------------------------------------------

From owner-hpff-core  Wed Jan  3 11:41:30 1996
Received: (from daemon@localhost) by cs.rice.edu (8.7.1/8.7.1) id LAA15237 for hpff-core-out; Wed, 3 Jan 1996 11:41:30 -0600 (CST)
Received: from [128.42.5.167] (pasyn-39.rice.edu [128.42.5.167]) by cs.rice.edu (8.7.1/8.7.1) with SMTP id LAA15231; Wed, 3 Jan 1996 11:41:20 -0600 (CST)
Date: Wed, 3 Jan 1996 11:41:20 -0600 (CST)
X-Sender: chk@titan.cs.rice.edu
Message-Id: <v01530506ad101a762b7d@[128.42.5.167]>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
X-Priority: 1 (Highest)
To: hpff-core@cs.rice.edu
From: chk@cs.rice.edu (Chuck Koelbel)
Subject: hpff-core: Re: PLEASE READ - details for the January 9-12 HPFF meeting
Cc: tlc@cs.rice.edu, sfink@cs.ucsd.edu, nnemer@mcs213k.cs.umr.edu,
        hatchell@cs.yale.edu, butler@rice.edu
Sender: owner-hpff-core
Precedence: bulk

---------------------------------------------------------------------------
hpff-core@cs.rice.edu is a mailing list for announcements related to High
Performance Fortran.  Instructions for adding or deleting yourself
from this list appear at the bottom of this message.
---------------------------------------------------------------------------
An addition to Theresa's information, applicable to US academics -

Due to the extra costs of this meeting (i.e. extra hotel nights, etc.), we
are increasing the reimbursement amount for the January meeting.  The new
amount is $1000, for this meeting only.  If you (or your advisor, or your
student) have been supported for either of the last two meetings, we will
support you this time as well.  Please confirm to me (chk@cs.rice.edu) and
Theresa (tlc@cs.rice.edu) that you are coming so we can be sure to set up
the accounting.

                                                Chuck


---------------------------------------------------------------------------
To (un)subscribe to this list, send mail to hpff-core-request@cs.rice.edu.
Leave the subject line blank, and in the body put the line
(un)subscribe <email-address>
---------------------------------------------------------------------------

From owner-hpff-core  Thu Jan  4 17:26:10 1996
Received: (from daemon@localhost) by cs.rice.edu (8.7.1/8.7.1) id RAA11419 for hpff-core-out; Thu, 4 Jan 1996 17:26:10 -0600 (CST)
Received: from coral.llnl.gov (coral.llnl.gov [128.115.235.21]) by cs.rice.edu (8.7.1/8.7.1) with ESMTP id RAA11412 for <hpff-core@cs.rice.edu>; Thu, 4 Jan 1996 17:26:07 -0600 (CST)
Message-Id: <199601042326.RAA11412@cs.rice.edu>
Received: by coral.llnl.gov
	(1.38.110.45/16.2) id AA056017968; Thu, 4 Jan 1996 15:26:09 -0800
Date: Thu, 4 Jan 1996 15:26:09 -0800
From: Mary E Zosel <zosel@coral.llnl.gov>
To: hpff-core@cs.rice.edu
Subject: hpff-core: clean proposal texts
Mime-Version: 1.0
Content-Type: text/plain; charset=X-roman8
Content-Transfer-Encoding: 7bit
Sender: owner-hpff-core
Precedence: bulk

---------------------------------------------------------------------------
hpff-core@cs.rice.edu is a mailing list for announcements related to High
Performance Fortran.  Instructions for adding or deleting yourself
from this list appear at the bottom of this message.
---------------------------------------------------------------------------
HPFF-Core Group
I would like to ask any of you who are authors of proposals -
especially those that have passed a second reading - to have
a "clean copy" (e.g. all updates) of the full proposal, as approved
available for the HPFF meeting next week.  It would be useful
to have both paper and electronic forms.  We will be trying
to get some draft document work done - and this will be greatly
facilitated by having complete proposal text. 

There has also been a request that we make these available
on the web for review and comment - we will talk about this.

   -mary-

---------------------------------------------------------------------------
To (un)subscribe to this list, send mail to hpff-core-request@cs.rice.edu.
Leave the subject line blank, and in the body put the line
(un)subscribe <email-address>
---------------------------------------------------------------------------

From owner-hpff-core  Fri Jan  5 07:55:58 1996
Received: (from daemon@localhost) by cs.rice.edu (8.7.1/8.7.1) id HAA22627 for hpff-core-out; Fri, 5 Jan 1996 07:55:58 -0600 (CST)
Received: from theory.tc.cornell.edu (THEORY.TC.CORNELL.EDU [132.236.98.174]) by cs.rice.edu (8.7.1/8.7.1) with SMTP id HAA22622 for <hpff-core@cs.rice.edu>; Fri, 5 Jan 1996 07:55:55 -0600 (CST)
Received: (from presberg@localhost) by theory.tc.cornell.edu (8.6.9/8.6.6) id IAA56864; Fri, 5 Jan 1996 08:55:51 -0500
Date: Fri, 5 Jan 1996 08:55:51 -0500
From: David Presberg <presberg@tc.cornell.edu>
Message-Id: <199601051355.IAA56864@theory.tc.cornell.edu>
To: Mary E Zosel <zosel@coral.llnl.gov>
Subject: re: hpff-core: clean proposal texts
In-Reply-To: <199601042326.RAA11412@cs.rice.edu>
References: <199601042326.RAA11412@cs.rice.edu>
cc: presberg@tc.cornell.edu, hpff-core@cs.rice.edu, bergmark@tc.cornell.edu
Sender: owner-hpff-core
Precedence: bulk

---------------------------------------------------------------------------
hpff-core@cs.rice.edu is a mailing list for announcements related to High
Performance Fortran.  Instructions for adding or deleting yourself
from this list appear at the bottom of this message.
---------------------------------------------------------------------------
RE:
 > There has also been a request that we make these [clean proposal docs] 
 > available
 > on the web for review and comment ...

I "second the motion"!!

-- Pres
---------------------------------------------------------------------------
To (un)subscribe to this list, send mail to hpff-core-request@cs.rice.edu.
Leave the subject line blank, and in the body put the line
(un)subscribe <email-address>
---------------------------------------------------------------------------

From owner-hpff-core  Fri Jan  5 12:44:15 1996
Received: (from daemon@localhost) by cs.rice.edu (8.7.1/8.7.1) id MAA00378 for hpff-core-out; Fri, 5 Jan 1996 12:44:15 -0600 (CST)
Received: from [128.42.5.165] (pasyn-37.rice.edu [128.42.5.165]) by cs.rice.edu (8.7.1/8.7.1) with SMTP id MAA00345 for <hpff-core>; Fri, 5 Jan 1996 12:43:30 -0600 (CST)
Date: Fri, 5 Jan 1996 12:43:30 -0600 (CST)
X-Sender: chk@titan.cs.rice.edu
Message-Id: <v01530505ad12c14460a0@[128.42.5.165]>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
To: hpff-core
From: John Merlin <jhm@ecs.soton.ac.uk> (by way of chk@cs.rice.edu (Chuck Koelbel))
Subject: hpff-core: Thoughts on HPF2 and 2000
Sender: owner-hpff-core
Precedence: bulk

---------------------------------------------------------------------------
hpff-core@cs.rice.edu is a mailing list for announcements related to High
Performance Fortran.  Instructions for adding or deleting yourself
from this list appear at the bottom of this message.
---------------------------------------------------------------------------

Hi,

I recently learnt of the HPF2.0 and HPF2000 proposals, and
thought I'd present some thoughts on them as input for the
impending discussions.

I'll start with the higher denomination standard first.

HPF2000
-------
My first impression was -- Wow!  To define HPF2000 in 1996,
four years early, is certainly a bold and far-sighted move!
I can see the advantage.  It's nice to get things done ahead
of schedule  (at least I imagine so -- I don't speak from personal
experience you understand).

Despite this, I'd like to suggest an alternative idea.  (Sorry to
be a spoilsport.)   That is, don't present this language as a standard
at all, but instead put everything that isn't in HPF2 in one
or more official HPF technical reports as proposals for the
standard to follow HPF2.  This would be in line with the procedure
that I believe has been adopted for Fortran evolution by the ANSI
Fortran committee.  Roughly speaking (Jerry will have the full
picture)  the idea is to use official TRs for features that if
put into F96 would unduly delay its publication and implementation,
but which are felt to be too important to forget altogether until
F2000.  This way vendors have guidelines on useful extensions
to F96, and they can be officially standardised in F2000 with
the benefit of experience.

Similarly for HPF.  Putting the new features in TRs means
that users and the marketplace can test their usefulness,
efficiency, implementability, technical correctness, etc.
The HPFF need be less concerned about getting every detail
perfect, because implementation and use would iron out any
rough edges.  Then, come 2000, they can be incorporated in an
official HPF2000 standard with the benefit of experience, just
as Fortran and C generally standardise existing practice rather
than defining totally new things.  If any of the proposed features
aren't implemented by 2000, then clearly they aren't needed or are
technically flawed, so they shouldn't go in the standard.
To paraphrase a paraphrase of a well-known saying:
"Premature optimisaton is the root of all evil".  As for the
name of this language, I suggest something like Extended
or Extra HPF, or HPF+, etc -- don't put a date on it.

The argument against this approach is that it's nice to put
everything in a standard so that users can rely on them being
universally available.  However, the name HPF2000 is an open
admission that this won't be the case for another four years.
Given that, the TR approach seems sensible.

HPF2.0
------
I can understand the desire to make the next full HPF standard
some sort of subset of HPF1.1, so that users have something they
can rely on being implemented and universally available.

However, I was alarmed by the suggestion that the offical 'Subset
HPF' would somehow cease to exist.  I don't know how you can
undefine a published standard, even an unofficial one, and it's
unfair on users and implementors alike, and damages credibility,
to shift the goalposts like this.  My chagrin was heightened by
the fact that we've recently announced a public domain implementation
of Subset HPF (see the advert at the end of this message!), and
were proud of the fact that we've implemented the data mapping
features in virtually their full generality.  I didn't expect
all the interesting bits to be eliminated from the next HPF
standard, given that this is the core and most important part of HPF.

Given that HPF2 will effectively be the real standard for
some years to come, so realistically users will have to restrict
themselves to what's in it, I believe it shouldn't be too spartan
and should include at least all of the alignment and distribution
features in Subset HPF.  This also reasonable given that
Subset HPF is what most users and implementors have been
targetting so far.  I'll now try to give an even more cogent
argument for this.

Firstly, I don't think the proposed criterion for including
features in HPF2 (that they "give high performance on all
platforms") is a good one.  It seems like a cop out by vendors.
A better criterion is to include the minimal set of features
necessary to allow programs to execute with the highest
(reasonably achievable) performance.  E.g., although CYCLIC (N)
distributions may be slower than BLOCK in a direct comparison,
there will be applications for which the use of CYCLIC (N) will
give better performance than being forced to use a simple BLOCK
distribution.  The HPF philosophy is that the user has to
experiment to get optimum efficiency;  I doubt if the user will
use relatively inefficient features like strided alignment
unecessarily, so I don't see why we should protect him/her from it.

On the basis of this argument, and the fact that these features
have already been widely implemented so obviously aren't too hard
to provide, I'd like to argue for the re-introduction of
general alignments and distributions.  Some justifications:

-- Alignment offsets and strides are needed to describe the
distribution of an array section when it is passed as an actual
argument into a procedure.  In this way no data movement is
required, and the mapping is still explicitly specified inside
the procedure.  Without this facility the argument would
either have to be remapped on entry and exit, so that it has a simple
mapping that can be described without alignment, or the mapping
would have to be inherited (which is probably also disallowed,
and is certainly less efficient than explicitly specifying the
mapping).  I gave an example of this for Gauss elimination in a
Sci. Prog.  paper about HPF a while back.

-- Alignment axis permutation is needed to efficiently implement
references like:
   FORALL (i=...,j=...) a (i,j) = b (j,i)

Anyway, I don't understand what complication or inefficieny this
facility could introduce.  Can someone explain?

-- Embedded alignment, e.g.:
     ALIGN s WITH a(m,n)
is needed to allow fine-scale control of where data is stored
and thus where computations are performed, and can avoid
unecessary communications.  For example, in Gaussian elimination
with pivotting, intermediate values of the pivot element and value
can be stored and computed just on the processor owning
the column in question, and the final values can then be
broadcast by assigning them to globally replicated scalars.
If one could not embed the variables holdinh the intermediate
values on a particular processor, they would be globally broadcast
at each step of the pivot search, with massive overhead.  Using
an 'ON HOME' directive doesn't solve this problem;  though the
computation can be localised, the intermediate results are still
broadcast if assigned to a globally replicated variable, as they
must be to preserve HPF semantics.  (I got this example from a
paper I reviewed, which found that the embedding solution gave
immensely better performance.)

-- '*'s in the alignment are needed to specify partial replication,
e.g. replicating a vector over dimension 1 of a 2d processor array
and distributing it over dim 2.  Surely there is no debate over
the usefulness of this.

Thus, I argue for keeping Subset HPF's general alignment and
distribution features in HPF2.

I have just two other comments on features for inclusion in
HPF2:

-- Why is it necessary for HPF2's base language to be full F90
(or F96), rather than the F90 subset selected for Subset HPF?
True, I expect most large vendors will wish to provide full F96
along with HPF, but they could do this anyway
even if HPF2 only required Subset F90.  (Perhaps they could
distinguish that their HPF implementation supports full F96 by
calling it HPF96 rather than just HPF2, for instance).

The problem I percieve with requiring full F96 as a basis
is that it tends to exclude small-scale research and academic
efforts, which may well provide public domain implementations
(e.g. shpf and ADAPTOR), from being legitamately calling
themselves HPF.  I think public domain implementations can
greatly help to prime the market for HPF, by providing free
development and testing systems to get people interested, and
for education purposes, etc.  Thus I think the HPFF should
support and facilitate such efforts rather than putting big
hurdles in the way.

Also, some vendors may wish to expend more effort on the HPF
features important for performance (like general data mapping
support!) rather than nice but relatively unimportant F90
features.  Finally, F96 does introduce some extra complications
for HPF, like pointers, arrays-of-arrays, etc.  I'm not
convinced these have been well sorted-out, nor do I think they
are trivial.

-- I agree with the removal of sequence and storage association,
and the SEQUENCE directive.  This is one area where Subset HPF
provided features that were guaranteed to give low performance,
and were extremely hard to implement in full generality.

Also, of course, I support the introduction of PURE procedures
to HPF2!

-- Finally, since we seem to be in the business of tweaking and
correcting the last HPF standard, I'd like to plead again
for common blocks to be disallowed inside pure procedures.
(Please, don't sigh so loud Chuck! :-)).  In fact they only
cause problems if they contain distributed data (much like
sequence and storage association), so could be allowed
otherwise, but for regularity it's probably prefereble to
disallow them altogether, in HPF2 at least.

Without common blocks, pure procedures are trivial to implement,
and very useful for getting high efficiency in many
circumstances as they permit functiional parallelism in FORALLs
(or indeed in array syntax, using F96 ELEMENTAL procedures).
However, by allowing them to access external data that may be
stored on a different processor to that executing the pure
procedure, they violate one of the rules of Kernel HPF, namely,
that features should "allow high performance across all platforms",
because they become extremely hard (or perhaps virtually
impossible) to implement on distributed memory architectures,
which is still an important architecural class for HPF
(e.g. workstation networks).  Has anybody actually implemented
pure procedures in their full generality, with global data
accesses allowed, on any DM platforms yet?

Furthermore, I notice that the desire for 'purer than pure'
procedures has been voiced in several contexts at recent HPF
meetings.  Perhaps this is the time to bite the bullet and add
this constraint.

That's all for now, you'll be pleased to know.

Wishing you all a Happy and High Performance New Year,
                               John.


P.S.  And now for that advert I promised.  'shpf', a Subset
High Performance Fortran compilation system, is now freely
available by anonymous ftp from:

   ftp.hpcc.soton.ac.uk  in directory  /pub/packages/shpf.

It supports arbitrary alignments and distributions, in any
number of dimensions, in virtually any context (for what
it's worth!).  Full information is (or shortly will be)
available on the WWW at URL:
   http://www.hpcc.ecs.soton.ac.uk/shpf/shpf.html.

Please spread the good news to your friends and colleagues,
and enjoy it!
-----------------------------------------------------------
John Merlin,                                 jhm@ecs.soton.ac.uk
Dept of Electronics and Computer Science,    tel:  +44 1703 593943
University of Southampton,                   fax:  +44 1703 593045
Southampton, SO17 1BJ, U.K.
-----------------------------------------------------------------


---------------------------------------------------------------------------
To (un)subscribe to this list, send mail to hpff-core-request@cs.rice.edu.
Leave the subject line blank, and in the body put the line
(un)subscribe <email-address>
---------------------------------------------------------------------------

From owner-hpff-core  Fri Jan  5 15:17:35 1996
Received: (from daemon@localhost) by cs.rice.edu (8.7.1/8.7.1) id PAA05833 for hpff-core-out; Fri, 5 Jan 1996 15:17:35 -0600 (CST)
Received: from coral.llnl.gov (coral.llnl.gov [128.115.235.21]) by cs.rice.edu (8.7.1/8.7.1) with ESMTP id PAA05753 for <hpff-core@cs.rice.edu>; Fri, 5 Jan 1996 15:17:27 -0600 (CST)
Message-Id: <199601052117.PAA05753@cs.rice.edu>
Received: by coral.llnl.gov
	(1.38.110.45/16.2) id AA284716649; Fri, 5 Jan 1996 13:17:29 -0800
Date: Fri, 5 Jan 1996 13:17:29 -0800
From: Mary E Zosel <zosel@coral.llnl.gov>
To: hpff-core@cs.rice.edu
Subject: hpff-core: active cci
Mime-Version: 1.0
Content-Type: text/plain; charset=X-roman8
Content-Transfer-Encoding: 7bit
Sender: owner-hpff-core
Precedence: bulk

---------------------------------------------------------------------------
hpff-core@cs.rice.edu is a mailing list for announcements related to High
Performance Fortran.  Instructions for adding or deleting yourself
from this list appear at the bottom of this message.
---------------------------------------------------------------------------
TO HPFF-CORE
Here is a text copy of the active CCI's for next week's meeting. They are sorted
by group, then by item #. I think there are 9 total.  Some are essential already
answered and just need verification that the answer is complete.
Almost all are for group C.  
    -mary-

_____________________________________________________
_____________________________________________________


CCI #18	Defined assignment in FORALL	Group C	Updated: 10/23/95

Henry Zongaro	05/04/95		

Current reply:
	Maybe Resolved.  CHK will provide clarification in HPF 2 document.   It will be a change to the "sequentializatio" of the forall to account for defined assignments.   BUT Henry suggests alternative based on new X3J3 action.
----------
Original question:

Hello,

     I was wondering whether there's not a problem with allowing defined
assignment to appear within a FORALL.  Consider the following example.

         module mod
           integer :: a(3) = (/1,2,3/)
         contains
           pure subroutine def_assign(lhs, rhs)
             integer, intent(inout) :: lhs
             character, intent(in) :: rhs

             lhs = a(ichar(rhs)+1)
           end subroutine def_assign
         end module mod

         program p
           use mod
           interface assignment(=)
             module procedure def_assign
           end interface

           forall (i = 1:2) a(i) = char(i)    ! A sneaky way of passing "i"
                                              ! to def_assign
         end program p

     The rules of forall specify that the right-hand side and the indices of
the left-hand side are evaluated, in any order, prior to assignment, which also
takes place in any order.  In the above example, we have

          a(1) = char(1)
          a(2) = char(2)

as the two defined assignments which take place.  Inside of def_assign, there's
a host-associated reference to a, so what ends up happening is the following:

          a(1) = a(2)
          a(2) = a(3)

     The order in which these assignments occurs affects the result.  The value
of a after the forall statement is executed could be (/2,3,3/) or (/3,3,3/).

     Basically, the problem is that in defined assignment, completely
evaluating the right-hand side for all active combinations does not necessarily
let the compiler precompute everything which might also appear on the left-hand
side.

Thanks,

Henry
------------
	DISCUSSION AT MEETING:

y, Henry, Jerry to circulate proposed wording for this definition.

----
No action from the July meeting was recorded ... was this resolved or not?
---------
Rob's recollection:
Resolved.  CHK will provide clarification in HPF 2 document.   It will be a change to the "sequentializatio" of the forall to account for defined assignments.
------------
Note from Henry

    X3J3 is taking a different approach on CCI 18.  They're trying to prohibit
 references in the procedure that defines the assignment to the variable that
 appears on the left-hand side of the defined assignment.  I believe WG5 will
 decide on this in their November meeting.

     We might want to pick up whatever they decide.  One advantage of the
estriction is that no extra compiler mechanisms are required for this somewhat
obscure case.
_____________________________________________________
_____________________________________________________


CCI #36	Independent and remapping	Group C
Larry Meadows	10/7/95	status: new	

In Section 4.4 of the standard, one of the conditions on INDEPENDENT is
that realignment and redistribution cannot occur, since they may change
the processor storing a particular array element.

I would argue that the same reasoning applies to remapping of arguments
in subroutines called inside of INDEPENDENT DO loops. For example:

!hpf$ distribute a(block)
!hpf$ indepedent
        do i = 1,n
            call sub(a)
        enddo

        subroutine sub(a)
        real a(:)
!hpf$ distribute a(cyclic)
        ...
        return
        end


>From an implementation point of view, remapping of arguments is a
collective operation, just as is realignment or redistribution, so
is difficult to implement inside INDEPENDENT DO loops.

Couple of other points:

- Would be nice to have some examples in 4.4.1 where subroutines were
  called, legally and illegally.

(see cci #37 for other point)


Thanks.

lfm

_____________________________________________________
_____________________________________________________

CCI #37	status of locals in procedures called inside Independent DO
Group C	
Larry Meadows	10/7/95	status: new	

----------original question---------
(continued item from #36 - but separate question)

Couple of other points:

- I seem to recall that we decided that local variables of subroutines
  called inside INDEPENDENT DO loops were automatically NEW (and I
  assume that this doesn't apply to SAVE or COMMON variables). Is this
  documented anywhere?


Thanks.

lfm

_____________________________________________________
_____________________________________________________


CCI #38	Library Coordinates	Group C	
Henry Zongaro	10/30/95	status: new	

---------- original question -------------
     A number of the procedures in the HPF_LOCAL_LIBRARY refer to their
arguments as containing or receiving "coordinates".  It's not clear whether the
values returned or required are supposed to be based on the corresponding lower
bounds of the array (or processor as the case may be) or whether they should be
one-based.
     For example,
           PROGRAM P
             INTERFACE
               EXTRINSIC(HPF_LOCAL) SUBROUTINE SUB(D)
                 INTEGER D(4:)
     !HPF$       PROCESSORS PROC(NUMBER_OF_PROCESSORS())
     !HPF$       DISTRIBUTE D(BLOCK) ONTO PROC
               END SUBROUTINE SUB
             END INTERFACE
             INTEGER A(5:10)

             CALL SUB(A)
           END PROGRAM P

           EXTRINSIC(HPF_LOCAL) SUBROUTINE SUB(D)
             USE HPF_LOCAL_LIBRARY
             INTEGER D(4:), L(1)

             CALL GLOBAL_TO_LOCAL(D, G_INDEX=(/5/), L_INDEX=L)
           END SUBROUTINE SUB
Assuming this program is run on one processor, what should be the value of L
after the call to GLOBAL_TO_LOCAL?  If G_INDEX and L_INDEX are indices that
are based on the actual lower bounds of A and D, respectively, then
G_INDEX=(/5/) would refer to the first element of A, and the corresponding
element of D would be D(4), and so L_INDEX should be (/4/).  If G_INDEX and
L_INDEX are based on one, then G_INDEX=(/5/) would refer to the fifth element
of A (i.e, A(9)), and the corresponding element of D would be D(8), which is
the fifth element of D, and so L_INDEX should be (/5/).

     This shows up in ABSTRACT_TO_PHYSICAL and PHYSICAL_TO_ABSTRACT (the INDEX
argument), LOCAL_TO_GLOBAL and GLOBAL_TO_LOCAL (the G_INDEX and L_INDEX
arguments) and (I think) the results of LOCAL_LINDEX and LOCAL_UINDEX.

     One thing to consider is that the results of GRADE_UP and GRADE_DOWN seem
to be values that can be used to index the array that's been sorted, but the
results of MAXLOC and MINLOC are one-based, so there's a precedent for each.
(Were the results of GRADE_UP and GRADE_DOWN intentionally designed to be
different from those of MAXLOC and MINLOC?)

     It should also be noted that the INDEX argument of ABSTRACT_TO_PHYSICAL
and PHYSICAL_TO_ABSTRACT gives indices in a processor arrangement, but there's
no direct way of finding out the bounds of such a processor arrangement -
HPF_DISTRIBUTION and GLOBAL_DISTRIBUTION will only give you the *shape*.
Thanks,
Henry


_____________________________________________________
_____________________________________________________


CCI #40	loop index and new	Group C	
Henry Zongaro	11/30/95	status: new	

--------original question -------
Hello,

     One of my coworkers has the following question:

     May the index variable of a DO-loop appear in the NEW clause of an
INDEPENDENT directive on that DO?  For example,

         !HPF$ INDEPENDENT, NEW(I)
               DO I = 1, 10
               END DO

If so, what does it mean?  Does it simply mean that the programmer is
guaranteeing that the value of I is not used after the DO-loop?

Thanks,

Henry

_____________________________________________________
_____________________________________________________


CCI #41	interaction of distribute and forall	Group C	
Norton	12/21/95	status: new

--------- original question ---------
Given a data layout:

      real, dimension(25) :: a,b,c
!HPF$ DISTRIBUTE a (*)
!HPF$ DISTRIBUTE b (*)
!HPF$ DISTRIBUTE c (*)


Is the following legal syntax and could an implementor choose to
compute this in parallel even though the arrays have been declared
"serial"?

!HPF$ INDEPENDENT
      do i=1,25
         a(i) = b(i) + c(i)
      enddo

[The motivation for the inquiry - it would be nice if the INDEPENDENT
directive would be honored over the DISTRIBUTE directive to
parallelize execution on an SMP type system if the cost of data motion
on the bus is cheap, but the same code would honor the DISTRIBUTE
directive over the INDEPENDENT directive to run code serially on a
distributed memory system where the cost of data motion over the
network may be expensive.  Realizing that that is a bit much to ask of
implementors, it would be nice to know if there are any thoughts on
a defined behavior for this combination of directives.]

thanks
-dave

-------- DISCUSSION ---------
Chuck has already sent an answer --- group should verify -  if there is any answer, or any action implied.
============= chuck's reply ===========

The quick answer:

Yes, the syntax is legal.
Yes, this is a correct use of INDEPENDENT.
Yes, a compiler could implement INDEPENDENT as you suggest.
No, it is not written in stone that the compiler must implement INDEPENDENT
this way.

A bit of philosophy:

INDEPENDENT makes a statement about side effects of loop iterations (i.e.
it states that there aren't any side effects on other iterations).  There
is language in the specification to the effect that "The compiler can use
this information in any way that it sees fit."  With that in mind, I hope
it's clear that INDEPENDENT and DISTRIBUTE are basically orthogonal
constructs; you can mix and match them as needed.

The implementation you suggest is (IMHO) an excellent compiler heuristic
for handling INDEPENDENT.  I hope that SMP vendors are listening to you.
However, HPF bends over backwards to keep from requiring any particular
implementation.  (Similarly, you won't find many implementation hints in
the F90 standard, or most other language definitions.)  In part, this is
because HPF is supposed to run on many platforms which may need different
implementations.  In part, this is because we don't want HPF to be locked
into an old implementation technique if new research comes along.  And, of
course, it's hard enough to get agreement on a language without debating
"should this be implemented the APR way or the PGI way?"

So, HPF doesn't define a "behavior" for INDEPENDENT, in the sense that it
is legal for a given compiler to ignore the directive (in some cases, or in
all cases).  Or the compiler can use the INDEPENDENT information in ways
you haven't thought of (for example, INDEPENDENT implies you can turn off
cache coherency protocols for data assigned in the loop, if you turn
coherence back on at the right time).  I argue this is what you want for
portable programs, where each platform can treat INDEPENDENT the way that
it needs to.

If you really want/need to define a program's behavior, you may want to
look at the ON HOME directive being discussed in HPF 2.  (I promise, the
revised proposal is coming!)

                                                Chuck

_____________________________________________________
_____________________________________________________


CCI #42	table look-up	Group C	
Norton	1/4/96	status: new	
----------original questions ----------
I'm having a bit of trouble understanding how to do table lookups. Using some information from The Book, the only thing I have been able to come up with is to go to local space to do the actual lookup (under the requirement of only one table/processor):

      program foo
      real, dimension(137,NUMBER_OF_PROCESSORS) :: table
      real, dimension(1024,1024) :: AA, BB

!HPF$ DISTRIBUTE (*,BLOCK(1)) :: table
!HPF$ DISTRIBUTE (BLOCK,BLOCK) :: AA, BB

      interface
         EXTRINSIC(HPF_LOCAL) subroutine lookup (AA,BB,table)
            real, dimension(:,:), intent(INOUT) :: AA
            real, dimension(:,:), intent(IN) :: BB
            real, dimension(:,:), intent(IN)  :: table
!HPF$       DISTRIBUTE (:,:) :: AA,BB,table
         end subroutine lookup
      end interface

      data table/........./
      call lookup (AA,BB,table)
      stop
      end
      EXTRINSIC(HPF_LOCAL) subroutine lookup (AA,BB,table)

            real, dimension(:,:), intent(INOUT) :: AA
            real, dimension(:,:), intent(IN) :: BB
            real, dimension(:,:), intent(IN)  :: table
!HPF$ DISTRIBUTE (:,:) :: AA,BB,table

      integer i,j,k
!HPF$ INDEPENDENT
      do i=lbound(AA,1),ubound(AA,1)            ! 1,ubound(AA,1)
                                                ! should be enough
        do j=lbound(AA,2), ubound(AA,2)
          do k=lbound(table,2),ubound(table,2)  ! hopefully this == k=1,1
                                                ! so it's just here
                                                ! for fun
             AA(i,j) = AA(i,j) + table(BB(i,j),k)
          enddo
        enddo
      enddo

      return
      end
If this (or something close to this - I am temporarily without an HPF compiler) works, under HPF, great.  However, I would really like to find out if there is a way to do the table lookups - one
table/processor - that will also compile under F90 on a single CPU system.  The above example has an HPF library function (NUMBER_OF_PROCESSORS) and the HPF EXTRINSIC syntax.

What might be a nice solution to code this for both HPF and F90 (if it were legal syntax) would be something like:

      program foo
      integer i,j
      real, dimension(137) :: table

!  I don't think this next statement is legal syntax
!  but it would be nice if it were - treated as a comment under
!  F90, but used under HPF.  Yes - I could use something like the
!  -dlines flag to a compiler or cpp, but that depends on the
!  compiler, rather then the language
!HPF$ real, dimension(137,NUMBER_OF_PROCESSORS) :: table_align

      real, dimension(1024,1024) :: AA, BB

!HPF$ DISTRIBUTE (*,BLOCK(1)) :: table_align

!HPF$ ALIGN table(i) WITH table_align(i,*)

!HPF$ DISTRIBUTE (BLOCK,BLOCK) :: AA, BB

! sorry - I can't currently check if this would as intended
!HPF$ INDEPENDENT
      do i=1,1024
        do j=1,1024
          AA(i,j) = AA(i,j) + table(BB(i,j))
        enddo
      enddo
      ......
The short question then is - have I missed some other ways to do table lookups? 

-------- DISCUSSION -----------

ed note --- this is included in cci for completeness --- answers sent are probably sufficient, but subgroup can verify completness, and that there are no additional issues.  
----------chk replies-------
Reading between a few lines, I think this is what you are trying to do:
1. Do many lookup operations in parallel.
2. Use the same (conceptual) table for all the lookups.
   (i.e. the table values are the same on all the lookups)
3. Store a copy of the conceptual table on all processors.
4. Have only one version of the code.

So, I would suggest using ALIGN to replicate the table, and using a
straight INDEPENDENT loop.  Since the replication happens in ALIGN, the
same code should work under both HPF and F90.  I don't have an HPF compiler
handy either, but the following should do it:

     real, dimension(1024,1024) :: AA, BB
      real, dimension(137) :: table
!HPF$ DISTRIBUTE (BLOCK,BLOCK) :: AA, BB
!HPF$ TEMPLATE dummy(NUMBER_OF_PROCESORS())
!HPF$ ALIGN table(i) WITH dummy(*)
!HPF$ DISTRIBUTE dummy(BLOCK)

!HPF$ INDEPENDENT
      do i=1,1024
        do j=1,1024
          AA(i,j) = AA(i,j) + table( truncate(BB(i,j)) )
        enddo
      enddo

I think your second syntax would also work, but this seems a bit simpler.
A good compiler would even do the right thing (i.e. one copy per processor)
if you said

     real, dimension(1024,1024) :: AA, BB
      real, dimension(137) :: table
!HPF$ DISTRIBUTE (BLOCK,BLOCK) :: AA, BB
!HPF$ ALIGN table(i) WITH AA(*,*)

but it's possible that some compilers would create one copy per element of
AA, definitely not the right thing to do.
- about the independent loop:
The loop is definitely dependence-free, so it's correct to say INDEPENDENT.
Most compilers ought to notice this on their own, but you never know.
Whether your current compiler actually executes the INDEPENDENT loop in
parallel is a different question - ask you vendor.
--
   This is really two questions:
1. Are there other ways to do the lookups in parallel?
2. Are there other ways to copy the lookup tables?

In answer to 1., you could get the same effect with INDEPENDENT DO (see
above) and FORALL (with or without an INDEPENDENT).  This might or might
not make a difference in how your compiler handles the program.  The
EXTRINSIC routine you started with works too (assuming all the details are
right, I didn't check), but it seems like overkill- the lookups are
data-parallel, so you'd expect that HPF can handle them well without
resorting to extrinsics.

In answer to 2., ALIGN is the basic way to get replication on all
processors.  Adding a NUMBER_OF_PROCESSORS dimension as you did works, but
risks confusing the compiler (it has to realize which dimension is local).

                                                Chuck
======
adam marshall also replies ... 
Surely you could use a couple of cpp  "#defines" for the F90 version; you
could #define NUMBER_OF_PROCESSORS to be "1" and EXTRINSIC(HPF_LOCAL) to be
a blank space (or no space at all).

Adam Marshall

_____________________________________________________
_____________________________________________________


CCI #29	calling hpf_local from independent loop	Group C & E	10/23/95
Rob Schreiber	8/3/95	status: in progress
-------- original question------------
I have two CCI questions.
Question I.  Can an extrinsic(hpf_local) be invoked in an independent loop?
In a Forall?
     Ex:

Forall (1 = 1:10) a(i) = f(i,a(i))

Note that part of the calling sequence, as specified in Ver 1.1, appendix A, is 
"The processors are synchronized.  In other words, all actions that logically precede the call are completed."

It seems clear that when this was written it was tacitly assumed that the call did not occur in an independent loop or forall.

Part 2:  May ony other kind of extrinsic be called in a forall or independent loop?
================
Discussion begins here:     Summary provided by Rob S. 

Status:   Under discussion.   Summary of the discussion in September:

1.  Only and HPF type routine can be PURE.  Thus, only and HPF_LOCAL
routine could be local, pure, and hence called in a forall.

2.  There is no semantic problem with this, or with invocation of any
extrinsic routine in an independent loop.

3.  An example
        real x(N, 100000)
        !hpf$ distribute x(*, block)

        !hpf$ independent
        do i = 1, N
           call extrinsic_fn(x(i,:))
        enddo
The independent loop is a mechanism for spawning N threads per processor, each independent of the others;  the load per thread may be variable.  It would
possibly be useful here to NOT synchronize after each call to the extrinsic
routine!   Is there any semantic reason to force the synchronization?

4.  There are some very important issues in the implementation, with possible
language impacts.

Let us assume an MPI based implementation of HPF calling an extrinsic
local routine that uses MPI for communication.  Because of the unity of
purpose and of hotel between HPF and MPI, it is arguably necessary for
HPFF to make this work cleanly and efficiently.

Issue MPI-1:   Does HPF handle MPI_Init and MPI_Finalize, automatically.

Issue MPI-2:   Are any of the MPI routines PURE?

Issue MPI-3:   Thread safety.   In a naive implementation, HPF does a barrier
before and after the call to the extrinsic;  but there is no guarantee that
there are no outstanding, nonreceived messages in the messaging system.  Thus, to be safe, any extrinsic routine should use its own communicator.   To prevent
interference between separate calls to the routine, a new communicator should 
becreated for every call.   An obvious way to do this is to call
      MPI_Comm_Dup(MPI_COMM_WORLD, New_comm)
at the beginning of every such extrinsic.    However, an extrinsic
that consumes all its messages would be justified in doing this once, on its
first invocation, and saving the communicator for reuse on later invocations.
   But consider what happens if the extrinsic is called in an independent DO loop
as in the example above, and there is no barrier used.  Now we really need a		(discussion continued)
separate communicator per thread. On the other hand, a call to MPI_Comm_Dup is a collective call, which synchronizes the processes.
     Perhaps this should be done by the calling HPF routine, so that the
MPI_COMM_WORLD communicator is different on every call.

Issue MPI-4:  If called from the range of an ON_HOME directive, what
set of processors does MPI_COMM_WORLD correspond to?   If it corresponds to the subset executing the ON block, then how can the called routine access 
nonresident data?   Should there be a way to access a communicator that 
corresponds to these executing processors, while MPI_COMM_WORLD 
always corresponds to all of the processors?

Issue MPI-5:  If called from separate ON_HOME blocks in the scope of
a TASK directive, with disjoint processors groups, so that the two
ON blocks may be executed concurrently, what communicators correspond to the two processor groups?   (If, in issue 4 above, the answer is that MPI_COMM_WORLD corresponds to the executing subset of the processors, then the answer here is MPI_COMM_WORLD.)

=========== comments from Jim Cownie =================

> Issue MPI-1:   Does HPF handle MPI_Init and MPI_Finalize, automatically.
I would say that the HPF run-time should have called MPI_Init before
any user code has run, therefore user extrinsic functions which need
MPI can just use it.

This is actually not a big deal, since the user routine can always do
use MPI_Initialized() to guard her call to MPI_Init. (Though if this
is done, then the HPF run-time needs to do the same, since MPI_Init
should only be called once.) That's why it's simpler to say that
MPI_Init has already been called before any user HPF code has run.

> Issue MPI-2:   Are any of the MPI routines PURE?
Probably. For instance one could cast reductions as functions which
return the result, and only read the arguments (though why you'd want
to use an MPI reduction extrinsically rather than an HPF one is beyond
me).
 ... after all 5 MPI issues ...
     I would suggest that in all of these cases the HPF run-time should
provide a "current communicator" which includes the set of processes
running the current construct. In some cases this will be
MPI_COMM_WORLD (or a Comm_Dup of COMM_WORLD), in others (ON_HOME, task parallelism, processor subsets) it will represent a subset of the available processes. In MPI MPI_COMM_WORLD is always available as the set of all processes (until MPI-2 introduces a dynamic process model,
though that shouldn't worry HPF implementations).
     Therefore I think
1) MPI_COMM_WORLD is *always* the set of all processes. (This is the
   current MPI view).
2) If you need subsets then you should create new communicators, and
   provide a way for the user code to access them.

MPI_COMM_WORLD should mean the same thing in a routine called from HPF extrinsic as it did in a "raw" MPI program. The HPF extrinsic MPI
environment should contain additions to the raw MPI environment (new
communicators, maybe pre-defined datatypes giving array distributions,
etc), but should not change the meaning of things in the raw MPI
world. In other words you may need to learn more to work in the HPF
extrinsic environment, but you shouldn't have to unlearn things you
already knew about MPI.
-- Jim
=======
ed.  think this was answered, but I didn't get it recorded here ...
_____________________________________________________
_____________________________________________________


CCI #39	Mapping Pointer Restrictions	Group D & E	
Larry Meadows	11/21/95	status: new	
---------- questions--------------
I've noticed an inconsistency in the 1.1 document wrt F90 pointers. You'll
recall CCI 6.3, from last year; this resulted in additional constraints
to the 1.1 document, that alignees and distributees could not have the
POINTER target (p. 27, line 5, and p. 32, line 5). However, later in the
same chapter, there is some text (p. 38, line 33):

A variable with the POINTER or ALLOCATABLE attribute may appear as an alignee
in an ALIGN directive or as a distributee in a DISTRIBUTE directive.

This is clearly a contradiction.

This is also somewhat related to CCI #11 (what is the SEQUENCE attribute
when associated with a pointer).

I'm assuming the following, in HPF 1.1:

1) Mapping statements can't apply to pointers. Note, therefore, that allocated
   pointers cannot be explicitly mapped.

2) Pointers can point to any object, regardless of their mapping attributes.

For HPF 2.0, there was some restriction on pointers that I've forgotten.
It really seems that a restriction similar to the following would be in
the spirit of Kernel HPF:

        Mapping statements can be applied to pointers. These statements
        assert that any object with which the pointer is associated will
        have the described. Pointers with mapping statements may point
        only to whole arrays, not to subarrays.

        Pointers without mapping statements may point only to objects without
        explicit mappings, and may point to subarrays of those objects.

lfm
-------
ed note:   this has both a cci item for group D and kernel input for group E ... but is primarily a group D item.

End of active CCI items for Jan 96 HPFF meeting.
---------------------------------------------------------------------------
To (un)subscribe to this list, send mail to hpff-core-request@cs.rice.edu.
Leave the subject line blank, and in the body put the line
(un)subscribe <email-address>
---------------------------------------------------------------------------

From owner-hpff-core  Tue Jan  9 08:18:14 1996
Received: (from daemon@localhost) by cs.rice.edu (8.7.1/8.7.1) id IAA27397 for hpff-core-out; Tue, 9 Jan 1996 08:18:14 -0600 (CST)
Received: from coral.llnl.gov (coral.llnl.gov [128.115.235.21]) by cs.rice.edu (8.7.1/8.7.1) with ESMTP id IAA27391 for <hpff-core@cs.rice.edu>; Tue, 9 Jan 1996 08:18:09 -0600 (CST)
Message-Id: <199601091418.IAA27391@cs.rice.edu>
Received: by coral.llnl.gov
	(1.38.110.45/16.2) id AA060847093; Tue, 9 Jan 1996 06:18:13 -0800
Date: Tue, 9 Jan 1996 06:18:13 -0800
From: Mary E Zosel <zosel@coral.llnl.gov>
To: hpff-core@cs.rice.edu
Subject: hpff-core: check address
Mime-Version: 1.0
Content-Type: text/plain; charset=X-roman8
Content-Transfer-Encoding: 7bit
Sender: owner-hpff-core
Precedence: bulk

---------------------------------------------------------------------------
hpff-core@cs.rice.edu is a mailing list for announcements related to High
Performance Fortran.  Instructions for adding or deleting yourself
from this list appear at the bottom of this message.
---------------------------------------------------------------------------
Hi ... I want to remind people to use just  zosel@llnl.gov for my
email address --- the old "phoenix" system that I used to use  is
dead - and email is not forwarded.  If you have an old address
squirreled away that has phoenix in it - zap it.
   -mary- 
---------------------------------------------------------------------------
To (un)subscribe to this list, send mail to hpff-core-request@cs.rice.edu.
Leave the subject line blank, and in the body put the line
(un)subscribe <email-address>
---------------------------------------------------------------------------

From owner-hpff-core  Mon Jan 15 18:48:19 1996
Received: (from daemon@localhost) by cs.rice.edu (8.7.1/8.7.1) id SAA18515 for hpff-core-out; Mon, 15 Jan 1996 18:48:19 -0600 (CST)
Received: from moe.rice.edu (moe.rice.edu [128.42.5.4]) by cs.rice.edu (8.7.1/8.7.1) with ESMTP id SAA18498 for <hpff-core@cs.rice.edu>; Mon, 15 Jan 1996 18:48:16 -0600 (CST)
Received: from hplms26.hpl.hp.com (hplms26.hpl.hp.com [15.255.168.31]) by moe.rice.edu (8.7.1/8.7.1) with ESMTP id SAA15656 for <hpff-core@cs.rice.edu>; Mon, 15 Jan 1996 18:20:25 -0600 (CST)
Received: from hplpp3.hpl.hp.com by hplms26.hpl.hp.com with ESMTP
	($Revision: 1.36.108.11 $/15.5+ECS 3.3+HPL1.1S) id AA187351623; Mon, 15 Jan 1996 16:20:24 -0800
Received: by hplpp3.hpl.hp.com
	(1.37.109.14/15.5+ECS 3.3+HPL1.1) id AA093161628; Mon, 15 Jan 1996 16:20:28 -0800
Date: Mon, 15 Jan 1996 16:20:28 -0800
From: Rob Schreiber <schreibr@hplpp3.hpl.hp.com>
Message-Id: <199601160020.AA093161628@hplpp3.hpl.hp.com>
To: hpff-core@cs.rice.edu
Subject: hpff-core: new address
Sender: owner-hpff-core
Precedence: bulk

---------------------------------------------------------------------------
hpff-core@cs.rice.edu is a mailing list for announcements related to High
Performance Fortran.  Instructions for adding or deleting yourself
from this list appear at the bottom of this message.
---------------------------------------------------------------------------
Here are my new address, etc.   Sorry I didnt send this sooner:

___________________________________________________________________

    Rob Schreiber
    HP Labs 3L-5 			phone: 415 857-8156
    Hewlett-Packard Company 		fax:   415 857-8508
    1501 Page Mill Road 		email: schreiber@hpl.hp.com
    Palo Alto, CA 94304-1126
___________________________________________________________________
---------------------------------------------------------------------------
To (un)subscribe to this list, send mail to hpff-core-request@cs.rice.edu.
Leave the subject line blank, and in the body put the line
(un)subscribe <email-address>
---------------------------------------------------------------------------

From owner-hpff-core  Wed Jan 17 14:59:59 1996
Received: (from daemon@localhost) by cs.rice.edu (8.7.1/8.7.1) id OAA17970 for hpff-core-out; Wed, 17 Jan 1996 14:59:59 -0600 (CST)
Received: from [128.42.1.213] (morpheus.cs.rice.edu [128.42.1.213]) by cs.rice.edu (8.7.1/8.7.1) with SMTP id OAA17964 for <hpff-core>; Wed, 17 Jan 1996 14:59:54 -0600 (CST)
X-Sender: chk@titan.cs.rice.edu
Message-Id: <v01530506ad23103f2423@[128.42.1.213]>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Date: Wed, 17 Jan 1996 14:59:56 -0600
To: hpff-core
From: chk@cs.rice.edu (Chuck Koelbel)
Subject: hpff-core: New Proposed Chapter structure
Sender: owner-hpff-core
Precedence: bulk

---------------------------------------------------------------------------
hpff-core@cs.rice.edu is a mailing list for announcements related to High
Performance Fortran.  Instructions for adding or deleting yourself
from this list appear at the bottom of this message.
---------------------------------------------------------------------------
This was discussed Friday, after many people had left.  Most of this is
still open to discussion, certainly until we pick the official Word
Processor of HPFF.  (Maybe we can support the meetings with corporate
sponsorships?  Must investigate...)

Feedback that would be especially welcome:

* After all the course corrections, should Fortran 95 be the base document
for HPF 2.0?  Or should we base on F90?

* Should we revise/expand the HPF Model section?  If so, who will write it?

* Should HPF_LOCAL and HPF_SERIAL be part of the EXTRINSIC chapter, or
stand-alone chapters?

* Do we want a "FORTRAN 77 and HPF" Chapter?  If so, who will write it?

* Are any features clearly in the wrong part?

* Is there a better way to separate Kernel HPF features from HPF 2.0
features?  (My intent is that they appear in separate subsections of the
same chapter.)

                                                Chuck


Proposed Outline for HPF 2.0 Language Specification


Outermost outline level is "Part"
Next is "Chapter"
Then "Section"/"Subsection"/etc.


ZERO. Front Matter
        [note that there is no Roman numeral for "0"]
        Copyright notice [still (c) Rice University]
        Table of Contents
        Acknowledgements
        * HPFF Acknowledgements
        * HPFF 94 Acknowledgements
        * HPFF 2.0 Acknowledgements

I. High Performance Fortran 2.0

   1. Overview
        o Goals and Scope of High Performance Fortran 2.0
             + Goals of HPF 2.0
             + Goals of Kernel HPF
             + Goals of Extended Features
        o Fortran 95
             [*very* brief description of features most important for HPF]
             [Any omitted features (POINTER) are spelled out here]
        o New Features in High Performance Fortran
             + Data Distribution Features
             + Data Parallel Execution Features
             + Intrinsic Functions and Standard Library
             + Extrinsic Procedures
        o Organization of this Document

   2. High Performance Fortran Terms and Concepts
        o Typographic Conventions
             [includes "Rationale", "Advice to XXX", code fragments, etc.]
        o BNF Notation
        o Syntax of Directives
        o Frequently-Used Terms
             + HPF-Conforming
             [Can/should we say "in the kernel" rather than
             "kernel-conforming"?  Might make the kernel sound less like a
             stand-alone language]
             + System Dependent
             [as opposed to the F90 "processor-dependent" term :-)]
             + Mapping Terms
             ["explicitly mapped" and the like appear in several places]
             + Execution Terms
             ["active processor subset" and the like will be needed for
             mapping under an ON clause]
        o The HPF Model
             + Simple Communication Examples
             + Aggregate Communication Examples
             + Interaction of Communication and Parallelism

   3. Data Alignment and Distribution Directives
        o Data Mapping Model
            [Maybe this should be part of the model in Chapter 2?]
        o Syntax of Data Alignment and Distribution Directives
             [This is the top-level syntax, i.e. "ALIGN and DISTRIBUTE are
             kinds of attributes"]
        o DISTRIBUTE Directive
             + General Syntax and Semantics
             + Distribution Patterns in Kernel HPF
             + Additional Distribution Patterns in HPF 2.0
        o ALIGN Directive
             + General Syntax and Semantics
             + Alignment Options in Kernel HPF
             + Additional Distribution Patterns in HPF 2.0
        o Mapping of Allocatable Arrays
        o PROCESSORS Directive
        o TEMPLATE Directive

   4. Alignment, Distribution, and Procedure Interfaces
        [If any of these features are not in Kernel HPF, then there should
        be subsections for "Kernel HPF" and "HPF 2.0" as above]
        o Implicit Interfaces
             [Are only allowed if the explicit mapping of caller and callee
             match.]
        o Explicit Interfaces
             + Prescriptive Mappings [which now subsume descriptive
             mappings]
             + Transcriptive Mappings
        o INHERIT Directive
        o Storage Association
             + Definition and Examples
             + Storage Association Rules
               [Now only one rule: "Thou shalt not map storage-associated
               arrays"]
             + Storage Association Discussion
             + Sequence Directives
        o Argument Passing and Sequence Association
             + Sequence Association Rules
             + Discussion of Sequence Association
             + Examples of Sequence Association

   5. Data Parallel Statements and Directives
        o The FORALL Statement and Construct
             [Only if there are differences from Fortran 95]
        o Pure Procedures
             [Only if there are differences from Fortran 95]
        o The INDEPENDENT Directive
             + General Syntax and Semantics
               [Including the REDUCTION "external" syntax]
             + INDEPENDENT Options in Kernel HPF
             + INDEPENDENT Options in HPF 2.0
             + Examples of INDEPENDENT
             + Visualization of INDEPENDENT Directives

   6. Intrinsic and Library Procedures
        [I believe that all HPF 1.1 library procedures are in Kernel HPF.]
        o Notation
        o System Inquiry Intrinsic Functions
        o Computational Intrinsic Functions
             [Were these adopted into F95?  If so, this section can be
             dropped]
        o Library Procedures
             + Mapping Inquiry Subroutines
             + Bit Manipulation Functions
             + Array Reduction Functions
             + Array Combining Scatter Functions
             + Array Prefix and Suffix Functions
             + Array Sorting Functions
        o Generic Intrinsic and Library Procedures
             + System inquiry intrinsic functions
             + Array location intrinsic functions
             + Mapping inquiry subroutines
             + Bit manipulation functions
             + Array reduction functions
             + Array combining scatter functions
             + Array prefix and suffix functions
             + Array sort functions
        o Specifications of Intrinsic Procedures
             [Several of them]
        o Specifications of Library Procedures
             [Gobs of them]

   7. Input and Output
        [I believe asynchronous I/O was adopted into HPF 2.0.  If not,
        this chapter should be moved into Part II.]
        o Asynchronous I/O Operations

   8. Extrinsic Procedures
        [I believe these are all in Kernel HPF]
        o Overview
        o Definition and Invocation of Extrinsic Procedures
        o Requirements on the Called Extrinsic Procedure

   9. HPF_LOCAL Interface
        [I believe all this is in Kernel HPF]
        [This could be made a section under Chapter 8.]
        o Conventions for Local Subprograms
           + Conventions for Calling Local Subprograms
           + Calling Sequence
           + Information Available to the Local Procedure
        o HPF_LOCAL Restrictions
        o HPF_LOCAL Argument Association
        o HPF_LOCAL Routine Library
        o Example HPF_LOCAL Procedures

   10. HPF_SERIAL Interface
        [I believe all this is in Kernel HPF]
        [This could be made a section under Chapter 8.]
        o Conventions for Uniprocessor Subprograms
           + Calling Sequence
        o HPF_SERIAL Restrictions
        o HPF_SERIAL Runtime Library
        o Example HPF_SERIAL Procedures

   11. Kernel HPF
        o Motivation for a Kernel Language
        o HPF 2.0 Features in Kernel HPF
        o HPF 2.0 Features *not* in Kernel HPF
        o Discussion of Kernel HPF Features
          [i.e. Justify the choices above.]

II. Extended Features for High Performance Fortran

   12. Overview of Extended Features
        o Motivation for Extended Features
        o Relationship to HPF 2.0
        o Extended Features in High Performance Fortran
             + Extended Data Distribution Features
             + Extended Data Parallel Execution Features
             + Extended Intrinsic Functions and Standard Library
             + Extended Extrinsic Procedures

   13. Extended Data Alignment and Distribution
        o Extended DISTRIBUTE Options
             + Indirect Mapping
             + Generalized BLOCK
             + Distribution Ranges
             + Distribution to Processor Arrangement Subsets
             + Shadow Widths
        o Dynamic Data Mapping
             + DYNAMIC Directive
             + REALIGN Directive
             + REDISTRIBUTE Directive
        o Mapping of POINTER Variables
        o Mapping of Derived Type Components

   14. Extended Data Parallel Statements and Directives
        o The ON Directive
             [I can't reconcile the requests for (a) ON in HPF 2.0 and (b)
             no complex distributions (in particular, no ONTO subset) in
             HPF 2.0.  I'm putting all of ON into the "Extended" category
             until this is resolved.]
             + General Syntax and Semantics
             + ON Options in Kernel HPF
             + ON Options in HPF 2.0
        o The TASK Directive
        o The REDUCTION Directive for User-Defined Operations

   15. Extended Intrinsic Functions and Standard Library
        o Generalized TRANSPOSE
        o Tasking Inquiry Functions
             [ON_NUMBER_OF_PROCESSORS(), etc.]

   16. Extended Extrinsic Procedures
        [These could each be a separate chapter.]
        o HPF_SPMD Interface
        o HPF to MPI Interface
        o C Interoperability

III. Appendices
        [Note switch to "lettering" rather than "numbering" chapters]
        A. Syntax Rules
        B. Semantic Restrictions
        C. Glossary
        D. Bibliography


---------------------------------------------------------------------------
To (un)subscribe to this list, send mail to hpff-core-request@cs.rice.edu.
Leave the subject line blank, and in the body put the line
(un)subscribe <email-address>
---------------------------------------------------------------------------

From owner-hpff-core  Fri Jan 26 17:08:33 1996
Received: (from daemon@localhost) by cs.rice.edu (8.7.1/8.7.1) id RAA07955 for hpff-core-out; Fri, 26 Jan 1996 17:08:33 -0600 (CST)
Received: from coral.llnl.gov (coral.llnl.gov [128.115.235.21]) by cs.rice.edu (8.7.1/8.7.1) with ESMTP id RAA07934 for <hpff-core@cs.rice.edu>; Fri, 26 Jan 1996 17:08:20 -0600 (CST)
Message-Id: <199601262308.RAA07934@cs.rice.edu>
Received: by coral.llnl.gov
	(1.38.110.45/16.2) id AA201297702; Fri, 26 Jan 1996 15:08:22 -0800
Date: Fri, 26 Jan 1996 15:08:22 -0800
From: Mary E Zosel <zosel@coral.llnl.gov>
To: hpff-core@cs.rice.edu
Subject: hpff-core: FYI - F95 Comment Response
Mime-Version: 1.0
Content-Type: text/plain; charset=X-roman8
Content-Transfer-Encoding: 7bit
Sender: owner-hpff-core
Precedence: bulk

---------------------------------------------------------------------------
hpff-core@cs.rice.edu is a mailing list for announcements related to High
Performance Fortran.  Instructions for adding or deleting yourself
from this list appear at the bottom of this message.
---------------------------------------------------------------------------
Here is the official response back from X3J3 about the HPFF
input to F95 Public Comment.
   -mz-

===========================

High Performance Fortran Forum
c/o Mary Zosel
Lawrence Livermore Natl. Lab
7000 East Ave.
P.O. Box 808
Livermore, CA   94550


To: High Performance Fortran Forum

Thank you for submitting comments on the public review draft of the Fortran 95 
international standard.  Such comments are extremely important in addressing 
errors of both omission and commission in the proposed standard.

The national and international Fortran standards bodies have considered the 
points in your comment letter and in comments from others worldwide.  The 
following responses address your specific comments:

Missing Features
----------------
Numerical exception handling and interfaces to non-Fortran code are the 
subjects of proposed technical reports scheduled for completion within the 
next two years.  These features would also be included in the next revision of 
the Fortran standard.  Asynchronous I/O  is being considered as a priority 
item for inclusion in the next revision of the standard.

Item #1. Allocatable Arrays in PURE Functions
---------------------------------------------
It was decided not to add a note about allocatable arrays in PURE functions 
having side-effects.  There is the potential for allocation in an instance of 
a PURE function to fail due to insufficient memory, depending on the order of 
invocation of the other instances.  This is a problem and a potential side-
effect, but it is more a case of program interaction with the underlying 
hardware/operating system than a language issue.  There is a similar problem 
with automatic arrays, or even automatic scalar variables, causing a memory 
overflow as the invocation order changes.  There is even the possibility of a 
"time-limit" abort if the invocation order causes "excessive" memory bank 
conflicts.

Item #2. Restrictions on Elemental Procedures
---------------------------------------------
Both an example and a note have been added to this section.

Item #3. Pointer arguments to PURE functions
--------------------------------------------
No change was made.  The intent of the notes was to give a high level summary 
of PURE procedures.  The use of POINTER arguments to PURE functions will be 
unlikely enough that summarizing their constraints would be more confusing 
than helpful.


Thank you again for your comments and your interest in Fortran 
standardization.


Sincerely,


Jerrold L. Wagener
X3J3 chair

cc: X3 Secretariat, X3sec@itic.nw.dc.us


---------------------------------------------------------------------------
To (un)subscribe to this list, send mail to hpff-core-request@cs.rice.edu.
Leave the subject line blank, and in the body put the line
(un)subscribe <email-address>
---------------------------------------------------------------------------

From owner-hpff-core  Mon Jan 29 10:47:07 1996
Received: (from daemon@localhost) by cs.rice.edu (8.7.1/8.7.1) id KAA15282 for hpff-core-out; Mon, 29 Jan 1996 10:47:07 -0600 (CST)
Received: from timbuk.cray.com (root@timbuk.cray.com [128.162.19.7]) by cs.rice.edu (8.7.1/8.7.1) with SMTP id KAA15271 for <hpff-core@cs.rice.edu>; Mon, 29 Jan 1996 10:47:01 -0600 (CST)
Received: from ironwood.cray.com (root@ironwood-fddi.cray.com [128.162.21.36]) by timbuk.cray.com (8.6.12/CRI-gate-8-2.11) with ESMTP id KAA03827 for <hpff-core@cs.rice.edu>; Mon, 29 Jan 1996 10:46:59 -0600
Received: from poplar414 (meltzer@poplar414 [128.162.149.14]) by ironwood.cray.com (8.6.12/CRI-ccm_serv-8-2.8) with SMTP id KAA03828 for <hpff-core@cs.rice.edu>; Mon, 29 Jan 1996 10:46:57 -0600
From: Andy Meltzer <meltzer@cray.com>
Received: by poplar414 (5.x/btd-b3)
          id AA14475; Mon, 29 Jan 1996 10:46:55 -0600
Message-Id: <9601291646.AA14475@poplar414>
Subject: hpff-core: HPF_SPMD doc (latex)
To: hpff-core@cs.rice.edu
Date: Mon, 29 Jan 1996 10:46:54 -0600 (CST)
X-Mailer: ELM [version 2.4 PL24-CRI-b]
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-hpff-core
Precedence: bulk

---------------------------------------------------------------------------
hpff-core@cs.rice.edu is a mailing list for announcements related to High
Performance Fortran.  Instructions for adding or deleting yourself
from this list appear at the bottom of this message.
---------------------------------------------------------------------------

Below is an updated copy of the HPF_SPMD document with
changes based on comments made during its first reading.   

The document is in latex.


					Andy Meltzer

					meltzer@cray.com

#####################

\documentstyle[twoside,11pt]{article}
%\input{psfig}
\pagestyle{myheadings}
%set dimensions of columns, gap between columns, and paragraph indent 
\setlength{\textheight}{8.75in}
\setlength{\textwidth}{5.5in}
\setlength{\footheight}{0.0in}
\setlength{\topmargin}{0.0in}
\setlength{\leftmargin}{0.0in}
\setlength{\headheight}{0.0in}
\setlength{\headsep}{0.25in}
\setlength{\oddsidemargin}{0.0in}
\setlength{\parindent}{1pc}
\setlength{\itemsep}{0.0in}
\setlength{\topsep}{0.0in}
\setlength{\parsep}{0.0in}
\setlength{\parskip}{.10in}

\markboth{Cray Research Inc.}{Cray Research Inc.}

% copied stuff out of art10.sty and modified them to conform to IEEE format.
\makeatletter
% as Latex considers descenders in its calculation of interline spacing,
%to get 12 point spacing for normalsize text, must set it to 10 points 
\def\@normalsize{\@setsize\normalsize{11pt}\xpt\@xpt
\abovedisplayskip 11pt plus2pt minus5pt\belowdisplayskip 
\abovedisplayskip \abovedisplayshortskip \z@ 
plus3pt\belowdisplayshortskip 6pt plus3pt 
minus3pt\let\@listi\@listI}

% need an 12 pt font size for subsection and abstract headings 
\def\subsize{\@setsize\subsize{12pt}\xipt\@xipt}

%make section titles bold and 12 point, 2 blank lines before, 1 after 
\def\section{\@startsection {section}{1}{\z@}{1.0ex plus 
1ex minus .2ex}{.2ex plus .2ex}{\large\bf}}

%make subsection titles bold and 11 point, 1 blank line before, 1 after 
\def\subsection{\@startsection {subsection}{2}{\z@}{.2ex 
plus 1ex} {.2ex plus .2ex}{\subsize\bf}}

\makeatother
\begin{document}
% don't want date printed
\date{}
%make title bold and 14 pt font (Latex default is non- bold, 16pt) 

\title{\Large\bf HPF\_SPMD}

\author{
   Andy Meltzer \\
   Cray Research, Inc. \\
   655F Lone Oak Drive \\
   Eagan, MN 55121
}
\maketitle

% I don't know why I have to reset thispagestyle, but 
% otherwise get page numbers 
\thispagestyle{empty}

\subsection*{\centering Abstract}
% IEEE allows italicized abstract
{\em
HPF\_SPMD is a hybrid language, combining an SPMD execution model with 
HPF Kernel features.
The model combines the multi-threaded execution of HPF\_LOCAL and the
HPF Kernel syntax and features.  The goal of HPF\_SPMD is to 
attain the potential performance of an SPMD programming model with 
access to HPF features and a well-defined extrinsic interface to HPF.
}

\section{Introduction\label{sec:intro}}

HPF\_SPMD is a hybrid language, combining an SPMD execution 
model with HPF Kernel features.
The model combines the multi-threaded execution of HPF\_LOCAL and the
HPF Kernel syntax and features.  The goal of HPF\_SPMD is to 
attain the potential performance of an SPMD programming model with 
access to HPF features and a well-defined extrinsic interface to HPF.
It is built on top of the HPF\_LOCAL extrinsic environment.
This language is based off of the current definition of the HPF Kernel and 
should change as the HPF Kernel changes.  

SPMD features and a multi-threaded model allow the user to take
advantage of the performance and opportunity for low level access of 
a more general purpose programming model.  Including the HPF Kernel data
distribution features gives the programmer access to the highest 
performing aspects of both models with the penalty of a somewhat 
more complex model.
HPF\_SPMD is not appropriate for all platforms, but is consistent with HPF
and easily targeted for platforms that have HPF and can support 
SPMD programming styles.

The syntax of HPF\_SPMD is a superset of the syntax of the HPF Kernel and the
extrinsic language's semantics are very similar to those of HPF. There are 
some differences, however. For example,
I/O causes differences; in HPF\_SPMD different
processors are allowed to read from different files at the same time, in HPF
the processors must all read from the same file.   The differences in the
models are principally caused by the multi-threaded execution model and
the introduction of HPF\_LOCAL data rules.

HPF\_SPMD allows for the notion of {\em private data}.  Data defaults
to a mapping in which data items are allocated so that each processor
has a unique copy.
The values of the individual
data items and the flow of control may vary from processor to
processor within HPF\_SPMD. This behavior is consistent with the behavior
of HPF\_LOCAL.  In HPF\_SPMD a processor
may be individually named and code executed based upon which processor
it is executing on.


\section{Execution Model\label{sec:exec-model}}

HPF\_SPMD is built upon the fundamental execution model of HPF\_LOCAL,
augmented with data mapping and work distribution features from the HPF Kernel.
It is also augmented with many explicit low-level control features,
some taken from Cray Research's CRAFT language.

In HPF\_SPMD there is a single task on each processor and 
all tasks begin executing in parallel, with data defaulting
to a private distribution, the same default distribution used in HPF\_LOCAL.
Each processor gets a copy of the data storage unless specified otherwise by 
the user.
Consequently I/O works identically to I/O in HPF\_LOCAL and 
message passing libraries are easily integrated.

In short, the execution model is that of HPF\_LOCAL.

To provide correct behavior when explicitly mapped data is involved, 
this model defines barrier points at which
conceptually all processors must stop and wait for the execution of all
other processors before they continue.  These barriers are an additional
constraint when compared to an HPF\_LOCAL program, but are only a small
subset of the implicit barriers in the comparable HPF 2.0 program.
An implementation may remove 
many of these barriers where they are deemed unnecessary, but EVERY 
processor must participate in the barriers at each one of these points.

In the following situations, the compiler automatically inserts implicit
barriers into a program:

\begin{itemize}
\item       At the end of many independent loops to ensure correctness
            when one processor may get ahead of others.
\item       When mapped stack variables are allocated; this includes
            when a subroutine is called the remaps data.
\item       At a {\tt SERIAL} or {\tt END SERIAL} directive.
\item       When the {\tt SYMMETRIC} directive is used (this directive
            is described in the ``Other Features" section.)
\item	    At some array syntax statements to ensure correctness.
\end{itemize}

{\em Question: Should an HPF\_SPMD subprogram have access to global HPF data?}

\section{Data Mapping Features}

Data mapping feature syntax is identical to that
in the HPF Kernel.  The semantics of the data mapping directives is
also identical.  

The only difference (as mentioned above) is that
the default distribution is private so that values of the 
same data item on different 
processors may vary.  This is consistent with the HPF\_LOCAL 
interpretation of the data declaration.

When data is explicitly mapped, only one copy of the data storage
is created unless the explicit mapping directs otherwise.  The
value of explicitly mapped replicated data items must be consistent
between processors as is the case in the HPF Kernel.  

A new directive is suggested for completeness: {\tt PE\_PRIVATE}, which
specifies that the data should conform to the default behavior.


\section{Subprogram Interfaces}

\subsection{Calling an HPF\_SPMD Subprogram from HPF\_SPMD}

The behavior and requirements of an HPF\_SPMD program at subprogram 
interfaces is identical to that of the HPF Kernel for dummy arguments that
are explicitly mapped.  

All processors must co-operate in a subprogram
invocation that remaps or explicitly maps data.  In other words, if
an explicit interface is required (by the the HPF Kernel rules) or the 
subprogram declares explicitly mapped data, the subprogram must be called on
all processors.  Processors need not co-operate if there are only 
reads to non-local data.  The {\tt INHERIT} attribute can only be
applied to explicitly mapped data.

The behavior of an HPF\_SPMD subprogram at subprogram interfaces
is identical to that of HPF\_LOCAL for data that has the default private
mapping.  Data is passed individually on every processor and the
processors need not interact in any way.   All HPF arrays are logically
carved up into pieces; the HPF\_SPMD procedure executing on a 
particular physical processor sees an array containing just those
elements of the global array that are mapped to that physical 
processor.

When a subroutine is passed actual arguments that are a combination
of both explicitly mapped data and private data, the explicitly mapped
data follows the HPF rules and the private data follows the HPF\_LOCAL
rules.

The user also has the option of passing data with explicitly
mapped actual arguments to dummy arguments that are not explicitly 
mapped.  The mapping rules for this data are identical to the mapping
rules when HPF calls an HPF\_LOCAL routine.  The data remains ``in-place".

Finally, it is undefined for an actual argument to be private and
the dummy argument to be explicitly mapped.   A definition could 
be supplied for this interaction, but it is the same solution that 
one might propose for a calling sequence when HPF\_LOCAL routines call 
HPF routines.  

\subsection{Calling an HPF\_SPMD Subprogram from HPF 2.0}

The calling convention and argument passing rules for HPF\_SPMD are
a hybrid of those for HPF 2.0 calling HPF\_LOCAL and HPF 2.0 calling
HPF 2.0.  Explicit interfaces are required.  Where dummy arguments
are private (default) storage, the HPF calling HPF\_LOCAL 
conventions are used.  Where dummy arguments are explicitly mapped,
the calling convention matches HPF calling HPF.  

There are a number of constraints on the HPF\_SPMD subprograms
that may be called from HPF.  The following is a list of the restrictions
placed HPF\_SPMD subprograms called from HPF:
\begin{itemize}
\item Recursive subprograms cannot be called from HPF.
\item Subprograms containing alternate returns cannot be called from HPF.
\item An HPF\_SPMD routine may not be invoked directly or
      indirectly from within the body of a {\tt FORALL} construct or
      in the body of an {\tt INDEPENDENT DO} loop inside an HPF program.
\item Scalar dummy arguments in a routine called by HPF must be mapped 
      so that each processor has a copy of the argument.
\item The attributes (type, kind, rank, optional, intent) of the dummy
      arguments in a routine called by HPF
      must match the attributes of the corresponding dummy 
      arguments in the explicit interface.  A dummy argument of an HPF\_SPMD
      routine may not be a procedure name.
\item A dummy argument of an HPF\_SPMD routine called by HPF may not be
      a procedure name.
\item A dummy argument of an HPF\_SPMD routine called by HPF may not have
      the {\tt POINTER} attribute.
\item A dummy argument of an HPF\_SPMD routine called by HPF must be
      non-sequential.
\item A dummy argument of an HPF\_SPMD routine called by HPF must have
      assumed shape even when it is explicit shape in the interface.
\item The default mapping of scalar dummy arguments and of scalar function
      results when an HPF program calls an HPF\_SPMD routine is that it
      is replicated on each processor.
\end{itemize}

\subsubsection{Argument Association}

If a dummy argument of an {\tt EXTRINSIC(HPF\_SPMD)} routine is an
array and the dummy argument of the HPF\_SPMD routine has the default 
private mapping, then the corresponding dummy argument in the 
specification of the HPF\_SPMD procedure must be an array of the same 
rank, type, and type parameters.  When the extrinsic procedure is invoked, 
the dummy argument is associated with the local array that consists of the
subgrid of the global array that is stored locally if it has the
private mapping.  

If the dummy argument of the HPF\_SPMD routine
is explicitly mapped, it must have the same mapping as the dummy argument
of the {\tt EXTRINSIC(HPF\_SPMD)} routine.   Note that this restriction
does not require actual and dummy arguments to match and is no more stringent
than saying that mappings of dummy arguments in interface blocks must 
match those in the actual subprogram.


\subsubsection{Calling Sequence}

The actions detailed below must occur prior to the invocation of the 
SPMD\_HPF procedure on each processor.  These actions are the responsibility
of the compiler and happen automatically.

At the call site the following events occur:
\begin{enumerate}
\item The processors are synchronized.
\item Each actual argument is remapped, if necessary, according to the
      directives in the interface block.  Actual arguments corresponding
      to unmapped scalar dummy arguments are replicated.
\end{enumerate}

At the return of the subprogram, the following events occur:
\begin{enumerate}
\item All processors are synchronized.
\item The original distribution of arguments and results is restored if
      necessary.
\end{enumerate}

If the extrinsic procedure is a function, the the HPF\_SPMD procedure is 
also a function.  If the function result is mapped in the caller, the
function result must:
\begin{itemize}
\item also be explicitly mapped with the same mapping in the extrinsic 
      procedure.
\item or return the local part of the extrinsic function return value.
\end{itemize}
If the extrinsic function is scalar-valued then the implicit mapping
of the return value is replicated.  Thus, all HPF\_SPMD functions must
return the same value.


\subsection{Calling an HPF\_SPMD Subprogram from HPF\_LOCAL}

This document does not define the interface for calling HPF\_SPMD
subprograms from HPF\_LOCAL, since the definition hinges on the
way HPF\_LOCAL data is combined to form explicitly mapped distributions.
Once a calling convention is defined for HPF\_LOCAL calling HPF, this
should be obvious.

One can, however, define a calling convention when dummy arguments
are only comprised of private data.  In this case, the private data
is passed as if to another HPF\_LOCAL routine (in other
words, as if private data is passed to private data within HPF\_SPMD) 
and all is well defined.

\subsection{Calling an HPF Subprogram from HPF\_SPMD}

This document does not define this interface.  However, when passing
explicitly mapped data with an explicit interface the rules could be
defined to be as if an HPF\_SPMD program were calling an HPF\_SPMD
subprogram.  Passing private data would either cause undefined behavior
or need further definition.


\section{Executable Statements}
\subsection{The {\tt INDEPENDENT} directive}

The {\tt INDEPENDENT} directive is part of HPF\_SPMD with the same semantics
as in HPF 2.0.  However, within {\tt INDEPENDENT} loops
the values of private data may vary from processor to processor.

{\tt INDEPENDENT} applied to {\tt FORALL} has identical syntax and 
semantics as in HPF.

\subsection{The {\tt NEW} Clause}

An HPF independent loop optionally may have a {\tt NEW} clause. The {\tt NEW}
clause is not required by HPF\_SPMD for default (not explicitly mapped)
data. In HPF\_SPMD data defaults to
private so values may differ from processor to processor.

Private data has slightly different behavior than
data specified in the {\tt NEW} clause.  
The value of a private datum on each processor can be used beyond a single
iteration of the loop. 
Private data may be used to compute local sums, for example.  
The values of data items named in a {\tt NEW} clause
may not be used beyond a single iteration. The {\tt NEW} clause asserts that
the {\tt INDEPENDENT} directive would be valid if new objects were create for
the variables named in the clause for each iteration of the loop.

The semantics of the {\tt NEW} clause are identical in
HPF\_SPMD and HPF 2.0. The variables named in a {\tt NEW} 
clause apply only to the immediately subsequent loop nest.

The meaning of {\tt INDEPENDENT} when applied to loops with private data
changes slightly with respect to the private data.  The change can be
summarized to say that instead of indicating that iterations have no
dependencies upon one-another, with respect to the private data, iterations 
on different processors have no dependencies upon one-another.


\subsection{{\tt REDUCE}}

The {\tt REDUCE} direcitive has the same syntax and semantics as the
HPF Kernel {\tt REDUCE} directive.  Only explicitly mapped data may
be assigned to in a {\tt REDUCE} directive.


\subsection{Array Syntax}

Array syntax is treated identically in HPF\_SPMD as in HPF 2.0 for 
explicitly mapped objects.   
For private objects the behavior is 
identical to that of HPF\_LOCAL.   When private (default) objects and 
explicitly mapped objects are combined the rules are as follows:

Given:
\begin{tabbing}
---------\=---------\=---\=---\=---\=------------\=\kill
\> {\em result} = {\em rhs$_1$} op$_1$ {\em rhs$_2$} op$_2$ ... op$_m$ 
{\em rhs$_n$}\\
\end{tabbing}
\begin{itemize}
\item If {\em result} is explicitly mapped and all {\em rhs} arrays are 
      explicitly mapped, the work is distributed as in HPF.

\item If {\em result} is private and all {\em rhs} arrays are private the
      computation is done on all processors as an HPF\_LOCAL program
      would do it.

\item If {\em result} is private and all {\em rhs} arrays are explicitly 
      mapped, the 
      work is distributed as in HPF and the values of the results are 
      broadcast to the {\em result} on each processor.

\item If {\em result} is explicitly mapped and NOT all {\em rhs} arrays are 
      explicitly mapped, the results of the operation are undefined.
     
\item If {\em result} is private and some, but not all {\em rhs} arrays are 
      explicitly mapped, the value is computed by on each processor
      and saved to the local {\em result}.

\end{itemize}

For consistency, all processors must participate in any array syntax 
statement in which the value of an explicitly mapped array is modified.


\subsection{{\tt FORALL} Statement and Construct}

The {\tt FORALL} statement is treated exactly as in HPF when data is
explicitly mapped.  When data is private, the {\tt FORALL} is executed
separately on each processor.  Finally, when data in a {\tt FORALL} is
mixed, the rules for array syntax apply.   If any explicitly mapped
data item is modified in a {\em forall-stmt} then arrays in the 
{\em forall-header} must be explicitly mapped.  In a {\tt FORALL}
construct, if any explicitly mapped array is modified, all modified
arrays must be explicitly mapped.


\subsection{{\tt WHERE} Statement}

The syntactic rules for the {\tt WHERE} statement are similar to those
for the {\tt WHERE}.  
The {\tt WHERE} statement is treated exactly as in HPF when data is
explicitly mapped.  When data is private, the {\tt WHERE} is executed
separately on each processor.  Finally, when data in a {\tt FORALL} is
mixed, the rules for array syntax apply.   
If any explicitly mapped
data item is modified in a {\em where-stmt}  then arrays in the 
{\em where-header} must be explicitly mapped.  In a {\t WHERE} statement,
if any explicitly mapped array is modified, all modified
arrays must be explicitly mapped.


\section{Sequence and Storage Association}

Storage and sequence association rules are identical to the HPF Kernel for
explicitly mapped data.  Data that is private follows the rules
for ordinary Fortran 90 sequence and storage association.  This is
consistent with HPF\_LOCAL.

\section{Input and Output}

Private I/O in HPF has sequential semantics, private I/O in HPF\_SPMD
has parallel semantics; in other words, a private read in HPF\_SPMD
requires each processor to read each element of data in a given file, while a
private read in HPF requires a single read by one processor and a broadcast of
that value (where necessary) to all other processors. If the same file is
specified, both languages generate the same results (with great I/O
overhead in the HPF\_SPMD case). HPF\_SPMD allows each processor to open and
read from different files, a feature unavailable to HPF. Private writes
cause many more differences between the two languages, however.
The user must ensure that only one processor writes to a file using some sort
of synchronization in HPF\_SPMD.

\section{Serial Regions}

It is often useful to enter
a region where only one task is executing.  This is particularly 
useful for certain types of I/O.   To facilitate this, two directives
are provided:

\begin{tabbing}
---------\=---------\=---\=---\=---\=------------\=\kill
\> {\tt SERIAL }\\
\> {\tt END SERIAL }\\
\end{tabbing}

In addition, one may optionally attach a {\tt COPY} clause to the 
{\tt END SERIAL} directive which specifies the private 
data items whose
values should be broadcast to all processors.  The syntax of this 
directive is:

\begin{tabbing}
---------\=---------\=---\=---\=---\=------------\=\kill
\> {\tt !HPF\$     } \> {\tt SERIAL }\\
\>\>{\em sequential region} \\
\>\> ... \\
\> {\tt !HPF\$     } \> {\tt END SERIAL [, COPY (} {\em var$_1$}{\tt [} {\em, var$_2$, ... , var$_n$} {\tt ])]} \\
\end{tabbing}

where {\em var} is private data to be copied to the same named
private data on other processsors.

Serial regions provide implicit barrier synchronization points
at both the {\tt SERIAL} and {\tt END SERIAL} directives.  

Serial regions can be nested, but inner directives are ignored.  There
must be a matching, properly nested {\tt END SERIAL} directive for each
{\tt SERIAL} directive.

If a subroutine call occurs within a serial region, the subroutine
executes serially; there is no way to get back to parallel execution
within the subroutine.  All explicitly mapped data is accessible from 
within subroutines called in a serial region, but a subroutine called
from within a serial region cannot declare explicitly mapped data
or remap data.

All processors must participate in the invocation of the serial region.
No branches are allowed into or out of a SERIAL region.

\section{{\tt STOP} and {\tt ABORT}}

Because of the SPMD nature of the HPF\_SPMD routines, the behavior
of these statements must be defined within the context of this extrinsic.
The {\tt STOP} statement stops execution of only the task executing the
statement.  The {\tt ABORT} statement stops execution of all tasks.
If the {\tt STOP} statement is called from a serial region, all tasks
are stopped and the execution is complete.  With respect to the all/one
execution, {\tt EXIT} behaves like {\tt STOP}.


\section{Library and Intrinsic Routines}

\subsection{HPF Local Routine Library}

The HPF\_LOCAL extrinsic environment contains a number of libraries
that are useful for local SPMD programming and a number of libraries
that allow the user to determine global (rather than local) state
information.  These library procedures take as input the name of
a dummy argument and return information on the corresponding global
HPF actual argument.  They may only be invoked by an HPF\_SPMD
procedure that was directly invoked by global HPF code.  They may
be called only for private data.   The libraries reside in a module
called HPF\_LOCAL\_LIBRARY; an HPF\_SPMD routine that calls them should
include the statement
\begin{tabbing}
---------\=---------\=---\=---\=---\=------------\=\kill
\> {\tt USE HPF\_LOCAL\_LIBRARY }\\
\end{tabbing}
or some functionally appropriate variant thereof.


\subsection{HPF Library}

The HPF Library is available to HPF\_SPMD when called with data that is
explicitly mapped and all processors are participating in the call.
In addition, as in HPF\_LOCAL, the entire HPF Library is available for
use with private data.  Mixing private and explicitly mapped data in
calls to the HPF library produces undefined behavior.

\subsection{Parallel Inquiry Intrinsics}

These directives are provided as an extension to HPF.  They provide
information potentially useful to the programmer about the state of
execution in a program.

\begin{tabbing}
---------\=---------\=---\=---\=---\=------------\=\kill
\> {\tt IN\_PARALLEL()} \\
\> {\tt IN\_INDEPENDENT()} \\
\end{tabbing}

\section{Task Identity}

{\tt PROCID} from HPF\_LOCAL is provided.  The physical processors are
identified by an integer in the range of 0 to {\em n-1} where {\em n}
is the value returned by the global HPF\_LIBRARY function 
{\tt NUMBER\_OF\_PROCESSORS}.  Processor identifiers are returned
by {\tt ABSTRACT\_TO\_PHYSICAL}, which establishes the one-to-one
correspondence between the abstract processors of an HPF processors
arrangement and the physical processors.  Also, the local library
function {\tt MY\_PROCESSOR} returns the identifier of the calling
processor.


\section{Synchronization Primitives}

It is suggested that a number of synchronization primitives be provided
since this model can be programmed at a much lower level than HPF 2.0.
These primitives include:

\begin{tabbing}
---------\=---------\=---\=---\=---\=------------\=\kill
\> Barriers (test, set, wait)\\
\> Locks (set, wait, clear)\\
\> Critical Sections \\
\> Events (test, set, wait, clear)\\
\end{tabbing}

These primitives provide full SPMD programming model support to the 
HPF\_SPMD extrinsic environment.


\subsection{Barriers}

The following intrinsics are available to create and use a user defined
program barrier:
\begin{itemize}
\item {\tt SET\_BARRIER()}
\item {\tt WAIT\_BARRIER()}
\item {\tt TEST\_BARRIER()}
\end{itemize}

The {\tt SET\_BARRIER()} intrinsic indicates that the calling task has 
arrived at the barrier.  The {\tt WAIT\_BARRIER()} intrinsic suspends 
execution of the calling task until all of the other tasks have arrived 
at the barrier and called {\tt SET\_BARRIER()}.  The {\tt TEST\_BARRIER()}
intrinsic tests the state of the barrier, returning FALSE if the barrier 
is set and TRUE if all of the tasks have arrived.

In the following example, a barrier is used to make sure that {\em block3}
is not entered by any task until all tasks have completed execution of
{\em block1}.

\begin{tabbing}
---------\=---------\=---\=---\=---\=------------\=\kill
\> {\em block1} \\
\> {\tt CALL SET\_BARRIER()} \\
\> {\em block2} \\
\> {\tt CALL WAIT\_BARRIER()} \\
\> {\em block3} \\
\end{tabbing}

\subsection{Locks}

Locks are used to prevent the simultaneous access of data by multiple
tasks.  The following three intrinsics are used:

\begin{itemize}
\item {\tt SET\_LOCK(}{\em lock}{\tt)} 
\item {\tt CLEAR\_LOCK(}{\em lock}{\tt)} 
\item {\tt TEST\_LOCK(}{\em lock}{\tt)}: 
\end{itemize}

The {\tt SET\_LOCK(}{\em lock}{\tt)} intrinsic sets the shared
value {\em lock} atomically.  If the lock is already set, the task 
that called {\tt SET\_LOCK} is suspended until the lock is cleared by 
another task and then sets it.
The {\tt CLEAR\_LOCK(}{\em lock}{\tt)} intrinsic clears {\em lock}.
After the call {\em lock} is cleared regardless of its state before the call.
The {\tt TEST\_LOCK(}{\em lock}{\tt)} intrinsic tests the value of {\em lock},
returning TRUE if the lock is set when {\tt TEST\_LOCK} is called
and FALSE if the lock is not set when called. {\tt TEST\_LOCK}
sets the lock before returning if it was not previously set (and
returns FALSE.)

\subsection{Critical Sections}

A {\em critical section} prohibits access to a section of code rather
than to a data object.  It is almost identical to a lock, but is 
implemented with a directive.

\begin{itemize}
\item {\tt !HPF\$ CRITICAL} 
\item {\tt !HPF\$ END\_CRITICAL}
\end{itemize}

The {\tt CRITICAL} directive marks the beginning of a code region in 
which only one task can enter at a time.  The {\tt END\_CRITICAL} directive 
marks the end of the critical section.

Every {\tt CRITICAL} directive must have a matching {\tt END\_CRITICAL}
directive in the same program unit.  They can be nested as long as there
are the same number of each directive; the inner directives have no 
effect.  Branching into or out of a critical section is not permitted.

The following example shows how a critical section might be used.

\begin{tabbing}
---------\=---------\=---\=---\=---\=------------\=\kill
\> {\tt ! }{\em Compute} {\tt LOCAL\_SUM} {\em on every task.} \\
\> {\tt !HPF\$} \> {\tt CRITICAL} \\
\>\> {\tt GLOBAL\_SUM = GLOBAL\_SUM + LOCAL\_SUM} \\
\> {\tt !HPF\$} \> {\tt END\_CRITICAL} \\
\end{tabbing}

\subsection{Events}

Events are typically used to record the state of a program's execution
and to communicate that state to another task.  Because they do not set
locks, as do the lock routines described earlier, they cannot easily be
used to enforce serial access of data.  They are suited to work such as
signalling other tasks when a certain value has been located in a search
procedure.  There are four routines needed to perform the event functions.

\begin{itemize}
\item {\tt SET\_EVENT([}{\em event}{\tt ])} 
\item {\tt CLEAR\_EVENT([}{\em event}{\tt ])}
\item {\tt WAIT\_EVENT([}{\em event}{\tt ])}
\item {\tt TEST\_EVENT([}{\em event}{\tt ])}
\end{itemize}

{\em Event} is a shared integer variable.  If this argument is
present then the event routines use the named variable.  If it is not
then it defaults to a compiler generated single event.  Event setting
and clearing are NOT atomic operations, so code should be written carefully
with that in mind.

The {\tt SET\_EVENT} routine
sets or {\em posts} an event; it declares that an action has been
accomplished or a certain point in the program has been reached.  A
task can post an event at any time, whether the state of the event 
is cleared or already posted.
The {\tt CLEAR\_EVENT} routine clears and event.  The {\tt WAIT\_EVENT} 
routine suspends task execution until the specified event occurs.
The {\tt TEST\_EVENT} routine returns the state, either set (TRUE) or 
clear (FALSE) of an event.

\section{Other Useful Features}

We have found a number of other directives to be extremely
useful.  While these are not required by the model, we should 
consider them for inclusion.

\subsection{Barrier Removal}

It is occasionally useful for an advanced programmer to indicate 
to the compilation system where barriers may not be needed (even though the 
compiler might think that they are necessary,
based upon incomplete knowledge.)

\begin{tabbing}
---------\=---------\=---\=---\=---\=------------\=\kill
\> {\tt NO BARRIER} \\
\end{tabbing}

\subsection{Parallelism Specification Directives}

These directives allow a user to assert that a routine will only be
called from within a parallel region, a serial region, or from within
both regions.  Without these directives an implementation might be
required to generate two versions of code for each subroutine, depending
upon implementation strategies.  The directives simply make the 
generated code size smaller and remove a test.

\begin{tabbing}
---------\=---------\=---\=---\=---\=------------\=\kill
\> {\tt PARALLEL\_ONLY} \\
\> {\tt SERIAL\_ONLY} \\
\> {\tt PARALLEL\_AND\_SERIAL} \\
\end{tabbing}

\subsection{{\tt SYMMETRIC}}

{\tt SYMMETRIC} data is private data that is guaranteed to be at the
same storage location on every processor.  The feature is obviously
tied to certain implementations, but does make PUT and GET functionality
much easier to deal with.

\subsection{{\tt ON} clause}

In addition to the version of {\tt INDEPENDENT} available from HPF 2.0,
a new version of {\tt INDEPENDENT} is proposed that incorporates
the {\tt ON} clause and has a 
number of differences to more easily facilitate the use of the {\tt ON}
clause.  If a restricted
version of the current {\tt ON} proposal is adopted for the HPF Kernel,
that proposal should be adopted instead of this one.

{\tt INDEPENDENT} without the {\tt ON} clause is identical 
to the current HPF implementation of {\tt INDEPENDENT}.

The new version of the {\tt INDEPENDENT} directive in HPF\_SPMD may be 
applied to the first of a group of tightly nested loops and may apply 
to more than one of them. 
This more easily facilitates the use of the {\tt ON} clause. 
The current {\tt INDEPENDENT} directive applies only to a single loop nest.  

The {\tt INDEPENDENT} directive is extended so that multiple loop nests can
be named.

The general syntax for these independent loops is as follows:
\begin{tabbing}
---------\=---------\=---\=---\=---\=------------\=\kill
\>{\tt !HPF\$} \> {\tt INDEPENDENT} ($I_1,I_2,\ldots,I_n$) {\tt ON} {\em array-name}($h_1(I_1),h_2(I_2),\ldots,h_n(I_n))$ \\
\>       \> {\tt DO} $I_1 = L_1, U_1, S_1$       \\
\>       \>\> {\tt DO} $I_2 = L_2, U_2, S_2$     \\
\>       \>\>\> {\tt DO} $I_n = L_n, U_n, S_n$ \\
\>       \>\>\>\> $\ldots$                              \\
\>       \>\>\> {\tt END DO}                            \\
\>       \>\> {\tt END DO}                              \\
\>       \> {\tt END DO}
\end{tabbing}


The syntax and semantics of {\tt INDEPENDENT} with the {\tt ON} 
clause are different from its syntax and semantics without the {\tt ON} 
clause. With the
{\tt ON} clause the directive states that there are no cross-processor
dependencies, but there may be dependencies between iterations on a
processor. It also indicates which loop iterations it refers to. 

If the {\tt ON} clause is used, {\tt INDEPENDENT} must be used 
in the multi-line form.

The rules for the  array specified by the {\tt ON}
clause are as follows.  
The iteration space of a {\tt INDEPENDENT} nest must be rectangular.  
That is, the lower loop bound, the upper loop bound, and the step 
expression for each loop indicated by the {\tt INDEPENDENT} induction 
list must be invariant with regard to the {\tt INDEPENDENT} nest.  
Triangular and trapezoidal nests, such as the following, are not allowed:

\begin{tabbing}
---------\=---------\=---\=---\=---\=------------\=\kill
\>{\tt !HPF\$}\>{\tt INDEPENDENT (I, J)  ON A(I,J)  ! Erroneous } \\
\>\>      {\tt DO I = 1, N                ! code} \\
\>\>\>      {\tt DO J = 1, I} \\
\>\>\>      {\tt ...} \\
\end{tabbing}

Each index expression of {\em array-name} in the
{\tt ON} clause (the functions {\em h$_i$} above,) 
must be of the form 

\begin{tabbing}
---------\=---------\=---\=---\=---\=------------\=\kill
\>{\bf [ }{\em a }{\tt * }{\tt loop\_control\_variable }{\tt + }{\bf ] }{\em b}.
\end{tabbing}
Where {\em a} and {\em b} must be integer values; they can be 
expressions, constants,
or variables. The value of {\em a} cannot be equal to 0.  The 
values of {\em a} and {\em b} must be
invariant with regard to the {\tt INDEPENDENT} loop nest.  

For example, specifying {\tt A(I,J,K)} is valid.  Specifying {\tt A(3,I+J,K)} 
is not valid.  Specifying {\tt A(I,I,K)} is not valid because I appears twice.

Division is prohibited in any index expression of the {\tt ON} clause.  
For example, specifying {\tt A(I/2,J,K)} is not valid.


\subsection{{\tt RESIDENT}}

The {\tt RESIDENT} directive can be applied to loops and at the
subroutine level.  It is an assertion that the accesses to a particular
variable in the subroutine (or loop) are only accesses to data that is
local to the processor making the assertion.  For example:

\begin{tabbing}
---------\=---------\=---\=---\=---\=------------\=\kill
\> \> {\tt REAL A(100), B(100)} \\
\> {\tt !HPF\$     } \> {\tt DISTRIBUTE A(BLOCK), B(BLOCK) }\\
\> {\tt !HPF\$     } \> {\tt RESIDENT A, B }\\
\end{tabbing}

indicates that only local elements of arrays {\tt A} and {\tt B} will
be accessed within the subroutine.  
Note that this is an assertion about the behavior of a program and
not a directive to make it so.

\subsection{{\tt GEOMETRY}}

The {\tt GEOMETRY} directive is like a mapping typedef, allowing the
user to conveniently change the mappings of many arrays at the same
time.  It is similar in many ways to the {\tt TEMPLATE} directive, but
since it is bound to no particular extent it is easier to apply in a
general way.  Users of CRAFT tend to rely heavily on this feature to quickly
distribute a set of arrays similarly.

The syntax of the {\tt GEOMETRY} directive is:

\begin{tabbing}
---------\=---------\=---\=---\=---\=------------\=\kill
\> {\tt !HPF\$     } \> {\tt GEOMETRY} {\em geom}{\tt(}{\em$\delta_1$} {\tt[}{\em , $\delta_2$, ..., $\delta_n$}{\tt])} \\
\> {\tt !HPF\$     } \> {\tt DISTRIBUTE} {\em geom} [::] {\em var$_1${\tt[}, var$_2$, ... , var$_m$}{\tt]} \\
\end{tabbing}

Where $\delta_i$ indicates one of the allowable distribution formats.

\section{New Features of HPF\_SPMD}

HPF\_SPMD starts with the HPF\_LOCAL extrinsic environment then adds all
of the HPF Kernel.   This section lists the new features of HPF\_SPMD.  

\begin{itemize}
\item Suggested changes to {\tt INDEPENDENT} to better support {\tt ON}.
\item New rules defining the interaction of explicitly mapped and private
      data.
\item Parallel inquiry intrinsics:
  \begin{itemize}
  \item {\tt IN\_PARALLEL()}
  \item {\tt IN\_INDEPENDENT()}
  \end{itemize}
\item Serial regions ({\tt SERIAL / END SERIAL})
\item Explicit synchronization primitives:
  \begin{itemize}
  \item Locks (set, wait, clear)
  \item Critical Sections 
  \item Events (test, set, wait, clear)
  \item Barriers (test, set, wait)
  \end{itemize}
\item {\tt PE\_PRIVATE} directive to specify default data mapping behavior
\item Other suggested features:
  \begin{itemize}
  \item {\tt PARALLEL\_ONLY} 
  \item {\tt SERIAL\_ONLY} 
  \item {\tt PARALLEL\_AND\_SERIAL} 
  \item {\tt RESIDENT} 
  \item {\tt SYMMETRIC} 
  \item {\tt GEOMETRY}
  \end{itemize}
\end{itemize}


\end{document}
---------------------------------------------------------------------------
To (un)subscribe to this list, send mail to hpff-core-request@cs.rice.edu.
Leave the subject line blank, and in the body put the line
(un)subscribe <email-address>
---------------------------------------------------------------------------