From owner-hpff-doc  Mon Jul 15 16:04:34 1996
Received: (from daemon@localhost) by cs.rice.edu (8.7.1/8.7.1) id QAA22802 for hpff-doc-out; Mon, 15 Jul 1996 16:04:34 -0500 (CDT)
Received: from coral.llnl.gov (coral.llnl.gov [134.9.1.2]) by cs.rice.edu (8.7.1/8.7.1) with ESMTP id QAA22793 for <hpff-doc@cs.rice.edu>; Mon, 15 Jul 1996 16:04:31 -0500 (CDT)
Received: (from zosel@localhost) by coral.llnl.gov (8.7.5/8.7.3/LLNL-Jun96) id OAA26289 for hpff-doc@cs.rice.edu; Mon, 15 Jul 1996 14:04:29 -0700 (PDT)
Date: Mon, 15 Jul 1996 14:04:29 -0700 (PDT)
From: Mary E Zosel <zosel@coral.llnl.gov>
Message-Id: <199607152104.OAA26289@coral.llnl.gov>
To: hpff-doc@cs.rice.edu
Subject: hpff-doc: phone/fax
Mime-Version: 1.0
Content-Type: text/plain; charset=X-roman8
Content-Transfer-Encoding: 7bit
Sender: owner-hpff-doc
Precedence: bulk

---------------------------------------------------------------------------
hpff-doc@cs.rice.edu is a mailing list for HPF 2.0 language specification
authors and editors.  Instructions for adding or deleting yourself from this
list appear at the bottom of this message.
---------------------------------------------------------------------------
Here's the list with phone and fax and email for use in exchanging
document comments.  Have at it.
   -mary-

-------
Robert Babb	U. of Denver		babb@cs.du.edu
	303 871 2460  /  3010
Siegfried Benkner	Univ. of Vienna	sigi@par.univie.ac.at
	(+43) 1 3105608-76  /  (fax)
Bob Boland	LANL			wrb@lanl.gov
	505-667-1729   / 665-6333
Ken Kennedy	Rice U./CRPC		kwk@rice.edu
	713-794-0358   / 0374
Charles Koelbel	Rice U.			chk@cs.rice.edu
	713 - 285 - 5304  / 5136
David Loveman	Digital			loveman@msbcs.enet.dec.com
	508 - 493-3704  / 3608
Larry Meadows	The Portland Group	lfm@pgroup.com
	503-682-2806 / 2637
Piyush Mehrotra	ICASE			pm@icase.edu
	757-864-2188 / 6134
Andy Meltzer	CRI/SGI			meltzer@cray.com
	612-683-5266 / tbd
Carol Munroe	Thinking Machines	munroe@think.com
	617-276-0400 ext 5512 / tbd 
Carl Offner	Digital			offner@hpc.pko.dec.com
	508 493-3051  / tbd
P. Sadayappan	Ohio State University	saday@cis.ohio-state.edu
	614-292-0053 / 2911
Rob Schreiber	HP			schreiber@hplpp3.hpl.hp.com
	415-857-8156 / 8508
Jaspal Subhlok	Carnegie Mellon		jass@cs.cmu.edu
	412 268-7893 / 5576
Arun Venkatachar	Louisiana St. Univ.	arun@ee.lsu.edu
	504 - 383-0424 / 5200
Henry Zongaro	IBM Canada		zongaro@vnet.ibm.com
	416 448 - 6044 / 4414
Mary Zosel	LLNL			zosel@llnl.gov	
	510 - 422 - 4002 / 423-8704
---------------------------------------------------------------------------
To (un)subscribe to this list, send mail to hpff-doc-request@cs.rice.edu.
Leave the subject line blank, and in the body put the line
(un)subscribe <email-address>
---------------------------------------------------------------------------

From owner-hpff-doc  Wed Jul 17 11:43:17 1996
Received: (from daemon@localhost) by cs.rice.edu (8.7.1/8.7.1) id LAA11841 for hpff-doc-out; Wed, 17 Jul 1996 11:43:17 -0500 (CDT)
Received: from mail.think.com (Mail1.Think.COM [131.239.33.245]) by cs.rice.edu (8.7.1/8.7.1) with ESMTP id LAA11831 for <hpff-doc@cs.rice.edu>; Wed, 17 Jul 1996 11:43:13 -0500 (CDT)
From: munroe@think.com
Received: from gandalf.think.com (Gandalf.Think.COM [131.239.146.104])
    by mail.think.com (8.7.5/m2) with SMTP id MAA24939;
    Wed, 17 Jul 1996 12:43:11 -0400 (EDT)
Received: by gandalf.think.com (4.1/Think-1.3)
	id AA17741; Wed, 17 Jul 96 12:43:10 EDT
Date: Wed, 17 Jul 96 12:43:10 EDT
Message-Id: <9607171643.AA17741@gandalf.think.com>
To: schreibr@hpl.hp.com
Cc: hpff-doc@cs.rice.edu
Subject: hpff-doc: Comments on Library
Sender: owner-hpff-doc
Precedence: bulk

---------------------------------------------------------------------------
hpff-doc@cs.rice.edu is a mailing list for HPF 2.0 language specification
authors and editors.  Instructions for adding or deleting yourself from this
list appear at the bottom of this message.
---------------------------------------------------------------------------

Rob,

These are the suggested corrections or changes that I didn't get
to hand to you in hard copy before you left the meeting last week.
I'm using the same page numbers as the copy distributed there.

In what follows I suggest changes to GRADE_UP/DOWN to reflect the decision
that it should return one-based values. I also indicate where DYNAMIC
could be deleted from the minimal, standard version of the inquiry
functions, to reflect the changed status of the DYNAMIC attribute as an
approved extension. None of the other suggested changes are substantive
ones.

I have one general question on terminology that is used through the current
HPF specifications.  Why is PRODUCT(SHAPE(...)) used instead of SIZE(...)?
Should I avoid using SIZE too in the descriptions of SORT_UP/DOWN? It just
seems bulkier to me, but I'm perfectly willing to be stylistically
consistent, I just wondered what the technical reason for this might be. I
guess you can say PRODUCT(SHAPE(<scalar>) is 1, since SHAPE(<scalar>) is a
rank 1, size 0 array, while SIZE(<scalar>) is undefined. That's the only
difference I can think of, and the contexts where I used SIZE were never
scalar.

--Carol

**************
Sect. 5.2, p. 88
l. 4 from b.: 
    replace
	"of he system inquiry functions NUMBER_OF_PROCESORS and"
    with 
	"of the system inquiry functions NUMBER_OF_PROCESORS or"

**************
Sect. 5.4.3 Array Reduction Functions, p. 90
ll. 18-19 refer to "binary operations IAND, IOR, IEOR, and .NEQV."
and then talks about "XXX reduction...where XXX is one of the binary
operators above." For consistency, then, I recommend
l. 4 from b.:
    replace
	"identity element for PARITY is .FALSE."
    with
	"identity element for .NEQV. is .FALSE."
(although the meaning is clear).

**************
Sect. 5.4.4 Array Combining Scatter Functions, p. 91

ll. 5-6 reads
	"These functions all have the form 
	XXX_SCATTER(ARRAY, BASE, INDX1, ..., INDXn, MASK)"

Since the following paragraphs clarify that the actual argument lists
vary somewhat, depending on the operation XXX, it would be more accurate
to replace this with something like
	"These functions have the general form ... except in
	the special cases noted below"

**************
Sect. 5.4.5 Array Prefix and Suffix Functions, p. 92

ll. 19-21 reads
	"The functions all have the form 
	XXX_PREFIX(ARRAY, DIM, MASK, SEGMENT, EXCLUSIVE)
	XXX_SUFFIX(ARRAY, DIM, MASK, SEGMENT, EXCLUSIVE)"

Just as above, it would be more accurate to replace this with something like
	"These functions have the general form ... except in
	the special cases noted below."

**************
Sect. 5.5.3 Mapping inquiry subroutines, p. 97
ll. 6-12 from b.
	I suggest reordering HPF_DISTRIBUTION and HPF_TEMPLATE
	alphabetically for consistency with rest of list.

For the same reason, I'd also reorder Section 5.7.16 and 5.7.17 on 
pp. 115-119, since HPF_DISTRIBUTION and HPF_TEMPLATE are the only two 
routines not in correct alphabetical order in pp. 100-141, and it makes 
them more awkward to look up.

**************
Sect. 5.5.6 Array combining scatter functions, p. 98
ll. 15-16 
    Replace
	"COUNT_SCATTER(ARRAY, BASE, INDX1, ..., INDXn, MASK)"
    with
	"COUNT_SCATTER(MASK, BASE, INDX1, ..., INDXn)"
 
ll. 23-24 Delete redundant spec. for IALL_SCATTER (following IPARITY...)

**************
Sect. 5.7.7 COPY_PREFIX, p. 107
l. 3 from b.
    Optionally, one could add a comment to the effect that
	"[by the rules stated Section 5.4.5].  Note that this set of values
	is never empty."

**************
Sect. 5.7.9 COPY_SUFFIX, p. 109
l. 8
    Optionally, one could add a comment to the effect that
	"[by the rules stated Section 5.4.5]. Note that this set of values
	is never empty."

**************
Sect. 5.7.13 GRADE_DOWN(ARRAY,DIM), p. 111
ll. 3-4 
    Suggested replacement for
	"Description. Produces a permutation of the indices of an array..."
    is
	"Description. Produces a permutation of the indices of an array,
	expressed as one-based coordinates,..."

l. 7
    Add to "must be of type integer, real, or character." the note
	"It must not be scalar."  (I think this is necessary for 
	consistency with the description "...the indices of an array.")

l. 16 (Case (i))
    Replace
	"S = GRADE_DOWN(ARRAY)"
    with
	"S = GRADE_DOWN(ARRAY)+SPREAD(LBOUND(ARRAY),DIM=2,NCOPIES=SIZE(ARRAY))-1"
    (Check this! It's longish but based on the way MAXLOC/MINLOC are described.)
p
    And, if there's some good reason for the lengthier form, "SIZE(ARRAY)"
    could be replaced throughout by the even longer "PRODUCT(SHAPE(ARRAY))" 
    (or vice versa, if there isn't).

l. 18 (Case (i), cont.)
    Question:  Why is "SIZE(B,1)" used instead of "SIZE(B)"? 
    (They are equivalent, so I guess it's another stylistic question,
     unless SIZE(B,1) is considered easier to compute or to understand?
     I just think it's clearer to without the dimension argument)

l. 25 (Case (ii))
    Replace
	"S = GRADE_DOWN(ARRAY,DIM=K)"
    with
	"S = GRADE_DOWN(ARRAY,DIM=K) + LBOUND(ARRAY,DIM=K) - 1"

l. 29-30 
    It is still correct, with the above indicated minimal change, to say that
	"R(i1,i2,...,:,...,in) is a permutation of all the integers in
	the range LBOUND(ARRAY,K):UBOUND(ARRAY,K)." 
    But you may want to introduce another variable above, say 
	G = GRADE_DOWN(ARRAY,DIM=K), with R = G + LBOUND(ARRAY,DIM=K) - 1, 
    and substitute, for greater clarity and directness, that
	"G(i1,i2,...,:,...,in) is a permutation of all the integers
	in the range 1:SIZE(ARRAY,K)."

**************
Sect. 5.7.14 GRADE_UP(ARRAY,DIM), p. 112
	IDENTICAL TO CHANGES FOR GRADE_DOWN (except for line and page numbers)
ll. 9-10
    Suggested replacement for
	"Description. Produces a permutation of the indices of an array..."
    is
	"Description. Produces a permutation of the indices of an array,
	expressed as one-based coordinates,..."

l. 13
    Add to "must be of type integer, real, or character." the note
	"It must not be scalar."  (I think this is necessary for 
	consistency with the description "...the indices of an array.")

l. 22 (Case (i))
    Replace
	"S = GRADE_UP(ARRAY)"
    with
	"S = GRADE_UP(ARRAY)+SPREAD(LBOUND(ARRAY),DIM=2,NCOPIES=SIZE(ARRAY))-1"
    (Check this! It's roundabout but based on the way MAXLOC/MINLOC are described.)

    And, if there's some good reason for the lengthier form, "SIZE(ARRAY)"
    could be replaced throughout by the wordier "PRODUCT(SHAPE(ARRAY))" 
    (or vice versa, if there isn't).

l. 24 (Case (i), cont.)
    Question:  Why is "SIZE(B,1)" used instead of "SIZE(B)"? 
    (They are equivalent, so I guess it's another stylistic question,
     unless SIZE(B,1) is considered easier to compute or to understand?
     I just think it's clearer to without the dimension argument)

l. 31 (Case (ii))
    Replace
	"S = GRADE_UP(ARRAY,DIM=K)"
    with
	"S = GRADE_UP(ARRAY,DIM=K)+LBOUND(ARRAY,DIM=K)-1"

l. 35-36 
    It is still correct, with the above indicated minimal change, to say that
	"R(i1,i2,...,:,...,in) is a permutation of all the integers in
	the range LBOUND(ARRAY,K):UBOUND(ARRAY,K)." 
    But you may want to introduce another variable above, say 
	G = GRADE_UP(ARRAY,DIM=K), with R = G + LBOUND(ARRAY,DIM=K) - 1, 
    and substitute, for greater clarity and directness, that
	"G(i1,i2,...,:,...,in) is a permutation of all the integers
	in the range 1:SIZE(ARRAY,K)."

**************
Sect. 5.7.15, HPF_ALIGNMENT, pp. 113-115
l.17 on p. 113, ll. 7-11 on p. 114, and table in middle of p. 115
    Remove "DYNAMIC" argument from this description, and add to
    "Approved extensions" version instead.

**************
Sect. 5.7.16, HPF_TEMPLATE, pp. 115-117
l. 3 from b.
    Close up space in "AXIS_ INFO" to "AXIS_INFO".

l. 3 from b. on p. 115, ll. 13-15 from b. plus bottom line (table) on p. 117
    Remove "DYNAMIC" argument from list of args, description of args, and
    table of results. Add to "Approved extensions" version instead.

**************
Sect. 5.7.17, HPF_DISTRIBUTION, pp. 118
l. 2
    Close up space in "PROCESSORS_ RANK" to "PROCESSORS_RANK".

**************
Sect. 5.7.46 SORT_DOWN, pp. 137-138
l. 7 from b. on p. 137
    Add to "must be of type integer, real, or character." the note
	"It must not be scalar."  

ll. 1-12 on p. 138
    It would be more consistent with surrounding style with fewer blanks
	around arguments.
    Also, if PRODUCT(SHAPE(ARRAY)) is really preferred to SIZE(ARRAY),
	that replacement could be made here.
	
************** 
Sect. 5.7.47 SORT_UP, p. 138-139
bottom line of p. 138
    Add to "must be of type integer, real, or character." the note
	"It must not be scalar."  

ll. 8-18 on p. 139
    It would be more consistent with surrounding style with fewer blanks
	around arguments.
    Also, if PRODUCT(SHAPE(ARRAY)) is really preferred to SIZE(ARRAY),
	that replacement could be made here.
			
---------------------------------------------------------------------------
To (un)subscribe to this list, send mail to hpff-doc-request@cs.rice.edu.
Leave the subject line blank, and in the body put the line
(un)subscribe <email-address>
---------------------------------------------------------------------------

From owner-hpff-doc  Thu Jul 18 18:28:23 1996
Received: (from daemon@localhost) by cs.rice.edu (8.7.1/8.7.1) id SAA26451 for hpff-doc-out; Thu, 18 Jul 1996 18:28:23 -0500 (CDT)
Received: from hplms26.hpl.hp.com (hplms26.hpl.hp.com [15.255.168.31]) by cs.rice.edu (8.7.1/8.7.1) with ESMTP id SAA26446 for <hpff-doc@cs.rice.edu>; Thu, 18 Jul 1996 18:28:20 -0500 (CDT)
Received: from hplrss.hpl.hp.com by hplms26.hpl.hp.com with ESMTP
	(1.37.109.16/15.5+ECS 3.3+HPL1.1S) id AA210792498; Thu, 18 Jul 1996 16:28:18 -0700
Received: by hplrss.hpl.hp.com
	(1.37.109.16/15.5+ECS 3.3+HPL1.1) id AA132162548; Thu, 18 Jul 1996 16:29:08 -0700
Date: Thu, 18 Jul 1996 16:29:08 -0700
From: Rob Schreiber <schreibr@hplrss.hpl.hp.com>
Message-Id: <199607182329.AA132162548@hplrss.hpl.hp.com>
To: hpff-doc@cs.rice.edu
Subject: hpff-doc: comments on acknowledgements
Sender: owner-hpff-doc
Precedence: bulk

---------------------------------------------------------------------------
hpff-doc@cs.rice.edu is a mailing list for HPF 2.0 language specification
authors and editors.  Instructions for adding or deleting yourself from this
list appear at the bottom of this message.
---------------------------------------------------------------------------

For Mary and Chuck.

Do we need the whole history of HPF from day 1?   I.D.T.S.

How about dropping th sections on HPF1.0 and HPFF94.

From owner-hpff-doc  Thu Jul 18 18:38:34 1996
Received: (from daemon@localhost) by cs.rice.edu (8.7.1/8.7.1) id SAA26559 for hpff-doc-out; Thu, 18 Jul 1996 18:38:34 -0500 (CDT)
Received: from hplms26.hpl.hp.com (hplms26.hpl.hp.com [15.255.168.31]) by cs.rice.edu (8.7.1/8.7.1) with ESMTP id SAA26553 for <hpff-doc@cs.rice.edu>; Thu, 18 Jul 1996 18:38:30 -0500 (CDT)
Received: from hplrss.hpl.hp.com by hplms26.hpl.hp.com with ESMTP
	(1.37.109.16/15.5+ECS 3.3+HPL1.1S) id AA212883107; Thu, 18 Jul 1996 16:38:27 -0700
Received: by hplrss.hpl.hp.com
	(1.37.109.16/15.5+ECS 3.3+HPL1.1) id AA132223157; Thu, 18 Jul 1996 16:39:17 -0700
Date: Thu, 18 Jul 1996 16:39:17 -0700
From: Rob Schreiber <schreibr@hplrss.hpl.hp.com>
Message-Id: <199607182339.AA132223157@hplrss.hpl.hp.com>
To: hpff-doc@cs.rice.edu
Subject: hpff-doc: comments on overview
Sender: owner-hpff-doc
Precedence: bulk

---------------------------------------------------------------------------
hpff-doc@cs.rice.edu is a mailing list for HPF 2.0 language specification
authors and editors.  Instructions for adding or deleting yourself from this
list appear at the bottom of this message.
---------------------------------------------------------------------------

section 1.4.2:  Rewrite as:

To enable programmers to indicate additional opportunities for parallel execution,
HPF provides an {\tt INDEPENDENT} directive.    When applied to a {\tt DO} loop or a {\tt
FORALL} construct, it asserts that the loop iterations do not exhibit any sequentializing
dependences.
Reductions are now allowed in an independent {\tt DO} loop.


section 1.4.3 :   add "and array mapping inquiry subroutines" on line 5 of the ection.


section 1.4.4:   Systolic computation is a bad example of the use of extrinsics, as
they are usually data parallel and can be expressed in HPF.

Sparse matrix computations are a better example.   They require genuinely asynchronous,
independent computation on the processors.


page 12:  second bullet:  "and the HPF library functions."




---------------------------------------------------------------------------
To (un)subscribe to this list, send mail to hpff-doc-request@cs.rice.edu.
Leave the subject line blank, and in the body put the line
(un)subscribe <email-address>
---------------------------------------------------------------------------

From owner-hpff-doc  Thu Jul 18 18:40:48 1996
Received: (from daemon@localhost) by cs.rice.edu (8.7.1/8.7.1) id SAA26598 for hpff-doc-out; Thu, 18 Jul 1996 18:40:48 -0500 (CDT)
Received: from hplms26.hpl.hp.com (hplms26.hpl.hp.com [15.255.168.31]) by cs.rice.edu (8.7.1/8.7.1) with ESMTP id SAA26592 for <hpff-doc@cs.rice.edu>; Thu, 18 Jul 1996 18:40:43 -0500 (CDT)
Received: from hplrss.hpl.hp.com by hplms26.hpl.hp.com with ESMTP
	(1.37.109.16/15.5+ECS 3.3+HPL1.1S) id AA213453242; Thu, 18 Jul 1996 16:40:43 -0700
Received: by hplrss.hpl.hp.com
	(1.37.109.16/15.5+ECS 3.3+HPL1.1) id AA132273293; Thu, 18 Jul 1996 16:41:33 -0700
Date: Thu, 18 Jul 1996 16:41:33 -0700
From: Rob Schreiber <schreibr@hplrss.hpl.hp.com>
Message-Id: <199607182341.AA132273293@hplrss.hpl.hp.com>
To: hpff-doc@cs.rice.edu
Subject: hpff-doc: Comments on Mapping
Sender: owner-hpff-doc
Precedence: bulk

---------------------------------------------------------------------------
hpff-doc@cs.rice.edu is a mailing list for HPF 2.0 language specification
authors and editors.  Instructions for adding or deleting yourself from this
list appear at the bottom of this message.
---------------------------------------------------------------------------
For Carl.

In section 2.8, there are "hard" links to the OLD Fortran standard.   Let's
take all these out.


References to FORTRAN 77 should be to Fortran (without any kind type parameter.)


---------------------------------------------------------------------------
To (un)subscribe to this list, send mail to hpff-doc-request@cs.rice.edu.
Leave the subject line blank, and in the body put the line
(un)subscribe <email-address>
---------------------------------------------------------------------------

From owner-hpff-doc  Thu Jul 18 18:50:04 1996
Received: (from daemon@localhost) by cs.rice.edu (8.7.1/8.7.1) id SAA26822 for hpff-doc-out; Thu, 18 Jul 1996 18:50:04 -0500 (CDT)
Received: from hplms26.hpl.hp.com (hplms26.hpl.hp.com [15.255.168.31]) by cs.rice.edu (8.7.1/8.7.1) with ESMTP id SAA26815 for <hpff-doc@cs.rice.edu>; Thu, 18 Jul 1996 18:50:02 -0500 (CDT)
Received: from hplrss.hpl.hp.com by hplms26.hpl.hp.com with ESMTP
	(1.37.109.16/15.5+ECS 3.3+HPL1.1S) id AA214993801; Thu, 18 Jul 1996 16:50:01 -0700
Received: by hplrss.hpl.hp.com
	(1.37.109.16/15.5+ECS 3.3+HPL1.1) id AA132323851; Thu, 18 Jul 1996 16:50:51 -0700
Date: Thu, 18 Jul 1996 16:50:51 -0700
From: Rob Schreiber <schreibr@hplrss.hpl.hp.com>
Message-Id: <199607182350.AA132323851@hplrss.hpl.hp.com>
To: hpff-doc@cs.rice.edu
Subject: hpff-doc: Comments on Mapping in Subprogram Calls
Sender: owner-hpff-doc
Precedence: bulk

---------------------------------------------------------------------------
hpff-doc@cs.rice.edu is a mailing list for HPF 2.0 language specification
authors and editors.  Instructions for adding or deleting yourself from this
list appear at the bottom of this message.
---------------------------------------------------------------------------

It is no longer possible to write "combinations of mapping directives
that are partly transcriptive and partly descriptive" (or is it?)

Suggested rewordings in Section 3.5 on explicit interfaces:

item 3.  For every pair of corresponding actual and dummy arguments, either:

   (a) They are both implicitly mapped, or
   (b) They are both .... (as it is now).

item 4.   Same reconstruction.

I think this is easier to parse, absorb, and understand.

Page 66, in advice to users:

Note that an e.i. can be provided in three ways.

   1.   Module procedures have an e.i.
   2.   Contained subprograms have an e.i.
   3.   An e.i. may be provided in an interface block.



The parenthetical comment in point 1 at the top of page 67
"(Actually, in that example, .....)
can be omitted -- it is distracting.

p.71    It is not necessary to put place names in quotes, no matter
HOW long or unpronouncable they are.   Looks funny to do so.



---------------------------------------------------------------------------
To (un)subscribe to this list, send mail to hpff-doc-request@cs.rice.edu.
Leave the subject line blank, and in the body put the line
(un)subscribe <email-address>
---------------------------------------------------------------------------

From owner-hpff-doc  Thu Jul 18 18:55:28 1996
Received: (from daemon@localhost) by cs.rice.edu (8.7.1/8.7.1) id SAA26919 for hpff-doc-out; Thu, 18 Jul 1996 18:55:28 -0500 (CDT)
Received: from hplms26.hpl.hp.com (hplms26.hpl.hp.com [15.255.168.31]) by cs.rice.edu (8.7.1/8.7.1) with ESMTP id SAA26914 for <hpff-doc@cs.rice.edu>; Thu, 18 Jul 1996 18:55:26 -0500 (CDT)
Received: from hplrss.hpl.hp.com by hplms26.hpl.hp.com with ESMTP
	(1.37.109.16/15.5+ECS 3.3+HPL1.1S) id AA215974119; Thu, 18 Jul 1996 16:55:20 -0700
Received: by hplrss.hpl.hp.com
	(1.37.109.16/15.5+ECS 3.3+HPL1.1) id AA132364164; Thu, 18 Jul 1996 16:56:04 -0700
Date: Thu, 18 Jul 1996 16:56:04 -0700
From: Rob Schreiber <schreibr@hplrss.hpl.hp.com>
Message-Id: <199607182356.AA132364164@hplrss.hpl.hp.com>
To: hpff-doc@cs.rice.edu
Subject: hpff-doc: Comments on Portable/Efficient
Sender: owner-hpff-doc
Precedence: bulk

---------------------------------------------------------------------------
hpff-doc@cs.rice.edu is a mailing list for HPF 2.0 language specification
authors and editors.  Instructions for adding or deleting yourself from this
list appear at the bottom of this message.
---------------------------------------------------------------------------

Hi Andy!

Check for use of tt font for DYNAMIC and other HPF keywords.

The REDUCE directive no longer exists.   Its the REDUCTION clause
in the INDEPENDENT directive.

---------------------------------------------------------------------------
To (un)subscribe to this list, send mail to hpff-doc-request@cs.rice.edu.
Leave the subject line blank, and in the body put the line
(un)subscribe <email-address>
---------------------------------------------------------------------------

From owner-hpff-doc  Thu Jul 18 18:59:50 1996
Received: (from daemon@localhost) by cs.rice.edu (8.7.1/8.7.1) id SAA27032 for hpff-doc-out; Thu, 18 Jul 1996 18:59:50 -0500 (CDT)
Received: from hplms26.hpl.hp.com (hplms26.hpl.hp.com [15.255.168.31]) by cs.rice.edu (8.7.1/8.7.1) with ESMTP id SAA27015 for <hpff-doc@cs.rice.edu>; Thu, 18 Jul 1996 18:59:47 -0500 (CDT)
Received: from hplrss.hpl.hp.com by hplms26.hpl.hp.com with ESMTP
	(1.37.109.16/15.5+ECS 3.3+HPL1.1S) id AA216864374; Thu, 18 Jul 1996 16:59:34 -0700
Received: by hplrss.hpl.hp.com
	(1.37.109.16/15.5+ECS 3.3+HPL1.1) id AA132424424; Thu, 18 Jul 1996 17:00:24 -0700
Date: Thu, 18 Jul 1996 17:00:24 -0700
From: Rob Schreiber <schreibr@hplrss.hpl.hp.com>
Message-Id: <199607190000.AA132424424@hplrss.hpl.hp.com>
To: hpff-doc@cs.rice.edu
Subject: hpff-doc: Comments on Extended Parallelism
Sender: owner-hpff-doc
Precedence: bulk

---------------------------------------------------------------------------
hpff-doc@cs.rice.edu is a mailing list for HPF 2.0 language specification
authors and editors.  Instructions for adding or deleting yourself from this
list appear at the bottom of this message.
---------------------------------------------------------------------------

Jass,

Just to be sure, I think we want the following declarations in
the fft example:

!hpf$ processors procs(8)
!hpf$ template, dimension(4), distribute(block) onto procs(1:4) :: t
!hpf$ align with t(*) :: done1

This is an HPF-conforming way to get replication onto a processor subset.


Rob


---------------------------------------------------------------------------
To (un)subscribe to this list, send mail to hpff-doc-request@cs.rice.edu.
Leave the subject line blank, and in the body put the line
(un)subscribe <email-address>
---------------------------------------------------------------------------

From owner-hpff-doc  Mon Jul 22 12:31:55 1996
Received: (from daemon@localhost) by cs.rice.edu (8.7.1/8.7.1) id MAA03114 for hpff-doc-out; Mon, 22 Jul 1996 12:31:55 -0500 (CDT)
Received: from mail.think.com (Mail1.Think.COM [131.239.33.245]) by cs.rice.edu (8.7.1/8.7.1) with ESMTP id MAA03108 for <hpff-doc@cs.rice.edu>; Mon, 22 Jul 1996 12:31:49 -0500 (CDT)
From: munroe@think.com
Received: from gandalf.think.com (Gandalf.Think.COM [131.239.146.104])
    by mail.think.com (8.7.5/m3) with SMTP id NAA19806;
    Mon, 22 Jul 1996 13:31:42 -0400 (EDT)
Received: by gandalf.think.com (4.1/Think-1.3)
	id AA27530; Mon, 22 Jul 96 13:31:41 EDT
Date: Mon, 22 Jul 96 13:31:41 EDT
Message-Id: <9607221731.AA27530@gandalf.think.com>
To: schreibr@hplrss.hpl.hp.com
Cc: hpff-doc@cs.rice.edu
In-Reply-To: Rob Schreiber's message of Thu, 18 Jul 1996 16:50:51 -0700 <199607182350.AA132323851@hplrss.hpl.hp.com>
Subject: hpff-doc: Comments on Mapping in Subprogram Calls
Sender: owner-hpff-doc
Precedence: bulk

---------------------------------------------------------------------------
hpff-doc@cs.rice.edu is a mailing list for HPF 2.0 language specification
authors and editors.  Instructions for adding or deleting yourself from this
list appear at the bottom of this message.
---------------------------------------------------------------------------
   Date: Thu, 18 Jul 1996 16:50:51 -0700
   From: Rob Schreiber <schreibr@hplrss.hpl.hp.com>

   ---------------------------------------------------------------------------
   hpff-doc@cs.rice.edu is a mailing list for HPF 2.0 language specification
   authors and editors.  Instructions for adding or deleting yourself from this
   list appear at the bottom of this message.
   ---------------------------------------------------------------------------

   It is no longer possible to write "combinations of mapping directives
   that are partly transcriptive and partly descriptive" (or is it?)

What about examples like this?

	!hpf$ distribute * onto *square :: d1
	!hpf$ distribute *(block,block) onto * :: d2

These look like still legal combinations of transcriptive and descriptive
to me (not that I'm championing their preservation), though within a 
single directive. 

(Now that you can't combine INHERIT and DISTRIBUTE, I don't think you
could have a multiple directive mixture as described above.)

---------------------------------------------------------------------------
To (un)subscribe to this list, send mail to hpff-doc-request@cs.rice.edu.
Leave the subject line blank, and in the body put the line
(un)subscribe <email-address>
---------------------------------------------------------------------------

From owner-hpff-doc  Tue Jul 23 15:48:16 1996
Received: (from daemon@localhost) by cs.rice.edu (8.7.1/8.7.1) id PAA12636 for hpff-doc-out; Tue, 23 Jul 1996 15:48:16 -0500 (CDT)
Received: from N2.SP.CS.CMU.EDU (N2.SP.CS.CMU.EDU [128.2.250.82]) by cs.rice.edu (8.7.1/8.7.1) with SMTP id PAA12627 for <hpff-doc@CS.RICE.EDU>; Tue, 23 Jul 1996 15:48:12 -0500 (CDT)
From: Jaspal.Subhlok@N2.SP.CS.CMU.EDU
Message-Id: <199607232048.PAA12627@cs.rice.edu>
Date: Tue, 23 Jul 96 16:33:30 EDT
To: hpff-doc@cs.rice.edu
Subject: hpff-doc: Expanded section/chapter on task parallelism
Sender: owner-hpff-doc
Precedence: bulk

---------------------------------------------------------------------------
hpff-doc@cs.rice.edu is a mailing list for HPF 2.0 language specification
authors and editors.  Instructions for adding or deleting yourself from this
list appear at the bottom of this message.
---------------------------------------------------------------------------
Chuck, Rob and others:

I am appending the expanded section on task parallelism. It is formatted
as a chapter. Chuck - could you integrate it into the full chapter on
parallelism extensions, perhaps after you are done revising the ON. Also
I am counting on you or Rob to fill in the bnf (search for "CHUCK" for 
the location). 

jas

ps: It appears that there is a lot of macros described in the README
file (e.g. MAKE-HPF-CHAPTER) that I could not find at the ftp site.
This caused me a lot of confusion.

--------------------------------------------------------

% File: parallel-ext.tex

% Contents:
% Approved Extensions for data-parallel computation for HPF 2.0
% document, including
%	ON
%	TASK
%	RESIDENT

% Revision history:
% May-10-96	Created by Charles Koelbel, Rice University
%		(from HPF 2.0 proposals)


\chapter{Approved Extensions for Data and Task Parallelism}
\label{ch-parallel-ext}

{\em
Comments on this chapter should be directed to 
Chuck Koelbel ({\tt chk@cs.rice.edu}),
Jaspal Subhlok ({\tt jass@cs.cmu.edu}),
and {\tt hpff-doc@cs.rice.edu}.
Please use ``{\tt Comments on Extended Parallelism}'' as the {\tt Subject:}
line.
\par
}

\section{The TASK\_REGION Directive}
\label{sec-tasking}

Task parallelism is expressed implicitly in HPF by mapping data 
objects onto subsets of processors and adding assertions that
allow concurrent execution of different  code blocks on different
processor subsets. A data objects is mapped to a processor subset
by distribution onto a 
subsection of a processor arrangement.
Execution on a subset of processors is specified by using an 
{\tt ON} directive. This section introduces a {\tt TASK\_REGION}
directive that allows the user to implicitly specify that
disjoint processor subsets can execute blocks of code concurrently.
  
A {\tt TASK\_REGION} construct is used to assert that a block of code 
satisfies the following set of constraints.
 All lexically outermost {\tt ON} blocks
inside a task region must have a {\tt RESIDENT} attribute
implying that all  data accessed inside them is mapped to the
corresponding processor subset. Further,
the code inside two such {\tt ON} blocks must not have interfering
I/O. Under these constraints, two such {\tt ON} blocks  
 can safely execute concurrently if they execute on disjoint
processor subsets.

\subsection{Syntax of the TASK\_REGION Directive}

[CHUCK : Could  you replace this with a BNF - I suspect
 it will take  you 5 mins and will take me much longer]


A task region is a single entry region delimited by:
\CODE
!HPF$ TASK REGION
  ......
!HPF$ END TASK REGION
\EDOC


\subsection{Semantics of the TASK\_REGION Directive}

The task region directive is a way for the programmer to assert
that a section of code satisfies a set of conditions. The compiler
is expected to use these assertions to generate task parallel code.
We will refer to a block of code enclosed by a 
{\tt TASK\_REGION ... END TASK\_REGION} pair as a ``task region''.

A task region can contain blocks of code that are directed to execute {\tt ON} 
processor subsets.  All other code executes on a subset that 
contains all executing processors.  Every {\tt ON} block at the outermost
nesting level (i.e.  not inside another {\tt ON} block or another task 
region) inside a 
task region is defined as a ``lexical task''.  Every execution instance of 
a lexical task is defined as an ``execution task'' and will also be
referred to
as just ``task'' when the distinction is clear from the context.

The following restrictions must hold  inside a task region:  
\begin{itemize}
\item Every {\tt ON} block corresponding to a lexical task must
have the {\tt RESIDENT} attribute.
This means that, for  reading a variable inside an execution task,
the executing processor subset for the task 
(also referred to as {\em active processors}) 
must own at least one copy of the variable, and for writing, they must own 
all copies of that variable.

\item An I/O operation inside an
execution task may interfere with an I/O operation inside
another execution task if and only if the two tasks execute on
identical subsets of  processors. Note that  two execution tasks
can be  instances of the same or different lexical tasks.
In general, two I/O operations interfere if they access the same
file or unit. The conditions for interference of I/O operations are
detailed in Section~\ref{sub-independent} in the context of the
{\tt INDEPENDENT} directive.

\end{itemize}

\subsection{Execution Model and Usage}

A task region does not introduce a fundamentally new execution model.  
However, the asssertions implicit in a task region imply that only
the specified executing processors of an execution task need to
participate in its execution, and that other processors can skip its
execution.  A processor executing a task 
region participates in the execution of all tasks
executing on a processor subset that it belongs to and
does not participate in the  execution of tasks executing 
on processor subsets that it does not belong to. 
Code outside lexical tasks is 
executed as normal data parallel code by all processors.  The access 
restrictions for a task region guarantee that the results obtained
by this execution paradigm will be consistent with pure data
parallel execution of a task region.

A task region presents a simple yet powerful model to write integrated
task and data parallel programs. We illustrate three basic computation
structures in which task parallelism can be effectively exploited
with this model.

\begin{enumerate}

\item {\em Parallel sections:} 
A task region can be used to divide the available processors into 
disjoint sets for performing independent computations, simulating
what is often referred to as {\em parallel sections}. This form of
task parallelism is relatively straightforward and useful in many
application scenarios, an example being multiblock applications.
The task region simply contains
a sequence of {\tt RESIDENT ON} blocks on disjoint processor subsets. 
Note that the division of processors among subsets can be dynamic,
that is, it can be in terms of other variables computed during execution.

\item {\em Nested parallelism:}
Task regions can be nested, and in particular, a subroutine call made
from an execution task can further subdivide the executing (or active)
processors using another task region directive. This allows the exploitation
of nested parallelism. An example is the implementation of
dynamic tree structured divide and conquer computations. As a 
specific example,
{\em quicksort} can be implemented by  recursively 
partitioning the input array of keys 
around a pivot and  assigning proportionate number of processors
to the two new arrays obtained as a result of partitioning.

\item {\em  Data parallel pipelines:}
Task regions can be used to implement 
pipelined data parallel computations. We will
illustrate this with a 2 dimensional fast Fourier transform (2D FFT)
computation. The first stage of a 2D FFT reads a two dimensional matrix
and performs a 1 dimensional FFT on each row of the matrix. The second
stage performs a 1 dimensional FFT on each column of the matrix and generates
the final output. In a pipelined data parallel implementation of this
form of 2D FFT, the two stages are mapped on to disjoint subsets of processors.
Task and data parallel code for a 2D FFT, along with a brief
description, is included in Section~\ref{sec:2DFFT}.

\end{enumerate}

\subsection{Implementation}
A task region is simply an assertion about a code block and the
exploitation of task parallelism is at least partially dependent
on the compilation scheme. While the specifics of how task parallelism
is exploited will be strongly dependent on the parallel system
architecture, the compiler, and the  underlying communication 
model, we will point out some  important considerations and
illustrate task parallel code generation with an example. We primarily
address distributed memory machines using a  message passing 
communication and synchronization model, but will point out
some of the important issues relating to shared memory implementations.


\subsubsection{Localized computation and communication}
It is of central importance that computation and communication
inside an executing task should not  involve any processors
other than those directed to execute the task in the
relevant {\tt ON} clause.

On entry to a lexical task, the compiler has to insert checks so
that the processors not in the relevant subset jump to the code
following the task. Since an execution task cannot access data
outside of the active executing processor subset, no communication
needs to be generated between the relevant executing processors
and other processors. In a message passing model, a communication
generation algorithm that only generates necessary messages will
naturally achieve the desired results. However, some communication
schemes can involve generation of empty messages between processors
that do not communicate and it is important to ensure that empty
messages are not generated between active processors of an executing
task and other processors.

A communication model that uses barriers for synchronization (in
shared or distributed memory machines) must ensure that all barriers
inside an executing task are subset barriers that only span the 
active processors. An implementations may also need to include a
subset barrier on entry and exit to an executing task for consistency
of data accesses inside and outside an executing task. In general,
the compilation framework has to ensure the consistency of data
accesses inside and and outside an executing tasks and this can be done
in the context of  virtually any synchronization scheme in a shared
or distributed memory environment.



\subsubsection{Replicated computations}
All computations exclusively involving replicated variables should be 
replicated on all executing processors.
A simple alternative is that  one processor performs the
computation and broadcasts the results to all processors. While such
replication
is generally profitable in HPF anyway, it has additional importance
in a task region since the communication generated by a broadcast can
cause additional synchronization that may interfere with
task parallelism.

\subsubsection{Implications for I/O}
In some parallel system implementations, I/O is performed through
a single node of the system. Task parallelism in the presence of 
I/O assumes that all nodes can perform I/O independently and this
paradigm has to be supported, although it is not necessary that 
each node be able to physically perform all I/O operations independently.
One simple solution is to have a single
designated I/O processor that performs
all I/O but is not considered an executing processor and hence does 
not have any execution related dependences.

\subsubsection{SPMD or MIMD code generation}
Another issue for the compiler is whether or not the same code image
should execute on all processors. Since different processor groups 
may need different variables, a naive SPMD implementation is likely
to be wasteful of memory since it must allocate all variables on
all processors. This can be addressed by dynamic memory allocation,
but at the cost of added complexity. Using different code images for
different processor subsets is another solution which leads to
significant added complexity.

\subsection{Example: 2-D FFT}
\label{sec:2DFFT}

This sections shows the use of task parallelism to build
a pipelined data-parallel 2-dimensional FFT and illustrates 
the compilation of task parallelism by  showing
SPMD code generated from the HPF program.

The basic  sequential 2DFFT code is as follows:
\CODE
      REAL, DIMENSION(n,n) :: a1, a2

      DO WHILE(.true.)
          READ (unit = 1, end = 100) a1
          CALL rowffts(a1)
          a2 = a1
          CALL colffts(a2)
          WRITE (unit = 2) a2
          CYCLE
100       CONTINUE
          EXIT
      ENDDO
\EDOC


To write a pipelined task and data parallel 2D FFT
in HPF, the code is slightly modified and several HPF
directives are added.
First,  variables {\tt a1} and {\tt a2} are distributed onto disjoint
subsets of processors, and then a task region is used to create two
lexical tasks to do {\tt rowffts}  and {\tt colffts} on different
subsets of processors. The assignment {\tt a2 = a1} in the task region
specifies transfer of data between the tasks. A new variable {\tt done1}
is introduced to store the termination condition.
The modified code is as follows:

\CODE
        REAL DIMENSION(n,n) :: a1,a2
        LOGICAL done1
!HPF$   PROCESSORS procs(8) 

!HPF$   DISTRIBUTE a1(block,*) ONTO procs(1:4)
!HPF$   DISTRIBUTE a2(*,block) ONTO procs(5:8)

!HPF$   TEMPLATE, DIMENSION(4), DISTRIBUTE(BLOCK) ONTO procs(1:4) :: td1
!HPF$   ALIGN WITH td1(*) :: done1
                 
!HPF$   TASK REGION
        done1 = .false.
        DO WHILE (.true.)
!HPF$       ON HOME(procs(1:4)) BEGIN, RESIDENT
              READ (unit = iu,end=100) a1
              CALL rowffts(a1)
              GOTO 101
    100       done1 = .true.
    101       CONTINUE
!HPF$       END ON
            
            IF (done1) EXIT
            a2 = a1

!HPF$       ON HOME(procs(5:8)) BEGIN, RESIDENT
               CALL colffts(a2)
               WRITE(unit = ou) a2
!HPF$       END ON
        ENDDO
!HPF$   END TASK REGION
\EDOC


Finally, we show simplified SPMD code generated
for each processor. We assume
a  message passing model where sends are asynchronous and nonblocking
and receives block until the data is available. We use a simple memory
model where variable declarations are identical across
all processors even though some variables will be referenced only on
subsets of the processors.
A shadow variable {\tt done1\_copy} is created by the compiler to
transfer information from processor subset 1 to processor subset 2
about termination of processing. The code is as follows:


\CODE
        REAL DIMENSION(n/4,n) :: a1, a2
        BOOLEAN done1

C       Following are compiler generated variables       
        BOOLEAN done1_copy
        BOOLEAN inset1, inset2

C     
C       Following magic compiler function call is to set the variables
C       inset1 and  inset2 to .true. for subset1 and subset 2 processors
C       respectively, and .false. otherwise.
C
        CALL intialize_tasksets(inset1,inset2)

C       Code for processor subset 1
        IF (inset1)
           done1 = .false.
           DO WHILE (.true.)
              READ (unit = 1,end=100) a1
              CALL rowffts(a1)
              GOTO 101
    100       done1 = .true.
    101       CONTINUE
              _send(done1,procs(5:8))
              IF (done1) EXIT
              _send(a1,proces(5:8))
           ENDDO
         ENDIF

C       Code for processor subset 2
        IF (inset2)       
           DO WHILE(.true.)
              _receive(done1_copy,procs(1:4))
              IF (local_done1) EXIT
              _receive(a2,procs(1:4))
              CALL colffts(a2)
              WRITE (unit = 2) a2
           ENDDO
        ENDIF
\EDOC

{\tt \_send} and {\tt \_receive} are  communication calls to
transfer variables between subsets of processors. Program execution
until the end of input is as follows.
Subset 1 processors repeatedly read input, compute
{\tt rowffts}, and send the computed output as well as {\tt done1} flag
(which is not set) to subset 2 processors. The subset 2 processors
receive the flag and the data set, compute
{\tt colffts} and write the results to the output. When the end of input is
reached, subset 1 processors set the flag {\tt done1} , send it and terminate
execution. Subset 2 processors receive the flag, recognise that the end of
input has been reached, and terminate execution.







---------------------------------------------------------------------------
To (un)subscribe to this list, send mail to hpff-doc-request@cs.rice.edu.
Leave the subject line blank, and in the body put the line
(un)subscribe <email-address>
---------------------------------------------------------------------------

From owner-hpff-doc  Thu Jul 25 12:50:14 1996
Received: (from daemon@localhost) by cs.rice.edu (8.7.1/8.7.1) id MAA15406 for hpff-doc-out; Thu, 25 Jul 1996 12:50:14 -0500 (CDT)
Received: from timbuk.cray.com (root@timbuk.cray.com [128.162.19.7]) by cs.rice.edu (8.7.1/8.7.1) with ESMTP id MAA15394; Thu, 25 Jul 1996 12:49:50 -0500 (CDT)
Received: from ironwood.cray.com (root@ironwood-fddi.cray.com [128.162.21.36]) by timbuk.cray.com (8.7.5/CRI-gate-8-2.11) with SMTP id MAA28535; Thu, 25 Jul 1996 12:49:12 -0500 (CDT)
Received: from poplar414.cray.com (meltzer@poplar414 [128.162.149.14]) by ironwood.cray.com (8.6.12/CRI-ccm_serv-8-2.8) with ESMTP id MAA21578; Thu, 25 Jul 1996 12:49:10 -0500
From: Andy Meltzer <meltzer@cray.com>
Received: by poplar414.cray.com (8.6.12/btd-b3)
          id MAA20609; Thu, 25 Jul 1996 12:49:08 -0500
Message-Id: <199607251749.MAA20609@poplar414.cray.com>
Subject: hpff-doc: hpf-craft document
To: zosel@coral.llnl.gov, chk@cs.rice.edu, hpff-doc@cs.rice.edu
Date: Thu, 25 Jul 1996 12:49:06 -0500 (CDT)
X-Mailer: ELM [version 2.4 PL24-CRI-b]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-hpff-doc
Precedence: bulk

---------------------------------------------------------------------------
hpff-doc@cs.rice.edu is a mailing list for HPF 2.0 language specification
authors and editors.  Instructions for adding or deleting yourself from this
list appear at the bottom of this message.
---------------------------------------------------------------------------


\section{HPF\_CRAFT}
\subsection*{\centering Abstract}
% IEEE allows italicized abstract
{\em
HPF\_CRAFT is a hybrid language, combining an SPMD execution model with 
the highest performing of the HPF features.
The model combines the multi-threaded execution of HPF\_LOCAL and the
HPF syntax.  The goal of HPF\_CRAFT is to 
attain the potential performance of an SPMD programming model with 
access to HPF features and a well-defined extrinsic interface to HPF.
}

\subsection{Introduction\label{sec:intro}}

HPF\_CRAFT is a hybrid language, combining an SPMD execution 
model with the highest performing and most portable of the HPF features.
The model combines the multi-threaded execution of HPF\_LOCAL and the
HPF syntax and features.  The goal of HPF\_CRAFT is to 
attain the potential performance of an SPMD programming model with 
access to HPF features and a well-defined extrinsic interface to HPF.
It is built on top of the HPF\_LOCAL extrinsic environment.
The choice of HPF features in HPF\_CRAFT is derived from the chapter on
coding for portable performance in HPF.

SPMD features and a multi-threaded model allow the user to take
advantage of the performance and opportunity for low level access of 
a more general purpose programming model.  Including HPF data
distribution features gives the programmer access to the highest 
performing aspects of both models, but with the added responsibility of 
working with a more low-level execution model.
HPF\_CRAFT is not appropriate for all platforms, but is consistent with HPF
and easily targeted for platforms that have HPF and can support 
SPMD programming styles.

The HPF features included in HPF\_CRAFT are a subset of the full HPF 
language chosen for their performance and their broad portability and
ease of use.  HPF\_CRAFT contains additional features to support SPMD
programming styles.  There are some differences from HPF, however. 
For example, I/O causes differences; in HPF\_CRAFT different
processors are allowed to read from different files at the same time, in HPF
the processors must all read from the same file.   The differences in the
models are principally caused by the multi-threaded execution model and
the introduction of HPF\_LOCAL data rules.

HPF\_CRAFT allows for the notion of {\em private data}.  Data defaults
to a mapping in which data items are allocated so that each processor
has a unique copy.  The values of the individual
data items and the flow of control may vary from processor to
processor within HPF\_CRAFT. This behavior is consistent with the behavior
of HPF\_LOCAL.  In HPF\_CRAFT a processor
may be individually named and code executed based upon which processor
it is executing on.
HPF\_CRAFT also allows for the notion of {\em private loops}.  A private loop 
is executed in entirety by each processor.

The rules governing the interface to HPF\_CRAFT subroutines
are similar to those for the HPF\_LOCAL interface.  Dummy arguments
use a hybrid of the interfaces between HPF and
itself and that of HPF and HPF\_LOCAL.  Explicitly mapped dummy arguments
behave just as they do in HPF, while default (private) dummy arguments
use the HPF\_LOCAL calling convention.

HPF\_CRAFT will be initially made available on Cray MPP systems
and may also be available on Cray vector architectures.
Future versions of HPF\_CRAFT are possible on other vendor's 
architectures as well.   

HPF\_CRAFT is being implemented for Cray Research by The Portland Group, Inc.
For Cray systems, HPF\_CRAFT may be obtained through the Cray Research Inc.
Orderdesk,
\begin{tabbing}
---------\=---------\=---\=---\=---\=------------\=\kill
\> Cray Research Inc. \\
\> orderdsk@cray.com \\
\> (612) 683-5907 \\
\end{tabbing}

Additional formal documentation, requests, and suggestions can be made to 
\begin{tabbing}
---------\=---------\=---\=---\=---\=------------\=\kill
\> The Portland Group \\
\> 9150 SW Pioneer Ct., Suite H \\
\> Wilsonville, OR 97070 \\
\> (503) 682-2806 \\
\> trs@pgroup.com \\
\end{tabbing}
	
\subsection{Examples of Use}

HPF\_CRAFT is intended for use in circumstances where greater control
and performance are desired for MIMD style architectures.   Since data
may be declared to be private, local control is made more available and
since processor information is available message passing and direct
memory access programming styles can be seamlessly integrated with
explicitly mapped data.

The following examples show some of the capabilities of HPF\_CRAFT that
are different from those of HPF.  Others, such as integrated message
passing and synchronization primitives are not shown.  Much of HPF can
also be used within HPF\_CRAFT.

{\bf example 1}

\begin{tabbing}
----------\=----\=----\=----\=---\=------------\=\kill
\> {\tt INTEGER PRIVATE\_A(100, 20), PRIVATE\_B(12, 256), PRIVATE\_C} \\
\> {\tt INTEGER MAPPED\_A(100, 20), MAPPED\_B(12, 256) MAPPED\_C} \\
{\tt !HPF\$} \> {\tt DISTRIBUTE MAPPED\_A(BLOCK, BLOCK), MAPPED\_B(BLOCK, *), MAPPED\_C} \\
\end{tabbing}

Example 1 illustrates the difference between the default distribution
for data and the distribution of mapped data.   In the above example, given
8 processors, there would be 8 * 100 * 20 (or  16,000) elements in the 
array {\tt PRIVATE\_A}.  Each processor contains an entire array named
{\tt PRIVATE\_A}. The elements of {\tt PRIVATE\_A} on processor 1
cannot be referenced using implicit syntax by any other processor.  There are
only 100 * 20 (or 2000) elements of {\tt MAPPED\_A}, however, and these
elements are distributed about the machine in a {\tt (BLOCK, BLOCK)} fashion.

The difference between the {\tt PRIVATE\_A} declaration in HPF\_CRAFT and that
in HPF is the most instructive.  In HPF\_CRAFT there is a guarantee that
each processor contains one copy of the array and the values of the elements
of the array may vary between processors.   In HPF many, perhaps most, 
implementations will default to a copy of the array per processor, but
the values of these copies must remain in sync;  In HPF there is no way 
to write a conforming program in which the value varies.


{\bf example 2}

\begin{tabbing}
----------\=----\=----\=----\=---\=------------\=\kill
\> {\tt PRIVATE\_C = 0} \\
{\tt !HPF\$} \> {\tt INDEPENDENT (I, J) ON MAPPED\_B(I, J)} \\
\>  {\tt DO J=1,256} \\
\>\>  {\tt DO I=1,12} \\
\>\>\>  {\tt MAPPED\_B(I, J) = MAPPED\_B(I, J) + 5} \\
\>\>\>  {\tt PRIVATE\_C = PRIVATE\_C + MAPPED\_B(I, J)} \\
\>\>  {\tt ENDDO} \\
\>  {\tt ENDDO} \\
\end{tabbing}

Although example 2 is contrived, it shows the usefulness of the {\tt ON} clause
for the {\tt INDEPENDENT} loop as well as giving an example of how private
data may be used.   In this example, each iteration is executed on the
processor containing the data that is mapped to it.  The user was allowed
to specify this.

In addition, the private data {\tt PRIVATE\_C} is used to compute a total
for each processor.  At the end of execution of the loop, the values on
each processor of {\tt PRIVATE\_C} may be different depending upon the
values in the elements of the array on each processor.  This data may 
be used as is, or it can be quickly summed using a barrier or an 
{\tt ATOMIC\_UPDATE}.


{\bf example 3}

\begin{tabbing}
----------\=----\=----\=----\=---\=------------\=\kill
\> {\tt MAPPED\_C = 0} \\
{\tt !HPF\$} \> {\tt ATOMIC\_UPDATE} \\
\> {\tt MAPPED\_C = MAPPED\_C + PRIVATE\_C} \\
\>
\end{tabbing}
Example 3 shows the final total being combined into the variable
{\tt MAPPED\_C} which is available to all processors.

{\bf example 4}

\begin{tabbing}
----------\=----\=----\=----\=---\=------------\=\kill
\> {\tt IF (MY\_PE() .EQ. 5) THEN } \\
\>\> {\tt PRIVATE\_C = } {\em some-big-expression} \\
\> {\tt ENDIF} \\
\end{tabbing}
Example 4 shows how the language allows private data to vary from
processor to processor.   In this example, {\tt PRIVATE\_C} on processor 
5 will have the result of {\em some-big-expression}.  Each processor can do
distinctly different work and communicate through shared data.

{\bf example 5}

\begin{tabbing}
----------\=----\=----\=----\=---\=------------\=\kill
{\tt !HPF\$} \> {\tt GEOMETRY G(*, CYCLIC) }\\
\> {\tt REAL FX(100,100), FY(100,100), FZ(100,100) } \\
{\tt !HPF\$} \> {\tt DISTRIBUTE (G) :: FX,FY,FZ } \\
\> {\tt REAL FXP(100,16,100), FYP(100,16,100) } \\
{\tt !HPF\$} \> {\tt DISTRIBUTE FXP(*,*, BLOCK) FYP(*,*, BLOCK) } \\
\> {\tt INTEGER CELL, ATOM, MAP(1000), NACELL(1000) } \\
\\
{\tt !HPF\$} \> {\tt INDEPENDENT (CELL) ON FX(1,CELL) } \\
\> {\tt DO CELL=1,100 } \\
\>\>  {\tt JCELL0 = 16*(CELL-1) } \\
\>\>  {\tt DO NABOR = 1, 13 } \\
\>\>\>  {\tt JCELL = MAP(JCELL0+NABOR) } \\
\>\>\>  {\tt DO ATOM=1, NACELL(CELL) } \\
\>\>\>\>    {\tt FX(ATOM, CELL) = FX(ATOM, CELL) + FXP(ATOM, NABOR, JCELL) } \\
\>\>\>\>    {\tt FY(ATOM, CELL) = FY(ATOM, CELL) + FYP(ATOM, NABOR, JCELL) } \\
\>\>\>  {\tt ENDDO } \\
\>\>  {\tt ENDDO } \\
\>  {\tt ENDDO } \\
\end{tabbing}

The code fragment in example 5 is from an application and shows a few
features of the language.  The {\tt GEOMETRY} directive allows the user
to generically specify a mapping and use it to apply to many arrays (they
need not have the same extents.) 

Example 5 has a single {\tt INDEPENDENT} loop which is the outer loop.
It executes 100 iterations total.  Within this loop the 
private value of {\tt JCELL0}
is set for each processor (ensuring that it is a local computation everywhere.)
Nested insided the {\tt INDEPENDENT} loop is a private loop;  this loop executes
13 times {\em per} processor.  Inside this loop {\tt JCELL} is computed locally
on each processor, minimizing unnecessary communication.  Finally the innermost
loop is also private.  


\subsection{External Interface\label{sec:extern-interf}}

This section describes what happens when an HPF\_CRAFT Subprogram 
is called from HPF.

The calling convention and argument passing rules for HPF\_CRAFT are
a hybrid of those for HPF calling HPF\_LOCAL and HPF calling
HPF.  Explicit interfaces are required.  Where dummy arguments
are private (default) storage, the HPF calling HPF\_LOCAL 
conventions are used.  Where dummy arguments are explicitly mapped,
the calling convention matches HPF calling HPF.  

There are a number of constraints on HPF\_CRAFT subprograms
that are called from HPF.  The following is a list of restrictions
placed on HPF\_CRAFT subprograms called from HPF:
\begin{itemize}
\item Recursive HPF\_CRAFT subprograms cannot be called from HPF.
\item HPF\_CRAFT subprograms called from HPF may only enter the routine at a
      single place (no alternate entries) and return from a single place.
\item An HPF\_CRAFT supprogram may not be invoked directly or
      indirectly from within the body of a {\tt FORALL} construct or
      within the body of an {\tt INDEPENDENT DO} loop that is 
      inside an HPF program.
\item The attributes (type, kind, rank, optional, intent) of the dummy
      arguments in a supprogram called by HPF
      must match the attributes of the corresponding dummy 
      arguments in the explicit interface.  
\item A dummy argument of an HPF\_CRAFT supprogram called by HPF 
\begin{itemize}
\item     may not be a procedure name.
\item     may not have the {\tt POINTER} attribute.
\item     may not be sequential.
\item     must have assumed shape even when it is explicit shape in the 
          interface.
\item	  if scalar, it must be mapped so that each processor has a copy of 
          the argument.
\end{itemize}
\item The default mapping of scalar dummy arguments and of scalar function
      results when an HPF program calls an HPF\_CRAFT supprogram is that it
      is replicated on each processor.
\end{itemize}

If a dummy argument of an {\tt EXTRINSIC("HPF\_CRAFT")} supprogram interface
block is an array and the dummy argument of 
the HPF\_CRAFT supprogram has the default
private mapping, then the corresponding dummy argument in the
specification of the HPF\_CRAFT procedure must be an array of the same
rank, type, and type parameters.  When the extrinsic procedure is invoked,
the dummy argument is associated with the local array that consists of the
subgrid of the global array that is stored locally.

If the dummy argument of the HPF\_CRAFT supprogram
is explicitly mapped, it must have the same mapping as the dummy argument
of the {\tt EXTRINSIC("HPF\_CRAFT")} supprogram.   Note that this restriction
does not require actual and dummy arguments to match and is no more stringent
than saying that mappings of dummy arguments in interface blocks must 
match those in the actual subprogram.

\subsection{Execution Model\label{sec:exec-model}}

HPF\_CRAFT is built upon the fundamental execution model of HPF\_LOCAL,
augmented with data mapping and work distribution features from HPF.
It is also augmented with explicit low-level control features,
many taken from Cray Research's CRAFT language.

In HPF\_CRAFT there is a single task on each processor and 
all tasks begin executing in parallel, with data defaulting
to a private distribution, the same default distribution used in HPF\_LOCAL.
Each processor gets a copy of the data storage unless specified otherwise by 
the user.
Consequently I/O works identically to I/O in HPF\_LOCAL and 
message passing libraries are easily integrated.

Simply stated, the execution model is that of HPF\_LOCAL.

To provide correct behavior when explicitly mapped data is involved, 
this model defines implicit barrier points at which
the execution model requires that all processors must stop and wait 
for the execution of all
other processors before continuing.  These barriers add additional
semantics to the HPF\_LOCAL behavior, but are only a small
subset of the implicit barriers in a comparable HPF program.
An implementation may remove 
many of these barriers where they are deemed unnecessary, but {\em every} 
processor must participate in the barriers at each one of these points.

The points where there are implicit barriers are conceptually after
those instances
in which the processors in the HPF\_CRAFT program are executing 
cooperatively, as if in an HPF program.  An HPF\_CRAFT program treats
operations on explicitly mapped objects as if they were operations in
an HPF program and it treates operations on private data as if they
were executed within the HPF\_LOCAL framework.
It is occasionally useful for an advanced programmer to indicate 
to the compilation system where barriers are not needed; HPF\_CRAFT
has syntax to allow this capability.


\subsection{HPF\_CRAFT Functional Summary}

HPF\_CRAFT contains a number of features not available in HPF, and 
restricts the usage of many of the features currently available.  
The following is a concise list of the differences.

\begin{itemize}
\item {\tt INDEPENDENT} has been extended to better support {\tt ON}.
\item There are new rules defining the interaction of explicitly 
      mapped and private data.
\item Parallel inquiry intrinsics {\tt IN\_PARALLEL()} and 
      {\tt IN\_INDEPENDENT()} have been added.
\item Serial regions ({\tt MASTER / END MASTER}) have been added.
\item Explicit synchronization primitives are available.
\item The {\tt ATOMIC\_UPDATE}, {\tt SYMMETRIC}, and {\tt GEOMETRY} directives have been added.
\item Many other compiler information directives have been added to assist
      the compiler in producing good quality code.
\end{itemize}


\subsubsection{Data Mapping Features}

Data mapping features provided are those that have been found useful most 
often.  These are derived from the chapter on coding for portable
performance in the HPF specification.

When data is explicitly mapped, only one copy of the data storage
is created unless the explicit mapping directs otherwise.  The
value of explicitly mapped replicated data items must be consistent
between processors as is the case in HPF.  
Storage and sequence association for explicitly mapped arrays is
not guaranteed in HPF\_CRAFT.  For private data, storage and sequence
association follows the F90 rules.

A new directive is included for completeness: {\tt PE\_PRIVATE}, which
specifies that the data should conform to the default behavior.
Private values of the same data item on different processors may vary.  


\subsubsection{Subprogram Interfaces}

The behavior and requirements of an HPF\_CRAFT program at subprogram 
interfaces may be divided into three cases.  Each case is also
available using some combination of HPF and HPF\_LOCAL.
For dummy arguments that are explicitly mapped, the behavior
is identical to that of HPF.  
All processors must co-operate in a subprogram
invocation that remaps or explicitly maps data.  In other words, if
an explicit interface is required (by the the HPF rules) or the 
subprogram declares explicitly mapped data, the subprogram must be called on
all processors.  Processors need not co-operate if there are only 
reads to non-local data.  The {\tt INHERIT} attribute may only be
applied to explicitly mapped data.

Data that has the default private mapping (case two)
the behavior of an HPF\_CRAFT subprogram at subprogram interfaces
is identical to that of HPF\_LOCAL.
Data is passed individually on every processor and the
processors need not interact in any way.   

When a subprogram is passed actual arguments that are a combination
of both explicitly mapped data and private data, the explicitly mapped
data follows the HPF rules and the private data follows the HPF\_LOCAL
rules.

In case three, the user has the option of passing data with explicitly
mapped actual arguments to dummy arguments that are not explicitly 
mapped (i.e. private.)  The mapping rules for this data 
are identical to the mapping
rules when HPF calls an HPF\_LOCAL subprogram.  The data remains ``in-place".
All HPF arrays are logically
carved up into pieces; the HPF\_CRAFT procedure executing on a 
particular physical processor sees an array containing just those
elements of the global array that are mapped to that physical 
processor.

Finally, it is undefined for an actual argument to be private and
the dummy argument to be explicitly mapped.   A definition could 
be supplied for this interaction, but it is the same solution that 
one might propose for a calling sequence when HPF\_LOCAL subprograms call 
HPF subprograms.  

\subsubsection{The {\tt INDEPENDENT} directive}

The {\tt INDEPENDENT} directive is part of HPF\_CRAFT with the same semantics
as in HPF.  However, within {\tt INDEPENDENT} loops
the values of private data may vary from processor to processor.
{\tt INDEPENDENT} applied to {\tt FORALL} has identical syntax and 
semantics as in HPF.

An HPF independent loop optionally may have a {\tt NEW} clause. The {\tt NEW}
clause is not required by HPF\_CRAFT for default (not explicitly mapped)
data. In HPF\_CRAFT data defaults to
private so values may differ from processor to processor.

Private data has slightly different behavior than
data specified in the {\tt NEW} clause.  
The value of a private datum on each processor can be used beyond a single
iteration of the loop. 
Private data may be used to compute local sums, for example.  
The values of data items named in a {\tt NEW} clause
may not be used beyond a single iteration. The {\tt NEW} clause asserts that
the {\tt INDEPENDENT} directive would be valid if new objects were created for
the variables named in the clause for each iteration of the loop.
The semantics of the {\tt NEW} clause are identical in
HPF\_CRAFT and HPF. 

The meaning of {\tt INDEPENDENT} when applied to loops containing private data
references
changes with respect to the private data.  The change can be
summarized to say that instead of indicating that iterations have no
dependencies upon one-another, with respect to the private data, iterations 
on different processors have no dependencies upon one-another.

\subsubsection{{\tt ON} clause}

In addition to the version of {\tt INDEPENDENT} available from HPF,
a new version of {\tt INDEPENDENT} is included that incorporates
the {\tt ON} clause. There are a number of differences between the 
versions of {\tt INDEPENDENT} with and without the {\tt ON} clause.

The new version of the {\tt INDEPENDENT} directive may be applied to the 
first of a group of tightly nested loops and may apply to more than one 
of them. This more easily facilitates the use of the {\tt ON} clause. 
The current {\tt INDEPENDENT} directive applies only to a single loop nest.  
The {\tt INDEPENDENT} directive is extended so that multiple loop nests can
be named.

The general syntax for these new independent loops is as follows:
\begin{tabbing}
---------\=---------\=---\=---\=---\=------------\=\kill
\>{\tt !HPF\$} \> {\tt INDEPENDENT} (I_1,I_2,\ldots,I_n) 
               {\tt ON} {\em array-name}(h_1(I_1),h_2(I_2),\ldots,h_n(I_n)) \\
\>       \> {\tt DO} I_1 = L_1, U_1, S_1       \\
\>       \>\> {\tt DO} I_2 = L_2, U_2, S_2     \\
\>       \>\>\> {\tt DO} I_n = L_n, U_n, S_n \\
\>       \>\>\>\> . . .                              \\
\>       \>\>\> {\tt END DO}                            \\
\>       \>\> {\tt END DO}                              \\
\>       \> {\tt END DO}
\end{tabbing}


The syntax and semantics of {\tt INDEPENDENT} with the {\tt ON} 
clause are different from its syntax and semantics without the {\tt ON} 
clause. With the
{\tt ON} clause the directive states that there are no cross-processor
dependencies, but there may be dependencies between iterations on a
processor. 

The iteration space of an {\tt INDEPENDENT} nest must be rectangular.  
That is, the lower loop bound, the upper loop bound, and the step 
expression for each loop indicated by the {\tt INDEPENDENT} induction 
list must be invariant with regard to the {\tt INDEPENDENT} nest.  
Each index expression of {\em array-name} in the
{\tt ON} clause (the functions {\em h_i} above,) 
must be of the form 
\begin{tabbing}
---------\=---------\=---\=---\=---\=------------\=\kill
\>{\bf [ }{\em a }{\tt * }{\tt loop\_control\_variable }{\tt + }{\bf ] }{\em b}
\end{tabbing}
where {\em a} and {\em b} must be integer values; they can be 
expressions, constants,
or variables. The values of {\em a} and {\em b} must be
invariant with regard to the {\tt INDEPENDENT} loop nest.  
For example, specifying {\tt A(I,J,K)} is valid.  Specifying {\tt A(3,I+J,K)} 
is not valid.  Specifying {\tt A(I,I,K)} is not valid because I appears twice.
Division is prohibited in any index expression of the {\tt ON} clause.  


\subsubsection{Array Syntax}

Array syntax is treated identically in HPF\_CRAFT as in HPF for 
explicitly mapped objects.   
For private objects the behavior is 
identical to that of HPF\_LOCAL.   When private objects and 
explicitly mapped objects are combined the rules are as follows:
\begin{tabbing}
---------\=---------\=---\=---\=---\=------------\=\kill
\> {\em result} = {\em rhs_1} op_1 {\em rhs_2} op_2 ... op_m {\em rhs_n}
\end{tabbing}
\begin{itemize}
\item If {\em result} is explicitly mapped and all {\em rhs} arrays are 
      explicitly mapped, the work is distributed as in HPF.

\item If {\em result} is private and all {\em rhs} arrays are private the
      computation is done on all processors as an HPF\_LOCAL program
      would do it.

\item If {\em result} is private and all {\em rhs} arrays are explicitly 
      mapped, the 
      work is distributed as in HPF and the values of the results are 
      broadcast to the {\em result} on each processor.

\item If {\em result} is explicitly mapped and {\em not} all {\em rhs} 
      arrays are explicitly mapped, the results of the operation are undefined.
     
\item If {\em result} is private and some, but not all {\em rhs} arrays are 
      explicitly mapped, the value is computed on each processor
      and saved to the local {\em result}.

\end{itemize}

All processors must participate in any array syntax 
statement in which the value of an explicitly mapped array is modified.


\subsubsection{{\tt FORALL} and {\tt WHERE} }

The {\tt FORALL} and {\tt WHERE}
statements are treated exactly as in HPF when data is
explicitly mapped.  When data is private, the statement is executed
separately on each processor.  Finally, when data in a {\tt FORALL} is
mixed, the rules for array syntax apply.   If any explicitly mapped
data item is modified in a {\em forall-stmt} then arrays in the 
{\em forall-header} must be explicitly mapped.  In a {\tt FORALL}
construct, if any explicitly mapped array is modified, all modified
arrays must be explicitly mapped.


\subsubsection{Synchronization Primitives}

A number of synchronization primitives are provided.
These primitives include:
\begin{tabbing}
---------\=---------\=---\=---\=---\=------------\=\kill
\> Barriers (test, set, wait)\\
\> Locks (set, wait, clear)\\
\> Critical Sections \\
\> Events (test, set, wait, clear)
\end{tabbing}

Barriers provides a mechanism for a task to indicate its arrival
at a program point.  Other tasks may test and optionally wait
at this barrier point.
In the following example, a barrier is used to make sure that {\em block3}
is not entered by any task until all tasks have completed execution of
{\em block1}.

\begin{tabbing}
---------\=---------\=---\=---\=---\=------------\=\kill
\> {\em block1} \\
\> {\tt CALL SET\_BARRIER()} \\
\> {\em block2} \\
\> {\tt CALL WAIT\_BARRIER()} \\
\> {\em block3} 
\end{tabbing}

Locks are used to prevent the simultaneous access of data by multiple
tasks.  
The {\tt SET\_LOCK(}{\em lock}{\tt)} intrinsic sets the mapped
value {\em lock} atomically.  If the lock is already set, the task 
that called {\tt SET\_LOCK} is suspended until the lock is cleared by 
another task and then sets it.  Individual locks may be tested or cleared.

A {\em critical section} prohibits access to a section of code rather
than to a data object.  
The {\tt CRITICAL} directive marks the beginning of a code region in 
which only one task can enter at a time.  The {\tt END\_CRITICAL} directive 
marks the end of the critical section.

Events are typically used to record the state of a program's execution
and to communicate that state to another task.  Because they do not set
locks, as do the lock routines described earlier, they cannot easily be
used to enforce serial access of data.  They are suited to work such as
signalling other tasks when a certain value has been located in a search
procedure.  There are four routines needed to perform the event functions.

The {\tt SET\_EVENT} routine
sets or {\em posts} an event; it declares that an action has been
accomplished or a certain point in the program has been reached.  A
task can post an event at any time, whether the state of the event 
is cleared or already posted.


\subsubsection{Serial Regions}

It is often useful to enter
a region where only one task is executing.  This is particularly 
useful for certain types of I/O.   To facilitate this, two directives
are provided.
In addition, one may optionally attach a {\tt COPY} clause to the 
{\tt END MASTER} directive which specifies the private 
data items whose
values should be broadcast to all processors.  The syntax of this 
directive is:
\begin{tabbing}
---------\=---------\=---\=---\=---\=------------\=\kill
\> {\tt !HPF\$     } \> {\tt MASTER }\\
\>\>{\em sequential region} \\
\>\> ... \\
\> {\tt !HPF\$     } \> {\tt END MASTER [, COPY (} {\em var_1}{\tt [,} {\em var_2}{\tt, ... ,}{\em var_n} {\tt ])]} 
\end{tabbing}
where {\em var} is private data to be copied to the same named
private data on other processors.

If a subroutine call occurs within a serial region, the subroutine
executes serially; there is no way to get back to parallel execution
within the subroutine.  All explicitly mapped data is accessible from 
within subroutines called in a serial region, but a subroutine called
from within a serial region cannot declare explicitly mapped data
or remap data.
All processors must participate in the invocation of the serial region.
No branches are allowed into or out of serial regions, {\tt INDEPENDENT}
loops, or critical sections.

\subsubsection{Libraries}
The HPF Local Routine Library is available in HPF\_CRAFT.
The HPF\_LOCAL extrinsic environment contains a number of libraries
that are useful for local SPMD programming and a number of libraries
that allow the user to determine global (rather than local) state
information.  These library procedures take as input the name of
a dummy argument and return information on the corresponding global
HPF actual argument.  They may only be invoked by an HPF\_CRAFT
procedure that was directly invoked by global HPF code.  They may
be called only for private data.   The libraries reside in a module
called {\tt HPF\_LOCAL\_LIBRARY}. 

The HPF Library is available to HPF\_CRAFT when called with data that is
explicitly mapped and all processors are participating in the call.
In addition, as in HPF\_LOCAL, the entire HPF Library is available for
use with private data.  Mixing private and explicitly mapped data in
calls to the HPF library produces undefined behavior.

\subsubsection{Parallel Inquiry Intrinsics}

These directives are provided as an extension to HPF.  They provide
information potentially useful to the programmer about the state of
execution in a program.

\begin{tabbing}
---------\=---------\=---\=---\=---\=------------\=\kill
\> {\tt IN\_PARALLEL()} \\
\> {\tt IN\_INDEPENDENT()} 
\end{tabbing}

\subsubsection{Task Identity}

{\tt MY\_PE()} may be used to return the local processor number.  
The physical processors are
identified by an integer in the range of 0 to {\em n-1} where {\em n}
is the value returned by the global HPF\_LIBRARY function 
{\tt NUMBER\_OF\_PROCESSORS}.  Processor identifiers are returned
by {\tt ABSTRACT\_TO\_PHYSICAL}, which establishes the one-to-one
correspondence between the abstract processors of an HPF processors
arrangement and the physical processors.  Also, the local library
function {\tt MY\_PROCESSOR} returns the identifier of the task 
executing the call.


\subsubsection{Parallelism Specification Directives}

These directives allow a user to assert that a subroutine will only be
called from within a parallel region, a serial region, or from within
both regions.  Without these directives an implementation might be
required to generate two versions of code for each subroutine, depending
upon implementation strategies.  The directives simply make the 
generated code size smaller and remove a test.

\begin{tabbing}
---------\=---------\=---\=---\=---\=------------\=\kill
\> {\tt !HPF\$ PARALLEL\_ONLY} \\
\> {\tt !HPF\$ SERIAL\_ONLY} \\
\> {\tt !HPF\$ PARALLEL\_AND\_SERIAL} 
\end{tabbing}

\subsubsection{{\tt SYMMETRIC}}

{\tt SYMMETRIC} data is private data that is guaranteed to be at the
same storage location on every processor.  The feature is obviously
tied to certain implementations, but does make one-way communication
routine functionality
much easier to deal with.


\subsubsection{{\tt RESIDENT}}

The {\tt RESIDENT} directive can be applied to loops and at the
subroutine level.  It is an assertion that the accesses to a particular
variable in the subroutine (or loop) are only accesses to data that is
local to the processor making the assertion.  


\subsubsection{{\tt ATOMIC\_UPDATE}}

In HPF\_CRAFT, the {\tt ATOMIC\_UPDATE} directive tells the compiler
that a particular data item or the elements of a particular array for
a specified operation must be updated atomically.  This can be used
within loops or in array syntax and applies to both the elements of
an array with an assignment of a permutation and the elements of 
an array within a loop.


\subsubsection{{\tt GEOMETRY}}

The {\tt GEOMETRY} directive is like a mapping typedef, allowing the
user to conveniently change the mappings of many arrays at the same
time.  It is similar in many ways to the {\tt TEMPLATE} directive, but
since it is bound to no particular extent it is easier to apply in a
general way.  

\begin{tabbing}
---------\=---------\=---\=---\=---\=------------\=\kill
\> {\tt !HPF\$     } \> {\tt GEOMETRY} {\em (geom)}{\tt(}{\em d_1} {\tt[}{\em , d_2, ..., d_n}{\tt])} \\
\> {\tt !HPF\$     } \> {\tt DISTRIBUTE} {\em geom} [::] {\em var_1{\tt[}, var_2, ... , var_m}{\tt]} 
\end{tabbing}

Where {\em d_i} indicates one of the allowable distribution formats.

---------------------------------------------------------------------------
To (un)subscribe to this list, send mail to hpff-doc-request@cs.rice.edu.
Leave the subject line blank, and in the body put the line
(un)subscribe <email-address>
---------------------------------------------------------------------------

