From owner-hpff-core  Thu May  2 00:04:08 1996
Received: (from daemon@localhost) by cs.rice.edu (8.7.1/8.7.1) id AAA05385 for hpff-core-out; Thu, 2 May 1996 00:04:08 -0500 (CDT)
Date: Thu, 2 May 1996 00:04:08 -0500 (CDT)
Message-Id: <199605020504.AAA05385@cs.rice.edu>
From: Chuck Koelbel <chk@cs.rice.edu>
Subject: hpff-core: ON clause proposal
Sender: owner-hpff-core
Precedence: bulk

---------------------------------------------------------------------------
hpff-core@cs.rice.edu is a mailing list for announcements related to High
Performance Fortran.  Instructions for adding or deleting yourself
from this list appear at the bottom of this message.
---------------------------------------------------------------------------

Here's the latest ON proposal.  Hopefully, we'll discuss this
in the HPFF meeting tomorrow and (really hoping now...) all this
will ge incorporated in a comprehensive draft REAL SOON NOW.

Comments, as always, are welcome.

						Chuck

The ON Clause

Charles Koelbel and Rob Schreiber

May 1, 1996


Changes since the January draft

* More complete description of what ON means.

* ON sets of processors propagate through CALL statements.  An explicit 
interface is required to allow this to be compiled.

* Revised constraints on explicit mappings within the scope of an ON 
directive:
	Local variables can only be mapped to ON processor set members
	Global variables can be mapped on any processor
	ALLOCATE can only create storage within the ON processor set

* An ON directive can have a NEW clause.

* RESIDENT makes an assertion about a lexical expression rather than about a 
range of memory locations.

* RESIDENT allowed as a separate directive from ON.

* Clarified the meaning of RESIDENT in the following cases:
	A variable is assigned to
	A variable's distribution has overlap areas

* RESIDENT assertions propagate through calls, allowing more compiler 
optimiization.


The purpose of the ON directive is to allow the programmer to control the 
distribution of computations among the processors of a parallel machine.  
In a sense, this is the dual of the DISTRIBUTE and ALIGN directives for 
data.  Modern parallel machines will achieve their best performance if all 
operations are performed by separate processors (as specified by ON) with 
each processor accessing its local data (as specified by DISTRIBUTE).  
If either explicit computation partitioning or explicit data mapping is not 
present, the system must choose one.

A secondary purpose of the ON directive is to identify data accessed by 
the computation as local to the executing processor.  The compiler can use 
this information to avoid generating communication or to simplify array address 
calculations.  Note that whether a given data element is local depends on 
two facts:
	Where the data is stored (i.e. its DISTRIBUTE and ALIGN attributes)
	Where the computation is executed (i.e. its ON directive)
For these reasons, the LOCAL clause is added to the ON directive, which is 
effectively the earliest point in the program where the needed facts might 
be available.  Note that changing the ON directive may invalidate some 
LOCAL clauses, or may make some new LOCAL clauses true.


Syntax

There are three flavors of the ON directive: a single-statement form, a 
multi-statement form, and a declaration form.  The RESIDENT directive can 
be part of an ON directive, or it can stand alone in either 
single-statement or multi-statement form.  The BNF for these directives is

	simple-on-directive	IS	ON HOME ( home-expr ) [ , resident-clause ] 
		[, new-clause ]
	decl-on-directive IS ENVIRONMENT :: simple-on-directive
	block-on-directive	IS	simple-on-directive BEGIN
	on-block	IS	
		block-on-directive
		block
		end-directive
	
	
	resident-clause	IS	RESIDENT [ ( local-var-list ] ) ]
	simple-resident-directive IS resident-clause
	block-resident-directive IS resident-clause BEGIN
	resident-block IS
		block-resident-directive
		block
		end-directive

	
	end-directive	IS	END [ BEGIN ]
	
	home-expr	IS	variable
		OR	template-elmt
		OR	processors-elmt
	
	template-elmt	IS	template-name [ ( section-subscript-list ) ]
	
	processors-elmt	IS	processors-name [ ( section-subscript-list ) ]
	
	local-var	IS	variable

Notes:

home-expr, template-elmt, and processors-elmt are just auxiliary syntax 
categories.  Note that "variable" is an F90 syntax term that means 
(roughly) "a reference, including an array element, array section, or 
derived type field"; "variable" doesn't include template or processor 
elements since those aren't first-class language constructs.  Note also 
that "block" is an F90 syntax term for "a series of statements treated as 
a group" - for example, the body of a DO construct.

simple-on-directive will be an option under executable-directive (HPF 
syntax term), as redistribute-directive is now.  This means that 
simple-on-directive can appear where an executable statement could.

decl-on-directive will be an option under specification-directive, as 
distribute-directive is now.  This means that decl-on-directive can appear 
where a declaration can.  Any expressions in home-expr or resident-clause 
of decl-on-directive must be specification expressions; this is not a 
constraint on simple-on-directive.

on-block most naturally fits as a F90 executable-stmt, since that makes 
constraints about "properly nesting" easier to state.  

Similarly, simple-resident-directive will be an option under 
executable-directive, and resident-block fits as an F90 executable-stmt.

Any top-level variables in the RESIDENT clause must be explicitly 
mapped.  Otherwise, the assertion (see below) is a statement about how the 
compiler works, not about the program.


Rationale:
The keyword HOME is required, even when its argument is a processor.  It 
seems more natural to eliminate the HOME keyword there, but this leads to 
the following ambiguity:

	INTEGER X(4)	! X(I) will be on processor I
	!HPF$ PROCESSORS HOME(4)
	!HPF$ DISTRIBUTE X(BLOCK)
	X = (/ 4,3,2,1 /)
	!HPF$ ON HOME(X(2))
	X(2) = X(1)

If HOME were not required, where should the computation be done:
1. Processor HOME(2) (i.e. the owner of X(2))?
2. Processor HOME(3) (i.e. the value of X(2), before the assignment)?
3. Processor HOME(4) (i.e. the value of X(2), after the assignment)?
The definition of ON clearly indicates interpretation 1 is correct.  One 
can get the effect of interpretation 2 by the directive
	!HPF$ ON HOME(HOME(X(2)))
There is no way to get the effect of interpretation 3, short of building a 
clairvoyant computer.  Introducing reserved keywords into Fortran was 
suggested as a better solution to this problem, but was seen as too large a 
change to the underlying language.

Separate syntax is needed for simple-on-directive and decl-on-directive 
to avoid ambiguity.  If the same directive can be used as both both an 
executable statement and a declaration, its role is unclear if it is the 
last statement in the declarations/first statement in the executable 
section.  This is particularly bad since the "scope" of the directive is 
rather different in the two uses.  That said, I'd be grateful if somebody 
could suggest a better declaration syntax.
End of rationale.

 
Semantics

The ON directive advises the compiler to perform a computation using the 
processor(s) named in the argument to HOME.  It has three forms:
	* simple-on-directive and on-block, which apply to the next executable 
	  statement or a block of executable statements, respectively.
	* decl-on-directive, which is a declaration for an entire scope.
Like the mapping directives ALIGN and DISTRIBUTE, this is advice rather 
than an absolute commandment; the compiler may override an ON directive if 
necessary.  Also like ALIGN and DISTRIBUTE, the ON directive may affect the 
efficiency of computation, but not the final results.

Advice to implementors:
If the compiler may override the user's advice in an ON directive, then 
the compiler should also offer the user an option to force all directives 
to be obeyed.
End advice to implementors.

The single-statement ON directive (i.e.  simple-on-directive) precedes the 
statement for which it is advising behavior, and is said to apply to that 
statement.  If the statement is a compound statement (e.g.  a DO loop or an 
IF-THEN-ELSE construct), then the ON directive also applies to all nested 
statements therein.  Similarly, the block ON directive (i.e.  on-block) 
applies the initial ON clause to all statements up to the matching END 
directive.  The declarative form of the ON directive (i.e.  
decl-on-directive) applies to all statements and declarations in a scope; 
its intended use is for functions and subroutines that may be called on a 
subset of the processors.

The HOME clause can name a program variable, a template, or a processors 
arrangement.  For each of these possibilities, it can further specify a 
single element or multiple elements.  This is translated into the 
processor executing the computation as follows:
*	If the HOME clause names a program variable (that is, an array element 
	or section), then every processor owning any part of that variable 
	should execute the computation.  For example, if a is a distributed array
		!HPF$ ON HOME ( a(2:4) )
	tells the compiler to perform the statement on the processors owning 
	a(2), a(3), and a(4).  If a were distributed BLOCK, this might be one 
	processor; if a were distributed CYCLIC, it would be three processors.
*	If the HOME clause names a template element or section, then every 
	processor owning any element of the template should execute the computation.
	The example above applies here as well, if a is a template rather than an 
	array.
*	If the HOME clause names a processors arrangement, then the processor(s) 
	referenced there should execute the computation.  For example, if p 
	is a processors arrangement,
		!HPF$ ON HOME ( p(2:4) )
	will execute the statement on the three processors p(2), p(3), and p(4).
For the executable forms of the directive, the value(s) of the HOME clause 
are determined as if the HOME argument was evaluated when control flow 
reached the ON directive.  For the declarative form, the values are 
determined on entry to the scope.  That is, if the value of a variable used 
in the HOME clause changes within the on-block, work is not migrated to a 
new processor.  If evaluation of the argument of the HOME clause would 
change the value of any variable in the program, then the value of that 
variable becomes undefined after the ON clause is reached.  

In every case, the ON directive specifies the processor(s) which should 
perform a computation.  This means that these processors should execute all 
operations required to execute the statements _except_ (perhaps) for the 
initial access of data.  For example, consider the directive and statement
	!HPF$ ON HOME( Z(2) )
	X(1) = X(1) + Y(2) * Z(3)
Assume that X,Y, and Z are distributed in some way.  Following the ON 
directive, the statement would be executed as follows:
1.	The values of X(1), Y(2), and Z(3) are made available to the 
	processor owning Z(2).  Depending on the hardware running the program, 
	this might correspond to one processor loading registers from memory, or 
	it might mean three processors sending messages (and one processor 
	receiving them).
2.	Processor HOME(Z(2)) performs an addition and a multiplication, using the 
	values sent above.  The data movement in step 1 allows these operations 
	to be performed locally.
3.	The result is stored to X(1), which may be on a different processor 
	than Z(2).  Again, this may require synchronization or other 
	cross-processor operations.
Note that the HOME clause only specifies how computation is partitioned 
among processors; it does not indicate processors that may be involved in 
data transfer.  Also, the ON clause by itself does not guarantee that its 
body can be executed in parallel with any other operation.  However, 
placing the computation can have a significant effect on data locality; as 
later examples will show, the combination of ON and INDEPENDENT can also 
provide control over load balancing parallel computations.

Rationale:
This is the heart of the ON clause.  It defines where computation is 
performed, but not (by itself) what data are accessed.

The "as if" wording and making side effects undefined avoids the following 
problem: ON is a directive.  Therefore, it cannot have side effects.  But 
implementing the ON clause may require evaluating some functions, therefore 
causing the side effects.
End of rationale.

Advice to implementors:
If the HPF program is compiled into Single-Program-Multiple-Data (SPMD) 
code, then the ON clause can always be implemented (albeit inefficiently) 
by having all processors compare their processor id to an id (or list of 
ids) generated from the HOME clause.  (Similar naive implementations can be 
constructed in other paradigms as well.) If the ON clause will be executed 
repeatedly, for example in a DO loop, it is probably worthwhile to invert 
this process.  That is, instead of all processors executing all the HOME 
clause tests, the compiler should determine the range of loop iterations 
that will give a true test.  (See the "Advice to implementors" in the 
Examples section below for more details.) For example, consider the 
following complex case:
	DO I = 1, N
		!HPF$ ON HOME( A(MY_FCN(I)) ) BEGIN
		...
		!HPF$ END ON
	END DO
Here, the generated code can perform an "inspector" (i.e.  a skeleton loop 
that only evaluates the HOME clause of each iteration) to produce a list of 
iterations assigned to each processor.  This list can be produced in 
parallel, since MY_FCN must be side-effect free; however, distributing it 
to all processors may require unstructured communications patterns, 
possibly negating the advantage of parallelism.  In general, more advanced 
compilers will be able to efficiently invert more complex HOME clauses.  It 
is recommended that the abilities (and limitations) of a particular 
compiler be documented clearly for users.

Note that processors "screened out" by the naive implementation may still 
be required to participate in data transfer.  (The RESIDENT clause can 
eliminate this, but it is optional.)  If the underlying architecture 
allows one-sided communication, this is not a problem.  On traditional 
message-passing machines, a request-reply protocol may be used.  This 
requires the inactive processors to enter a wait loop until the ON block 
completes, or requires the runtime system to handle requests 
asynchronously.  Again, it is recommended that the documentation tell 
programmers which cases are likely to be efficient and which inefficient 
on a particular system.
End advice to implementors.

Advice to programmers:
The argument to HOME in an ON clause can be arbitrarily complex.  This is a 
two-edged sword; it can express very complicated computation partitioning, 
but the implementation of these partitions may not be efficient.  More 
concretely, it may express a perfectly load-balanced computation, but force 
the compiler to serialize the computation of the HOME clauses.  Although 
the amount of overhead for an ON clause will vary based on the HPF code, 
the compiler, and the hardware, one can expect that compilers will generate 
very good code based solely on array mappings or a named processor 
arrangement, and progressively worse code as the complexity of the HOME 
clause increases.  A rough measure of the complexity of an ON directive is 
the amount of run-time data used to compute it; for example, a constant 
offset is fairly simple, while a permutation array is very complex.  See 
the Examples section below for more concrete examples of this phenomenon.

It is worth noting that the ON clause alone does not address data movement.  
(The RESIDENT clause does this.) Therefore, on some machines additional 
processors will have to enter the ON block to take part in communication.

It should also be noted that the ON clause does not change the semantics 
of a program, in the same sense that DISTRIBUTE does not change 
semantics.  In particular, an ON clause _by itself_ does not change 
sequential code into parallel code, because the code in the ON clause can 
still interact with code outside the ON clause.  (To put it another way, ON 
does not spawn processes.)  
End of advice to programmers.

It is legal to nest ON directives, provided that the processors named by 
the inner ON directive are also named by the outer directive.  The syntax 
of on-block automatically ensures that it is properly nested inside other 
compound statements, and that compound statements properly nest inside of 
it.  As with other Fortran 90 compound statements, transfer of control to 
the interior of an on-block from outside the block is prohibited, while 
transfers within a block may occur.  However, HPF also prohibits transfers 
of control from the interior of a on-block to outside the on-block.  Note 
that this is stricter than Fortran 90.  If ON clauses are nested, then the 
innermost HOME clause effectively controls execution of the statement(s).  
A programmer can think of this as successively restricting the set of 
processors at each level of nesting; clearly, the last restriction must be 
the strongest.  Alternately, the programmer can think of this as a 
fork-join approach to nested parallelism.

Rationale:
The restrictions about control flow into and out of an ON block essentially 
make it a single-entry single-exit region, thus simplifying the semantics 
considerably.
End Rationale.


It is legal for an ON directive to apply to a CALL statement or a statement 
containing a function invocation.  The ON directive tells the compiler 
which processors should execute the statements in the procedure body.  Any 
procedure which is called from a statement controlled by an ON directive 
must have an explicit interface in the caller.  Furthermore, the interface 
must include a valid declarative ON directive which applies to the entire 
called procedure.  (I.e.  If the interface is though an INTERFACE block, 
then the INTERFACE specification must include the ENVIRONMENT :: ON 
clause.)  The treatment of ON clauses in the interface is similar to that 
for a compound statement.  Any processor named by an ON directive inside 
the subroutine must be included by the ON directive in the caller.  In 
particular, the declarative ON directive in the interface must name a 
subset of the processors in the caller's ON clause, and any ON clauses in 
the subroutine itself can only restrict the declarative ON directive's 
processor set further.  If ON clauses appear in a subroutine call chain, 
then the deepest ON clause (i.e.  the one in the last called routine that 
is still active) effectively controls the computation.

Note that EXTRINSIC procedures always have an explicit interface, and thus 
will require an ON declaration if they are called from an ON block.  In 
this case, the effect of the call is still as if the procedure were called 
on all processors.
 
If the procedure uses alternate return, then the target of the return must 
be in the same ON block as the CALL statement.  This implies that labels 
passed as arguments must refer to statements in the same ON block as the 
CALL statement.

Rationale:
It is natural to treat a procedure call within an ON clause as if it were 
an in-lined program block.  This leads to "propagating" the ON directive 
into the procedure.  Making the ON directive explicit in the callee allows 
separate compilation (i.e.  the compiler need not perform interprocedural 
analysis in order to select the calling convention).

The constraint on alternate return is similar to the prohibition against 
jumping out of an ON block, and has the same justification.
End of Rationale.

Note to language designers:
I just had an epiphany:
	* The declarative ON constrains the declarations to only map to the ON 
	  processor set.
	* Global data must be declared.
	* Therefore, only global data stored on the ON processor set can be 
	  accessed.
	* Therefore, we don't need one-way communication for global data inside 
	  CALLS from ON clauses.
	* Similar reasoning applies to dummy arguments.
Somebody check this.  If this really finesses the one-way communications 
problem, then we need to point it out!  (It's also a pretty significant 
restriction on HPF expressibility - can't get to global data sometimes.)
	
Is this what we want for EXTRINSIC procedures?  On one hand, ON and 
EXTRINSIC ought to be orthogonal.  On the other, it seems more natural that 
an EXTRINSIC called within an ON would synchronize only the processors 
from the ON, not all the processors.  But that would mean that ON would 
change the semantics of EXTRINSIC (maybe...)

We could disallow alternate returns from procedures called from ON 
clauses.  I'd sort of prefer that, but in the spirit of backward 
compatibility I've just specified the minimum (I think) constraints.
End of note to language designers.

Advice to implementors:
Notice that nested ON clauses do not present additional problems for the 
naive implementation.  If a processor failed the outer HOME comparison, 
then it would fail any tests inside the ON as well.  Thus, it need not even 
make the tests.

The difficulties of implementing one-way communication on message-passing 
machines remain, however.
End of advice to implementors.

Advice to users:
A CALL statement from within an ON block effectively executes the 
subroutine on a subset of processors.  In conjunction with other 
declarations, this allows a fork-join style of parallelism.

Note that the earlier advice regarding data movement still applies.  
Message-passing machines may need to perform the call statement on many 
(even all) processors in order to properly exchange data between 
processors.
End of advice to users.

Operations controlled by an ON clause must follow certain restrictions:

* If an ON directive applies to a DISTRIBUTE directive whose distributee is 
not a global variable and does not have the SAVE or ALLOCATABLE attribute, 
then the set of processors named in the HOME clause must include
	- All processors named in the DISTRIBUTE's ONTO clause if one is present
	- All processors in the default processors arrangement, if there is no 
	  ONTO clause
* If an ON directive applies to a ALIGN directive whose alignee is not a 
global variable and does not have the SAVE or ALLOCATABLE attribute, then 
the set of processors named in the HOME clause must include
	- All processors that store any element of the align-with-clause
* If an ON directive applies to a REDISTRIBUTE directive, then the set of 
processors named in the HOME clause must include 
	- All processors that stored any element of the distributee before 
	  the REDISTRIBUTE was encountered
	- The processors required by an equivalent DISTRIBUTE directive
* If an ON directive applies to a REALIGN directive, then the set of 
processors named in the HOME clause must include  
	- All processors that stored any element of the alignee before 
	  the REALIGN was encountered
	- The processors required by an equivalent ALIGN clause
* If an ON directive applies to an ALLOCATE statement which creates an 
explicitly mapped variable, then the set of processors named in the HOME clause
must include the processors required by the mapping directive for the 
allocated variable.  Simiilarly, if the ON directive applies to a 
DEALLOCATE statement that destroys an explicitly mapped variable, then the 
HOME clause must include all processors that owned any element of that 
variable.

If operations within an ON block do not follow these constraints, then the 
program is not HPF-conforming.

Note to language designers:
An alternative for the DISTRIBUTE would be that an missing ONTO clause 
would mean "distribute on all processors in the nearest enclosing ON 
directive" rather than "distribute on all processors".  In some sense, this 
makes the nested fork-joins more modular (declarations inside a forked 
region only need to know about the region they are part of).  It is 
especially appropriate now that we have the declarative ON directive.  I 
have not investigated the difficulty of writing things this way, but it 
probably requires fine-tooth combing through the data mapping chapter and 
inquiry intrinsics.
End of note to language designers.

Rationale:
Operations which allocate memory require the cooperation of all processors 
that will own that memory.  Therefore, ON clauses must schedule those 
operations to execute on the cooperating processors.  This leads to 
constraints on the mappings of variables allowed in ON clauses.  The most 
common case where these apply is in subroutines called from ON clauses; 
such subroutines must have a declarative ON clause that applies to all 
DISTRIBUTE and ALIGN directives for local variables, dummy arguments, and 
global data.

An alternate description of these constraints is:
* Local variables that are not SAVEd can only be mapped to the processors 
  executing the surrounding ON clause.  This ensures that all local 
  variables are allocated within the ON processor set, and that all references 
  to local variables will be RESIDENT as defined below.
* Dummy arguments and result variables can only be mapped to the processors 
  executing the ON clause.  Like local variables, this ensures that all 
  references to dummies and result variables are to local memory.  Note 
  that, although the dummy arguments must be RESIDENT, the actual argument 
  may not be (due to remapping).
* Global variables may be explicitly mapped onto all processors (or onto any 
  subset of processors that could normally be used).  This is required so 
  that the mappings of the globals can be consistent in different scopes.  
  Memory for the global variables will be allocated either at load time, 
  when all processors are active, or in explicit ALLOCATE statements, 
  which will be dealt with below.
* Local variables with the SAVE attribute can be mapped onto any set of 
  processors.  In this sense, they are much like global variables.  Like 
  global variables, it is possible to ensure that an appropriate set of 
  processors is active when memory is allocated for them.
* ALLOCATABLE variables have their mapping determined when they are 
  allocated.  Therefore, the restrictions on those mappings are applied to 
  the ALLOCATE statement.
End of rationale.

Advice to implementors:
These restrictions ensure that HPF data distribution directives inside 
an ON block (either lexically or in a dynamic call chain) can be 
implemented without relying on one-way communication outside of the 
"current" processing group.
End of advice to implementors.

If an ON directive includes a NEW clause, the meaning is the same as a NEW 
clause in an INDEPENDENT directive.  The operation of the program would be 
identical if the NEW variables were allocated anew on every entry to the 
ON directive's scope, and deallocated on exit from the ON clause.  That 
is, the NEW variables are dead on entry (i.e. assigned before use in the 
ON block) and dead on exit (i.e. not used after the ON block, unless first 
reassigned).  In addition, NEW variables cannot be remapped in the ON 
clause's scope, whether by REALIGN, REDISTRIBUTE, or by subroutine calls.  
Finally, NEW variables are not considered by any nested RESIDENT 
directives, as detailed below.

Rationale:
NEW clauses provide a simple way to create temporary variables.  This 
ability is particularly important when RESIDENT clauses come into play, as 
will be clear below.
End of rationale.

Advice to implementors:
Because they are not used outside of the ON blocks, NEW variables need not
be kept consistent before and after ON clauses.  Therefore, no 
communication outside of the ON processor set is required to implement 
them.  Scalar NEW variables should be replicated over the ON processor set, 
or allocated to memory areas shared by the ON processor set.  Note that 
memory must be dynamically allocated if there is a possibility that 
multiple instances of the ON block could be active concurrently.  This is 
similar to the requirements for implementing NEW variables in INDEPENDENT 
loops.
End of advice to implementors.


The RESIDENT directive is an assertion to the compiler that certain array 
references made within the ON are stored in local memory if the computation 
is performed by the processor(s) named in nearest enclosing ON directive.  
The scope of the assertion is the next Fortran 90 statement if the 
simple-resident-directive form is used and the enclosed block of code if 
the resident-block form is used.  If RESIDENT appears as a clause in an ON 
directive, then the ON and RESIDENT apply to the same statements.  In this 
case, the RESIDENT assertion refers to the ON directive in which it 
appears.

RESIDENT( var ) means the _lexical expression_ var, when encountered in the 
execution of statements in the scope of the RESIDENT directive, accesses 
only data local to the set of executing processors.  (That is, the set of 
processors named by the ON directive, henceforth referred to as the ON 
processors set.)  If var is accessed by the statement (e.g.  it appears on 
the right-hand side of an assignment statement, or in the evaluation of a 
conditional expression), then at least one copy of the variable and any 
subobject of the variable must be mapped to the ON processors set.  If var 
is assigned to by the statement (e.g.  it appears on the left hand side of 
an assignment statement, or in the variable list of a READ statement), then 
all copies of the variable and all subobjects of the variable must reside 
in the ON processors set.  

Note that RESIDENT is always an assertion relative to the surrounding ON 
directive.  Therefore, if the compiler does not implement the ON directive 
then it must be careful in interpreting RESIDENT. Similarly, if the 
compiler overrules the programmer-specified ALIGN and DISTRIBUTE 
directives, the RESIDENT clause may give less information.

Rationale:
The different treatment of variable reads and writes is due to the 
implementation requirements. If a variable's value is read (but not 
written), then it can be taken from any consistent copy.  Therefore, 
RESIDENT only asserts that one of those copies is available.  Conversely, 
all copies of a replicated variable must be consistent, so RESIDENT 
asserts that all copies are available it it is updated.

The RESIDENT assertion is always relative to the declared data mappings and 
ON clauses because both pieces of information are necessary to determine 
the locality of data references.  Data mapping determines where the data is 
stored, while ON clauses determines where they are used; in essence they 
determine the endpoints of a data path.  RESIDENT itself says that the path 
length is very short; obviously, one cannot measure a path without knowing 
both endpoints.
End Rationale.

For example, consider the following:

	!HPF$ ON HOME(Z(I)), RESIDENT(X,Y,RECORD(I))
	X(I) = Y(I+1) + RECORD(I)%FIELD1 + RECORD(I+1)%FIELD2

The following facts are asserted by the directive:
	* Z(I) would be local if it appeared, due to its use in the HOME directive.
	* All copies of X(I) are stored on the same processor as Z(I), due to the 
	  RESIDENT clause.  This may be true because X and Z have the same 
	  mapping, or because Z is replicated on all processors, or because the 
	  single copy of X(I) is the only element of X mapped to the same processor 
	  as the single copy of Z(I).  (Other situations are also possible.)
	* At least one copy of Y(I+1) is on the same processor as Z(I), due to the 
	  RESIDENT clause.  This may be true because Y is replicated on all 
	  processors, because Z(I) and Y(I+1) are the only elements of their arrays 
	  that are mapped to the same processor, or because the directive
	  	!HPF$ ALIGN Y(J) WITH Z(J-1)
	  appears elsewhere in the program.  (Other situations also make the 
	  RESIDENT assertion true.)
	* At least one copy of all subobjects of RECORD(I) is mapped on the same 
	  processor as Z(I).  In particular, the reference RECORD(I)%FIELD1 
	  (i.e. a subobject consisting of one component) can be accessed locally.  
	  The situations in which this is true are similar to those for X(I).
	  No information is available regarding RECORD(I+1)%FIELD2.

If there is no local-var-list, then _all_ references to _all_ variables 
referenced during execution of the RESIDENT directive's body except those 
declared NEW in a surrounding ON directive are local in the sense described 
above.  That is, for every usage of any variable's value, at least one copy 
of the variable will be mapped to the ON processor set.  Likewise, for 
every operation that assigns to a variable, all copies of that variable are 
mapped to the ON processor set.  References and assignments to NEW 
variables are always considered local.  If there are no function or 
subroutine invocations, this is syntactic sugar for listing all variable 
references within the directive's scope.  It might well have been named the 
ALL_RESIDENT clause; the present form, however, does not add yet another 
keyword to the directive sublanguage.

If a RESIDENT directive applies to a CALL statement or function invocation, then 
the assertion is more subtle.  

	* If a local-var-list appears in the RESIDENT directive, then no 
	  assertion is made about behavior within the called procedure.  For 
	  example, consider the statements:
	  
	  	!HPF$ RESIDENT( A(I), B )
	  	A(I) = F( A(I), B(LO:HI) )

	  The directive declares all variable refences in the statement (including 
	  the actual parameters) to be local to the current ON processor set.  
	  However, the execution of F itself could access elements of arrays named 
	  A and B stored on arbitrary processors. 

Rationale:
Propagating assertions about the behavior of lexical entities is difficult 
to define consistently and usefully.  For example, consider the following 
function called from the code fragment above:

	  	REAL FUNCTION F( X, Y )
	  	REAL X, Y(:), B(I)
	  	!HPF$ ENVIRONMENT :: ON HOME(PROCS(1:10))
	  	!HPF$ INHERIT Y
	  	!HPF$ ALIGN B(:) WITH Y(:)
	  	INTEGER I
	  	USE MODULE_DEFINING_A
	  	Z = 0.0
	  	DO I = 1, SIZE(Y)
	  		Z = Z + A(I)*X + B(I)*Y(I)
	  	END DO
	  	F = Z
	  	END

Assume A is defined as a distributed, global array in module 
MODULE_DEFINING_A.  What should the RESIDENT clause mean regarding 
operations in F?  The expression A(I) in the RESIDENT directive might 
reasonably mean references only to the array A that is visible in the 
caller, or it might mean references to any array named A, or it might be.  
Note that the A in the caller may be local, the same global array as the A 
in F (if the caller used MODULE_DEFINING_A), or a different global array 
(if the caller uses a different module).  Perhaps a limiting case is array 
B.  The array B in function F is local, and thus different from the 
caller; however, because of the restrictions on ON clauses it is certain 
that the local B will be mapped to the ON processors set.  Thus, the 
RESIDENT assertion is trivially true.  To further confuse matters, RESIDENT 
variables might seem to apply to dummy arguments that might become 
associated with those variables. Unfortunately, this implies that the 
lexical expression B in the caller refers to the lexical expression Y in F.
End rationale.	  

	* If the RESIDENT directive does not contain a local-var-list, then the 
	  directive asserts that all references in the caller _and the called 
	  procedures_ are local as defined above.  For example, consider the 
	  statements:
	  
	  	!HPF$ RESIDENT
	  	A(I) = F( A(I), B(LO:HI) )

	  The directive declares all variable refences in the statement (including 
	  the actual parameters) to be local to the current ON processor set, and 
	  that F itself does not reference or update any nonlocal variables.

Rationale:
The RESIDENT assertion is always true for data local to the called 
procedure.  This is true because the called procedure must use a 
declarative ON clause, which in turn limits the set of processors that can 
store any local explicitly mapped variables.  The above definition extends 
this assertion to all global explicitly mapped data, producing a very 
powerful directive.  This is similar to the meaning of INDEPENDENT, in that 
it also makes an assertion about variable accesses in any called procedure 
in the loop.  An alternative semantics for RESIDENT would have been to 
avoid propagating the assertion interprocedurally (i.e.  treat both the 
variable-list version and the no-list version the same).  However, this 
would not provide enough information to optimize code on certain machines.  
In particular, it would have made task parallelism quite difficult on 
message-passing machines.
End of rationale.
	  
Advice to implementors:
RESIDENT without a variable list guarantees that no one-sided 
communication outside of the ON processor set will be generated by the 
callee.  Such a procedure can be called only on the "active" processors, 
unless the runtime system has additional constraints (for example, if the 
runtime system requires all processors to participate in collective 
communications).

The other forms of RESIDENT provide information that could be propagated 
interprocedurally.  If the information is not propagated, the only result 
will be less optimization.
End of advice to implementors.

Advice to programmers:
Although the RESIDENT assertion applies interprocedurally, it is by no 
means certain that all compilers will make use of this information.  In 
particular, separate compilation limits the propagation that can take 
place.  It is therefore good practice to include a RESIDENT clause 
both in the caller's ON directive and in the callee's ON declaration.  
This ensures that the compiler has the RESIDENT information available when 
it is compiling both ends of the procedure call.  This is especially 
useful for RESIDENT clauses without a variable list; knowing that all data 
accessed is local allows many optimizations that are not otherwise 
possible.
End of advice to programmers.

Note that if the HOME clause specifies more than one processor, then RESIDENT 
only asserts that the variables are stored on one of the processors.  For 
example, if a statement is executed on a section of the processors 
arrangement, then communication within that section may be needed for some 
variables in the RESIDENT clause.  Communication with processors outside of 
the section will not be needed for those variables, however.

Rationale:
The alternative to this interpretation would be that any variable named in 
the RESIDENT clause would be local to all processors, i.e. replicated.   
While that certainly allows more extensive optimizations, it is a less 
common case.  In addition, it does not seem to capture the intent of ON 
directives applied to CALL statements or compound statements.  For example,
	!HPF$ PROCESSORS PROCS(MP,MP)
	!HPF$ DISTRIBUTE X(BLOCK,BLOCK) ONTO PROCS
	!HPF$ ON HOME(PROCS(1,1:MP)), RESIDENT( X(K,1:N) )
	CALL FOO( X(K,1:N) )
would presumably call FOO on a row of the processors arrangement, passing 
elements of X in place.  This is what the current definition does; if 
RESIDENT meant "resident on every processor", the call would force X to be 
replicated.
End Rationale.

The RESIDENT directive is similar to the INDEPENDENT directive, in that if it 
is correct it does not change the meaning of the program.  If the RESIDENT 
clause is incorrect, the program is not standard-conforming (and is thus 
undefined).  Like the INDEPENDENT directive, the compiler may use the 
information in the RESIDENT clause, or ignore it if it is insufficient for the 
compiler's purposes.  If the compiler can detect that the RESIDENT clause is 
incorrect (i.e.  that a RESIDENT variable is definitely nonlocal), it is 
justified in producing a warning.  Unlike the INDEPENDENT directive, 
however, the truth of the RESIDENT clause depends on the mapping of 
computations (specified with the ON clause) and the mapping of data 
(specified with DISTRIBUTE and ALIGN clauses); if the compiler overrides 
either of these, then it may not be able to use information in the RESIDENT 
directive.

Rationale:
Knowing that a reference is local is valuable information for the 
optimizer.  It is in keeping with the spirit of HPF to phrase this as an 
assertion of fact, which the compiler can use as it pleases.  Expressing 
it as advice to the compiler seems to have disadvantages.  Some possible ways 
this advice could be phrased, and the counter-arguments, are 
*	"Don't generate communication for this reference" has great potential 
	for changing the meaning of the program.  Some programmers want this 
	capability, but it violates the "correct directives should not change the 
	meaning of a program" principle of HPF.  Also, once communication is 
	"turned off" for a reference, it's not clear how to turn it back on.
*	"Generate communication for this reference" is not a useful directive, 
	since the compiler has to do this anyway.
*	"Generate communication for this reference, and place it here" is useful, 
	since it can override the default placement by the compiler.  It still 
	has potential for changing program meaning.  It also has the potential to 
	create programs as complex as message-passing, as programmers try to move 
	communication out of loops.
End of rationale. 


Examples

*** To be done: 
*** Create "portably efficient" section of ON clause
*** Create "portably efficient" section for RESIDENT
*** Create examples of inferences from lexical information

The following are valid examples of ON directives.  Most of them are 
"reasonable" in the sense that they illustrate idioms that programmers 
might want to use, rather than contrived situations.  For simplicity, the 
first several examples assume the following array declarations:

	REAL A(N), B(N), C(N), D(N)
	!HPF$ DISTRIBUTE A(BLOCK), B(BLOCK), C(BLOCK), D(BLOCK)

One of the most commonly requested capabilities for HPF as to control how 
loop iterations were assigned to processors.  (Historically, the ON clause 
first appeared to perform exactly this role in the Kali FORALL construct.) 
This can be done by the ON directive, as shown in the following examples:

	!HPF$ INDEPENDENT
	DO I = 2, N-1
	  !HPF$ ON HOME(A(I))
	  A(I) = (B(I) + B(I-1) + B(I+1))/3
	END DO
	
	!HPF$ INDEPENDENT
	DO J = 2, N-1
	  !HPF$ ON HOME(A(J+1)) BEGIN
	    A(J) = B(J+1) + C(J+1) + D(J+1)
	  !HPF$ END ON
	END DO

The ON directive in the I loop advises the compiler to have each processor 
run over its local section of the A array (and therefore B as well).  The 
references to B(I-1) and B(I+1) must be fetched from off-processor for the 
first and last iterations on each processor (except for the boundary 
processors); note that those processors are not mentioned in the HOME 
clause.  The ON directive in the J loop advises the compiler to "shift" 
computations so that each processor does a vector sum of its local sections 
of B, C, and D, stores the first element of the result on the processor to 
its left, and stores the rest of the result (shifted by one) in A.  It is 
worth noting that the directives would still be valid (and minimize 
nonlocal data accesses) if the arrays were distributed CYCLIC, although the 
number of nonlocal references would be much higher.

Advice to implementors:
It is highly recommended that compilers concentrate on optimizing DO loops 
with a single ON clause including the entire loop body.  Schematically, 
the code will be:

	DO i = lb, ub, stride
		!HPF$ ON HOME(array(f(i))) BEGIN
		body
		!HPF$ END ON
	END DO

Where array has some data mapping.  Assume the mapping give processor p 
the elements my_set(p).  (In a BLOCK distribution, for example, my_set(p) 
is a contiguous range of integers.)  Then the generated code on processor 
p should be

	DO i in [lb:ub:stride] intersect f^-1(my_set(p))
		body
	END DO

(This schematic does not show where communication or synchronization must 
be placed; that must be derived from analysis of the body.) Moreover, f is 
most likely to be the identity function or a linear function with integer 
coefficients, both of which can be inverted easily.  Given this, techniques 
for iterating through the set can be found in several recent conferences.
End of advice to implementors.

Advice to users:
One can expect the I loop above to generate efficient code for the 
computation partitioning.  In effect, the compiler will arrange for each 
processor to iterate over its own section of array A.  The J loop is 
slightly more complex, since the compiler must find the inverse of the HOME 
clause's subscripting function.  That is, the compiler must solve K=J+1 for 
J, where K ranges over the local elements of A.  Of course, in this case 
J=K-1; in general, linear functions can be inverted by the compiler.  (It 
should be pointed out, however, that complex combinations of ALIGN and 
DISTRIBUTE may make the description of K unwieldy, and this may add 
overhead to the inversion process.)
End of advice to users.
 

Sometimes it is advantageous to "split" an iteration between processors. 
The following case shows one example of this:
	
	!HPF$ INDEPENDENT
	DO I = 2, N-1
	  !HPF$ ON HOME(A(I))
	  A(I) = (B(I) + B(I-1) + B(I+1))/3
	  !HPF$ ON HOME C(I+1)
	  C(I+1) = A(I) * D(I+1)
	END DO

Due to the first ON clause, the reference to A(I) is local in the first 
statement. The second ON clause makes A(I) nonlocal (for some values of I) 
there.  This maximizes the data locality in both statements, but does 
require data movement between the two.  

Advice to implementors:
If there are several non-nested ON clauses in a loop, then the schematic 
above needs to be generalized.  In essence, the iteration range for each 
individual ON clause must be generated.  A processor will then iterate over 
the union of these ranges. Statements guarded by an ON directive must now 
be guarded by an explicit test.  In summary, the code for

	DO i = lb, ub, stride
		!HPF$ ON HOME(array1(f1(i)))
		stmt1
		!HPF$ ON HOME(array2(f2(i)))
		stmt2
	END DO

on processor p becomes 

	set1 = [lb:ub:stride] intersect f1^-1(my_set1(p))
	set2 = [lb:ub:stride] intersect f2^-1(my_set2(p))
	DO i in set1 union set2
	  IF (i in set1) THEN
	    stmt1
	  ENDIF
	  IF (i in set2) THEN
	    stmt2
	  ENDIF	  
	END DO

where my_set1(p) is the local set for array1, and my_set2(p) is the local 
set for array2.  (Again, synchronization and communication must be handled 
by other means.)  Code transformations such as loop distribution and loop 
peeling can be used to eliminate the tests in many cases.  They will be 
particularly profitable if there are data dependences between the ON 
blocks.
End of advice to implementors.

Advice to users:
Splitting an iteration like this is likely to require either additional 
tests at runtime or additional analysis by the compiler.  Even if the 
compiler can generate low-overhead scheduling for the individual ON 
clauses, combining them is not necessarily low-overhead.  The locality 
benefits must be rather substantial for this to pay off, but there are 
cases where multiple ON clauses are valuable.  (All these statements are 
particularly true if one ON block uses data computed in another one.)
End of advice to users.


Because ON clauses nest naturally, they can be useful for expressing 
parallelism along different dimensions.  Consider the following examples:

	REAL X(M,M)
	!HPF$ DISTRIBUTE X(BLOCK,BLOCK)
	
	!HPF$ INDEPENDENT, NEW(I)
	DO J = 1, M
	  !HPF$ ON HOME(X(:,J)) BEGIN
	    DO I = 2, M
	      !HPF$ ON HOME(X(I,J))
	      X(I,J) = (X(I-1,J) + X(I,J)) / 2
	    END DO
	  !HPF$ END ON
	END DO

Each iteration of the J loop is executed by a column of the processors 
arrangement.  The I loop further subdivides the computation, giving each 
processor responsibility for computing the elements it owns.  Many 
compilers would have chosen this computation partitioning automatically for 
such a simple example.  However, the compiler might have attempted to fully 
parallelize the outer loop, executing each inner loop sequentially on one 
processor.  (This might be attractive on a machine with very fast 
communications.) By inserting the ON clauses, the user has advised against 
this strategy, thus trading additional locality for restricted parallelism.  
Notice that the ON directive neither requires nor implies the INDEPENDENT 
assertion.  In both nests, each iteration of the I loop depends on the 
preceeding iteration, but the ON directive can still partition the 
computation among processors.  The ON directive does not automatically make 
a loop parallel.

Advice to implementors:
"Dimension-based" nesting, as above, will probably be a common case.  The 
HOME clauses can be inverted at each level, treating indices from outer 
loops as run-time invariants.
End of advice to implementors.

Advice to programmers:
Nested ON directives will tend to have efficient implementations if their 
HOME clauses refer to different dimensions of the processors arrangements, 
as in the above example.  This minimizes the interaction between the levels 
of the loops, simplifying the implementation.
End of advice to programmers.

Consider the following variation on the above example:

	!HPF$ DISTRIBUTE Y(BLOCK,*)

	!HPF$ INDEPENDENT, NEW(I)
	DO J = 1, M
	  !HPF$ ON HOME(Y(:,J)) BEGIN
	    DO I = 2, M
	      !HPF$ ON HOME(Y(I,J))
	      Y(I,J) = (Y(I-1,J) + Y(I,J)) / 2
	    END DO
	  !HPF$ END ON
	END DO

Note that the ON clauses have not changed, except for the name of the 
array.  The interpretation is similar to the above, except that the outer 
ON directive assigns each iteration of the J loop to all of the processors.  
The inner ON directive again implements a simple owner-computes rule.  The 
programmer has directed the compiler to distribute a serial computation 
across all the processors.  There are a few scenarios where this is more 
efficient than parallelizing the outer loop:

1. Parallelizing the outer loop will generate many non-local references, 
   since only a part of each column is on any processor. If nonlocal references 
   are very expensive (or if M is relatively small), this overhead 
   may outweigh any gain from parallel execution.
2. The compiler may take advantage of the INDEPENDENT directive to avoid 
   inserting any synchronization.  This allows a natural pipelined execution.  
   A processor will execute its part of the I loop for one value of J, then 
   immediately go on to the next J iteration.  Thus, the first processor 
   will start on J=2 while the second receives the data it needs (from 
   processor one) for J=1.  (A similar pipeline would develop in the X 
   example above.)

Clearly, the suitability of these ON clauses will depend on the underlying 
parallel architecture.  

Advice to programmers:
This example points out how ON may improve software engineering.  While the 
"value" of HOME(X(I)) will change if X's mapping changes, its intent will 
usually stay the same - run the loop "aligned with" the array X.  Moreover, 
the form of the clauses is portable, and they simplify experimenting with 
alternative computation partitioning.  Both qualities are similar to the 
advantages of DISTRIBUTE and ALIGN over low-level data layout mechanisms.
End advice to programmers.

ON directives are particularly useful when the compiler cannot accurately 
estimate data locality, for example when the computation uses indirection 
arrays.  Consider three variations of the same loop:

	REAL X(N), Y(N)
	INTEGER IX1(M), IX2(M)
	!HPF$ DISTRIBUTE X(BLOCK), Y(BLOCK)
	!HPF$ DISTRIBUTE IX(BLOCK), IY(BLOCK)
	
	!HPF$ INDEPENDENT
	DO I = 1, N
	  !HPF$ ON HOME( X(I) )
	  X(I) = Y(IX(I)) - Y(IY(I))
	END DO
	
	!HPF$ INDEPENDENT
	DO J = 1, N
	  !HPF$ ON HOME( IX(J) )
	  X(J) = Y(IX(J)) - Y(IY(J))
	END DO

	!HPF$ INDEPENDENT
	DO K = 1, N
	  !HPF$ ON HOME( X(IX(K)) )
	  X(K) = Y(IX(K)) - Y(IY(K))
	END DO

In the I loop, each processor runs over its section of the X array.  Only 
the reference X(I) is guaranteed to be local.  (If M<>N, then IX and IY 
have a different block size than X, and thus a different mapping.)  
However, if it is _usually_ the case that X(I), Y(IX(I)), and Y(IY(I)) are 
located on the same processor, then this mapping may be the best one 
available.  If X(I) and Y(IX(I)) are _always_ on the same processor, then 
the RESIDENT clause should be added:
	!HPF$ ON HOME( X(I) ), RESIDENT( Y(IX(I)) )
This will avoid communication setup overhead on most systems, and there is 
little chance that the compiler would deduce this automatically.  If both 
references to Y are _always_ on the same processor as X(I), then further 
improvement is possible and desirable:
	!HPF$ ON HOME( X(I) ), RESIDENT( EVERY=Y )
In the J loop, references IX(J) and IY(J) are always local.  This is the 
most common array reference class in the loop, so it minimizes the 
number of nonlocal data references in the absence of any special 
properties of IX and IY.  It may not evenly balance the load among 
processors; for example, if N=M/2 then half the processors will be idle.  
As before, if the values in IX or IY ensure that one of the Y references is 
always local, a RESIDENT assertion should be added.  In the K loop, only 
reference Y(IX(K)) is guaranteed to be local (because Y and X have the same 
distribution).  However, the values stored in IX and IY may ensure that 
Y(IY(K)) and X(K) always local, a fact that should be noted if true:
  !HPF$ ON HOME( X(IX(K)) ), RESIDENT( Y(IY(K)), X(K) )
Even if the three REAL values are not always, but merely "usually" on the 
same processor, this may be a good computation partitioning for both 
locality and parallelism.  However, these advantages must be weighed 
against the cost of computing this partitioning.  Since the HOME clause 
depends on a (presumably large) array of runtime values, substantial time 
may be required to determine which iterations are assigned to each 
processor.  It should be clear from this discussion that there is no magic 
solution for handling complex computation partitionings; the best answer 
is usually a combination of application knowledge, careful data structure 
design (including ordering of the elements), and efficient compilation 
methodology and runtime support.

Advice to implementors:
The K loop is the situation that the inspector strategy described above is 
designed for.  If there is an outer loop around any of these examples, and 
that loop does not modify the distribution of X or the values of IX, then 
a record of each processor's iterations can be saved for reuse.  The cost 
is at worst linear in the sizes of the arrays.
End of advice to implementors.

Advice to users:
It is unlikely that any production compiler will generate low-overhead code 
for K loop above in the near term.  The difference from previous examples 
is that the HOME clause is not a function that can be easily inverted by 
the compiler.  Some compilers may choose to execute every iteration on all 
processors, testing the HOME clause at run-time; others may pre-compute a 
list of iterations for every processor.  Of course, the cost of computing 
the list will be substantial.

In practice, one would make all the arrays the same size to avoid some of 
the alignment problems above; the example was written this way for 
pedagogical reasons, not as an example of good data structure design.
End advice to programmers.

Explicit use of processors arrangements in ON directives is usually 
associated with task parallelism.  Many examples can be found there (I 
assume...) The following example illustrates how processors can be used 
for a one-dimensional domain decomposition algorithm:

	!HPF$ PROCESSORS (PROCS(NP))
	!HPF$ DISTRIBUTE X(BLOCK) ONTO PROCS

	! Compute ILO(IP) = lower bound on PROCS(IP)
	! Compute IHI(IP) = upper bound on PROCS(IP)
	DONE = .FALSE.
	DO WHILE (.NOT. DONE)
	  !HPF$ INDEPENDENT, NEW( ILO, IHI )
	  DO IP = 1, NP
	    !HPF$ ON HOME(PROCS(IP)), RESIDENT( X(ILO(IP):IHI(IP)) )
	    CALL SOLVE_SUBDOMAIN( IP, X(ILO(IP):IHI(IP)) )
	  END DO
	  !HPF$ ON HOME(X) BEGIN
	    CALL SOLVE_BOUNDARIES( X, ILO(1:NP), IHI(1:NP) )
	    DONE = CONVERGENCE_TEST( X, ILO(1:NP), IHI(1:NP) )
	  !HPF$ END ON
	END DO

The algorithm divides the entire computational domain (array X) into NP 
subdomains, one for each processor.  The INDEPENDENT IP loop performs a 
computation on each subdomain's interior.  The processors then collaborate 
to update the boundaries of the subdomains and test for convergence.  The 
subroutine SOLVE_SUBDOMAIN can use a transcriptive or descriptive
mapping for its array argument, placing it on a single processor.  An ON 
block encompassing the entire procedure could then ensure the computation 
proceded on a single processor.  Subroutines SOLVE_BOUNDARIES and 
CONVERGENCE_TEST may well have their own loops similar to the IP loop, with 
similar RESIDENT clauses.  Note that only the lower and upper bound of each 
subdomain is recorded; this allows different processors to process 
different-sized subdomains.  However, each subdomain must "fit" into one 
processor's section of the X array.

Advice to implementors:
The IP loop above is likely to be a common idiom among programmers doing 
block-structured codes.  In general, it can be implemented by inverting the 
HOME clause as was done above.  In the one-to-one case shown here (probably 
very popular with programmers), it can be implemented by assigning the 
processor id to the loop index variable and testing the range of the loop 
(once).
End of advice to implementors.

Advice to programmers:
Some compilers will propogate the ON information from the caller to the 
callee at compile time, and some at run time.  Repeating the ON clause in 
the caller and callee will tend to give the compiler better information, 
resulting in better generated code.  

Again, note the usefulness of RESIDENT clauses in giving the compiler 
information.  Few compilers would be able to unravel nontrivial assignments 
to ILO and IHI, and no current compiler would even attempt to understand 
the comments in the above code fragment.
End of advice to programmers.

*** Check examples from here on down

Because it is an assertion of act, the compiler can draw many inferences 
from a single RESIDENT clause.  For example, consider the following case:

	!HPF$ ALIGN Y(I) WITH X(I)
	!HPF$ ALIGN Z(J) WITH X(J+1)
	
	!HPF$ ON HOME( X(K) ), RESIDENT( X(INDX(K)) )
	X(K) = X(INDX(K)) + Y(INDX(K)) + Z(INDX(K))

The compiler is justified in making the following assumptions in compiling 
the assignment statement (assuming it honors both the ALIGN directives and 
the ON directive):

* X(K) requires no communication (because of the HOME clause)
* X(INDX(K)) requires no communication (because of the RESIDENT clause)
* Y(INDX(K)) requires no communication (because Y has the same mapping as 
  X, and INDX(K) clearly cannot change values between its use in the two 
  references X(INDX(K)) and Y(INDX(K)))

The compiler cannot make any assumption about INDX(K) or Z(INDX(K)) from 
the above code.  There is no indication how INDX is mapped relative to X, 
so the ON directive gives no guidance.  Note that the fact that an 
expression (here, X(INDX(K))) is local does not imply that its 
subexpressions (here, INDX(K)) are also local.  Similarly, Z's mapping does 
not determine if Z(INDX(K)) would be local; it indicates that Z(INDX(K)-1) 
is local, but that isn't a great help.  If the compiler has additional 
information (for example, X is distributed by BLOCK and INDX(K) is not near 
a block boundary), it might be able to make additional deductions.

Advice to implementors:
One mark of a good compiler will be that it aggressively propagates RESIDENT 
assertions.  This is likely to significantly reduce communication costs.  
Note the cases under "Advice to users" below.
End advice to implementors.

Advice to users:
One can expect compilers to differ in how aggressive they are in drawing 
these deductions.  Higher-quality compilers will be able to identify more 
references as local, and use this information to eliminate data movement.  
All compilers should recognize that if an element of one array is local, 
then the samethen the same element of any other arrays with the same static mapping 
(i.e.  arrays ALIGNed together, or with the same DISTRIBUTE pattern and 
array size) will also be local.  That is, any compiler should recognize 
Y(INDX(K)) in the above example as local.  Dynamically changing array 
mappings (i.e. REALIGN and REDISTRIBUTE) will tend to limit such 
information and information propagation.  Also, assignments that might 
change subexpressions (for example, an assignment to K or any element of 
INDX in the above example) will force the compiler to be conservative in 
its deductions.
End advice to users.
---------------------------------------------------------------------------
To (un)subscribe to this list, send mail to hpff-core-request@cs.rice.edu.
Leave the subject line blank, and in the body put the line
(un)subscribe <email-address>
---------------------------------------------------------------------------

From owner-hpff-core  Fri May  3 22:02:37 1996
Received: (from daemon@localhost) by cs.rice.edu (8.7.1/8.7.1) id WAA13007 for hpff-core-out; Fri, 3 May 1996 22:02:37 -0500 (CDT)
Received: from coral.llnl.gov. (coral.llnl.gov [134.9.1.2]) by cs.rice.edu (8.7.1/8.7.1) with ESMTP id WAA13002 for <hpff-core@cs.rice.edu>; Fri, 3 May 1996 22:02:33 -0500 (CDT)
Message-Id: <199605040302.WAA13002@cs.rice.edu>
Received: by coral.llnl.gov
	(1.40.112.4/16.2) id AA218278951; Fri, 3 May 1996 20:02:31 -0700
Date: Fri, 3 May 1996 20:02:31 -0700
From: Mary E Zosel <zosel@coral.llnl.gov>
To: hpff-core@cs.rice.edu
Subject: hpff-core: HPF 2.0 Document Plan
Mime-Version: 1.0
Content-Type: text/plain; charset=X-roman8
Content-Transfer-Encoding: 7bit
Sender: owner-hpff-core
Precedence: bulk

---------------------------------------------------------------------------
hpff-core@cs.rice.edu is a mailing list for announcements related to High
Performance Fortran.  Instructions for adding or deleting yourself
from this list appear at the bottom of this message.
---------------------------------------------------------------------------

HPF 2.0 DOCUMENT GENERATION PLAN:
PLEASE NOTE DATES AND WHERE YOUR NAME APPEARS

In the interest of time - we will abandon the idea of producing a 1.2 document, 
but we will use the text prepared for this document as the basis for generating 
our new 2.0 document.  The time schedule for producing the 2.0 document follows.
---------

May 3:  Chuck sends out a file with the 1.2 base files (minus the library)
May 3:  Chuck initializes an HPFF_DOC email list with instructions for how
		to subscribe.  
Before May 10: Raw text pasted together in the order of the chapters given below
is due to Chuck.  This text may or may not be in latex. Chuck will merge and 
distribute a 'raw' document as our starting point.

May 13-May 22 - Chapter writers produce latex in roughly hpf-ese for each 
chapter.

Before May 22:  Latex with in roughly HPF-ese for each chapter is due to Chuck. 
Things like cross-references will be missing.

May 23 - Chuck will post the first 'latex' draft of the document. 

May 24-June 7  back-up readers, volunteer readers, sweepers go over text and 
send in comments to hpff_doc.   Each comment should reference the chapter in 
the subject line.

June 7-June 21 Chapter editors iterate to improve and incorporate comments.  
Comments can also continue - but may not make it into the revised text.

By June 21 - Send updates to Chuck.

June 24 -   Chuck reposts document.  An RalphaS version of the document also 
sent to wider audience to request early help in fixing  problems in the document

before it goes for formal public comment.

June 27 -   If some serious problems are noted with the June 24 version - fixes 
due by this time.  Draft to Theresa for printing by June 28.  (This timing 
recognizes that between the US holiday the following week, and the Vienna 
workshop, very few people will be around, and Chuck will not be available to 
issue a revised document, and that a few days are required for duplications / 
shipping.

July 10-12 - Final technical changes at meeting


Names of the primary writing team, and back-up readers for each section are 
limited.  These are people who commit to input by the dates given and Mary will 
contact directly to nag about their contributions.

------

For each chapter a primary writing team has been assigned, along with other 
names designated as readers.  It should be understood that anyone is welcome to 
read an comment on any section.  In addition, a 'sweep team' has been named - 
for a group who will do a pass across the entire document - looking for more 
global issues that might have 'slipped through cracks': did CCI's get in?  did 
all the proposals get in?  Did the appropriate material from 1.1 get included or
removed?  Is the division between 2.0 and Extended features correct?

DOCUMENT OUTLINE and NAMES
Section I   Introduction
Chapter 0:  Front, ack, etc
	writer: Mary
	reader: Chuck
Chapter 1: Overview - terms and concepts - new document/language structure, 
Fortran language base, F95 features, HPF 1.1 deletions, etc.
	writer: David and Carl
	reader:  Bob, Rob, Chuck, Jerry
Section II  HPF 2.0
Chapter 2: Mappings  - distribute, align, sequence, some of pointer.  
Material from most of v1.1 chapter 3 and some of chapter 7.
	writer: Piyush, Carl, GLS
	reader: Saday, Guy R.
Chapter 3: Mapping across Subprogram Interface. 
Material from v1.1 chapters 3 and 7.
	writer/reader - same as chapter 2.
Chapter 4:  Independent and Reduce
	writer: Chuck
	reader: Rob and Jay
Chapter 5: HPF Library (plus sort-up and sort-down)
	writer: Rob
	reader: Carol
Chapter 6:  Extrinsics
	writer: David
	reader: Mary
Chapter 7: Portability and Efficiency Issues (forall examples)
	writer: Andy, Chuck
	reader: Larry, Carl, Henry, Guy R.

Section III Extended Features
Chapter 8: More mappings (and related subprogram interface issues) 
{should this be two chapters as in Section II?} 
 GenBlock, indirect, range, shadow, subsets, derived type, more on pointers.
	writer:  Piyush, Carl
	reader: Saday, Guy R.
Chapter 9: ON, TASK, RESIDENT
	writer: Chuck, Jaspal
	reader: Jay
Chapter 10: New Library - generalized transpose, extended inquiries, 
new mappings.
	writer: Rob
	reader: Henry
Chapter 11: Async I/O
	writer: Larry
	reader: Alok
Chapter 12: HPF_LOCAL
	writer: David
	reader: Carl, Carol, Saday
Chapter 13: HPF_SERIAL
	writer: David
	reader: Carl, Carol, Saday
Chapter 14: PROVISIONAL TEXT - F77_LOCAL
 (don't really know where and if this goes yet).
	writer: Carol
	reader: ?
Chapter 15: C Interoperability
	writer: Henry and Andy
	reader: Scott, Jerry

Section IV  Appendixes
A - BNF  (GLS)
B - Extrinsic Extrinsics - (policy, mechanism, HPF_CRAFT)
	writer: Mary, Andy
	reader: Bob
C- Subset
	writer: Mary
	reader: Carol
Glossary and Bibliography later.  PLEASE - Chapter writers - note those features
that should be defined in the glossary and put the appropriate latex on the key 
places they are used.

Sweep Team:   Piyush, Carol, Henry, Rob, Mary
---------------------------------------------------------------------------
To (un)subscribe to this list, send mail to hpff-core-request@cs.rice.edu.
Leave the subject line blank, and in the body put the line
(un)subscribe <email-address>
---------------------------------------------------------------------------

From owner-hpff-core  Wed May  8 13:25:03 1996
Received: (from daemon@localhost) by cs.rice.edu (8.7.1/8.7.1) id NAA06658 for hpff-core-out; Wed, 8 May 1996 13:25:03 -0500 (CDT)
Received: from [128.42.1.213] (morpheus.cs.rice.edu [128.42.1.213]) by cs.rice.edu (8.7.1/8.7.1) with SMTP id NAA06649 for <hpff-core>; Wed, 8 May 1996 13:24:52 -0500 (CDT)
X-Sender: chk@titan.cs.rice.edu
Message-Id: <v01530511adb6a1b553a0@[128.42.1.213]>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Date: Wed, 8 May 1996 13:25:37 -0600
To: hpff-core
From: chk@cs.rice.edu (Chuck Koelbel)
Subject: hpff-core: HPF 1.2 draft now available
Sender: owner-hpff-core
Precedence: bulk

---------------------------------------------------------------------------
hpff-core@cs.rice.edu is a mailing list for announcements related to High
Performance Fortran.  Instructions for adding or deleting yourself
from this list appear at the bottom of this message.
---------------------------------------------------------------------------
Such as it is, anyway.

URL is ftp://titan.cs.rice.edu/public/HPFF/hpf2.0-draft/

Lots of .tex files there.

LaTex runs with some errors, but does produce syntactically correct output.
See the README file for more info.

Now off to collect that raw text...

                                                Chuck

PS I don't think I got all the chapter editors' assignments.  If you should
be on the hpff-doc list (still not set up), please send me your address.


---------------------------------------------------------------------------
To (un)subscribe to this list, send mail to hpff-core-request@cs.rice.edu.
Leave the subject line blank, and in the body put the line
(un)subscribe <email-address>
---------------------------------------------------------------------------

From owner-hpff-core  Wed May  8 15:21:50 1996
Received: (from daemon@localhost) by cs.rice.edu (8.7.1/8.7.1) id PAA12456 for hpff-core-out; Wed, 8 May 1996 15:21:50 -0500 (CDT)
Received: from mail13.digital.com (mail13.digital.com [192.208.46.30]) by cs.rice.edu (8.7.1/8.7.1) with ESMTP id PAA12445 for <hpff-core@cs.rice.edu>; Wed, 8 May 1996 15:21:44 -0500 (CDT)
Received: from us2rmc.zko.dec.com by mail13.digital.com (8.7.5/UNX 1.2/1.0/WV)
	id QAA04647; Wed, 8 May 1996 16:07:37 -0400 (EDT)
Received: from tle.enet by us2rmc.zko.dec.com (5.65/rmc-22feb94)
	id AA28450; Wed, 8 May 96 15:47:11 -0400
Message-Id: <9605081947.AA28450@us2rmc.zko.dec.com>
Received: from tle.enet; by us2rmc.enet; Wed, 8 May 96 16:03:31 EDT
Date: Wed, 8 May 96 16:03:31 EDT
From: Stan Whitlock DTN 381-2011 ZKO2-3/N30 <whitlock@tle.ENET.dec.com>
To: zosel@llnl.gov, hpff-core@cs.rice.edu, whitlock@tle.ENET.dec.com
Apparently-To: hpff-core@cs.rice.edu, zosel@llnl.gov
Subject: hpff-core: ANSI X3J3 work in progress on asynch I/O
Sender: owner-hpff-core
Precedence: bulk

---------------------------------------------------------------------------
hpff-core@cs.rice.edu is a mailing list for announcements related to High
Performance Fortran.  Instructions for adding or deleting yourself
from this list appear at the bottom of this message.
---------------------------------------------------------------------------
Mary,

My name is Stan Whitlock.  I represent DEC on X3J3 and am responsible for the
handling of Fortran 2000 requirements.  J3 discussed Asynch I/O at its Feb-96
meeting and produced the appended paper as work in progress.  I believe the
plan is to do more work on this topic next week at the May-96 meeting.

J3 is aware of HPFF's interest in this topic and its ultimate standardization.
As our work progresses, we will keep you informed.

Thanks				/Stan

--------------------------------------------------------------------------------

To: X3J3					X3J3/96-040r1
From: /jor (bleikamp)				page 1 of 2
Subject: ASYNC I/O : Requirements for F2000

Background:

  We hope that WG5 and the Fortran community will have an
  opportunity to review and comment on these requirements before
  the "standardese" is developed for Fortran 2000.

  At meeting 136, X3J3 discussed async i/o, and took a number of
  straw votes, trying to establish a consensus about the various
  tradeoffs between user flexibility, implementation cost, and
  expected efficiency of the resulting implementations.
  These requirements are the outcome of that discussion.

  In general, X3J3 decided that more functionality was
  desirable, at the expense of extra memory allocations and
  memory copies. X3J3 assumes (and hopes) that appropriate cases will
  be substanially optimized, eliminating unnecessary temporary allocations
  and memory to memory copies when possible.  The senario supported by
  existing practice,
    1) Contiguous memory block (such as ONE named variable), using
    2) Unformatted sequential I/O
  will hopefully be optimized by most vendors, and other senarios
  will likely incur additional overhead.

  There were other high level requirements which were generally
  accepted:

    - must be possible to be standard conforming on a
      processor/OS which does not support async i/o

    - must be readily implementable on a wide variety of OS's
      while achieving some level of asynchronous i/o

    - should use existing OPEN, READ, and WRITE statements, may add
      a new statement for WAITing for I/O completion

    - the user is prohibited from referencing the I/O list items until
      the I/O is complete (i.e. the WAIT statement was executed).

 	      					X3J3/96-040
Requirements:         				page 2 of 2

  - The user will request ASYNC I/O via the OPEN statement.
    All I/O to that unit may be performed asynchronously.

  - The READ and WRITE statements will be used, rather than some
    new intrinsic routine or BUFFERIN/OUT.

  - After executing a READ/WRITE on a unit OPENed for async i/o,
    the user shall WAIT for completion of the I/O before referencing
    the storage units referenced by the list items in the READ/WRITE.
    If the I/O request was a WRITE, the storage units can be referenced,
    but not defined.  If the I/O request is a READ, the storage units
    may not be referenced or defined.

  - The mechanism used to WAIT for I/O completion will be a new statement
    or a subprogram call.  The syntax to either WAIT for completion, or
    to inquire about the completion status (without waiting), will be
    provided.  Note that an implementation is free to always WAIT for
    completion even when the user inquires about the status.  This allows
    OS's which can only wait for completion to easily support async i/o.

  - The READ/WRITE statements will optionally return some sort of
    "handle", which uniquely identifies a particular I/O request.
    The WAIT mechanism will allow WAITing for a particular request
    via the "handle", AND, alternatively, allow WAITing for all requests
    for a user specified unit to complete, without specify any "handles".

  - The user may issue multiple I/O requests on a single or many units,
    without waiting for any of the previously issued requests to
    finish.  An implementation might wait for a previously issued
    request on a unit to finish before proceeding.

  - An OPEN statement which specifies async i/o may open a file
    for either FORMATTED or UNFORMATTED I/O, and for either
    SEQUENTIAL or DIRECT ACCESS I/O.

    Note that formatted I/O potentially reads/writes several records,
    and is less likely to be optimized by all implementations.  This is
    particularily true for list directed and namelist i/o.

    Note that for DIRECT ACCESS I/O, the "seek" will probably be done
    synchronously, and only the read/write will likely be done
    asynchronously.

  - Non-advancing I/O is prohibited on a unit opened for async i/o.

  - The READ/WRITE statements may have any valid list items.
    We considered restricting the I/O list items to ONE contiguous
    object, to ensure that the library would not have to allocate
    a temporary, and would not have to copy the I/O list to/from
    the temporary; however, X3J3 decided the enhanced functionality
    was worth the extra overhead.

end of requirements
---------------------------------------------------------------------------
To (un)subscribe to this list, send mail to hpff-core-request@cs.rice.edu.
Leave the subject line blank, and in the body put the line
(un)subscribe <email-address>
---------------------------------------------------------------------------

From owner-hpff-core  Thu May  9 09:58:22 1996
Received: (from daemon@localhost) by cs.rice.edu (8.7.1/8.7.1) id JAA17374 for hpff-core-out; Thu, 9 May 1996 09:58:22 -0500 (CDT)
Received: from [128.42.1.213] (morpheus.cs.rice.edu [128.42.1.213]) by cs.rice.edu (8.7.1/8.7.1) with SMTP id JAA17354 for <hpff-core>; Thu, 9 May 1996 09:58:16 -0500 (CDT)
X-Sender: chk@titan.cs.rice.edu
Message-Id: <v01530503adb7bd495778@[128.42.1.213]>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Date: Thu, 9 May 1996 09:59:02 -0600
To: hpff-core
From: chk@cs.rice.edu (Chuck Koelbel)
Subject: hpff-core: New mailing list - hpff-doc
Sender: owner-hpff-core
Precedence: bulk

---------------------------------------------------------------------------
hpff-core@cs.rice.edu is a mailing list for announcements related to High
Performance Fortran.  Instructions for adding or deleting yourself
from this list appear at the bottom of this message.
---------------------------------------------------------------------------
We are pleased to announce that hpff-doc@cs.rice.edu is now on the air.

This is the mailing list discussed at the last HPFF meeting, for people
actively working on pulling together the HPF 2.0 draft.  If you are one of
the initial members of the list, you just got a "Welcome" message.
Otherwise, you didn't.

The list is managed by Majordomo as an open list, meaning that you can add
yourself to it.

To add yourself to the list:
        mail majordomo@cs.rice.edu << EOF
        subscribe hpff-doc
        EOF

To remove yourself from the list:
        mail majordomo@cs.rice.edu << EOF
        unsubscribe hpff-doc
        EOF

We now return you to your regularly scheduled e-mail.

                                                Chuck


---------------------------------------------------------------------------
To (un)subscribe to this list, send mail to hpff-core-request@cs.rice.edu.
Leave the subject line blank, and in the body put the line
(un)subscribe <email-address>
---------------------------------------------------------------------------

From owner-hpff-core  Wed May 15 16:56:59 1996
Received: (from daemon@localhost) by cs.rice.edu (8.7.1/8.7.1) id QAA25794 for hpff-core-out; Wed, 15 May 1996 16:56:59 -0500 (CDT)
Received: from coral.llnl.gov. (coral.llnl.gov [134.9.1.2]) by cs.rice.edu (8.7.1/8.7.1) with ESMTP id QAA25772 for <hpff-core@cs.rice.edu>; Wed, 15 May 1996 16:56:51 -0500 (CDT)
Message-Id: <199605152156.QAA25772@cs.rice.edu>
Received: by coral.llnl.gov
	(1.40.112.4/16.2) id AA152007409; Wed, 15 May 1996 14:56:49 -0700
Date: Wed, 15 May 1996 14:56:49 -0700
From: Mary E Zosel <zosel@coral.llnl.gov>
To: hpff-core@cs.rice.edu
Subject: hpff-core: hpff meeting dates
Mime-Version: 1.0
Content-Type: text/plain; charset=X-roman8
Content-Transfer-Encoding: 7bit
Sender: owner-hpff-core
Precedence: bulk

---------------------------------------------------------------------------
hpff-core@cs.rice.edu is a mailing list for announcements related to High
Performance Fortran.  Instructions for adding or deleting yourself
from this list appear at the bottom of this message.
---------------------------------------------------------------------------
To: HPFF core meeting attendees ...

We agreed to July 10-12 ...

It turns out that one person has reported a conflict on
   Sept. 18-20 --- the other date we agreed on.

What about September 25-27 ???

Can you check your schedules?

Please send me a note indicating which of the two dates (or both) work
for you.

   -mary-    zosel@llnl.gov

---------------------------------------------------------------------------
To (un)subscribe to this list, send mail to hpff-core-request@cs.rice.edu.
Leave the subject line blank, and in the body put the line
(un)subscribe <email-address>
---------------------------------------------------------------------------

From owner-hpff-core  Thu May 16 17:29:31 1996
Received: (from daemon@localhost) by cs.rice.edu (8.7.1/8.7.1) id RAA08750 for hpff-core-out; Thu, 16 May 1996 17:29:31 -0500 (CDT)
Received: from coral.llnl.gov. (coral.llnl.gov [134.9.1.2]) by cs.rice.edu (8.7.1/8.7.1) with ESMTP id RAA08743 for <hpff-core@cs.rice.edu>; Thu, 16 May 1996 17:29:23 -0500 (CDT)
Message-Id: <199605162229.RAA08743@cs.rice.edu>
Received: by coral.llnl.gov
	(1.40.112.4/16.2) id AA129525761; Thu, 16 May 1996 15:29:21 -0700
Date: Thu, 16 May 1996 15:29:21 -0700
From: Mary E Zosel <zosel@coral.llnl.gov>
To: hpff-core@cs.rice.edu
Subject: hpff-core: Original September Dates
Mime-Version: 1.0
Content-Type: text/plain; charset=X-roman8
Content-Transfer-Encoding: 7bit
Sender: owner-hpff-core
Precedence: bulk

---------------------------------------------------------------------------
hpff-core@cs.rice.edu is a mailing list for announcements related to High
Performance Fortran.  Instructions for adding or deleting yourself
from this list appear at the bottom of this message.
---------------------------------------------------------------------------
Ok ... I have enough feedback ... we will stick with the original
September date for the HPFF meeting   (18-20).

    -mary-
---------------------------------------------------------------------------
To (un)subscribe to this list, send mail to hpff-core-request@cs.rice.edu.
Leave the subject line blank, and in the body put the line
(un)subscribe <email-address>
---------------------------------------------------------------------------

From owner-hpff-core  Fri May 24 07:58:38 1996
Received: (from daemon@localhost) by cs.rice.edu (8.7.1/8.7.1) id HAA04661 for hpff-core-out; Fri, 24 May 1996 07:58:38 -0500 (CDT)
Received: from aloisius.vcpc.univie.ac.at ([193.171.58.11]) by cs.rice.edu (8.7.1/8.7.1) with ESMTP id HAA04649; Fri, 24 May 1996 07:57:57 -0500 (CDT)
Received: (from doc@localhost) by aloisius.vcpc.univie.ac.at (8.7.5/8.7.3) id OAA25175; Fri, 24 May 1996 14:45:35 +0200 (MET DST)
Date: Fri, 24 May 1996 14:45:35 +0200 (MET DST)
Message-Id: <199605241245.OAA25175@aloisius.vcpc.univie.ac.at>
To: myh@icase.edu, jvr@icase.edu, m.d.salas@larc.nasa.gov, marco_a@crl.dec.com,
        hpff-core@cs.rice.edu, baden@cs.ucsd.edu, baetke@conmuc.de.convex.com,
        choudhar@cat.syr.edu, jcownie@bbn.com, jkd@cray.com,
        gannon@cs.indiana.edu, Thomas.Gross@cs.cmu.edu, halstead@crl.dec.com,
        pm@icase.edu, billo@vnet.ibm.com, snir@watson.ibm.com,
        joelw@convex.com, chk@cs.rice.edu, jhm@ecs.soton.ac.uk,
        gcf@nova.npac.syr.edu, wylie@cscs.ch, culler@allspice.berkeley.edu,
        vivek_sarkar@vnet.ibm.com, c.jesshope@ee.surrey.ac.uk,
        irigoin@chailly.ensmp.fr, tam@cray.com, karp@hpl.hp.com
From: course@vcpc.univie.ac.at
Subject: hpff-core: ANNOUNCE: Summer of HPF Meeting in Vienna, Austria
X-Safemail-Version: 1.2
Sender: owner-hpff-core
Precedence: bulk

---------------------------------------------------------------------------
hpff-core@cs.rice.edu is a mailing list for announcements related to High
Performance Fortran.  Instructions for adding or deleting yourself
from this list appear at the bottom of this message.
---------------------------------------------------------------------------
Announcement - Summer of HPF in Vienna

July 1-4, 1996, Vienna, Austria

High Performance Fortran (HPF) is a data parallel language extension to
Fortran90 which provides a portable programming interface for a wide variety
of target platforms. The original HPF language specification was produced by
the High Performance Fortran Forum, a broad consortium of industry and
academia, which met regularly throughout 1992 and early 1993. HPF compilers
are now available on most commonly-used computing systems, and users are
beginning to gain first hand experience with this language.

The HPF Forum began a second round of meetings in January 1995 with the aim of
working on the features which could not be dealt with in the original
specification, as well as support to the vendors implementing HPF. The results
of this round are due for public review in a draft form during the summer of
1996.

To increase public knowledge of HPF a workshop and a tutorial with a hands-on
session will be held in Vienna during the first week of July, (1st-4th July).
The complete agenda of events is described below. This workshop is organised
by the Vienna Centre for Parallel Computing (VCPC) as part of the ESPRIT
project HPC-Standards.

	July 1: 	Tutorial: HPF in practice
	July 2: 	Tutorial: Hands  on session
	July 3/4:	Workshop: HPF for real applications

Participants may register for one or both of the above events using the form
at the end of this document.

The workshop will be held at hotel Austrotel, Felberstr. 4, A-1150 Vienna. 

Tutorial

The tutorial is divided into two parts. 

	Day One:	HPF in Practice, Charles Koelbel, Rice University. 

This tutorial will introduce programmers to the most important features of HPF
and illustrate how they can be used in practice for scientific computation.
Further details can be found at http://www.cs.rice.edu/~chk/hpf-tutorial.html

	Day Two:	HPF Tutorial, NA Software. 

Attendees will have hands on access to the NA Software HPF mapper and tools on
the Meiko CS-2 at VCPC. Please note that there are a limited number of places
for the `hands-on' sessions.

Workshop

In this workshop, we give an overview of the work of the HPF Forum, including
its recent activities. A number of major compiler vendors will present their
views on HPF and give an update on their efforts. There will also be
contributions from several leading software houses who are beginning to port
applications to HPF. Tools for HPF, and the use of HPF together with MPI will
also be considered. The program will give ample time for both formal and
informal discussion.

Speakers for the workshop include the following; 

	Presentations from researchers in HPF and members of the HPF Forum 
	including Chuck Koelbel (Rice University/CRPC), Piyush Mehrotra 
	(ICASE), Barbara Chapman (VCPC), Thomas Brandes (GMD), John Merlin 
	(University of Southampton),  Sigi Benkner (University of Vienna) 
	and others.

	Descriptions of current and future compilers from a number of 
	vendors including Larry Meadows (PGI), Manish Gupta (IBM), 
	Harvey Richardson (TMC), Cliff Addison (NAS) and others.

	Summaries of experiences by `real' users from a range of industrial 
	and research organisations, including Henk Sips (Amsterdam), 
	Scott Baden (UCSD) and Guy Robinson (VCPC). Reports from the 
	PHAROS ESPRIT project of the experiences of CISE, debis, MATRA and 
	SEMCAP with current compilers and the conversion of real applications
	to HPF-1, and the HPF+ ESPRIT projects work in developing extensions 
	to HPF-1 to reflect the complexity of advanced applications from 
	AVL, ESI and ECMWF.

Contact

To register for the above events or to request further information please
contact course@vcpc.univie.ac.at or http://www.vcpc.univie.ac.at or complete
the form attached. 

______________________
 
European Centre for Parallel Computing at Vienna, (VCPC)
Liechtensteinstr. 22, A-1090 Vienna, Austria, Europe
Tel: +43-1-3109396-10, Fax: +43-1-3109396-13, E-mail: info@vcpc.univie.ac.at

==============================================================================


Registration form


European Centre for Parallel Computing at Vienna (VCPC)


Summer of HPF

July 1-4, 1996, Vienna, Austria


Name: ________________________________________________________________________


Affiliation: _________________________________________________________________


Address: _____________________________________________________________________


_________________________________  Telephone: ________________________________


Fax: ____________________________  E-mail: ___________________________________


I wish to attend (please cross): 

o the HPF Tutorial on day one only (July 1)

o the HPF Tutorial on both days (July 1/2)

o the HPF Workshop on July 3/4  


Please send me further information on:

o the HPF Tutorial on day one only (July 1)		

o the HPF Tutorial on both days (July 1/2)

o the HPF Workshop on July 3/4  


o Please send me information on future workshops and tutorials  


o Please send me hotel information 

o Please book a room for me at hotel Austrotel, or a nearby hotel 
    (please indicate preference).

    Type of room:  o Single             o Double 


    Arrival date: ___________________ Departure date: ________________________


Date: ____________________________ Signature: ________________________________


Fees


Payment should be enclosed if you register for the tutorial or the workshop.
Please make cheques payable to VCPC. All payments must be in Austrian
Schillings.  Fees include refreshments and lunch on each day of the events for
which you register. 


Day one only (July 1):                  Until May 20 		After May 20

	Academic, ESPRIT/ACTS projects: 1700 ATS         	2100 ATS
	Industry:                      	2300 ATS         	2800 ATS  


Day one and two (July 1/2):             Until May 20   		After May 20

	Academic, ESPRIT/ACTS projects: 2300 ATS         	2700 ATS
	Industry:                       2850 ATS         	3500 ATS    


Workshop (July 3/4):                   	Until May 20   	After May 20

	Academic, ESPRIT/ACTS projects: 1800 ATS         	2200 ATS
	Industry:                       2400 ATS         	2800 ATS    


I qualify for the	o Academic, project fee     	o Industry fee 


Name of project: _____________________________________________________________


Method of payment:	o Enclosed Cheque 	o American Express

		o Eurocard/Mastercard 	o Visa	

				o Diners Club 


Total amount of payment: _____________________________________________________


Credit Card Number: __________________________ Exp. date: ____________________


Cardholder Name: _____________________________________________________________


Date: ______________________________ Signature: ______________________________


---------------------------------------------------------------------------
To (un)subscribe to this list, send mail to hpff-core-request@cs.rice.edu.
Leave the subject line blank, and in the body put the line
(un)subscribe <email-address>
---------------------------------------------------------------------------

From owner-hpff-core  Fri May 24 10:20:53 1996
Received: (from daemon@localhost) by cs.rice.edu (8.7.1/8.7.1) id KAA07444 for hpff-core-out; Fri, 24 May 1996 10:20:53 -0500 (CDT)
Received: from aloisius.vcpc.univie.ac.at ([193.171.58.11]) by cs.rice.edu (8.7.1/8.7.1) with ESMTP id KAA07412; Fri, 24 May 1996 10:20:11 -0500 (CDT)
Received: (from doc@localhost) by aloisius.vcpc.univie.ac.at (8.7.5/8.7.3) id RAA29170; Fri, 24 May 1996 17:09:06 +0200 (MET DST)
Date: Fri, 24 May 1996 17:09:06 +0200 (MET DST)
Message-Id: <199605241509.RAA29170@aloisius.vcpc.univie.ac.at>
To: myh@icase.edu, jvr@icase.edu, m.d.salas@larc.nasa.gov, marco_a@crl.dec.com,
        hpff-core@cs.rice.edu, baden@cs.ucsd.edu, baetke@conmuc.de.convex.com,
        choudhar@cat.syr.edu, jcownie@bbn.com, jkd@cray.com,
        gannon@cs.indiana.edu, Thomas.Gross@cs.cmu.edu, halstead@crl.dec.com,
        pm@icase.edu, billo@vnet.ibm.com, snir@watson.ibm.com,
        joelw@convex.com, chk@cs.rice.edu, jhm@ecs.soton.ac.uk,
        gcf@nova.npac.syr.edu, wylie@cscs.ch, culler@allspice.berkeley.edu,
        vivek_sarkar@vnet.ibm.com, c.jesshope@ee.surrey.ac.uk,
        irigoin@chailly.ensmp.fr, tam@cray.com, karp@hpl.hp.com
From: course@vcpc.univie.ac.at
Subject: hpff-core: Summer HPF meeting, Vienna, Austria
X-Safemail-Version: 1.2
Sender: owner-hpff-core
Precedence: bulk

---------------------------------------------------------------------------
hpff-core@cs.rice.edu is a mailing list for announcements related to High
Performance Fortran.  Instructions for adding or deleting yourself
from this list appear at the bottom of this message.
---------------------------------------------------------------------------
Please find attached the updated programme and application form for the
"Summer of HPF" workshop and tutorial in Vienna, Austria.

Please disregard the previous message, which did not contain the
correct information regarding registration.

=============================================================================

Announcement


Summer of HPF in Vienna

HPF Tutorial, July 1-2, 1996, Vienna, Austria
Workshop on HPF for Real Applications, July 3-4, 1996, Vienna


High Performance Fortran (HPF) is a data parallel language extension to
Fortran90 which provides a portable programming interface for a wide
variety of target platforms. The original HPF language specification
was produced by the High Performance Fortran Forum, a broad consortium
of industry and academia, which met regularly throughout 1992 and early
1993. HPF compilers are now available on most commonly-used computing
systems, and users are beginning to gain first hand experience with
this language. The Forum has continued to meet in order to address
advanced topics.

To increase public knowledge of HPF, a workshop and a tutorial with a
hands-on session will be held in Vienna during the first week of July.
The workshop is organised by the VCPC as part of the ESPRIT project
HPC-Standards.

Participants may register for one or more of the above events using the
form attached. For further information, please contact
course@vcpc.univie.ac.at or http://www.vcpc.univie.ac.at/ or complete
the form attached.

The workshop will be held at the Austrotel hotel, Vienna. A number of
rooms have been reserved there at a special rate for participants.
Please refer to the VCPC on your reservation in order to qualify.

Book rooms directly at: Austrotel, Felberstr. 4, A-1150 Vienna, Austria. 
                   Tel: +43-1-981110, Fax: +43-1-98111930.


Tutorial

The tutorial is divided into two parts. Participants may register for
the first day only, or for both days. It is especially suitable for
those who do not have access to an HPF compiler.

        Day One:        HPF in Practice, Charles Koelbel, Rice University. 

This tutorial will introduce programmers to the most important features
of HPF and illustrate how they can be used in practice for scientific
computation.  Further details can be found at
http://www.cs.rice.edu/~chk/hpf-tutorial.html

        Day Two:        HPF Tutorial, NA Software. 

Attendees will have hands on access to the NA Software HPF mapper and
tools on the Meiko CS-2 at VCPC. Please note that there are a limited
number of places for the `hands-on' sessions.


Workshop

This workshop gives an overview of the achievements of the HPF Forum,
including its recent activities, and provides up-to-date information on
HPF compilers.  Major compiler vendors will describe their efforts and
share their views on HPF.  Contributions from end users include
descriptions of completed and on-going code development efforts.

One of the aims of this event is to enable compiler writers, potential
and actual users of High  Performance Fortran to come together to
discuss their problems and needs. Compiler writers need guidance from
users in order to understand how best to improve their products;
application developers need to find out how to write their codes in
ways that help the compiler generate fast object code.  Thus we include
both kinds of presentation and leave time for discussion in the
program.


Exhibit

An exhibit room will be available to enable vendors of HPF compilers
and related tools to display their products and to disseminate
information during the workshop.  There is limited space only. If you
wish to participate, please contact Tony.Curtis@vcpc.univie.ac.at with
a list of your proposed requirements.  Note that we will not be able to
process any requests after June 20.

===============================================================================

Workshop on HPF for Real Applications

Preliminary Program


        Welcome on Tuesday, 2nd July, 19:00 - 21:00


Wednesday, 3rd July

        09:00 - 10:00   Making HPF Work: Past Success and Future Challenges 
                Charles Koelbel, CRPC/Rice University

        10:00 - 11:00   Migrating to HPF 

                Re-engineering Tools for HPF
                Bernard Dion, Simulog

                Programming Tools for HPF: User Requirements
                Fritz Wollenweber, German Military Geophysical Office

                Tools for High Performance Program: A Survey
                Jean-Louis Pazat, IRISA

        11:00 - 11:30   Coffee Break

        11:30 - 13:00   Commercial Compilers I

                Thinking Machines' High Performance Fortran
                Harvey Richardson, Thinking Machines, Inc.

                An Overview of the IBM XLHPF Compiler 
                Manish Gupta, IBM Watson Research Center 

                The PREPARE HPF Compiler
                Martijn de Lange, ACE

        13:00 - 14:30   Lunch 

        14:30 - 15:30   Applications I

                Porting of Ocean Simulation Code to HPF
                Tor Sorevik, Parallab

                HPF Porting Strategy for an Industrial CFD Code 
                Christian Borel, MATRA

        15:30 - 16:00   Coffee Break

        16:00 - 17:00   Applications II

                HPF Port of an Irregular Application
                Philippe Devillers, VCPC

                Experience with Porting Two CFD Applications to HPF 
                Henk Sips, University of Amsterdam 

        17:00 - 18:00   Free time for exhibit/demonstrations 

        19:00   Social Event


Thursday 4th July

        09:00 - 10:30   Compilers II

                The PGI HPF Compiler
                Larry Meadows, The Portland Group, Inc.

                The HPFPlus Compiler Toolset
                Mike Delves, N. A. Software

                APR's HPF Compiler: Status and Results 
                John Levesque, Applied Parallel Research

        10:30 - 11:00   Coffee Break

        11:00 - 12:00   Benchmarking 

                Experience with HPF Compilers at ICASE
                Piyush Mehrotra, ICASE

                Benchmarking experiences at the VCPC
                Guy Robinson, VCPC

        12:00 - 13:30   Lunch

        13:30 - 15:00   Applications III

                HPF+ Pam-Crash Kernels and Requirements
                Guy Lonsdale, NEC Europe

                Application of HPF to Financial Modelling
                Carlos Falco-Korn, LPAC

        15:00 - 15:30   Coffee Break 

        15:30 - 17:00   Research Compilers

                sHPF: A Subset HPF Compilation System
                John Merlin, University of Southampton

                Optimizing HPF for Advanced Applications 
                Siegfried Benkner, University of Vienna 

                Run Time Support for Structured Adaptive Mesh Methods
                Scott Baden, University of California, San Diego

        17:00 - 18:00   Panel Discussion and Closing Remarks

===============================================================================

Registration form

European Centre for Parallel Computing at Vienna (VCPC)

Summer of HPF
HPF Tutorial, July 1-2, 1996, Vienna, Austria
Workshop on HPF for Real Application, July 3-4, 1996, Vienna, Austria


Name: _______________________________________________________________________

Affiliation: ________________________________________________________________

Address: ____________________________________________________________________

____________________________________  Telephone: ____________________________

Fax: ________________________________  E-mail: ______________________________


I wish to attend (please cross): 
o the HPF Tutorial on day one only (July 1)
o the HPF Tutorial on both days (July 1/2)
o the HPF Workshop on July 3/4  

Please send me further information on:
o the HPF Tutorial on day one only (July 1)             
o the HPF Tutorial on both days (July 1/2)
o the HPF Workshop on July 3/4  

o Please send me information on future workshops and tutorials  

o Please send me hotel information 


     Arrival date: ___________________   Departure date: ____________________


Date: ________________________________   Signature: _________________________


Fees

Payment should be enclosed if you register for the tutorial or the
workshop. Please make cheques payable to VCPC. All payments must be in
Austrian Schillings.  Fees include refreshments and lunch on each day
of the events for which you register.

Day one only (July 1):                      Until June 20       After June 20
        Academic, ESPRIT/ACTS projects:     1700 ATS            2100 ATS
        Industry:                           2300 ATS            2800 ATS  

Day one and two (July 1/2):                 Until June 20       After June 20
        Academic, ESPRIT/ACTS projects:     2300 ATS            2700 ATS
        Industry:                           2850 ATS            3500 ATS    

Workshop (July 3/4):                        Until June 20       After June 20
        Academic, ESPRIT/ACTS projects:     1800 ATS            2200 ATS
        Industry:                           2400 ATS            2800 ATS    


I qualify for the       o Academic, project fee         o Industry fee 

Name of project: ____________________________________________________________


Method of payment:      o Enclosed Cheque       o American Express
                o Eurocard/Mastercard   o Visa  
                                o Diners Club 

Total amount of payment: ____________________________________________________

Credit Card Number: _____________________________   Exp. date: ______________

Cardholder Name: ____________________________________________________________

Date: _________________________________   Signature: ________________________


European Centre for Parallel Computing at Vienna, (VCPC)
Liechtensteinstr. 22, A-1090 Vienna, Austria
Tel: +43-1-3109396-10, Fax: +43-1-3109396-13, E-mail: info@vcpc.univie.ac.at
WWW: http://www.vcpc.univie.ac.at


---------------------------------------------------------------------------
To (un)subscribe to this list, send mail to hpff-core-request@cs.rice.edu.
Leave the subject line blank, and in the body put the line
(un)subscribe <email-address>
---------------------------------------------------------------------------

From owner-hpff-core  Fri May 24 14:21:31 1996
Received: (from daemon@localhost) by cs.rice.edu (8.7.1/8.7.1) id OAA14604 for hpff-core-out; Fri, 24 May 1996 14:21:31 -0500 (CDT)
Received: from squid.icase.edu (squid.icase.edu [128.155.142.43]) by cs.rice.edu (8.7.1/8.7.1) with SMTP id OAA14599 for <hpff-core@cs.rice.edu>; Fri, 24 May 1996 14:21:28 -0500 (CDT)
Received: from localhost by squid.icase.edu with SMTP 
	(8.6.11/lanleaf8.6.4) id PAA16066; Fri, 24 May 1996 15:21:21 -0400
Date: Fri, 24 May 1996 15:21:21 -0400 (EDT)
From: Piyush Mehrotra <pm@icase.edu>
To: hpff-core@cs.rice.edu
Subject: hpff-core: JAVA Workshop at ICASE
Message-ID: <Pine.SUN.3.93.960524152055.15391Q-100000@squid.icase.edu>
Phone: (804)864-2188
Fax: (804)864-6134
WWW: http://www.icase.edu/~pm
Address: ICASE MS 132C NASA Langley Research Center Hampton VA 23681
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-hpff-core
Precedence: bulk

---------------------------------------------------------------------------
hpff-core@cs.rice.edu is a mailing list for announcements related to High
Performance Fortran.  Instructions for adding or deleting yourself
from this list appear at the bottom of this message.
---------------------------------------------------------------------------


ICASE is organizing a tutorial and workshop on Java and Web Technologies
at NASA Langley Research Center, Hampton VA during the week of June 10-14.
The first four days consists of  a tutorial given by Geoffrey Fox. 

The last day (Friday) is a workshop on "Java and Web Technologies for
Scientific Computing". The speakers include

    Dennis Gannon, Indiana
    Jim Browne, Univ. of Texass 
    Wojtek Furmanski, NPAC
    Mani Chandy, Caltech 
    Micah Beck, Tennessee

More information about the short course and the workshop can be found at
http://www.icase.edu/workshops/java. 

Information about ICASE can be found at http://www.icase.edu


---------------------------------------------------------------------------
To (un)subscribe to this list, send mail to hpff-core-request@cs.rice.edu.
Leave the subject line blank, and in the body put the line
(un)subscribe <email-address>
---------------------------------------------------------------------------