From owner-hpff-doc  Tue Nov  5 15:40:15 1996
Received: (from daemon@localhost) by cs.rice.edu (8.7.1/8.7.1) id PAA16411 for hpff-doc-out; Tue, 5 Nov 1996 15:40:15 -0600 (CST)
Received: from mail12.digital.com (mail12.digital.com [192.208.46.20]) by cs.rice.edu (8.7.1/8.7.1) with ESMTP id PAA16405 for <hpff-doc@cs.rice.edu>; Tue, 5 Nov 1996 15:40:10 -0600 (CST)
Received: from mpsg.hpc.pko.dec.com by mail12.digital.com (8.7.5/UNX 1.5/1.0/WV)
	id QAA29259; Tue, 5 Nov 1996 16:30:08 -0500 (EST)
Received: by mpsg.hpc.pko.dec.com; id AA11791; Tue, 5 Nov 1996 16:33:27 -0500
From: offner@hpc.pko.dec.com (Carl Offner)
Received: by hardy.hpc.pko.dec.com; (5.65v3.2/1.1.8.2/01Nov94-0839AM)
	id AA19779; Tue, 5 Nov 1996 16:30:03 -0500
Date: Tue, 5 Nov 1996 16:30:03 -0500
Message-Id: <9611052130.AA19779@hardy.hpc.pko.dec.com>
To: hpff-doc@cs.rice.edu
Subject: hpff-doc: mapping chapters
Sender: owner-hpff-doc
Precedence: bulk

---------------------------------------------------------------------------
hpff-doc@cs.rice.edu is a mailing list for HPF 2.0 language specification
authors and editors.  Instructions for adding or deleting yourself from this
list appear at the bottom of this message.
---------------------------------------------------------------------------

I am following this with new versions of the three mapping chapters:

	mapping-base.tex
	mapping-subr.tex
	mapping-ext.tex

There are some small and inessential wording changes in mapping-subr.tex,
made for reasons of clarification.  The only real change is this:

The section on pointers was moved from mapping-subr.tex to mapping-ext.tex
The reason for this is that mapped pointers are not part of HFP 2.0, but
are an approved extension.  In this connection, I made (after consultation
with a few others) two additional constraints in mapping-base.tex to
the effect that objects with the TARGET attribute cannot be explicitly
mapped.  (This is already true for pointers in that chapter.)  I don't
think this is controversial, but we will bring it up at the December
meeting for a formal vote.

		--Carl Offner
---------------------------------------------------------------------------
To (un)subscribe to this list, send mail to hpff-doc-request@cs.rice.edu.
Leave the subject line blank, and in the body put the line
(un)subscribe <email-address>
---------------------------------------------------------------------------

From owner-hpff-doc  Tue Nov  5 15:43:47 1996
Received: (from daemon@localhost) by cs.rice.edu (8.7.1/8.7.1) id PAA16537 for hpff-doc-out; Tue, 5 Nov 1996 15:43:47 -0600 (CST)
Date: Tue, 5 Nov 1996 15:43:47 -0600 (CST)
Message-Id: <199611052143.PAA16537@cs.rice.edu>
From: offner@hpc.pko.dec.com (Carl Offner)
Subject: hpff-doc: mapping-subr.tex
Sender: owner-hpff-doc
Precedence: bulk

---------------------------------------------------------------------------
hpff-doc@cs.rice.edu is a mailing list for HPF 2.0 language specification
authors and editors.  Instructions for adding or deleting yourself from this
list appear at the bottom of this message.
---------------------------------------------------------------------------
% File: mapping-subr.tex

% Contents:
% Mapping constructs for dummy arguments for HPF 2.0 document,
% including
%       interface rules
%       INHERIT
%
% If you don't have LaTeX2e available, uncomment the next three lines:
%\def\emph#1{{\em #1}}
%\def\texttt#1{{\tt #1}}
%\def\textit#1{{\it #1}}

% Revision history:
% Nov-05-96     Edit by Carl Offner, Digital.  A few simple
%               clarifications based on suggestions from Rob
%               Schreiber.  I also moved the section on pointers to
%               the mapping-ext.tex chapter, since mapped pointers are
%               now an approved extension, not in HPF 2.0.
%
% Oct-18-96     Edit by Carl Offner, Digital.  Made some
%               clarifications based on suggestions from Carol
%               Munroe.  The only substantive change was a change in
%               the beginning of item 2 in the definition of
%               ``specialization'':  In this case, S cannot now have
%               the INHERIT attribute.  This was probably the
%               understood intent in any case; I don't believe it
%               restricts the definition in any essential way.
%
% Sep-26-96     Edit by Carl Offner, Digital.  Added section defining
%               the partial order on mappings.  Specified that
%               descriptive syntax is the same as prescriptive syntax
%               together with a (weak) assertion by the programmer
%               that no remapping is necessary.
%
% Sep-04-96     Edit by Carl Offner, Digital.  Added forward reference
%               to some approved extensions that allow subroutines to
%               permanently modify data.  Also added new subsection
%               making it clear that explicit mapping directives are
%               characteristics of dummy arguments and function
%               results.

% Aug-13-96     Edit by Carl Offner, Digital.  Some clarifications and
%               corrections of typos, based on suggestions from Rob
%               Schreiber.
%
% Aug-01-96     Edit by Carl Offner, Digital.  Based on
%               decisions taken at the July HPFF meeting,
%               i)  DISTRIBUTE is no longer allowed on INHERITed
%                   arguments.
%               ii) The section describing when an explicit interface
%                   is not needed when arguments are explicitly mapped
%                   has been vastly tightened up.
%               In addition, descriptive syntax has been more clearly
%               identified as being left in purely for backward
%               compatibility; and the TERPSICHORE-FRUG example has
%               been reworked.
%
% Jun-03-96     Edit by Carl Offner, Digital.
%               Rearrangements, many edits, and two new sections (an
%               introduction and the section on when an explicit
%               interface is necessary).
%
% May-10-96     Created by Charles Koelbel, Rice University
%               (from HPF 1.2 document and HPF 2.0 proposals)

\chapter{Data Mapping in Subprogram Interfaces}
\label{ch-mapping-subr}




\emph{In this section, phrases such as ``the caller must pass\dots''
  are constraints on the implementation (i.e., on the generated code
  produced by the compiler), not on the source code produced by the
  programmer.}



\section{Introduction}
\label{mapsub:introduction}

\emph{This introduction gives an overview of the ways in which mapping
directives interact with argument passing to subprograms.  The
language used here, however, is not definitive; the subsequent
subsections of this section contain the authoritative rules.}

In addition to the data mapping features described in
Section~\ref{ch-mapping-base}, HPF allows a number of options for
describing the mapping of dummy arguments.

The mapping of each such dummy argument may be related to the mapping
of its associated actual argument in the calling main program or
procedure (the ``caller'') in several different ways.  To allow for
this, mapping directives applied to dummy arguments can have three
different syntactic forms: \emph{prescriptive}, \emph{descriptive},
and \emph{transcriptive}.

HPF provides these three forms to allow the programmer either to
specify that the data is to be left in place, or to specify that
during the execution of the call the data must be automatically
remapped into a new and presumably more efficient mapping for the
duration of the execution of the called subprogram.

The meaning of these forms is as follows:

\begin{description}

\item[prescriptive] The directive describes the mapping of the dummy
  argument.  However, the actual argument need not have this mapping.
  \emph{If it does not}, it is the responsibility of the compiler to
  generate code to remap the argument as specified, and to restore the
  original mapping on exit.  This code may be generated in either the
  caller or in the called subprogram; the requirements for explicit
  interfaces in section~\ref{mapsub:ExplicitInterfaces} insure that
  the necessary information will be available at compile-time to
  perform the mapping in either place.

  Prescriptive directives are syntactically identical to directives
  occurring elsewhere in the program.  For instance, if \texttt{A} is
  a dummy argument,

                                                                \CODE
!HPF$ DISTRIBUTE A (BLOCK, CYCLIC)
                                                                \EDOC
  is a prescriptive directive.

\item[descriptive] Descriptive syntax has exactly the same meaning as
  prescriptive syntax, except that in addition it amounts to a weak
  assertion by the programmer that the actual argument requires no
  remapping.

  The assertion is characterized as ``weak'' because if it is false,
  the program is still standard-conforming.  In such a case, the
  compiler must generate the appropriate remapping.

  If the compiler can prove that the assertion is false, or if the
  compiler cannot verify that it is true, it may issue a warning or
  informational diagnostic message.

  \begin{users}
    The purpose of descriptive, as opposed to prescriptive, directives
    is simply to provide a possible way for the compiler to report
    information to the programmer that may be useful in program
    development and debugging.

    Note that any diagnostic message thay may be produced as a result
    of the use of descriptive directives is not a portable feature of
    this language.  In particular, there are instances in which no
    remapping is needed but where this fact would be impossible or
    highly non-trivial for a compiler to ascertain.  Different
    compilers may well emit messages in different circumstances; and
    there is no requirement that any such messages be emitted at all.
  \end{users}

  Descriptive directives look like prescriptive directives, except
  that an asterisk precedes the description.  For instance,
                                                                \CODE
!HPF$ DISTRIBUTE A *(BLOCK, CYCLIC)
                                                                \EDOC
  is a descriptive directive.

\item[transcriptive] The mapping is unspecified.  The called
  subprogram must accept the mapping of the argument as it is passed.
  (Of course this means that the caller must pass this mapping
  information at run-time.)

  Transcriptive directives are written with a single asterisk for
  distributions and processor arrays; for instance
                                                                \CODE
!HPF$ DISTRIBUTE A *
                                                                \EDOC
  is a transcriptive directive.  The \texttt{INHERIT} directive
  (see section~\ref{INHERIT-SECTION}) is used to specify a transcriptive
  alignment.

\end{description}

Both distribution formats and processor arrangements can be specified
prescriptively, descriptively, or transcriptively.  Alignment is more
complicated, because of the need to specify the template with which
the dummy is aligned.  This template may be unspecified (in this case
of course there is no \texttt{ALIGN} directive), in which case it is
the \emph{natural template} of the dummy.  (``Natural template'' is
defined in section~\ref{mapsub:templates} below.)  Otherwise, one of
the following disjoint possibilities must be true:

\begin{itemize}

\item The template is explicitly specified by a prescriptive
  \texttt{ALIGN} directive.

\item The template is explicitly specified by a descriptive
  \texttt{ALIGN} directive.

\item The template is \emph{inherited}.  This is specified by giving
  the dummy the \texttt{INHERIT} attribute (described in
  section~\ref{INHERIT-SECTION} below).  This implicitly specifies the
  template to be a copy of the template with which the corresponding
  actual argument is ultimately aligned; further, the alignment of the
  dummy with that template is the same as that of the corresponding
  actual.  This is in effect a transcriptive form of alignment.

\end{itemize}

This is restated more precisely in section~\ref{mapsub:templates}
below.

\begin{users}
  Although it is possible to write some combinations of mapping
  directives that are partially prescriptive and partially
  transcriptive, for instance, there is probably no virtue to so
  doing.  The point of these directives is to enable the compiler to
  handle any necessary remapping correctly and efficiently.  Now
  remapping can happen for one or more of the following reasons:

  \begin{itemize}
  \item  to make the alignment of the actual and the dummy agree;
  \item  to make the distribution of the actual and the dummy agree;
  \item  to make the processor array of the actual and the dummy agree.
  \end{itemize}

  For most machines, there is no real difference in the cost of
  remapping for any of these reasons.  It is therefore a better
  practice (for readability, at least) to make a mapping either purely
  transcriptive, purely prescriptive, or purely descriptive.

  While transcriptive mappings can be useful in writing libraries,
  they impose a run-time cost on the subprogram.  They should
  therefore be avoided in normal user code.
\end{users}

\section{What Remapping is Required, and Who Does It}

If there is an explicit interface for the called subprogram and that
interface contains prescriptive or descriptive mapping directives for
a dummy argument, and if a remapping of the corresponding actual
argument is necessary, the call should proceed as if the data was
copied to a temporary variable to match the mapping of the dummy
argument as expressed by the directives in the explicit interface.
The template of the dummy will then be as declared in the interface.

If there is no explicit interface, then no remapping will be
necessary; this is a consequence of the requirements in
section~\ref{mapsub:ExplicitInterfaces}.

An overriding principle is that \emph{any remapping of arguments is
not visible to the caller}.  That is, when the subprogram returns and
the caller resumes execution, all objects accessible to the caller
after the call are mapped exactly as they were before the call.  It is
not possible for a procedure to change the mapping of any object in a
manner visible to its caller.

\begin{users}
Some Approved Extensions relax this restriction; see for instance
sections~\ref{DYNAMIC-DUMMY-SECTION} and \ref{POINTERS-SECTION}.
\end{users}

\section{Distributions and Processor Arrangements}
\label{mapsub:DistProcArr}

In a \texttt{DISTRIBUTE} directive where every \textit{distributee} is
a dummy argument, either the \textit{dist-format-clause} or the
\textit{dist-target}, or both, may begin with, or consist of, an
asterisk.

\begin{itemize}

\item Without an asterisk, a \textit{dist-format-clause} or
  \textit{dist-target} is prescriptive; the clause describes a
  distribution and constitutes a request of the language processor to
  make it so.  This might entail either the caller or the called
  subprogram remapping or copying the actual argument on entry at run
  time in order to satisfy the requested distribution for the dummy.

\item Starting with an asterisk, a \textit{dist-format-clause} or
  \textit{dist-target} is descriptive.  Such a directive is equivalent
  in every respect to a prescriptive directive, except that if the
  compiler cannot verify that no remapping of the actual is required,
  it may issue a diagnostic message to that effect.  See
  section~\ref{mapsub:introduction} for further information on this
  point.

\item Consisting of only an asterisk, a \textit{dist-format-clause} or
  \textit{dist-target} is transcriptive; the clause says nothing about
  the distribution but constitutes a request of the language processor
  to copy that aspect of the distribution from that of the actual
  argument.  (The intent is that if the argument is passed by
  reference, no movement of the data will be necessary at run time.)
\end{itemize}

It is possible that, in a single \texttt{DISTRIBUTE} directive, the
\textit{dist-format-clause} might have an asterisk but not the
\textit{dist-target}, or vice versa.

\subsection{Examples}

These examples of \texttt{DISTRIBUTE} directives for dummy arguments
illustrate the various combinations:

                                                                        \CODE
!HPF$ DISTRIBUTE URANIA (CYCLIC) ONTO GALILEO
                                                                        \EDOC
The language processor should do whatever it takes to cause
\texttt{URANIA} to have a \texttt{CYCLIC} distribution on the
processor arrangement \texttt{GALILEO}.
                                                                        \CODE
!HPF$ DISTRIBUTE POLYHYMNIA * ONTO ELVIS
                                                                        \EDOC
The language processor should do whatever it takes to cause
\texttt{POLYHYMNIA} to be distributed onto the processor arrangement
\texttt{ELVIS}, using whatever distribution format it currently has
(which might be on some other processor arrangement).
                                                                        \CODE
!HPF$ DISTRIBUTE THALIA *(CYCLIC) ONTO *FLIP
                                                                        \EDOC
The language processor should do whatever it takes to cause
\texttt{THALIA} to have a \texttt{CYCLIC} distribution on the
processor arrangement \texttt{FLIP}; the programmer believes that the
actual is already distributed in this fashion and that no remapping is
required.
                                                                        \CODE
!HPF$ DISTRIBUTE EUTERPE (CYCLIC) ONTO *
                                                                        \EDOC
The language processor should do whatever it takes to cause
\texttt{EUTERPE} to have a \texttt{CYCLIC} distribution onto whatever
processor arrangement the actual was distributed onto.
                                                                        \CODE
!HPF$ DISTRIBUTE ERATO * ONTO *
                                                                        \EDOC
The mapping of \texttt{ERATO} should not be changed from that of the
actual argument.

Also note that \texttt{DISTRIBUTE ERATO * ONTO *} does not mean the
same thing as
                                                                        \CODE
!HPF$ DISTRIBUTE ERATO (*) ONTO *
                                                                        \EDOC
This latter means: distribute \texttt{ERATO} \texttt{*} (that is,
on-processor) onto whatever processor arrangement the actual was
distributed onto.  The processor arrangement is necessarily scalar in
this case.

\subsection{What Happens When a Clause is Omitted}

One may omit either the \textit{dist-format-clause} or the
\textit{dist-onto-clause} for a dummy argument.  This is understood as
follows:

If the dummy argument has the \texttt{INHERIT} attribute (see
section~\ref{INHERIT-SECTION}), then no distribution directive is
allowed in any case:  the distribution as well as the alignment is
inherited from the actual argument.

In any other case in which distribution information is omitted, the
compiler may choose the distribution format or a target processor
arrangement arbitrarily.

Here are two examples:
                                                                         \CODE
!HPF$ DISTRIBUTE WHEEL_OF_FORTUNE *(CYCLIC)
                                                                         \EDOC
The programmer believes that the actual argument corresponding to the
dummy argument \texttt{WHEEL_OF_FORTUNE} is already distributed
\texttt{CYCLIC}.  The compiler should insure that the mapping of the
passed data is in fact \texttt{CYCLIC}, and remap it if necessary if
it is not.  It may in addition be remapped onto some other processor
arrangement, but there is no reason to; most likely the programmer
would be surprised if such a remapping occurred.
                                                                          \CODE
!HPF$ DISTRIBUTE ONTO *TV :: DAVID_LETTERMAN
                                                                          \EDOC
The programmer believes that the actual argument corresponding to the
dummy argument \texttt{DAVID_LETTERMAN} is already distributed onto
\texttt{TV} in some fashion.  The compiler should insure that this is
so, and make it so if it is not.  The distribution format may be
changed as long as \texttt{DAVID_LETTERMAN} is kept on \texttt{TV}.
(Note that this declaration must be made in attributed form; the
statement form
                                                                        \CODE
!HPF$ DISTRIBUTE DAVID_LETTERMAN ONTO *TV         !Nonconforming
                                                                        \EDOC
does not conform to the syntax for a \texttt{DISTRIBUTE} directive.)




\section{Alignment}

\subsection{The Template of the Dummy Argument}
\label{mapsub:templates}

Here we describe precisely how to determine the template with which the dummy
argument is ultimately aligned:

First, templates are not passed through the subprogram argument
interface.  A dummy argument and its corresponding actual argument may
be aligned to the same template only if that template is accessible in
both the caller and the called subprogram either through host
association or use association.

In any other case, the template with which a dummy argument is aligned is
always distinct from the template with which the actual argument is
aligned, though it may be a copy (see section \ref{INHERIT-SECTION}).
On exit from a procedure, an HPF implementation arranges that the
actual argument is aligned with the same template with which it was
aligned before the call.

The template of the dummy argument is arrived at in one of three ways:

\begin{itemize}
\item If the dummy argument appears explicitly as an \textit{alignee}
  in an \texttt{ALIGN} directive, its template is the
  \textit{align-target} if the \textit{align-target} is a template;
  otherwise its template is the template with which the
  \textit{align-target} is ultimately aligned.

\item If the dummy argument is not explicitly aligned and does not
  have the \texttt{INHERIT} attribute (described in
  section~\ref{INHERIT-SECTION} below), then the template has the same
  shape and bounds as the dummy argument; this is called the
  \textit{natural template} for the dummy.

  (Thus, all the examples in section~\ref{mapsub:DistProcArr} use
  the natural template.)

\item If the dummy argument is not explicitly aligned and does have
  the \texttt{INHERIT} attribute, then the template is ``inherited'' from
  the actual argument according to the following rules:

  \begin{itemize}

  \item If the actual argument is a whole array, the template of the
    dummy is a copy of the template with which the actual argument is
    ultimately aligned.

  \item If the actual argument is an array section of array \(A\)
    where no subscript is a vector subscript, then the template of the
    dummy is a copy of the template with which \(A\) is ultimately
    aligned.

  \item If the actual argument is any other expression, the shape
    and distribution of the template may be chosen arbitrarily by
    the language processor (and therefore the programmer cannot know
    anything \textit{a priori} about its distribution).

  \end{itemize}

  In all of these cases, we say that the dummy has an \textit{inherited
  template}.

\end{itemize}


\subsection{The INHERIT Directive}
\label{INHERIT-SECTION}

The \texttt{INHERIT} directive specifies that a dummy argument should be
aligned to a copy of the template of the corresponding actual argument
in the same way that the actual argument is aligned.

                                                                        \BNF
inherit-directive      \IS  INHERIT inheritee-list

inheritee               \IS object-name
                                                                        \FNB

\begin{constraints}

\item An \textit{inheritee} must be a dummy argument.
\item An \textit{inheritee} must not be an \textit{alignee}.
\item An \textit{inheritee} must not be a \textit{distributee}.

\end{constraints}

\begin{users}
The first of these three constraints is relaxed for pointers under the
approved extensions (see section~\ref{POINTERS-SECTION}).
\end{users}

The \texttt{INHERIT} directive causes the named subprogram dummy
arguments to have the \texttt{INHERIT} attribute.  Only dummy
arguments may have the \texttt{INHERIT} attribute.  An object must not
have both the \texttt{INHERIT} attribute and the \texttt{ALIGN}
attribute.  The \texttt{INHERIT} directive may appear only in a
\textit{specification-part} of a scoping unit.

If a dummy argument has the \texttt{TARGET} attribute and no explicit
mapping attributes, then the \texttt{INHERIT} attribute is implicitly
assumed.

The \texttt{INHERIT} attribute specifies that the template for a dummy
argument should be inherited, by making a copy of the template of the
actual argument.  Moreover, no other explicit mapping directive may
appear for an argument with the \texttt{INHERIT} attribute: the
\texttt{INHERIT} attribute implies a distribution of
\texttt{DISTRIBUTE~*~ONTO~*} for the inherited template.  Thus, the
net effect is to tell the compiler to leave the data exactly where it
is, and not attempt to remap the actual argument.  The dummy argument
will be mapped in exactly the same manner as the actual argument; the
subprogram must be compiled in such a way as to work correctly no
matter how the actual argument may be mapped onto abstract processors.

Note that if \texttt{A} is an array dummy argument, the directive
                                                                        \CODE
!HPF$ INHERIT A
                                                                        \EDOC
is more general than
                                                                        \CODE
!HPF$ DISTRIBUTE A * ONTO *
                                                                        \EDOC
for the following reason: The \texttt{INHERIT} directive states that
the (inherited) template with which \texttt{A} is aligned is
distributed \texttt{* ONTO *}, but that \texttt{A} may be aligned in
some non-trivial manner with that template. On the other hand, the
\texttt{DISTRIBUTE} directive states that \texttt{A} is aligned
trivially with its natural template, which in turn is distributed
\texttt{* ONTO *}.

For example, the following code is not permitted:
                                                                        \CODE
!HPF$ PROCESSORS P(2)
      REAL, DIMENSION(100) :: A
!HPF$ DISTRIBUTE (BLOCK) ONTO P :: A

      CALL FOO(A(1:50))

      ...

      SUBROUTINE FOO(D)
      REAL, DIMENSION(50) :: D
!HPF$ DISTRIBUTE D *            ! Illegal
                                                                        \EDOC
The transcriptive distribution for \texttt{D} is illegal because the
natural template for \texttt{D} is not distributed \texttt{BLOCK}.  On
the other hand, it would be legitimate to replace the illegal
directive by
                                                                        \CODE
!HPF$ INHERIT D
                                                                        \EDOC


\subsubsection{Examples}

Here is a straightforward example of the use of \texttt{INHERIT}:

                                            \CODE

      REAL DOUGH(100)
!HPF$ DISTRIBUTE DOUGH(BLOCK(10))
      CALL PROBATE( DOUGH(7:23:2) )
      ...
      SUBROUTINE PROBATE(BREAD)
      REAL BREAD(9)
!HPF$ INHERIT BREAD
                                             \EDOC

The inherited template of \texttt{BREAD} has shape [100]; element
\texttt{BREAD(I)} is aligned with element 5 + 2*I of the inherited
template, and that template has a \texttt{BLOCK(10)} distribution.


More complicated examples can easily be constructed.  It is important
to bear in mind that the inherited template may have a different rank
than the rank of the dummy, and it may even have a different rank than
the rank of the actual.  For instance, one might have a program
containing the following:

                                                \CODE
      REAL A(100,100)
!HPF$ TEMPLATE T(100,100,100)
!HPF$ DISTRIBUTE T(BLOCK,CYCLIC,*)
!HPF$ ALIGN A(I,J) with T(J,3,I)
      CALL SUBR(A(:,7))
      ...
      SUBROUTINE SUBR(D)
      REAL D(100)
!HPF$ INHERIT D
                                                 \EDOC

In this case, the dummy \texttt{D} has rank 1.  It corresponds to a
1-dimensional section of a 2-dimensional actual \texttt{A} which in
turn is aligned with a 2-dimensional section of a 3-dimensional
template \texttt{T}.  The template of \texttt{D} is a copy of this
three-dimensional template.  \texttt{D} is aligned with the section
\texttt{(7, 3, :)} of this inherited template.  Thus, the ``visible''
dimension of the dummy \texttt{D} is distributed \texttt{*}, although
if the call statement had been

                                                \CODE
      CALL SUBR(A(7,:))
                                                 \EDOC

\noindent
for instance, the ``visible'' dimension of the dummy would be
distributed \texttt{BLOCK}.


\subsection{ALIGN Directive}

The presence or absence of an asterisk at the start of an
\textit{align-spec} has the same meaning as in a
\textit{dist-format-clause}: it specifies whether the \texttt{ALIGN}
directive is descriptive or prescriptive, respectively.

If an \textit{align-spec} that does not begin with \texttt{*} is
applied to a dummy argument, the meaning is that the dummy argument
will be forced to have the specified alignment on entry to the
subprogram.  This may require either the caller or the subprogram to
temporarily remap the data of the actual argument or a copy thereof.

Note that a dummy argument may also be used as an \textit{align-target}.
                                                                        \CODE
      SUBROUTINE NICHOLAS(TSAR,CZAR)
      REAL, DIMENSION(1918) :: TSAR,CZAR
!HPF$ INHERIT :: TSAR
!HPF$ ALIGN WITH TSAR :: CZAR
                                                                        \EDOC

In this example the first dummy argument, \texttt{TSAR}, remains
aligned with the corresponding actual argument, while the second dummy
argument, \texttt{CZAR}, is forced to be aligned with the first dummy
argument.  If the two actual arguments are already aligned, no
remapping of the data will be required at run time.  If they are not,
some remapping will take place.

If the \textit{align-spec} begins with ``\texttt{*}'', then the
\textit{alignee} must be a dummy argument.  The ``\texttt{*}''
indicates that the programmer believes that the actual argument
already has the specified alignment, and that no action to remap it is
required at run time.  (As before, there is no requirement that the
programmer's belief is correct, and the compiler must generate a
remapping if one appears to be necessary, just as in the case of a
prescriptive alignment.)  For example, if in the above example the
alignment directive were changed to

                                                                        \CODE
!HPF$ ALIGN WITH *TSAR :: CZAR
                                                                        \EDOC
then the programmer is expressing a belief that no remapping of the
actual argument corresponding to TSAR will be necessary.

It is not permitted to say simply ``\texttt{ALIGN WITH *}''; an
\textit{align-target} must follow the asterisk.  (The proper way to
say ``accept any alignment'' is \texttt{INHERIT}.)

If a dummy argument has no explicit \texttt{ALIGN} or
\texttt{DISTRIBUTE} attribute, then the compiler provides an implicit
alignment and distribution specification, one that could have been
described explicitly without any ``assertion asterisks''.

\subsubsection{Example}

Without using \texttt{INHERIT}, explicit alignment of a dummy argument
may be necessary to insure that no remapping takes place at the
subprogram boundary.  Here is an example:

                                                                       \CODE
      LOGICAL FRUG(128)
!HPF$ PROCESSORS DANCE_FLOOR(16)
!HPF$ DISTRIBUTE (BLOCK) ONTO DANCE_FLOOR::FRUG
      CALL TERPSICHORE(FRUG(1:40:3))
                                                                        \EDOC

The array section \texttt{FRUG(1:40:3)} is mapped onto abstract
processors in the following manner:
\begin{center}
\setlength{\unitlength}{0.01in}
\begin{picture}(560,225)(0,0)
\put(0,200){\makebox(35,25){\small\rm 1}}
\put(35,200){\makebox(35,25){\small\rm 2}}
\put(70,200){\makebox(35,25){\small\rm 3}}
\put(105,200){\makebox(35,25){\small\rm 4}}
\put(140,200){\makebox(35,25){\small\rm 5}}
\put(175,200){\makebox(35,25){\small\rm 6}}
\put(210,200){\makebox(35,25){\small\rm 7}}
\put(245,200){\makebox(35,25){\small\rm 8}}
\put(280,200){\makebox(35,25){\small\rm 9}}
\put(315,200){\makebox(35,25){\small\rm 10}}
\put(350,200){\makebox(35,25){\small\rm 11}}
\put(385,200){\makebox(35,25){\small\rm 12}}
\put(420,200){\makebox(35,25){\small\rm 13}}
\put(455,200){\makebox(35,25){\small\rm 14}}
\put(490,200){\makebox(35,25){\small\rm 15}}
\put(525,200){\makebox(35,25){\small\rm 16}}
\thinlines
\multiput(0,25)(0,25){7}{\line(1,0){560}}
\thicklines
\multiput(0,0)(0,200){2}{\line(1,0){560}}
\multiput(0,0)(35,0){17}{\line(0,1){200}}
\put(0,175){\makebox(35,25){\tt 1}}
\put(0,100){\makebox(35,25){\tt 4}}
\put(0,25){\makebox(35,25){\tt 7}}
\put(35,150){\makebox(35,25){\tt 10}}
\put(35,75){\makebox(35,25){\tt 13}}
\put(35,0){\makebox(35,25){\tt 16}}
\put(70,125){\makebox(35,25){\tt 19}}
\put(70,50){\makebox(35,25){\tt 22}}
\put(105,175){\makebox(35,25){\tt 25}}
\put(105,100){\makebox(35,25){\tt 28}}
\put(105,25){\makebox(35,25){\tt 31}}
\put(140,150){\makebox(35,25){\tt 34}}
\put(140,75){\makebox(35,25){\tt 37}}
\put(140,0){\makebox(35,25){\tt 40}}
\end{picture}
\end{center}

Suppose first that the interface to the subroutine
\texttt{TERPSICHORE} looks like this:

                                                                        \CODE
      SUBROUTINE TERPSICHORE(FOXTROT)
      LOGICAL FOXTROT(:)
!HPF$ INHERIT FOXTROT
                                                                        \EDOC
The template of \texttt{FOXTROT} is a copy of the 128 element
template of the whole array \texttt{FRUG}.  The template is mapped like this:

\begin{center}
\setlength{\unitlength}{0.01in}
\begin{picture}(560,225)(0,0)
\put(0,200){\makebox(35,25){\small\rm 1}}
\put(35,200){\makebox(35,25){\small\rm 2}}
\put(70,200){\makebox(35,25){\small\rm 3}}
\put(105,200){\makebox(35,25){\small\rm 4}}
\put(140,200){\makebox(35,25){\small\rm 5}}
\put(175,200){\makebox(35,25){\small\rm 6}}
\put(210,200){\makebox(35,25){\small\rm 7}}
\put(245,200){\makebox(35,25){\small\rm 8}}
\put(280,200){\makebox(35,25){\small\rm 9}}
\put(315,200){\makebox(35,25){\small\rm 10}}
\put(350,200){\makebox(35,25){\small\rm 11}}
\put(385,200){\makebox(35,25){\small\rm 12}}
\put(420,200){\makebox(35,25){\small\rm 13}}
\put(455,200){\makebox(35,25){\small\rm 14}}
\put(490,200){\makebox(35,25){\small\rm 15}}
\put(525,200){\makebox(35,25){\small\rm 16}}
\thinlines
\multiput(0,25)(0,25){7}{\line(1,0){560}}
\thicklines
\multiput(0,0)(0,200){2}{\line(1,0){560}}
\multiput(0,0)(35,0){17}{\line(0,1){200}}
\put(0,175){\makebox(35,25){\tt 1}}
\put(0,150){\makebox(35,25){\tt 2}}
\put(0,125){\makebox(35,25){\tt 3}}
\put(0,100){\makebox(35,25){\tt 4}}
\put(0,75){\makebox(35,25){\tt 5}}
\put(0,50){\makebox(35,25){\tt 6}}
\put(0,25){\makebox(35,25){\tt 7}}
\put(0,0){\makebox(35,25){\tt 8}}
\put(35,175){\makebox(35,25){\tt 9}}
\put(35,150){\makebox(35,25){\tt 10}}
\put(35,125){\makebox(35,25){\tt 11}}
\put(35,100){\makebox(35,25){\tt 12}}
\put(35,75){\makebox(35,25){\tt 13}}
\put(35,50){\makebox(35,25){\tt 14}}
\put(35,25){\makebox(35,25){\tt 15}}
\put(35,0){\makebox(35,25){\tt 16}}
\put(70,175){\makebox(35,25){\tt 17}}
\put(70,150){\makebox(35,25){\tt 18}}
\put(70,125){\makebox(35,25){\tt 19}}
\put(70,100){\makebox(35,25){\tt 20}}
\put(70,75){\makebox(35,25){\tt 21}}
\put(70,50){\makebox(35,25){\tt 22}}
\put(70,25){\makebox(35,25){\tt 23}}
\put(70,0){\makebox(35,25){\tt 24}}
\put(105,175){\makebox(35,25){\tt 25}}
\put(105,150){\makebox(35,25){\tt 26}}
\put(105,125){\makebox(35,25){\tt 27}}
\put(105,100){\makebox(35,25){\tt 28}}
\put(105,75){\makebox(35,25){\tt 29}}
\put(105,50){\makebox(35,25){\tt 30}}
\put(105,25){\makebox(35,25){\tt 31}}
\put(105,0){\makebox(35,25){\tt 32}}
\put(140,175){\makebox(35,25){\tt 33}}
\put(140,150){\makebox(35,25){\tt 34}}
\put(140,125){\makebox(35,25){\tt 35}}
\put(140,100){\makebox(35,25){\tt 36}}
\put(140,75){\makebox(35,25){\tt 37}}
\put(140,50){\makebox(35,25){\tt 38}}
\put(140,25){\makebox(35,25){\tt 39}}
\put(140,0){\makebox(35,25){\tt 40}}
\put(175,175){\makebox(35,25){\tt 41}}
\put(175,150){\makebox(35,25){\tt 42}}
\put(175,125){\makebox(35,25){\tt 43}}
\put(175,100){\makebox(35,25){\tt 44}}
\put(175,75){\makebox(35,25){\tt 45}}
\put(175,50){\makebox(35,25){\tt 46}}
\put(175,25){\makebox(35,25){\tt 47}}
\put(175,0){\makebox(35,25){\tt 48}}
\put(210,175){\makebox(35,25){\tt 49}}
\put(210,150){\makebox(35,25){\tt 50}}
\put(210,125){\makebox(35,25){\tt 51}}
\put(210,100){\makebox(35,25){\tt 52}}
\put(210,75){\makebox(35,25){\tt 53}}
\put(210,50){\makebox(35,25){\tt 54}}
\put(210,25){\makebox(35,25){\tt 55}}
\put(210,0){\makebox(35,25){\tt 56}}
\put(245,175){\makebox(35,25){\tt 57}}
\put(245,150){\makebox(35,25){\tt 58}}
\put(245,125){\makebox(35,25){\tt 59}}
\put(245,100){\makebox(35,25){\tt 60}}
\put(245,75){\makebox(35,25){\tt 61}}
\put(245,50){\makebox(35,25){\tt 62}}
\put(245,25){\makebox(35,25){\tt 63}}
\put(245,0){\makebox(35,25){\tt 64}}
\put(280,175){\makebox(35,25){\tt 65}}
\put(280,150){\makebox(35,25){\tt 66}}
\put(280,125){\makebox(35,25){\tt 67}}
\put(280,100){\makebox(35,25){\tt 68}}
\put(280,75){\makebox(35,25){\tt 69}}
\put(280,50){\makebox(35,25){\tt 70}}
\put(280,25){\makebox(35,25){\tt 71}}
\put(280,0){\makebox(35,25){\tt 72}}
\put(315,175){\makebox(35,25){\tt 73}}
\put(315,150){\makebox(35,25){\tt 74}}
\put(315,125){\makebox(35,25){\tt 75}}
\put(315,100){\makebox(35,25){\tt 76}}
\put(315,75){\makebox(35,25){\tt 77}}
\put(315,50){\makebox(35,25){\tt 78}}
\put(315,25){\makebox(35,25){\tt 79}}
\put(315,0){\makebox(35,25){\tt 80}}
\put(350,175){\makebox(35,25){\tt 81}}
\put(350,150){\makebox(35,25){\tt 82}}
\put(350,125){\makebox(35,25){\tt 83}}
\put(350,100){\makebox(35,25){\tt 84}}
\put(350,75){\makebox(35,25){\tt 85}}
\put(350,50){\makebox(35,25){\tt 86}}
\put(350,25){\makebox(35,25){\tt 87}}
\put(350,0){\makebox(35,25){\tt 88}}
\put(385,175){\makebox(35,25){\tt 89}}
\put(385,150){\makebox(35,25){\tt 90}}
\put(385,125){\makebox(35,25){\tt 91}}
\put(385,100){\makebox(35,25){\tt 92}}
\put(385,75){\makebox(35,25){\tt 93}}
\put(385,50){\makebox(35,25){\tt 94}}
\put(385,25){\makebox(35,25){\tt 95}}
\put(385,0){\makebox(35,25){\tt 96}}
\put(420,175){\makebox(35,25){\tt 97}}
\put(420,150){\makebox(35,25){\tt 98}}
\put(420,125){\makebox(35,25){\tt 99}}
\put(420,100){\makebox(35,25){\tt 100}}
\put(420,75){\makebox(35,25){\tt 101}}
\put(420,50){\makebox(35,25){\tt 102}}
\put(420,25){\makebox(35,25){\tt 103}}
\put(420,0){\makebox(35,25){\tt 104}}
\put(455,175){\makebox(35,25){\tt 105}}
\put(455,150){\makebox(35,25){\tt 106}}
\put(455,125){\makebox(35,25){\tt 107}}
\put(455,100){\makebox(35,25){\tt 108}}
\put(455,75){\makebox(35,25){\tt 109}}
\put(455,50){\makebox(35,25){\tt 110}}
\put(455,25){\makebox(35,25){\tt 111}}
\put(455,0){\makebox(35,25){\tt 112}}
\put(490,175){\makebox(35,25){\tt 113}}
\put(490,150){\makebox(35,25){\tt 114}}
\put(490,125){\makebox(35,25){\tt 115}}
\put(490,100){\makebox(35,25){\tt 116}}
\put(490,75){\makebox(35,25){\tt 117}}
\put(490,50){\makebox(35,25){\tt 118}}
\put(490,25){\makebox(35,25){\tt 119}}
\put(490,0){\makebox(35,25){\tt 120}}
\put(525,175){\makebox(35,25){\tt 121}}
\put(525,150){\makebox(35,25){\tt 122}}
\put(525,125){\makebox(35,25){\tt 123}}
\put(525,100){\makebox(35,25){\tt 124}}
\put(525,75){\makebox(35,25){\tt 125}}
\put(525,50){\makebox(35,25){\tt 126}}
\put(525,25){\makebox(35,25){\tt 127}}
\put(525,0){\makebox(35,25){\tt 128}}
\end{picture}
\end{center}

\noindent
\texttt{FOXTROT(I)} is aligned with element 3*I-2 of the template.

Suppose on the other hand that the interface to \texttt{TERPSICHORE} looks
like this:

                                                                        \CODE
      SUBROUTINE TERPSICHORE(FOXTROT)
      LOGICAL FOXTROT(:)
!HPF$ DISTRIBUTE FOXTROT(BLOCK)
                                                                        \EDOC


In this case, the template of \texttt{FOXTROT} is its natural
template; it has the same size 14 as \texttt{FOXTROT} itself.  The
actual argument, \texttt{FRUG(1:40:3)} is mapped to the 16 processors
in this manner:

\begin{center}
\begin{tabular}{cc}
Abstract  &  Elements \\
processor & of FRUG \\
1 & 1, 2, 3 \\
2 & 4, 5, 6 \\
3 & 7, 8 \\
4 & 9, 10, 11 \\
5 & 12, 13, 14 \\
6--16   &  none
\end{tabular}
\end{center}

That is, the original positions (in the template of the actual
argument) of the elements of the dummy are as follows:

\begin{center}
\setlength{\unitlength}{0.01in}
\begin{picture}(560,225)(0,0)
\put(0,200){\makebox(35,25){\small\rm 1}}
\put(35,200){\makebox(35,25){\small\rm 2}}
\put(70,200){\makebox(35,25){\small\rm 3}}
\put(105,200){\makebox(35,25){\small\rm 4}}
\put(140,200){\makebox(35,25){\small\rm 5}}
\put(175,200){\makebox(35,25){\small\rm 6}}
\put(210,200){\makebox(35,25){\small\rm 7}}
\put(245,200){\makebox(35,25){\small\rm 8}}
\put(280,200){\makebox(35,25){\small\rm 9}}
\put(315,200){\makebox(35,25){\small\rm 10}}
\put(350,200){\makebox(35,25){\small\rm 11}}
\put(385,200){\makebox(35,25){\small\rm 12}}
\put(420,200){\makebox(35,25){\small\rm 13}}
\put(455,200){\makebox(35,25){\small\rm 14}}
\put(490,200){\makebox(35,25){\small\rm 15}}
\put(525,200){\makebox(35,25){\small\rm 16}}
\thinlines
\multiput(0,25)(0,25){7}{\line(1,0){560}}
\thicklines
\multiput(0,0)(0,200){2}{\line(1,0){560}}
\multiput(0,0)(35,0){17}{\line(0,1){200}}
\put(0,175){\makebox(35,25){\tt 1}}
\put(0,100){\makebox(35,25){\tt 2}}
\put(0,25){\makebox(35,25){\tt 3}}
\put(35,150){\makebox(35,25){\tt 4}}
\put(35,75){\makebox(35,25){\tt 5}}
\put(35,0){\makebox(35,25){\tt 6}}
\put(70,125){\makebox(35,25){\tt 7}}
\put(70,50){\makebox(35,25){\tt 8}}
\put(105,175){\makebox(35,25){\tt 9}}
\put(105,100){\makebox(35,25){\tt 10}}
\put(105,25){\makebox(35,25){\tt 11}}
\put(140,150){\makebox(35,25){\tt 12}}
\put(140,75){\makebox(35,25){\tt 13}}
\put(140,0){\makebox(35,25){\tt 14}}
\end{picture}
\end{center}

This layout (3 elements on the first processor, 3 on the second, 2 on
the third, 3 on the fourth, \dots) cannot properly be described as a
\texttt{BLOCK} distribution.  Therefore, remapping will take place at
the call.

Remapping can be avoided without using \texttt{INHERIT} by explicitly
aligning the dummy to a declared template of size 128 distributed
\texttt{BLOCK}:

                                                                        \CODE
      SUBROUTINE TERPSICHORE(FOXTROT)
      LOGICAL FOXTROT(:)
!HPF$ PROCESSORS DANCE_FLOOR(16)
!HPF$ TEMPLATE, DISTRIBUTE(BLOCK) ONTO DANCE_FLOOR::GURF(128)
!HPF$ ALIGN FOXTROT(J) WITH GURF(3*J-2)
                                                                        \EDOC

\begin{users}
  The advantage of this technique is that, where it can be used, it
  gives the compiler more information; this information can often be
  used to generate more efficient code.
\end{users}


\section{Equivalence and Partial Order on the Set of Mappings}
\label{mapsub:PartialOrderOnMaps}

The set of mappings of named objects is endowed with a partial order
modulo a certain equivalence.  Roughly speaking, if \(P\) and \(Q\)
are two mappings, then to say that \(Q\) is a \emph{specialization} of
\(P\) (i.e., ``\(P\) is above \(Q\)'' in this ordering) is to say that
\(P\) is partially specified, and that \(Q\) is one of the mappings
that is consistent with \(P\).  This notion is used below in
section~\ref{mapsub:ExplicitInterfaces}, and also in
section~\ref{POINTERS-SECTION}.

\begin{users}
  Since these conditions are complex to state, it is worth noting that
  if you always provide explicit interfaces (which, as explained
  below, is quite easy and generally happens automatically), and if
  you don't use mapped pointers (an Approved Extension, explained
  below in section~\ref{POINTERS-SECTION}), then you can completely
  omit reading this section.
\end{users}

The precise definition is as follows.

First, we define a notion of equivalence for \textit{dist-format}
specifications:

\begin{enumerate}
\item Each \textit{dist-format} is equivalent to itself.

\item  Using the notation \(\equiv\) for the phrase ``is equivalent to'',

  \[\begin{array}{rclcl}
    \texttt{BLOCK}(n)  &\equiv& \texttt{BLOCK}(m) & \mbox{iff } m
    \mbox{ and } n \mbox{ have the same value} \\
    \texttt{CYCLIC}(n) &\equiv& \texttt{CYCLIC}(m) & \mbox{iff } m
    \mbox{ and } n \mbox{ have the same value} \\
    \texttt{CYCLIC}    &\equiv& \texttt{CYCLIC}(1) \\
  \end{array}\]

\item Other than this, no two lexically distinct \textit{dist-format}
  specifications are equivalent.
\end{enumerate}

Equivalence, thus defined, is an equivalence relation in the usual sense.

Now the partial order on mappings is defined: Let \texttt{S} (``special'')
and \texttt{G} (``general'') be two data objects.

The mapping of \texttt{S} is a \emph{specialization} of the mapping of
\texttt{G} if and only if either:

\begin{enumerate}

\item \texttt{G} has the \texttt{INHERIT} attribute, or

\item \texttt{S} does not have the \texttt{INHERIT} attribute, and the
  following constraints all hold:

  \begin{enumerate}
  \item \texttt{S} is a named object, and
  \item The shapes of the ultimate align targets of \texttt{S} and \texttt{G}
    are the same, and
  \item The dimensions of \texttt{S} and \texttt{G} each correspond to the same
    dimension of their respective ultimate align targets, and
    corresponding elements of \texttt{S} and \texttt{G} are aligned with the
    same corresponding elements of their respective ultimate align
    targets, and
  \item Either

    \begin{enumerate}
    \item The ultimate align targets of \texttt{S} and \texttt{G} are not
      explicitly distributed, or
    \item Both ultimate align targets of \texttt{S} and \texttt{G} are
      explicitly distributed.  In this case, the distribution
      directive specified for the ultimate align target of \texttt{G} must
      satisfy one of the following conditions:

      \begin{enumerate}
      \item It must have no \textit{dist-onto-clause}, or
      \item It must have a \textit{dist-onto-clause} of
        ``\texttt{ONTO} *'', or
      \item It must have a \textit{dist-onto-clause} specifying a
        processor arrangement having the same shape as that
        explicitly specified in a distribution directive for the
        ultimate align target of \texttt{S}.
      \end{enumerate}

      and the distribution directive for the ultimate align target of
      \texttt{G} must also satisfy one of the following conditions:

      \begin{enumerate}
      \item It must have no \textit{dist-format-clause}, or
      \item It must have a \textit{dist-format-clause} of ``*'', or
      \item Each \textit{dist-format} must be equivalent (in the sense
        defined above) to the \textit{dist-format} in the
        corresponding position of the \textit{dist-format-clause} in
        an explicit distribution directive for the ultimate align
        target of \texttt{S}.
      \end{enumerate}

    \end{enumerate}

  \end{enumerate}

\end{enumerate}

With this definition,

\begin{itemize}
\item Any mapping of a named object is a specialization of itself.
\item If \(A\), \(B\), and \(C\) are named objects, and if the mapping
  of \(A\) is a specialization of the mapping of \(B\) and the mapping
  of \(B\) is a specialization of the mapping of \(C\), then the
  mapping of \(A\) is a specialization of the mapping of \(C\).
\end{itemize}

That is, the specialization relation, as applied to mappings of named
objects, is reflexive and transitive, and it can therefore be applied
to produce an equivalence relation on the set of mappings of named
objects: two such mappings can be said to be equivalent iff each is a
specialization of the other.  With this definition, the specialization
relation yields a partial ordering on the set of mappings of named
objects, modulo equivalence.  The \texttt{INHERIT} mapping is the
unique maximal element in this partial order.


\section{Explicit Interfaces}
\label{mapsub:ExplicitInterfaces}

Under certain conditions, an explicit interface for a subprogram is
not required.  The conditions in Fortran under which this is allowable
are tightened considerably for HPF programs that use mapping
directives.

\begin{users}
  These conditions are complex.  The important thing to realize is
  that you don't have to read any of this if you have an explicit
  interface.  So if there is any doubt in your mind, just make sure
  you have an explicit interface.
\end{users}

An explicit interface is required \emph{except} when all of the
following conditions hold:

\begin{enumerate}

\item  Fortran does not require one, \emph{and}

\item No dummy argument is distributed transcriptively or with the
  \texttt{INHERIT} attribute, \emph{and}

\item For each pair of corresponding actual and dummy arguments, either:

  \begin{enumerate}

  \item They are both implicitly mapped, or
    
  \item They are both explicitly mapped and the mapping of the actual
    argument is a specialization of the mapping of the dummy argument,

  \end{enumerate}

  \emph{and}

\item For each pair of corresponding actual and dummy arguments, either:

  \begin{enumerate}

  \item Both are sequential, or

  \item Both are nonsequential.

  \end{enumerate}

\end{enumerate}

\begin{rationale}
  This has the following consequences:

  \begin{itemize}

  \item A plain Fortran program (i.e., with no HPF directives) will
    continue to be legal without the need to add additional
    interfaces, at least in a compilation environment in which all
    variables are sequential by default.  This is insured by items 1,
    2, 3a, and 4a. 

  \item If remapping is necessary, this fact will be visible to the
    caller.  Thus the implementation may choose to have all remapping
    performed by the caller.

  \end{itemize}
\end{rationale}

\begin{users}
  This requirement pushes the user strongly in the direction of always
  providing explicit interfaces.  This is a good thing---explicit
  interfaces allow many errors to be caught at compile-time and greatly
  speed up the process of robust software development.

  Note, that an explicit interface can be provided in three ways:

  \begin{enumerate}

  \item A module subprogram has an explicit interface.

  \item An internal subprogram has an explicit interface.

  \item An explicit interface may be provided by an interface block.

  \end{enumerate}

  In addition, an intrinsic procedure always has an explicit interface
  by definition.

  The idiomatic Fortran way of programming makes extensive use of
  modules; every subprogram, for instance, can be in a module.  This
  provides explicit interfaces automatically, with no extra effort on
  the part of the programmer.  It should very seldom be necessary to
  write an interface block.
\end{users}

\subsection{Characteristics of Procedures}
\label{mapsub:ProcChars}

The characteristics of dummy data objects and function results as
given in section 12.2 of the Fortran standard are extended to
also include the \emph{hpf-characteristics} of such objects, which are defined
recursively as follows:

\begin{itemize}
\item A processor arrangement has one hpf-characteristic: its shape.

\item A template has up to three hpf-characteristics:

\begin{enumerate}
\item its shape;
\item its distribution, if explicitly stated;
\item the hpf-characteristic (i.e., the shape) of the processor
  arrangement onto which it is distributed, if explicitly stated.
\end{enumerate}

\item A dummy data object has the following hpf-characteristics:

\begin{enumerate}
\item its alignment, if explicitly stated, as well as all
  hpf-characteristics of its align target;
\item its distribution, if explicitly stated, as well as the
  hpf-characteristic (i.e., the shape) of the processor arrangement
  onto which it is distributed, if explicitly stated.
\end{enumerate}

\item A function result has the same hpf-characteristics as a dummy
  data object.  Specifically, it has the following
  hpf-characteristics:

\begin{enumerate}
\item its alignment, if explicitly stated, as well as all
  hpf-characteristics of its align target;
\item its distribution, if explicitly stated, as well as the
  hpf-characteristic (i.e., the shape) of the processor arrangement
  onto which it is distributed, if explicitly stated.
\end{enumerate}

\end{itemize}

\begin{rationale}
  In case an explicit interface is given by an interface block, the
  Fortran standard specifies what information must be specified in
  that interface block; it does this using the concept of a Fortran
  \emph{characteristic}.  Characteristics of dummy data objects, for
  instance, include their types.  Characteristics must be specified in
  interface blocks; section 12.3.2.1 of the Fortran standard states

  \begin{quote}
    An interface body specifies all of the procedure's characteristics and
    these shall be consistent with those specified in the procedure
    definition\dots
  \end{quote}

  Normally, an interface block for a procedure is a textual copy of the
  appropriate declarations of that procedure.  This section simply says
  that such a textual copy must include any explicit mapping directives
  relevant to dummy arguments of the procedure.
\end{rationale}

\section{Argument Passing and Sequence Association}

For actual arguments in a procedure call, Fortran allows an array
element (scalar) to be associated with a dummy argument that is an
array.  It furthermore allows the shape of a dummy argument to differ
from the shape of the corresponding actual array argument, in effect
reshaping the actual argument via the procedure call.  Storage
sequence properties of Fortran are used to identify the values of the
dummy argument.  This feature, carried over from FORTRAN 77, has been
widely used to pass starting addresses of subarrays, rows or columns
of a larger array, to procedures.  For HPF arrays that are potentially
mapped across processors, this feature is not fully supported.


\subsection{Sequence Association Rules}

\begin{enumerate}

\item When an array element or the name of an assumed-size array is
  used as an actual argument, the associated dummy argument must be a
  scalar or specified to be a sequential array.

  An array-element designator of a nonsequential array must not be
  associated with a dummy array argument.

\item When an actual argument is an array or array section and the
  corresponding dummy argument differs from the actual argument in
  shape, then the dummy argument must be declared sequential and the
  actual array argument must be sequential.

\item An object of type character (scalar or array) is nonsequential
  if it conforms to the requirements of Definition~\ref{seq-var} of
  section~\ref{sequence-defs}.  If the length of an explicit-length
  character dummy argument differs from the length of the actual
  argument, then both the actual and dummy arguments must be
  sequential.


\item Without an explicit interface, a sequential actual may not be
  associated with a nonsequential dummy and a nonsequential actual may
  not be associated with a sequential dummy.  (This item merely
  repeats part of section~\ref{mapsub:ExplicitInterfaces}).


\end{enumerate}


\subsection{Discussion of Sequence Association}

When the shape of the dummy array argument and its associated actual
array argument differ, the actual argument must not be an expression.
There is no HPF mechanism for declaring that the value of an
array-valued expression is sequential.  In order to associate such an
expression as an actual argument with a dummy argument of different
rank, the actual argument must first be assigned to a named array
variable that is forced to be sequential according to
Definition~\ref{seq-var} of section~\ref{sequence-defs}.

\subsection{Examples of Sequence Association}

Given the following subroutine fragment:
                                                                \CODE
      SUBROUTINE HOME (X)
      DIMENSION X (20,10)
                                                                \EDOC
By rule 1
                                                                \CODE
      CALL HOME (ET (2,1))
                                                                \EDOC
is legal only if \texttt{X} is declared sequential in \texttt{HOME}
and \texttt{ET} is sequential in the calling procedure.

Likewise, by rules 2 and 4
                                                                \CODE
      CALL HOME (ET)
                                                                \EDOC
requires either that \texttt{ET} and \texttt{X} are both sequential
arrays or that \texttt{ET} and \texttt{X} have the same shape and
(in the absence of an explicit interface) have the same sequence
attribute.


Rule 3 addresses a special consideration for  objects of type
character. Change of the length of character objects across
a call, as in

                                                                \CODE
      CHARACTER (LEN=44) one_long_word
      one_long_word = 'Chargoggagoggmanchaugagoggchaubunagungamaugg'
      CALL webster(one_long_word)

      SUBROUTINE webster(short_dictionary)
      CHARACTER (LEN=4) short_dictionary (11)
          !Note that short_dictionary(3) is 'agog', for example
                                                                \EDOC

\noindent
is conceptually legal in Fortran. In HPF, both the actual argument and
dummy argument must be sequential.
(Chargoggagoggmanchaugagoggchaubunagungamaugg is the original Nipmuc
name for what is now called Lake Webster in Massachusetts.)

---------------------------------------------------------------------------
To (un)subscribe to this list, send mail to hpff-doc-request@cs.rice.edu.
Leave the subject line blank, and in the body put the line
(un)subscribe <email-address>
---------------------------------------------------------------------------

From owner-hpff-doc  Tue Nov  5 15:44:18 1996
Received: (from daemon@localhost) by cs.rice.edu (8.7.1/8.7.1) id PAA16559 for hpff-doc-out; Tue, 5 Nov 1996 15:44:18 -0600 (CST)
Date: Tue, 5 Nov 1996 15:44:18 -0600 (CST)
Message-Id: <199611052144.PAA16559@cs.rice.edu>
From: offner@hpc.pko.dec.com (Carl Offner)
Subject: hpff-doc: mapping-ext.tex
Sender: owner-hpff-doc
Precedence: bulk

---------------------------------------------------------------------------
hpff-doc@cs.rice.edu is a mailing list for HPF 2.0 language specification
authors and editors.  Instructions for adding or deleting yourself from this
list appear at the bottom of this message.
---------------------------------------------------------------------------
% File: mapping-ext.tex

% Contents:
% Approved Extensions for data mapping for HPF 2.0 document, including
%        GenBlock
%        indirect
%        range
%        shadow
%        subsets
%        derived types
%        more on pointers.

% Revision history:
% May-10-96        Created by Charles Koelbel, Rice University
%                (From HPF 2.0 proposals)
% May-22-96        Modified by Piyush Mehrotra to include all of
%               the above topics
% Aug-8-96         Modified by Piyush Mehrotra:
%               modified subsection on remapping across procedure boundaries
%               modified the section on pointers based on proposals passed
%                       in the July meeting
%               added clarifications and examples to the range section
% Oct-18-96        Modified by Carl Offner:
%               added a few words to the beginning of item 2 in the
%               definition of ``specialization'' in conformance with a
%               similar change in the mapping-subr.tex section.
% Nov-05-96        Modified by Carl Offner:
%               Moved the section on pointers from mapping-subr.tex
%               to here.  Noted that now targets can be mapped.


\chapter{Approved Extensions for Data Mapping}
\label{ch-mapping-ext}



This section describes a set of data mapping features which extend
the capabilities provided by the base set as described in
Section~\ref{ch-mapping-base}. These extensions can be
divided into two categories. The first set of extensions
provide the user greater control over the mapping of the data.
These include directives for dynamic remapping of data
which allow the user to redistribute and realign at runtime data which
has been declared {\tt DYNAMIC}.
The {\tt ONTO} clause used in the {\tt DISTRIBUTE} directive is extended
to allow direct distribution to subsets of processors.
Explicit mapping of pointers and components of derived
types are also introduced.
Two new distributions are included, the {\tt GEN\_BLOCK}
distribution which generalizes the block distribution and
the {\tt INDIRECT} which allows the mapping of individual array
elements to be specified through a mapping array.

The programmer can use the second set of extensions to provide the
compiler with information useful for generating efficient
code. This category includes the \texttt{RANGE} directive, which
allows the user to specify the range of distributions that
a dynamically distributed array, a pointer, or a formal
argument may have. The \texttt{SHADOW} directive allows the user
to specify the amount of additional space required on a
processor to accommodate non-local elements in a
nearest-neighbor computation.

Since this section deals with extensions, we repeat some of the
sections of Sections~\ref{ch-mapping-base}
and~\ref{ch-mapping-subr}, providing new rules and extending old ones
where necessary.
In particular, subsections~\ref{ext:PartialOrderOnMaps},
\ref{ext:ExplicitInterfaces} and
\ref{ext:ProcChars} extend t eh corresponding subsections
in Section~\ref{ch-mapping-base} based on the approved
extensions described here.

%When extending a rule we use \textit{extended-xx} as the new rule name
%with the understanding that it will replace the old rule name,
%\textit{xx}, everywhere.



\section{Extended Model}

The fundamental model for allocation of data to abstract
processors still remains a two-level mapping as described in
Section~\ref{ch-mapping-base}. However, it is extended to
allow the dynamic remapping of the data objects as
illustrated by the following diagram:

\begin{center}

\setlength{\unitlength}{0.01in}
\begin{picture}(600,240)(0,30)
\thicklines
\put(100,150){\circle{50}}
\put(242,150){\circle{50}}
\put(383,150){\circle{50}}
\put(525,150){\circle{50}}
\put(125,150){\vector(1,0){92}}
\put(267,150){\vector(1,0){91}}
\put(408,150){\vector(1,0){92}}
\put(56,190){\shortstack{Arrays or\strut\\other
objects\strut}}
\put(192,190){\shortstack{Group of\strut\\aligned
objects\strut}}
\put(328,190){\shortstack{Abstract\strut\\processors as
a\strut
                \\user-declared\strut\\Cartesian
mesh\strut}}
\put(494,190){\shortstack{Physical\strut\\processors\strut}}
\put(135,50){\shortstack{{\tt ALIGN}\strut\\(static) or\strut
                \\{\tt REALIGN}\strut\\(dynamic)\strut}}
\put(265,50){\shortstack{{\tt DISTRIBUTE}\strut\\(static) or\strut
                \\{\tt REDISTRIBUTE}\strut\\(dynamic)\strut}}
\put(400,50){\shortstack{Optional\strut\\implementation-\strut
                \\dependent\strut\\directive\strut}}
\end{picture}

\end{center}

Thus, objects can now be remapped at execution time using
the executable directives {\tt REALIGN} and {\tt REDISTRIBUTE}
Any object that is the root of
an alignment tree, i.e., is not explicitly aligned to
another object, can be explicitly redistributed.
Redistributing such an object causes all objects ultimately
aligned with it to be also redistributed so as to maintain the
alignment relationships.

Any object that is not a root
of an alignment tree can be explicitly realigned but not explicitly
redistributed. Such a realignment does not change the
mapping of any other object.
Note that such remapping of data may require communication
among the processors.


By analogy with the Fortran 90 {\tt ALLOCATABLE} attribute, HPF includes
the attribute {\tt DYNAMIC}.  It is not permitted
to {\tt REALIGN} an array that has not been declared {\tt DYNAMIC}.
Similarly, it is not permitted to {\tt REDISTRIBUTE} an array or template
that has not been declared {\tt DYNAMIC}.

Saved local variables, variables in common, and variables accessed by
use association must not be implicitly remapped - e.g., by having
variable distribution formats or being aligned with entities having
variable distribution formats.  Of these three categories of
variables, only variables accessed by use association may have the
{\tt DYNAMIC} attribute.

\section{Syntax of Attributed Forms of the New Directives}


As in the case of other mapping directives, the executable directives
{\tt REALIGN} and {\tt REDISTRIBUTE} also come in two forms (statement
form and attribute form) but may not be combined with other attributes
in a single directive.  The \texttt{RANGE} and \texttt{SHADOW}
attributes may be combined with other attributes in a single
directive.

                                                                        \BNF
combined-attribute-extended \IS  ALIGN align-attribute-stuff
                           \OR  DISTRIBUTE dist-attribute-stuff
                           \OR  DYNAMIC
                           \OR  INHERIT
                           \OR  TEMPLATE
                           \OR  PROCESSORS
                           \OR  DIMENSION ( explicit-shape-spec-list )
                           \OR  RANGE range-attr-stuff-list
                           \OR  SHADOW shadow-attr-stuff
                           \OR  ACTIVE

\FNB
\begin{constraints}
\item The \texttt{ACTIVE} attribute may only be applied to a
processors arrangement.
\end{constraints}

The \texttt{ACTIVE} directive is discussed in Section~\ref{ch-parallel-ext}
while the rest are discussed below.



\section{REDISTRIBUTE Directive}
\label{REDISTRIBUTE-SECTION}

The {\tt REDISTRIBUTE} directive is similar to the {\tt DISTRIBUTE}
directive but is considered executable.  An array (or template) may be
redistributed at any time, provided it has been declared {\tt DYNAMIC}
(see Section \ref{DYNAMIC-SECTION}).  Any other arrays currently
ultimately aligned with an array (or template) when it is redistributed
are also remapped to reflect the new distribution, in such a way as to
preserve alignment relationships (see Section \ref{ALIGN-SECTION}).
(This can require a lot of computational and communication effort at
run time; the programmer must take care when using this feature.)

The {\tt DISTRIBUTE} directive may appear only in the {\it specification-part}
of a scoping unit.  The {\tt REDISTRIBUTE} directive may appear
only in the {\it execution-part} of a scoping unit.  The principal
difference between {\tt DISTRIBUTE} and {\tt REDISTRIBUTE} is
that {\tt DISTRIBUTE} must contain only a {\it specification-expr} as the
argument to a
distribution format, e.g.,
{\tt BLOCK} or {\tt CYCLIC}, whereas in {\tt REDISTRIBUTE}
such an argument may be any integer expression.  Another difference is that
{\tt DISTRIBUTE} is an attribute, and so can be combined with other attributes
as part of a {\it combined-directive}, whereas {\tt REDISTRIBUTE} is not an
attribute
(although a {\tt REDISTRIBUTE} statement may be written in the
style of attributed syntax, using ``{\tt ::}'' punctuation).

Formally, the syntax of the {\tt DISTRIBUTE} and
{\tt REDISTRIBUTE} directives is:

                                                                        \BNF
redistribute-directive     \IS  REDISTRIBUTE distributee dist-directive-stuff
                   \OR  REDISTRIBUTE dist-attribute-stuff :: 
 distributee-list
                                                                        \FNB
\begin{constraints}

\item  A {\it distributee} that appears in a {\tt REDISTRIBUTE} directive
must have the {\tt DYNAMIC} attribute (see Section \ref{DYNAMIC-SECTION}).

\item Neither the {\it dist-format-clause} nor the {\it dist-target}
in a {\tt REDISTRIBUTE} may begin with ``{\tt *}''.

\end{constraints}

If a range directive (see Section~\ref{range}) has been used to restrict
the set of distribution formats allowed for a {\it distributee}, then
the new mapping should match one of the formats specified in the range
directive.

A {\tt REDISTRIBUTE} directive must not cause any data object
associated with the {\it distributee} via storage association ({\tt
COMMON} or {\tt EQUIVALENCE}) to be mapped such that storage units of
a scalar data object are split across more than one abstract
processor.  See Section \ref{sequence} for further discussion of
storage association.


The statement form of a {\tt REDISTRIBUTE} directive
may be considered an abbreviation for an attributed form that
happens to mention only one {\it distributee}; for example,
                                                                        \CODE
!HPF$ REDISTRIBUTE distributee ( dist-format-list ) ONTO dist-target
                                                                        \EDOC
is equivalent to
                                                                        \CODE
!HPF$ REDISTRIBUTE ( dist-format-list ) ONTO dist-target :: distributee
                                                                        \EDOC

\section{REALIGN Directive}
\label{REALIGN-SECTION}

The {\tt REALIGN} directive is similar to the {\tt ALIGN} directive but
is considered executable.  An array (or template) may be realigned at
any time, provided it has been declared {\tt DYNAMIC} (see Section
\ref{DYNAMIC-SECTION}) Unlike redistribution (see Section
\ref{DISTRIBUTE-SECTION}), realigning a data object does not cause any
other object to be remapped.  (However, realignment of even a single
object, if it is large, could require a lot of computational and
communication effort at run time; the programmer must take care when
using this feature.)

The {\tt ALIGN} directive may appear only in the {\it
specification-part} of a scoping unit.  The {\tt REALIGN} directive is
similar but may appear only in the {\it execution-part} of a scoping
unit.  The principal difference between {\tt ALIGN} and {\tt REALIGN}
is that {\tt ALIGN} must contain only a {\it specification-expr} as a
{\it subscript} or in a {\it subscript-triplet}, whereas in {\tt
REALIGN} such subscripts may be any integer expressions.  Another
difference is that {\tt ALIGN} is an attribute, and so can be combined
with other attributes as part of a {\it combined-directive}, whereas
{\tt REALIGN} is not an attribute (although a {\tt REALIGN} statement
may be written in the style of attributed syntax, using ``{\tt ::}''
punctuation).

Formally, the syntax of {\tt REALIGN} is as follows:

                                                                        \BNF
realign-directive          \IS  REALIGN alignee align-directive-stuff
                           \OR  REALIGN align-attribute-stuff :: alignee-list
                                                                        \FNB
\begin{constraints}

\item  Any {\it alignee} that appears in a {\tt REALIGN} directive
must have the {\tt DYNAMIC} attribute (see Section \ref{DYNAMIC-SECTION}).


\item If the {\tt align-target} specified in the {\tt align-with-clause}
has the {\tt DYNAMIC} attribute, then each {\tt alignee} must also have the
{\tt DYNAMIC} attribute.

\item An object may not have both the {\tt INHERIT} attribute and
the {\tt ALIGN} attribute.  (However, an object with the {\tt INHERIT}
attribute may appear as an {\it alignee} in a {\tt REALIGN} directive,
provided that it does not appear as a {\it distributee} in a {\tt DISTRIBUTE}
or {\tt REDISTRIBUTE} directive.)
\end{constraints}

If a range directive (see Section~\ref{range}) has been used to restrict
the set of distribution formats allowed for an {\it alignee}, then
the new mapping should match one of the formats specified in the range
directive.

\section{DYNAMIC Directive}
\label{DYNAMIC-SECTION}

The {\tt DYNAMIC} attribute specifies that an object may be dynamically
realigned or redistributed.
                                                                        \BNF
dynamic-directive      \IS  DYNAMIC  alignee-or-distributee-list

alignee-or-distributee \IS  alignee
                       \OR  distributee
                                                                        \FNB

\begin{constraints}

\item An object in {\tt COMMON} may not be declared {\tt DYNAMIC} and
may not be aligned to an object (or template) that is {\tt DYNAMIC}.
(To get this kind of effect,  Fortran 90 modules must be used instead
of {\tt COMMON} blocks.)

\item An object with the {\tt SAVE} attribute may not be declared {\tt DYNAMIC}
and may not be aligned to an object (or template) that is {\tt DYNAMIC}.

\end{constraints}

A {\tt REALIGN} directive may not be applied to an {\it alignee} that does not
have the {\tt DYNAMIC} attribute.  A {\tt REDISTRIBUTE} directive may not be
applied to a {\it distributee} that does not have the {\tt DYNAMIC} attribute.

A {\tt DYNAMIC} directive may be combined with other directives, with the
attributes stated in any order, consistent with the Fortran
90 attribute syntax.

Examples:
                                                                        \CODE
!HPF$ DYNAMIC A,B,C,D,E
!HPF$ DYNAMIC:: A,B,C,D,E
!HPF$ DYNAMIC, ALIGN WITH SNEEZY:: X,Y,Z
!HPF$ ALIGN WITH SNEEZY, DYNAMIC:: X,Y,Z
!HPF$ DYNAMIC, DISTRIBUTE(BLOCK, BLOCK) :: X,Y
!HPF$ DISTRIBUTE(BLOCK, BLOCK), DYNAMIC :: X,Y
                                                                        \EDOC
The first two examples mean exactly the same thing.
The next two examples mean exactly the same second thing.
The last two examples mean exactly the same third thing.

The three directives
                                                                        \CODE
!HPF$ TEMPLATE A(64,64),B(64,64),C(64,64),D(64,64)
!HPF$ DISTRIBUTE(BLOCK, BLOCK) ONTO P:: A,B,C,D
!HPF$ DYNAMIC A,B,C,D
                                                                        \EDOC
may be combined into a single directive as follows:
                                                                        \CODE
!HPF$ TEMPLATE, DISTRIBUTE(BLOCK, BLOCK) ONTO P,   &
!HPF$   DIMENSION(64,64),DYNAMIC :: A,B,C,D
                                                                        \EDOC


An {\tt ALLOCATABLE} object may also be given the {\tt DYNAMIC} attribute.
If an {\tt ALLOCATE} statement is immediately followed by {\tt REDISTRIBUTE}
and/or {\tt REALIGN} directives,
the meaning in principle is that the array is first created
with the statically declared mapping, if any, then immediately remapped.
In practice there is an obvious optimization: create the array
in the processors to which it is about to be remapped, in a single
step.  HPF implementors are strongly encouraged to implement this
optimization and HPF programmers are encouraged to rely upon it.
Here is an example:
                                                                       \CODE
      REAL,ALLOCATABLE(:,:) :: TINKER, EVERS
!HPF$ DYNAMIC :: TINKER, EVERS
      REAL, ALLOCATABLE :: CHANCE(:)
!HPF$ DISTRIBUTE(BLOCK),DYNAMIC :: CHANCE
      ...
      READ 6,M,N
      ALLOCATE(TINKER(N*M,N*M))
!HPF$ REDISTRIBUTE TINKER(CYCLIC, BLOCK)
      ALLOCATE(EVERS(N,N))
!HPF$ REALIGN EVERS(:,:) WITH TINKER(M::M,1::M)
      ALLOCATE(CHANCE(10000))
!HPF$ REDISTRIBUTE CHANCE(CYCLIC)
                                                                       \EDOC
While {\tt CHANCE} is by default always allocated with a {\tt BLOCK}
distribution, it should be possible for a compiler to notice that it will
immediately be remapped to a {\tt CYCLIC} distribution.  Similar remarks
apply to {\tt TINKER} and {\tt EVERS}.  (Note that {\tt EVERS}
is mapped in a thinly-spread-out manner onto {\tt TINKER};
adjacent elements of {\tt EVERS} are mapped to elements of
{\tt TINKER} separated by a stride {\tt M}.  This thinly-spread-out
mapping is put in the lower left corner of {\tt TINKER},
because {\tt EVERS(1,1)} is mapped to {\tt TINKER(M,1)}.)



\section{Remapping and Subprogram Interfaces}
\label{DYNAMIC-DUMMY-SECTION}

If the dummy argument of any subprogram has the {\tt DYNAMIC} attribute,
then an explicit interface is required for the subprogram
(see subsection~\ref{ext:ExplicitInterfaces}).
The rules on the interaction of the {\tt REALIGN} and {\tt
REDISTRIBUTE} directives with a subprogram argument interface are:
\begin{enumerate}

\item A dummy argument may be declared {\tt DYNAMIC}.  However, it is subject
to the general restrictions concerning the use of the name of an
array to stand for its associated template.

The effect of any redistribution of the dummy after the procedure returns
to the caller is dependent on the attribute of the actual argument.
If the actual argument associated with the dummy has also been declared
{\tt DYNAMIC}, then any explicit remapping of the dummy is visible in the
caller after the procedure returns.
If a range directive (see Section~\ref{range}) has been used to restrict
the set of distribution formats allowed for
the actual argument, then
the new mapping should match one of the formats specified in the range
directive.

If the actual argument associated with the dummy has not been declared
{\tt DYNAMIC} then the original mapping of the actual has to be restored
on return.
When the subprogram returns
and the caller resumes execution, all objects accessible to the caller
after the call which are not declared
{\tt DYNAMIC},  are mapped exactly as they were before the call.


\item If an array or any section thereof is accessible by two or more
paths, it is not HPF-conforming to remap it through any of those paths.  For
example, if an array is passed as an actual argument, it is forbidden
to realign that array, or to redistribute an array or template to which
it was aligned at the time of the call, until the subprogram has
returned from the call.  This prevents nasty aliasing problems.  An
example:

                                                                        \CODE
      MODULE FOO
      REAL A(10,10)
!HPF$ DYNAMIC ::  A
      END

      PROGRAM MAIN
      USE FOO
      CALL SUB(A(1:5,3:9))
      END

      SUBROUTINE SUB(B)
      USE FOO
      REAL B(:,:)
!HPF$ DYNAMIC ::  B
      ...
!HPF$ REDISTRIBUTE A            !Nonconforming
      ...
      END
                                                                        \EDOC
Situations such as this are forbidden, for the same reasons that an assignment
to {\tt A} at the statement marked ``Nonconforming'' would also be forbidden.
In general, in {\it any} situation where assignment to a variable
would be nonconforming by reason of aliasing, remapping of that variable
by an explicit {\tt REALIGN} or {\tt REDISTRIBUTE} directive is also forbidden.

Note that it is legal to remap a host-associated or use-associated variable
in a subprogram if it has been declared {\tt DYNAMIC} and is accessible
only through a single path. Such remappings stay in effect even after
the subprogram has returned to its caller.

\end{enumerate}



%----------------------------------------------------------------------------
% Mapping to subsets proposal
%--------------------------------------------------------------------------
\section{Mapping to Processor Subsets}
\label{mapping-proc-subset}

This extension allows  objects  to  be  directly  distributed  to  processor
subsets  by allowing a processor subset to be specified
where  a  processor  could  be  named,  e.g.,  in  a   {\tt DISTRIBUTE}
directive.   The  specified subset must be a proper subset of the
named processor arrangement.

Formally the syntax of the extended {\it dist-target} is as
follows:

                                                                        \BNF
extended-dist-target    \IS processor-name [( section-subscript-list )]
                        \OR * processor-name [( section-subscript-list )]
                        \OR *
                                                                        \FNB
\begin{constraints}

\item The {\it section-subscript}s in the
{\it section-subscript-list} may not be
{\it vector-subscript}s and are restricted to be
either {\it subscript}s or {\it subscript-triplet}s.

\item In the {\it section-subscript-list},
the number of {\it section-subscript}s
must equal the rank of the {\it processor-name}.

\end{constraints}



                                                                        \CODE
!Example 1
!HPF$ PROCESSORS P(10)
        REAL A(100)
!HPF$ DISTRIBUTE A(BLOCK) ONTO P(2:5)


!Example 2
!HPF$ PROCESSORS Q(10,10)
        REAL A(100,100)
!HPF$ DISTRIBUTE B(BLOCK,BLOCK) ONTO Q(5:10,5:10)
                                                                        \EDOC

In Example 1, the array A is distributed by block across the
processors P(2) to P(5) while in the second example, the array B
is distributed across the lower right quadrant of the processor array Q.

\begin{users}
This extension is most useful in conjunction with the
tasking construct, see Section~\ref{sec-tasking}, which allows
multiple independent phases of a computation to execute simultaneously
on different subsets of processors. A similar situation arises when
the code uses multiple data structures which can be computed in
parallel where the computation on each individual object also
exhibits parallelism, e.g., the multiple blocks in
a multi-block grid used in some fluid dynamics calculation. Here, the
individual blocks have to be distributed over subsets of processors
to exploit both levels of parallelism.
\end{users}


\section{Pointers}
\label{POINTERS-SECTION}


As an approved extension to HPF, pointers and targets can be
explicitly mapped.  Formally, this implies that the constraints that a
{\it distributee} and an {\it alignee} may not have the {\tt POINTER}
or {\tt TARGET} attribute as stated in
Sections~\ref{DISTRIBUTE-SECTION} and~\ref{ALIGN-SECTION}
respectively, have to be removed.

As in the case of an allocatable object, the mapping specification
for a pointer does not take effect immediately but plays a role when
the pointer becomes pointer associated with a target either through allocation
or through pointer assignment.

When a pointer with an explicit mapping is used in an {\tt ALLOCATE}
statement, the data is allocated with the specified mapping.


For example:
                                                                        \CODE
      REAL, POINTER, DIMENSION(:) :: A, B
!HPF$ ALIGN B(I) WITH A(I)
!HPF$ DISTRIBUTE A(BLOCK)
      ...
      ALLOCATE(A(100))
      ALLOCATE(B(50))
      ...
      ALLOCATE(B(200))                  ! Nonconforming
                                                                        \EDOC


Pointer {\it A} is declared to have a {\tt BLOCK} distribution while
pointer {\tt B} is declared to be identically aligned with {\it A}.
When {\it A} is allocated, it is created with a block distribution.
When {\it B} is allocated, it is aligned with the first 50 elements of
{\it A}. Note, that the allocation statements may not occur in the
opposite order, since an object may be aligned to another only if it
has already been created or allocated.  Also, the second allocation
for {\it B} is nonconforming, since a larger object, {\it B} here,
cannot be aligned with a smaller object, {\it A} in this case.

A pointer \texttt{P} with an explicit mapping can be pointer
associated with a target \texttt{T} through a pointer assignment
statement under the following conditions:

\begin{enumerate}

\item The mapping of \texttt{T} is a specialization of the mapping of
  \texttt{P}; and

\item If \texttt{P} is explicitly aligned, its ultimate align target
  has a fully-specified non-transcriptive distribution; and

\item \label{PandTdynamic} \texttt{P} and \texttt{T} are either both
  \texttt{DYNAMIC} or neither is.

\end{enumerate}


Here are some examples:
                                                                        \CODE
      REAL, POINTER, DIMENSION(:,:) :: P
!HPF$ DISTRIBUTE P(BLOCK,BLOCK)
      REAL, TARGET, DIMENSION (100, 100) ::  B, C, D
!HPF$ DISTRIBUTE B(BLOCK, BLOCK)
!HPF$ DISTRIBUTE C(BLOCK, CYCLIC)
      ...
      P => B                 ! Conforming
      P => B(1:50, 1:50)     ! Nonconforming: target must be  a whole array.
      P => C                 ! Nonconforming: the distribution in the
                             ! second dimension does not match
      P => D                 ! Nonconforming: D is not explicitly mapped
      ...
                                                                        \EDOC
The following pointer assignment is valid even though no processor
arrangement is specified for the pointer; in this case, the mapping of
\texttt{B} is a specialization of the mapping of \texttt{P}:
                                                                        \CODE
      REAL, POINTER, DIMENSION(:) :: P
      REAL, TARGET, DIMENSION(100) ::  B
!HPF$ DISTRIBUTE P(BLOCK)
!HPF$ DISTRIBUTE (BLOCK) ONTO P :: B
      ...
      A => B                 ! Conforming
      ...
                                                                        \EDOC

                                                                        \CODE
      REAL, POINTER, DIMENSION(:) :: P
!HPF$ DISTRIBUTE * :: P
      REAL, TARGET, DIMENSION(100) ::  B, C
!HPF$ DISTRIBUTE B(BLOCK), C(CYCLIC)
      ...
      P => B                 ! Conforming
      P => C                 ! Conforming
      P => C(1:50)           ! Nonconforming: target must be a whole array
      ...
                                                                        \EDOC

Here, the {\tt *} is used to indicate a transcriptive distribution for the
pointer {\it A} and thus it can be pointer associated with both
targets {\it B} and {\it C} distributed by {\tt BLOCK}
and {\tt CYCLIC} respectively.  However, it still cannot be used to
point to an array section such as \texttt{C(1:50)}.  To do that, the
pointer must have the \texttt{INHERIT} attribute:

                                                                        \CODE
      REAL, POINTER, DIMENSION(:) :: P
!HPF$ INHERIT :: P
      REAL, TARGET, DIMENSION(100) ::  B, C
!HPF$ DISTRIBUTE B(BLOCK), C(CYCLIC)
      ...
      P => B                 ! Conforming
      P => C                 ! Conforming
      P => C(1:50)           ! Conforming
      ...
                                                                        \EDOC

To allow pointers to have transcriptive distributions, we have
to change the constraint for {\it dist-format-clause} as specified
in Section~\ref{DISTRIBUTE-SECTION}, to read as follows:

\begin{constraints}
\item If either the {\it dist-format-clause} or the {\it dist-target}
in a {\tt DISTRIBUTE} directive begins with ``{\tt *}''
then every {\it distributee} must be a dummy argument,
{\em
except if the {\em distributee}
has the {\tt POINTER} attribute.}
\end{constraints}

The constraint for {\it align-spec} as specified
in Section~\ref{ALIGN-SECTION}, should be changed to read as follows:

\begin{constraints}
\item If the {\it align-spec}
in an {\tt ALIGN} directive begins with ``{\tt *}''
then every {\it alignee} must be a dummy argument,
{\em
except if the {\em alignee}
has the {\tt POINTER} attribute.}
\end{constraints}

The constraint for {\it inheritee} as specified
in Section~\ref{INHERIT-SECTION}, should be changed to read as follows:

\begin{constraints}
\item An {\it inheritee} must be a dummy argument,
{\em
except if the {\em alignee}
has the {\tt POINTER} attribute.}
\end{constraints}

When pointers with such transcriptive mappings
are used in an {\tt ALLOCATE} statement,
the compiler may choose any arbitrary mapping for the allocated data.
A range declaration (see Section~\ref{range}) can be used to restrict
the set of distribution formats.

If a pointer has the \texttt{DYNAMIC} attribute, then any target
associated with the pointer (which must therefore also have the
\texttt{DYNAMIC} attribute) may be remapped using a
\texttt{REDISTRIBUTE} or \texttt{REALIGN} statement under the
following restriction:

A pointer may be used in {\tt REALIGN} and {\tt REDISTRIBUTE} as an
{\it alignee}, {\it align-target}, or {\it distributee} if and only if
it is currently associated with a whole array, not an array section.

Note that when an object is remapped, the new mapping is
visible through any pointer that may be associated with the object.


\subsection{Pointers and Subprograms}


If a dummy pointer argument is not explicitly mapped, then the
actual pointer argument must not also be explicitly mapped.

If a dummy pointer argument has an explicit mapping, then the actual
argument must follow the rules for pointer assignment as stated above,
with one exception:  If the actual argument has the \texttt{DYNAMIC}
attribute, it is not necessary that the corresponding dummy argument
have the \texttt{DYNAMIC} attribute.  That is, item~\ref{PandTdynamic}
on page~\pageref{PandTdynamic} is weakened to

\begin{enumerate}
\item[3.] If a dummy pointer argument has the \texttt{DYNAMIC}
  attribute, then the corresponding actual argument must also have the
  \texttt{DYNAMIC} attribute.
\end{enumerate}

Again a range declaration (see Section~\ref{range}) can be used to
restrict the set of distribution formats of the actual.

A dummy pointer argument may have the \texttt{DYNAMIC} attribute.  In
this case, the actual pointer argument must also have the
\texttt{DYNAMIC} argument. The target associated with the dummy
argument may be redistributed under the restrictions stated in the
last subsection.  Following Fortran rules, if the actual is also
visible (through host- or use-association), the target may be
redistributed only through the dummy argument.  If the dummy argument
is redistributed, then the actual argument has the new mapping on
return from the procedure.  In such a case, the new mapping must match
the range restrictions (if any) of the actual.

\subsection{Restrictions on Pointers and Targets}
\label{mapsubext:pointers}

If, on invocation of a procedure P: (a)~a dummy argument has the
\texttt{TARGET} attribute, and (b)~the corresponding actual argument
has the \texttt{TARGET} attribute and is not an array section with a
vector subscript (and therefore is an object A or a section of an
array A), then the program is not HPF-conforming unless:
\begin{enumerate}
\item No remapping of the actual argument occurs during the call; or
\item the remainder of program execution would be unaffected if
  \begin{enumerate}
  \item\label{first-item} each pointer associated with any portion of
    A before the call were to acquire undefined pointer association
    status on entry to P and, if not reassigned during execution of P,
    were to be restored on exit to the pointer association status it
    had before entry.
  \item\label{second-item} each pointer associated with any portion of
    the dummy argument or with any portion of A during execution of P
    were to acquire undefined pointer association status on exit from
    P; and
  \end{enumerate}
\end{enumerate}

Note that if a dummy argument has the \texttt{TARGET} attribute and no
explicit mapping attributes, then the \texttt{INHERIT} attribute is
implicitly assumed (see section~\ref{INHERIT-SECTION}); therefore no
remapping occurs for such a dummy argument and there is no problem.

\begin{rationale}
  These restrictions are made in order to support the following part
  of the Fortran standard (in section 12.4.1.1 of that document) in
  the face of implicit remapping across the subprogram interface:
  \begin{quote}
    If the dummy argument does not have the \texttt{TARGET} or
    \texttt{POINTER} attribute, any pointers associated with the actual
    argument do not become associated with the corresponding dummy
    argument on invocation of the procedure.

    If the dummy argument has the \texttt{TARGET} attribute and the
    corresponding actual argument has the \texttt{TARGET} attribute but
    is not an array section with a vector subscript:
    \begin{enumerate}
    \item Any pointers associated with the actual argument become
      associated with the corresponding dummy argument on invocation of
      the procedure.

    \item When execution of the procedure completes, any pointers
      associated with the dummy argument remain associated with the actual
      argument.
    \end{enumerate}

    If the dummy argument has the \texttt{TARGET} attribute and the
    corresponding actual argument does not have the \texttt{TARGET}
    attribute or is an array section with a vector subscript, any pointers
    associated with the dummy argument become undefined when execution of
    the procedure completes.
  \end{quote}
\end{rationale}

\subsubsection{Example}

Here is an example that illustrates the restrictions of this section:

\CODE

      INTEGER, TARGET, DIMENSION (10) :: ACT
      INTEGER, POINTER, DIMENSON (:) :: POINTS_TO_ACT, POINTS_TO_DUM
!HPF$ DISTRIBUTE ACT(BLOCK)

      POINTS_TO_ACT => ACT
      CALL F(ACT)
      POINTS_TO_DUM(1) = 1             ! ILLEGAL

      CONTAINS
        SUBROUTINE F(DUM)
          INTEGER, TARGET, DIMENSION(10) :: DUM
        !HPF$ DISTRIBUTE DUM(CYCLIC)

          POINTS_TO_DUM => DUM
          POINTS_TO_ACT(1) = 1         ! ILLEGAL
        END SUBROUTINE
      END

\EDOC

The assignment to \texttt{POINTS_TO_DUM(1)} is illegal because it
violates item~\ref{second-item}; the assignment to
\texttt{POINTS_TO_ACT(1)} is illegal because it violates
item~\ref{first-item}.




\section{Mapping of Derived Type Components}
\label{derived-type}

An {\tt ALIGN}, {\tt DISTRIBUTE}, or {\tt DYNAMIC} directive may appear
within a
{\it derived-type-def} wherever a {\it component-def-stmt} may appear.  Every
{\it alignee} or {\it distributee} within such a directive must be the name of
a component defined within that {\it derived-type-def}.
To allow mapping of the structure components, the rules have to be extended
as follows:


                                                                        \BNF

distributee-extended    \IS  object-name
                        \OR  template-name
                        \OR  component-name
                        \OR  structure-component
                                                                        \FNB

A derived type is said to be an {\it explicitly mapped type}
if any of its components is explicitly mapped or if any of its
components is of an explicitly mapped type.

\begin{constraints}
\item A component of a derived type may be explicitly distributed
only if the type of the component is not an explicitly mapped type.

\item A variable or array of a derived type may be explicitly
distributed only if the derived type is not an explicitly mapped type.

\item A {\it distributee} in a {\tt DISTRIBUTE} directive may not be a
{\it structure-component}.

\item A {\it distributee}
in a {\tt DISTRIBUTE} directive which occurs in a {\it derived-type-def}
must be the {\it component-name} of a component of the derived type.

\item A {\it component-name} may occur as a {\it distributee}
in a {\it DISTRIBUTE} directive occuring within the derived type definition
only.


\item A {\it distributee} that is a {\it structure-component} may occur
only in a {\tt REDISTRIBUTE} directive
and every {\it part-ref}
except the rightmost must be scalar (rank zero).
The rightmost {\it part-name}
in the {\it structure-component} must have the {\tt DYNAMIC} attribute.
\end{constraints}



                                                                        \BNF
alignee-extended        \IS  object-name
                        \OR  component-name
                        \OR  structure-component
                                                                        \FNB



\begin{constraints}
\item A component of a derived type may be explicitly aligned
only if the type of the component is not an explicitly mapped type.

\item A variable or array of a derived type may be explicitly
aligned only if the derived type is not an explicitly mapped type.

\item An {\it alignee} in an {\tt ALIGN} directive may not be a
{\it structure-component}.

\item An {\it alignee}
in a {\tt ALIGN} directive which occurs in a {\it derived-type-def}
must be the {\it component-name} of a component of the derived type.

\item A {\it component-name} may occur as an {\it alignee}
in an {\it ALIGN} directive occuring within the derived type definition
only.

\item An {\it alignee} that is a {\it structure-component} may occur
only in a {\tt REALIGN} directive
and every {\it part-ref}
except the rightmost must be scalar (rank zero).
The rightmost {\it part-name} in the {\it structure-component} must
have the {\tt DYNAMIC} attribute.

\end{constraints}


                                                                        \BNF
align-target-extended   \IS  object-name
                        \OR  template-name
                        \OR  component-name
                        \OR  structure-component
                                                                        \FNB

\begin{constraints}

\item A {\it component-name} may appear as an align target
only in an {\it ALIGN} directive occuring within the derived type definition.

\item A {\it component-name} may occur as an {\it align-target}
in an {\it ALIGN} directive occuring within the derived type definition
only.

\item In an {\it align-target} that is a {\it structure-component}
every {\it part-ref}
except the rightmost must be scalar (rank zero).
\end{constraints}



The above constraints imply that components of derived type can be
mapped within the derived type definition itself such that
when any objects of that type are created the components
will be created with the specified mapping.

An align directive inside a derived type definition may
align a component of the derived type
with another component of the same derived
type or with another object.
A structure component can be used as a target to align other objects
including components of derived types.

If a component of a derived type has been declared {\tt DYNAMIC},
then the component of an object of the type may be
remapped using the {\tt REDISTRIBUTE} or {\tt REALIGN} directive.


Examples:

                                                                        \CODE
      TYPE SIMPLE
        REAL S(100)
!HPF$   DISTRIBUTE S(BLOCK)
      END TYPE SIMPLE

!HPF$ TEMPLATE, DISTRIBUTE(BLOCK, *) :: HAIRY_TEMPLATE(47,73)

      TYPE COMPLICATED
        INTEGER SIZE
        REAL RV(100,100), KV(100,100), QV(47,73)
! Arrays RV, KV, and QV may be mapped
!HPF$   DISTRIBUTE (BLOCK, BLOCK):: RV, KV
!HPF$   ALIGN WITH HAIRY_TEMPLATE :: QV
        TYPE(SIMPLE) SV(100)
! The following directive is not valid because SIMPLE
!  is an explicitly mapped type.
!HPF$   DISTRIBUTE SV(BLOCK)
      END TYPE COMPLICATED

      TYPE(COMPLICATED) LOTSOF(20)

! The following directive is not valid because COMPLICATED
! is an explicitly mapped type.
!HPF$ DISTRIBUTE LOTSOF(BLOCK)
                                                                        \EDOC

Here, a component of the derived type {\tt SIMPLE} has been mapped;
thus objects of this type, e.g., {\tt SV} in type {\tt COMPLICATED},
cannot be distributed. The array {\tt LOTSOF} cannot be distributed for the
same reason.


Components of structures can be remapped using the {\tt REDISTRIBUTE}
or {\tt REALIGN} directive if they have been declared {\tt DYNAMIC}.
For example, the following code fragment can be used to allocate and map
multiple blocks (called {\tt SUBGRID} here) of a multi-block grid:

                                                                        \CODE
!HPF$ PROCESSORS P( number_of_processors() )

      TYPE SUBGRID
        INTEGER SIZE
        INTEGER LO, HI          ! target subset of processors
        REAL, POINTER BL(:)
!HPF$   DYNAMIC BL
      END TYPE SUBGRID

      TYPE (SUBGRID), ALLOCATABLE :: GRID(:)

      READ (*,*) SUBGRID_COUNT
      ALLOCATE GRID(SUBGRID_COUNT)
      DO I = 1, SUBGRID_COUNT
        READ(*,*) GRID(I)%SIZE
      END DO

! Compute processor subsets for each subgrid, setting the LO and HI values
      CALL FIGURE_THE_PROCS ( GRID, number_of_processors())
! Allocate each subgrid and distribute to the computed processors subset
      DO I = 1, SUBGRID_COUNT
        ALLOCATE( GRID(I)%BL( GRID(I)%SIZE ) )
!HPF$   REDISTRIBUTE GRID(I)%BL(BLOCK) ONTO P( GRID(I)%LO : GRID(I)%HI )
     END DO
                                                                        \EDOC



%-----------------------------------------------------------------------------
% Indirect mapping proposal, apparent date Jan 18, 1996
%----------------------------------------------------------------------------

\section{New Distribution Formats}
\label{dist-formats}

This section describes two new
distribution formats. Formally, we can extend the syntax as follows:

                                                                \BNF
extended-dist-format    \IS  BLOCK  [ ( int-expr ) ]
                        \OR  CYCLIC [ ( int-expr ) ]
                        \OR GEN\_BLOCK (int-array)
                        \OR INDIRECT (int-array)
                        \OR  *
                                                                \FNB

\begin{constraints}
\item
    An {\it int-array} appearing in a {\it extended-dist-format} of a
    {\tt DISTRIBUTE} or {\tt REDISTRIBUTE} directive must be an
integer array of rank 1.

\item
    An {\it int-array} appearing in a {\it extended-dist-format} of a
    {\tt DISTRIBUTE} directive must be a {\it restricted-expr}.

\item
    The size of any
 {\it int-array} appearing with a {\tt GEN\_BLOCK} distribution must be
equal to the extent of the corresponding dimension of the target processor
arrangement.

\item
    The size of any
 {\it int-array} appearing with an {\tt INDIRECT} distribution must be
equal to the extent of the corresponding dimension of the {\it distributee}
to which the distribution is to be applied.

\end{constraints}

The ``generalized'' block distribution, {\tt GEN\_BLOCK}, allows
contiguous segments of an array, of possibly unequal sizes, to be mapped
onto processors. The sizes of the segments
are specified by values of a user-defined integer mapping array,
one value per target processor of the mapping.
That is,  the {\it ith} element of the mapping array specifies the
size of the block to be stored on the {\it ith} processor
of the target processor arrangement.
Thus, the values of the mapping arrays are restricted to be
non-negative numbers and their sum must be greater than or equal to
the extent of the corresponding dimension the array being distributed.

The mapping array has to be a restricted expression when used in  the
{\tt DISTRIBUTE}   directive,  but  can  be  an  array  variable  in  a
{\tt REDISTRIBUTE} directive. In the latter case, changing the value of
the  map  array  after  the  directive has been executed will not
change the mapping of the distributed array.


Let {\it l} and {\it u} be the lower and upper bounds of the dimension
of the {\it distributee}, {\it MAP} be the mapping array
 and let {\it BS(i):BE(i)} be the resultant
elements mapped to the {\it ith} processor in the corresponding
dimension of the target processor arrangements.
Then,
\begin{eqnarray*}
BS (1) & = & l,\\
BE (i) & = & max( BS(i)+MAP(i)-1, u),\\
BS (i) & = & BE(i-1) + 1.\\
\end{eqnarray*}

Example:
                                                                        \CODE
        PARAMETER (S = /2,25,20,0,8,65/)
!HPF$ PROCESSORS P(6)
        REAL A(100), B(200), new(6)
!HPF$ DISTRIBUTE A( GEN_BLOCK( S) ) ONTO P
!HPF$ DYNAMIC  B
        ...
       new = ...
!HPF$  REDISTRIBUTE ( B( GEN_BLOCK(new) )
                                                                        \EDOC
Given the above specification, array elements A(1:2) are mapped on P(1),
A(3:27) are mapped on P(2), A(28:37) are mapped on P(3),
no elements are mapped on P(4), A(37:45) are mapped on P(5), and
A(46:100) are mapped on P(6).
The array {\it B} is distributed based on the array {\it new} whose
values are computed at runtime.

\begin{implementors}
Accessing elements of an array distributed using the generalized
block distribution may require accessing the values of the mapping
array at runtime. However, since the size of such an array
is equal to that of the processor arrangement, it can in most
cases be replicated over all processors.

For dynamic arrays, an independent copy of the mapping array
will have to be maintained internally so that a change
in the values of the mapping array does not affect the access
of the distributed array.
\end{implementors}



There are many scientific applications in which the structure  of
the  underlying domain is such that it does not map directly onto
Fortran data structures. For example, in many CFD applications  an
unstructured mesh (consisting of triangles in 2D or tetrahedra in
3D) is used to represent the underlying domain. The nodes of such
a mesh are generally represented by a one-dimensional array while
another is used to  represent  their  interconnections.   Mapping
such  arrays  using the structured distribution mechanisms, {\tt BLOCK}
and {\tt CYCLIC}, results in mappings in which unrelated  elements  are
mapped  onto  the  same  processor. This in turn leads to massive
amounts of unnecessary communication.   What  is  required  is  a
mechanism  to  map a related set of arbitrary array elements onto
the same processor.  The  {\tt INDIRECT}  distribution
provides  such  a  mechanism.


The  {\tt INDIRECT}  distribution  allows  a  many-to-one  mapping   of
elements  of  a  dimension  of a data array to a dimension of the
target processor arrangement. An integer array is used to specify
the  target  processor  of  each  individual element of the array
dimension being distributed.  That is, the {\it ith} element of the mapping
array  provides  the  processor  number  onto which the {\it ith} array
element is to be mapped.  Since the mapping array maps array elements
onto  processor  elements, the extent of the mapping array must match
the extent of the dimension of the array it is distributing. Also,
the  values of the mapping array must lie between the lower and upper
bound of the target dimension of the processor arrangement.

The mapping array has to be a restricted expression when used in  the
{\tt DISTRIBUTE}   directive,  but  can  be  an  array  variable  in  a
{\tt REDISTRIBUTE} directive. In the latter case, changing the value of
the  mapping  array  after  the  directive has been executed will not
change the mapping of the distributed array.



Example:

                                                                        \CODE
!HPF$ PROCESSORS P(4)
        REAL A(100), B(50)
        INTEGER map1(100), map2(50)
        PARAMETER (map1 = /1,3,4,3,3,2,1,4, ..../)
!HPF$ DYNAMIC B
!HPF$ DISTRIBUTE A( INDIRECT(map1) ) ONTO P
!HPF$ DISTRIBUTE map2(BLOCK) ONTO P

        map2 = ...
!HPF$ DISTRIBUTE B( INDIRECT(map2) ) ONTO P
        ....
                                                                        \EDOC

Here, the array {\it A} is distributed statically using the constant
array {\it map1}.  Thus:

\hspace{0.5in}
\parbox{3in}{
A(1) is mapped onto P(1),\\
A(2) is mapped onto P(3),\\
A(3) is mapped onto P(4),\\
A(4) is mapped onto P(3),\\
A(5) is mapped onto P(3),\\
A(6) is mapped onto P(2),\\
A(7) is mapped onto P(1),\\
A(5) is mapped onto P(4), and so on.\\
}

The array {\it B} is declared  dynamic and is redistributed using
the mapping array {\it map2}.



\begin{implementors}
In general, the {\tt INDIRECT} distribution is going to be used in  the
{\tt REDISTRIBUTE}  directive  with an array variable as the map array.
Also, since the size of the mapping array must be  the  same  as  the
array  being  distributed,  it  will  itself  be distributed most
likely using the {\tt BLOCK} distribution.  This raises several issues.
To  correctly  implement  this  distribution,  the runtime system
should maintain a (distributed) copy of the mapping array so that  if
the  program  modifies  the  mapping array, the distribution does not
change.  Using an array variable as a mapping array implies that  the
location  of  each  element  of the array will not be known until
runtime. Thus, a communication maybe required to figure  out  the
location  of  a  specific array element.
\end{implementors}


%-----------------------------------------------------------------------------
% Shadow regions proposal, apparent date Feb 7, 1996
%-----------------------------------------------------------------------------

\section{Range Directive}
\label{range}

The {\tt RANGE} attribute is used to restrict the possible distribution
formats for an object or template that has the {\tt DYNAMIC} attribute or
a transcriptive distribution format (including pointers).


                                                                \BNF
range-directive         \IS RANGE ranger range-attr-stuff-list

ranger                  \IS object-name
                        \OR template-name

range-attr-stuff        \IS ( range-attr-list )

range-attr              \IS range-dist-format
                        \OR ALL

range-dist-format       \IS BLOCK [ ()]
                        \OR  CYCLIC [ () ]
                        \OR GEN\_BLOCK
                        \OR INDIRECT
                        \OR  *

                                                                \FNB

\begin{constraints}

\item
The {\it ranger} must have the {\tt DYNAMIC} attribute,
the {\tt INHERIT} attribute and no explicit {\tt DISTRIBUTE},
or it must have been specified with a {\it dist-format-clause}
of {\tt *} in a {\tt DISTRIBUTE} or combined directive.

\item
The length of any {\it range-attr-list} must be equal to the rank of
the {\it ranger}.

\item
The {\it ranger} must not appear as an alignee in an {\tt ALIGN}
or {\tt REALIGN} directive.
\end{constraints}


Since the length of each {\it range-attr-list} is the same as the rank
of the {\it ranger}, each {\it range-attr}, {\it R}, in each {\it
range-attr-stuff} corresponds positionally to a dimension {\it D} of
the ranger.  This dimension {\it D} in turn either corresponds (though
not necessarily positionally) to an axis {\it A} of the template with
which the ranger is ultimately aligned, or there is no corresponding
axis in that template.

With this notation, a {\tt RANGE} attribute on a {\it ranger} is equivalent to
the following restriction:

For at least one {\it range-attr-stuff} in the {\it
    range-attr-stuff-list}, every {\it range-attr}, {\it R}, must either
  \begin{itemize}
  \item be compatible with the distribution
    format of the corresponding axis {\it A}, if such a corresponding axis
    exists, or
  \item be either {\tt *} or {\tt ALL}, if no such corresponding axis
    exists.
  \end{itemize}


This compatibility must be maintained by any {\tt DISTRIBUTE}
or {\tt REDISTRIBUTE} directive in which the {\it ranger} appears as
a distributee, or if the ranger has the {\tt POINTER}
attribute and is transcriptively distributed,
for any target with which the {\it ranger}
becomes associated.

A distribution format of

\begin{enumerate}

\item
{\tt BLOCK} is compatible with a {\it range-dist-format} of {\tt BLOCK},
{\tt BLOCK()} or {\tt CYCLIC()};

\item
{\tt BLOCK(n)} is compatible with a {\it range-dist-format} of
{\tt BLOCK()} or {\tt CYCLIC()};

\item
{\tt CYCLIC} is compatible with a {\it range-dist-format} of {\tt CYCLIC}
or {\tt CYCLIC()};

\item
{\tt CYCLIC(n)} is compatible with a {\it range-dist-format} of {\tt CYCLIC()};

\item
{\tt GEN\_BLOCK} is compatible with a {\it range-dist-format} of
{\tt GEN\_BLOCK};

\item
{\tt INDIRECT} is compatible with a {\it range-dist-format} of {\tt INDIRECT};

\item
{\tt *} is compatible with a {\it range-dist-format} of {\tt *}.


\end{enumerate}

All distribution formats are compatible with a {\it range-dist-format} of
{\tt ALL}.

Note that the possibility of a {\tt RANGE} directive of the form
                                                                        \ICODE
!HPF$ RANGE range-attr-stuff-list :: ranger-list
                                                                        \EDOC
is covered by syntax rule \ref{combined-directive-rule} for a
{\it combined-directive} using {\it combined-attribute-extended}
as defined in rule \ref{combined-attribute-extended-rule}.


Examples:

                                                                \CODE
!HPF$    DISTRIBUTE T(BLOCK)
!HPF$    ALIGN A(I,J) WITH T(I)

         CALL SUB(A)
            ....

         SUBROUTINE SUB(X)
!HPF$    INHERIT X
!HPF$    RANGE X (BLOCK, *),  (CYCLIC, *)
                                                                \EDOC

Since the ultimate align target of X, the inherited template T in this case,
does not have a second dimension, only a {\tt *} or {\tt ALL} can be
used in the second dimensions of the {\tt range-attr-stuff} for X.


                                                                \CODE
         REAL A(100, 100, 100)
!HPF$    DISTRIBUTE A(BLOCK, *, CYCLIC)

         CALL SUB( A(:,,:,1) )          ! Conforming
         CALL SUB( A(:,,1,:) )          ! Nonconforming
         CALL SUB( A(1,,:,:) )          ! Nonconforming
            ....

         SUBROUTINE SUB(X)
         REAL A(:, :)
!HPF$    INHERIT X
!HPF$    RANGE X (BLOCK, *)
                                                                \EDOC

Given the range directive in the subroutine SUB, only the first call
to SUB is conforming. However, all three calls can be made conforming if
the range directive above is replaced by the following directive:

                                                                \CODE
!HPF$    RANGE (BLOCK, *), (BLOCK, CYCLIC), (*, CYCLIC) :: X
                                                                \EDOC


\section{Shadow Width Declarations}
\label{mapext:shadow}

In compiling nearest-neighbor code, for example, in
discretizing partial differential equations or implementing
convolutions, a standard technique is to allocate storage on each
processor for the local array section that includes additional space
for the elements that have to be moved in from its neighboring
processors.  This additional storage is referred to as ``shadow
edges.''  There are conceptually two shadow edges for each array
dimension:  one at the low end of the local array section, and the
other at the high end.

In a single routine, the compiler can tell which arrays require shadow
edges, and allocate this additional space accordingly.  However, since
the width of the shadow area is dependent on the size of the
computational stencil being used, an array may require different
shadow widths in different routines. Thus, without interprocedural
analysis, an array argument may need to be copied into a space with
the appropriate shadow width on each procedure call. A similar data
motion would be required to copy the data back to its original
location on exit from the subroutine.  This unnecessary data motion
can be avoided by allowing the user to specify the required shadow
width when the array is declared.

Formally, the syntax for declaring shadow widths is as follows:

                                                                \BNF
shadow-directive        \IS SHADOW shadow-target shadow-attr-stuff

shadow-target               \IS object-name
                        \OR component-name

shadow-attr-stuff       \IS ( shadow-spec-list )

shadow-spec             \IS width
                        \OR low-width : high-width

width                   \IS int-expr
low-width               \IS int-expr
high-width              \IS int-expr
                                                                \FNB

\begin{constraints}
\item The \textit{int-expr} representing a \textit{width}, \textit{low-width},
  or \textit{high-width} must be a constant \textit{specification-expr} with
  value greater than equal to 0.

\item
The absence of a \textit{shadow-directive} is equivalent to a {\it
  shadow-directive} having each \textit{shadow-spec} equal to 0.

\item A \textit{shadow-spec} of {\it width} is equivalent to a
  \textit{shadow-spec} of {\it width}:{\it width}
\end{constraints}


Thus, the directive

                                                                \CODE
!HPF$    DISTRIBUTE (BLOCK) :: A
!HPF$    SHADOW (w) :: A
                                                                \EDOC

\noindent
specifies that the array \textit{A} is distributed \texttt{BLOCK}
and is to have a shadow width of \textit{w} on both sides.  If
\textit{A} is a dummy argument, this gives the compiler enough
information to inhibit unnecessary data motion at procedure calls.

Alternatively, different
shadow widths can be specified for the low end and high end
of a dimension. For example:
                                                                \CODE
         REAL, DIMENSION (1000) :: A
!HPF$    DISTRIBUTE(BLOCK), SHADOW(1:2) ::  A
         ....
         FORALL (i = 2, 998)
            A(i) = 0.25* (A(i) + A(i-1) + A(i+1) + A(i+2))
                                                                \EDOC
\noindent
specifies that only one non-local element is needed at the lower end
while two are needed at the high end.

\section{Equivalence and Partial Order on the Set of Mappings}
\label{ext:PartialOrderOnMaps}

Section~\ref{mapsub:PartialOrderOnMaps} has to be changed to
accomodate the new distributions, the \texttt{SHADOW} attribute, and
mapping of components of derived types, all introduced as approved
extensions.  The relevant text now reads as follows; additions are in
\textbf{bold-face} type:

First, we define a notion of equivalence for \textit{dist-format}
specifications:

\begin{enumerate}
\item Each \textit{dist-format} is equivalent to itself.

\item  Using the notation \(\equiv\) for the phrase ``is equivalent to'',

  \[\begin{array}{rclcl}
    \texttt{BLOCK}(n)  &\equiv& \texttt{BLOCK}(m) & \mbox{iff } m
      \mbox{ and } n \mbox{ have the same value} \\
    \texttt{CYCLIC}(n) &\equiv& \texttt{CYCLIC}(m) & \mbox{iff } m
      \mbox{ and } n \mbox{ have the same value} \\
    \texttt{CYCLIC}    &\equiv& \texttt{CYCLIC}(1) \\
    \texttt{GEN\_BLOCK(V)} &\equiv& \texttt{GEN\_BLOCK(W)} &
      \mbox{\textbf{iff the values of corresponding}} \\
       &&& \mbox{\textbf{  elements of
           \texttt{V} and \texttt{W} are equal}} \\
    \texttt{INDIRECT(V)} &\equiv&\texttt{INDIRECT(W)} &
      \mbox{\textbf{iff the values of corresponding}} \\
      &&& \mbox{\textbf{  elements of
          \texttt{V} and \texttt{W} are equal}}
  \end{array}\]

\item Other than this, no two lexically distinct \textit{dist-format}
  specifications are equivalent.
\end{enumerate}

Equivalence, thus defined, is an equivalence relation in the usual
sense.

\textbf{Similarly, we say that two \texttt{SHADOW} attributes (see
  Section~\ref{mapext:shadow} for the syntax) are equivalent iff the
  \textit{shadow-spec-list} of one is element wise equivalent to the
  \textit{shadow-spec-list} of the other.}

Now the partial order on mappings is defined: Let \texttt{S} (``special'')
and \texttt{G} (``general'') be two data objects.

The mapping of \texttt{S} is a \emph{specialization} of the mapping of
\texttt{G} if and only if either

\begin{enumerate}

\item \texttt{G} has the \texttt{INHERIT} attribute, or

\item \texttt{S} does not have the \texttt{INHERIT} attribute, and the
  following constraints all hold:

  \begin{enumerate}
  \item \texttt{S} is a named object \textbf{or structure component}, and
  \item The shapes of the ultimate align targets of \texttt{S} and \texttt{G}
    are the same, and
  \item The dimensions of \texttt{S} and \texttt{G} each correspond to the same
    dimension of their respective ultimate align targets, and
    corresponding elements of \texttt{S} and \texttt{G} are aligned with the
    same corresponding elements of their respective ultimate align
    targets, and
  \item Either

    \begin{enumerate}
    \item The ultimate align targets of \texttt{S} and \texttt{G} are not
      explicitly distributed, or
    \item Both ultimate align targets of \texttt{S} and \texttt{G} are
      explicitly distributed.  In this case, the distribution
      directive specified for the ultimate align target of \texttt{G} must
      satisfy one of the following conditions:

      \begin{enumerate}
      \item It must have no \textit{dist-onto-clause}, or
      \item It must have a \textit{dist-onto-clause} of
        ``\texttt{ONTO} *'', or
      \item It must have a \textit{dist-onto-clause} specifying a
        processor arrangement having the same shape as that
        explicitly specified in a distribution directive for the
        ultimate align target of \texttt{S}.
      \end{enumerate}

      and the distribution directive for the ultimate align target of
      \texttt{G} must also satisfy one of the following conditions:

      \begin{enumerate}
      \item It must have no \textit{dist-format-clause}, or
      \item It must have a \textit{dist-format-clause} of ``*'', or
      \item Each \textit{dist-format} must be equivalent (in the sense
        defined above) to the \textit{dist-format} in the
        corresponding position of the \textit{dist-format-clause} in
        an explicit distribution directive for the ultimate align
        target of \texttt{S}.
      \end{enumerate}

    \end{enumerate}

  \item \textbf{Either \texttt{S} and \texttt{G} have no
      \texttt{SHADOW} attribute or they have equivalent
      \texttt{SHADOW} attributes.}

  \end{enumerate}

\end{enumerate}

\section{Explicit Interfaces}
\label{ext:ExplicitInterfaces}

The requirements in Section~\ref{mapsub:ExplicitInterfaces} are
extended as follows to account for the possible presence of the
\texttt{DYNAMIC} attribute; the addition is in \textbf{bold-face}
type:

An explicit interface is required \emph{except} when all of the
following conditions hold:

\begin{enumerate}

\item  Fortran does not require one, \emph{and}

\item No dummy argument is distributed transcriptively or with the
  \texttt{INHERIT} attribute, \emph{and}

\item \textbf{No dummy argument has the
  \texttt{DYNAMIC} attribute, \emph{and}}

\item For each pair of corresponding actual and dummy arguments, either:

  \begin{enumerate}

  \item They are both implicitly mapped, or

  \item They are both explicitly mapped and

    \begin{enumerate}
    \item The mapping of the actual argument is a specialization of
      the mapping of the dummy argument, and
    \item If the ultimate align targets of the actual and dummy
      arguments are both explicitly distributed, then the
      \textit{dist-onto-clause} of each must specify processor
      arrangements with the same shape.
    \end{enumerate}

  \end{enumerate}

  \emph{and}

\item For each pair of corresponding actual and dummy arguments, either:

  \begin{enumerate}

  \item Both are sequential, or

  \item Both are nonsequential.

  \end{enumerate}

\end{enumerate}

\section{Characteristics of Procedures}
\label{ext:ProcChars}

The \texttt{SHADOW} and \texttt{DYNAMIC} attributes, if present, are
hpf-characteristics of dummy arguments and procedure return values.
To be precise, the definitions in Section~\ref{mapsub:ProcChars} are
rewritten as follows; additions are in \textbf{bold-face} type:

\begin{itemize}
\item A processor arrangement has one hpf-characteristic: its shape.

\item A template has up to three hpf-characteristics:

\begin{enumerate}
\item its shape;
\item its distribution, if explicitly stated;
\item the hpf-characteristic (i.e., the shape) of the processor
  arrangement onto which it is distributed, if explicitly stated.
\end{enumerate}

\item A dummy data object has the following hpf-characteristics:

\begin{enumerate}
\item its alignment, if explicitly stated, as well as all
  hpf-characteristics of its align target;
\item its distribution, if explicitly stated, as well as the
  hpf-characteristic (i.e., the shape) of the processor arrangement
  onto which it is distributed, if explicitly stated;
\item \textbf{its \texttt{SHADOW} attribute, if explicitly stated.}
\item \textbf{its \texttt{DYNAMIC} attribute, if explicitly stated.}
\end{enumerate}

\item A function result has the same hpf-characteristics as a dummy
  data object.  Specifically, it has the following
  hpf-characteristics:

\begin{enumerate}
\item its alignment, if explicitly stated, as well as all
  hpf-characteristics of its align target;
\item its distribution, if explicitly stated, as well as the
  hpf-characteristic (i.e., the shape) of the processor arrangement
  onto which it is distributed, if explicitly stated;
\item \textbf{its \texttt{SHADOW} attribute, if explicitly stated.}
\item \textbf{its \texttt{DYNAMIC} attribute, if explicitly stated.}
\end{enumerate}

\end{itemize}

---------------------------------------------------------------------------
To (un)subscribe to this list, send mail to hpff-doc-request@cs.rice.edu.
Leave the subject line blank, and in the body put the line
(un)subscribe <email-address>
---------------------------------------------------------------------------

From owner-hpff-doc  Tue Nov  5 15:44:37 1996
Received: (from daemon@localhost) by cs.rice.edu (8.7.1/8.7.1) id PAA16577 for hpff-doc-out; Tue, 5 Nov 1996 15:44:37 -0600 (CST)
Date: Tue, 5 Nov 1996 15:44:37 -0600 (CST)
Message-Id: <199611052144.PAA16577@cs.rice.edu>
From: offner@hpc.pko.dec.com (Carl Offner)
Subject: hpff-doc: mapping-base.tex
Sender: owner-hpff-doc
Precedence: bulk

---------------------------------------------------------------------------
hpff-doc@cs.rice.edu is a mailing list for HPF 2.0 language specification
authors and editors.  Instructions for adding or deleting yourself from this
list appear at the bottom of this message.
---------------------------------------------------------------------------
% File: mapping-base.tex

% Contents:
% Mapping constructs for local variables for HPF 2.0 document,
% including
%       DISTRIBUTE
%       ALIGN
%       SEQUENCE
%       some of pointer mappings
%       equivalence of mappings


% Revision history:
% May-10-96     Created by Charles Koelbel, Rice University
%               (from HPF 1.1 document and HPF 2.0 proposals)
% May-22-96     Chapter updated by Piyush Mehrotra
%
% Nov-05-96     Edited by Carl Offner:  added constraints that
%               explicitly distributed or aligned objects may not have
%               the TARGET attribute.


\chapter{Data Mapping}
\label{ch-mapping-base}


HPF data alignment and distribution directives allow the
programmer to advise the compiler
how to assign array elements to processor memories.
This section discusses the basic data mapping features applicable,
particularly those that are meaningful within a single scoping unit.
Section~\ref{ch-mapping-subr} will discuss mapping features that apply
when mapped variables appear as procedure arguments.

\section{Model}

HPF adds directives to Fortran to allow the user to advise the
compiler on the allocation of data objects to processor memories.  The
model is that there is a two-level mapping of data objects to memory
regions, referred to as ``abstract processors.''  Data objects
(typically array elements) are first {\it aligned} relative to one
another; this group of arrays is then {\it distributed} onto a
rectilinear arrangement of abstract processors.  (The implementation
then uses the same number, or perhaps some smaller number, of physical
processors to implement these abstract processors.  This mapping of
abstract processors to physical processors is
implementation-dependent.)

The following diagram illustrates the model:

\begin{center}

\setlength{\unitlength}{0.01in}
\begin{picture}(600,240)(0,30)
\thicklines
\put(100,150){\circle{50}}
\put(242,150){\circle{50}}
\put(383,150){\circle{50}}
\put(525,150){\circle{50}}
\put(125,150){\vector(1,0){92}}
\put(267,150){\vector(1,0){91}}
\put(408,150){\vector(1,0){92}}
\put(56,190){\shortstack{Arrays or\strut\\other objects\strut}}
\put(192,190){\shortstack{Group of\strut\\aligned objects\strut}}
\put(328,190){\shortstack{Abstract\strut\\processors as a\strut
                \\user-declared\strut\\Cartesian mesh\strut}}
\put(494,190){\shortstack{Physical\strut\\processors\strut}}
\put(150,100){\tt ALIGN}
\put(270,100){\tt DISTRIBUTE}
\put(400,50){\shortstack{Optional\strut\\implementation-\strut
                \\dependent\strut\\directive\strut}}
\end{picture}

\end{center}

The underlying assumptions are that an operation on two or more data
objects is likely to be carried out much faster if they all reside in
the same processor, and that it may be possible to carry out many
such operations concurrently if they can be performed on different
processors.

Fortran provides a number of features, notably array syntax, that
make it easy for a compiler to determine that many operations may be
carried out concurrently.  The HPF directives provide a way to inform
the compiler of the recommendation that certain data objects should
reside in the same processor: if two data objects are mapped (via the
two-level mapping of alignment and distribution) to the same abstract
processor, it is a strong recommendation to the implementation that
they ought to reside in the same physical processor.  There is also a
provision for recommending that a data object be stored in multiple
locations, which may complicate any updating of the object but makes
it faster for multiple processors to read the object.

There is a clear separation between directives that serve as
specification statements and directives that serve as executable
statements (in the sense of the Fortran standards).  Specification
statements are carried out on entry to a program unit, as
if all at once; only then are executable statements carried out.
(While it is often convenient to think of specification statements as
being handled at compile time, some of them contain specification
expressions, which are permitted to depend on run-time quantities
such as dummy arguments, and so the values of these expressions may
not be available until run time, specifically the very moment that
program control enters the scoping unit.)

The basic concept is that every array (indeed, every object) is created
with {\em some\/} alignment to an entity, which in turn has {\em some\/}
distribution onto {\em some\/} arrangement of abstract processors.
If the specification statements contain
explicit specification directives specifying the alignment of an
array {\tt A} with respect to another array {\tt B}, then the
distribution of {\tt A} will be dictated by the distribution of {\tt
B}; otherwise, the distribution of {\tt A} itself may be specified
explicitly.  In either case, any such explicit declarative
information is used when the array is created.

\begin{implementors}
This model gives a better picture of the actual amount of work that
needs to be done than a model that says ``the array is created in some
default location, and then realigned and/or redistributed if there is
an explicit directive.''  Using {\tt ALIGN} and {\tt DISTRIBUTE}
specification directives doesn't have to cause any more work at run
time than using the implementation defaults.
\end{implementors}

In the case of an allocatable object, we say that the object is
created whenever it is allocated.  Specification directives for
allocatable objects
may appear in the
{\it specification-part} of a program unit, but take effect each time the
array is created, rather than on entry to the scoping unit.

Alignment is considered an {\em attribute\/} (in the Fortran sense)
of a data object.  If an object {\tt A} is aligned
with an object {\tt B}, which in turn is already aligned
to an object {\tt C}, this is regarded as an alignment of {\tt A} with
{\tt C} directly, with {\tt B} serving only as an intermediary at the
time of specification.
We say that {\tt A} is {\it immediately aligned} with {\tt
B} but {\it ultimately aligned} with {\tt C}.  If an object is not
explicitly aligned with another object, we say that it is ultimately
aligned with itself.  The alignment relationships form a tree with
everything ultimately aligned to the object at the root of the tree;
however, the tree is always immediately ``collapsed'' so that every
object is related directly to the root.

Every object which is the root of an alignment tree has an associated
{\em template\/} or index space.  Typically, this template has the same
rank and size in each dimension as the object associated with it.  (The
most important exception to this rule is dummy arguments with the {\tt
INHERIT} attribute, described in section~\ref{INHERIT-SECTION}.)
We often refer to ``the template for an array,'' which means the
template of the object to which the array is ultimately aligned.  (When
an explicit {\tt TEMPLATE} (see section~\ref{TEMPLATE-SECTION}) is
used, this may be simply the template to which the array is explicitly
aligned.)

The {\em distribution\/} step of the HPF model technically applies to
the template of an array, although because of the close relationship
noted above we often speak loosely of the distribution of an array.
Distribution partitions the template among a set of abstract processors
according to a given pattern.  The combination of alignment (from arrays
to templates) and distribution (from templates to processors) thus
determines the relationship of an array to the processors; we refer to
this relationship as the {\em mapping\/} of the array.  (These remarks
also apply to a scalar, which may be regarded as having an index space
whose sole position is indicated by an empty list of subscripts.)

Every object is created as if according to some complete set of
specification directives; if the program does not include complete
specifications for the mapping of some object, the compiler provides
defaults.  By default an object is not aligned with any other object;
it is ultimately aligned with itself.  The default distribution is
implementation-dependent, but must be expressible as explicit
directives for that implementation.
Identically declared
objects need not be provided with identical default distribution
specifications; the compiler may, for example, take into account the
contexts in which objects are used in executable code.  The programmer
may force identically declared objects to have identical distributions
by specifying such distributions explicitly.  (On the other hand,
identically declared processor arrangements {\it are} guaranteed to
represent ``the same processors arranged the same way.''  This is
discussed in more detail in section~\ref{PROCESSORS-SECTION}.)


Sometimes it is desirable to consider a large index space with
which several smaller arrays are to be aligned, but not to
declare any array that spans the entire index space.
HPF allows one to declare a {\tt TEMPLATE}, which is like an array
whose elements have no content and therefore occupy no storage;
it is merely an abstract index space that can be distributed and
with which arrays may be aligned.




An object is considered to be {\it explicitly mapped}
if it appears in an HPF mapping directive within the scoping
unit in which it is declared;  otherwise it is
{\it implicitly mapped}. A mapping directive is an {\tt ALIGN},
or {\tt DISTRIBUTE},
or {\tt INHERIT}
directive, or any directive that confers an alignment, a
distribution, or the {\tt INHERIT}
attribute.


Note that we extend this model in Section~\ref{ch-mapping-ext} to allow
dynamic redistribution and remapping of objects.



\section{Syntax of Data Alignment and Distribution Directives}

Specification directives in HPF have two forms: specification
statements, analogous to the {\tt DIMENSION} and {\tt ALLOCATABLE}
statements of Fortran;
and an attribute form analogous to type declaration statements
in Fortran using the ``{\tt ::}'' punctuation.

The attribute form allows more than one attribute to be described
in a single directive.  HPF goes beyond Fortran in not requiring
that the first attribute, or indeed any of them, be a type specifier.


                                                                        \BNF
combined-directive         \IS  combined-attribute-list :: combined-decl-list

combined-attribute         \IS  ALIGN align-attribute-stuff
                           \OR  DISTRIBUTE dist-attribute-stuff
                           \OR  INHERIT
                           \OR  TEMPLATE
                           \OR  PROCESSORS
                           \OR  DIMENSION ( explicit-shape-spec-list )
combined-decl       \IS hpf-entity [(explicit-shape-spec-list)]
                            \OR object-name
hpf-entity                       \IS processors-name
                            \OR template-name
                                                                        \FNB

The {\tt INHERIT} attribute is related to subroutine call conventions
and will be discussed in Section~\ref{ch-mapping-subr}.
\begin{constraints}

\item The same {\it combined-attribute} must not appear more
than once in a given {\it combined-directive}.

\item If the {\tt DIMENSION} attribute appears in a {\it
combined-directive}, any entity to which it applies must be declared
with the HPF {\tt TEMPLATE} or {\tt PROCESSORS} type specifier.

\end{constraints}

The following rules constrain the declaration of various attributes,
whether in separate directives or in a combined-directive.

If the {\tt DISTRIBUTE} attribute is present, then every name declared in the
{\it combined-decl-list} is considered to be a {\it distributee} and is subject
to the constraints listed in section~\ref{DISTRIBUTE-SECTION}.

If the {\tt ALIGN} attribute is present, then every name declared in the
{\it entity-decl-list} is considered to be an {\it alignee} and is subject
to the constraints listed in section~\ref{ALIGN-SECTION}.

The HPF keywords {\tt PROCESSORS} and {\tt TEMPLATE} play the role of
type specifiers in declaring processor arrangements and templates.
The HPF keywords {\tt ALIGN}, {\tt DISTRIBUTE},
and {\tt INHERIT} play the role of attributes.  Attributes referring to
processor arrangements, to templates, or to entities with other types
(such as {\tt REAL}) may be combined in an HPF directive without
having the type specifier appear.

No entity may be given a particular attribute more than once.

Dimension information may be specified after an {\it hpf-entity} or in a
{\tt DIMENSION} attribute.  If both are present, the one after the
{\it object-name} overrides the {\tt DIMENSION} attribute (this is consistent
with the Fortran standard).  For example, in:
                                                                        \CODE
!HPF$ TEMPLATE,DIMENSION(64,64) :: A,B,C(32,32),D
                                                                        \EDOC
{\tt A}, {\tt B}, and {\tt D} are \( 64 \times 64 \) templates; {\tt C}
is \( 32 \times 32 \).

Directives mapping a variable must be in the same scoping unit
where the variable is declared.




If a specification expression includes a reference to the value of
an element of an array specified in the same specification-part,
any explicit mapping or {\tt INHERIT} attribute for the array
must be completely specified in prior specification-directives.
(This restriction is inspired by and extends
section 7.1.6.2 of the Fortran standard, which states in part:
If a specification expression includes a reference to the value of
an element of an array specified in the same specification-part,
the array bounds must be specified in a prior declaration.)


A comment on asterisks: The asterisk character ``{\tt *}'' appears
in the syntax rules for HPF alignment and distribution directives
in three distinct roles:
\begin{itemize}
\item  When a lone asterisk appears as a member of a parenthesized
list, it indicates either a collapsed mapping, wherein many
elements of an array may be mapped to the same abstract processor,
or a replicated mapping, wherein each element of an array may be mapped
to many abstract processors.   See the syntax rules for {\it align-source}
and {\it align-subscript} (see section \ref{ALIGN-SECTION})
and for {\it dist-format} (see section \ref{DISTRIBUTE-SECTION}).

\item An asterisk appearing in an {\it align-subscript-use}
expression represents the usual integer multiplication operator.

\item When an asterisk appears before a left parenthesis ``{\tt (}''
or after the keyword {\tt WITH} or {\tt ONTO}, it indicates
a descriptive or transcriptive mapping for dummy arguments of subprograms
(see section~\ref{ch-mapping-subr}) and for mapping of pointers under
the approved extensions (see section~\ref{POINTERS-SECTION}).

\item An asterisk can also be used in the {\tt PASS_BY} attribute
in an interface block to describe dummy arguments passed by reference
to an extrinsic routine written in C
(see Section~\ref{ext-langs}).

\end{itemize}

\section{DISTRIBUTE Directive}
\label{DISTRIBUTE-SECTION}

The {\tt DISTRIBUTE} directive specifies a mapping of data objects
to abstract processors in a processor arrangement.
For example,
                                                                        \CODE
      REAL SALAMI(10000)
!HPF$ DISTRIBUTE SALAMI(BLOCK)
                                                                        \EDOC
specifies that the array {\tt SALAMI} should be distributed across
some set of abstract processors by slicing it uniformly into blocks of
contiguous elements.  If there are 50 processors, the directive
implies that the array should be divided into groups of 200 elements,
with {\tt SALAMI(1:200)} mapped to the first processor,
{\tt SALAMI(201:400)} mapped to the second processor, and so on.
If there is only one processor, the entire array is mapped to that processor
as a single block of 10000 elements.

The block size may be specified explicitly:
                                                                        \CODE
      REAL WEISSWURST(10000)
!HPF$ DISTRIBUTE WEISSWURST(BLOCK(256))
                                                                        \EDOC
This specifies that groups of exactly 256 elements should be
mapped to successive abstract processors.
(There must be at least \( \lceil 10000/256 \rceil = 40 \) abstract processors
if the directive is to be satisfied.  The fortieth processor
will contain a partial block of only 16 elements, namely
{\tt WEISSWURST(9985:10000)}.)

HPF also provides a cyclic distribution format:
                                                                        \CODE
      REAL DECK_OF_CARDS(52)
!HPF$ DISTRIBUTE DECK_OF_CARDS(CYCLIC)
                                                                        \EDOC
If there are 4 abstract processors,
the first processor will contain {\tt DECK_OF_CARDS(1:49:4)},
the second processor will contain {\tt DECK_OF_CARDS(2:50:4)},
the third processor will contain {\tt DECK_OF_CARDS(3:51:4)},
and the fourth processor will contain {\tt DECK_OF_CARDS(4:52:4)}.
Successive array elements are dealt out to successive abstract processors
in round-robin fashion.

Distributions may be specified independently for each dimension of a
multidimensional array:
                                                                        \CODE
      INTEGER CHESS_BOARD(8,8), GO_BOARD(19,19)
!HPF$ DISTRIBUTE CHESS_BOARD(BLOCK, BLOCK)
!HPF$ DISTRIBUTE GO_BOARD(CYCLIC,*)
                                                                        \EDOC
The {\tt CHESS_BOARD} array will be carved up into contiguous
rectangular patches, which will be distributed onto a two-dimensional
arrangement of abstract processors.  The {\tt GO_BOARD} array will have its
rows distributed cyclically over a one-dimensional arrangement of
abstract processors.  (The ``{\tt *}'' specifies that {\tt GO_BOARD} is not to
be distributed along its second axis; thus an entire row is to be
distributed as one object.  This is sometimes called ``on-processor''
distribution.)

The {\tt DISTRIBUTE} directive may appear only in the {\it specification-part}
of a scoping unit
and can contain only a {\it specification-expr} as the
argument to a {\tt BLOCK} or {\tt CYCLIC} option.

Formally, the syntax of the {\tt DISTRIBUTE}
directive is:

                                                                        \BNF
distribute-directive       \IS  DISTRIBUTE distributee dist-directive-stuff

dist-directive-stuff       \IS  dist-format-clause [ dist-onto-clause ]
dist-attribute-stuff       \IS  dist-directive-stuff
                           \OR  dist-onto-clause

distributee                \IS  object-name
                           \OR  template-name

dist-format-clause         \IS  ( dist-format-list )
                           \OR  * ( dist-format-list )
                           \OR  *

dist-format                \IS  BLOCK  [ ( int-expr ) ]
                           \OR  CYCLIC [ ( int-expr ) ]
                           \OR  *

dist-onto-clause           \IS  ONTO dist-target

dist-target                \IS  processors-name
                           \OR  * processors-name
                           \OR  *
                                                                        \FNB

The full syntax is given here for completeness, however some of the
forms will only be discussed in Section~\ref{ch-mapping-subr}.  These
``interprocedural'' forms are:
\begin{itemize}
\item The last two options of rule~\ref{dist-format-clause-rule} (containing
the {\tt *} form)
\item The last two options of rule~\ref{dist-target-rule} (containing
the {\tt *} form)
\end{itemize}

\begin{constraints}
%Do not change order of constraints without changing reference to them below.

\item An {\it object-name} mentioned as a {\it distributee}
must be a simple name and not a subobject designator or a {\it component-name}.

\item An {\it object-name} mentioned as a {\it distributee} may not
appear as an {\it alignee}.

\item An {\it object-name} mentioned as a {\it distributee} may not
have the {\tt POINTER} attribute.

\item An {\it object-name} mentioned as a {\it distributee} may not
have the {\tt TARGET} attribute.


\item If a {\it dist-format-list} is specified, its length must
equal the rank of each {\it distributee}.

\item If both a {\it dist-format-list} and a {\it processors-name} appear, the
number of elements of the {\it dist-format-list} that are not ``{\tt *}''
must equal the rank of the named processor arrangement.

\item If a {\it processors-name} appears but not a {\it dist-format-list}, the
rank of each {\it distributee}
must equal the rank of the named processor arrangement.

\item If either the {\it dist-format-clause} or the {\it dist-target}
in a {\tt DISTRIBUTE} directive begins with ``{\tt *}'' then every {\it
distributee}
must be a dummy argument.

\item Any {\it int-expr} appearing in a {\it dist-format} of a
{\tt DISTRIBUTE} directive must be a {\it specification-expr}.

\end{constraints}

\begin{users}
Some of the above constraints are relaxed under the approved extensions
(see Section~\ref{ch-mapping-ext}):
mapping of derived type components (relaxes constraint 1),
mapping of pointers (relaxes constraints 3 and 7)
and remapping of data objects (relaxes constraint 8).
\end{users}

Note that the possibility of a {\tt DISTRIBUTE} directive of the form
                                                                        \ICODE
!HPF$ DISTRIBUTE dist-attribute-stuff :: distributee-list
                                                                        \EDOC
is covered by syntax rule \ref{combined-directive-rule} for a
{\it combined-directive}.


Examples:
                                                                        \CODE
!HPF$ DISTRIBUTE D1(BLOCK)
!HPF$ DISTRIBUTE (BLOCK,*,BLOCK) ONTO SQUARE:: D2,D3,D4
                                                                        \EDOC

The meanings of the alternatives for {\it dist-format} are given below.

Define the ceiling division function {\tt CD(J,K)~=~(J+K-1)/K} (using
Fortran integer arithmetic with truncation toward zero.)

Define the ceiling remainder function {\tt CR(J,K)~=~J-K*CD(J,K)}.

The dimensions of a processor arrangement appearing as a {\it
dist-target} are said to {\it correspond} in left-to-right order with
those dimensions of a {\it distributee} for which the corresponding
{\it dist-format} is not {\tt *}.  In the example above, processor
arrangement {\tt SQUARE} must be two-dimensional; its first dimension
corresponds to the first dimensions of {\tt D2}, {\tt D3}, and {\tt D4}
and its second dimension corresponds to the third dimensions of {\tt
D2}, {\tt D3}, and {\tt D4}.

Let \(d\) be the size of a {\it distributee} in a certain dimension and
let \(p\) be the size of the processor arrangement in the corresponding
dimension.  For simplicity, assume all dimensions have a lower bound of
1.  Then {\tt BLOCK(\(m\))} means that a {\it distributee} position
whose index along that dimension is \(j\) is mapped to an abstract
processor whose index along the corresponding dimension of the
processor arrangement is {\tt CD(\(j\),\(m\))} (note that \(m \times p
\geq d\) must be true), and is position number {\tt
\(m\)+CR(\(j\),\(m\))} among positions mapped to that abstract
processor.  The first {\it distributee} position in abstract processor
\(k\) along that axis is position number {\tt 1+\(m\)*(\(k\)-1)}.

The block size \(m\) must be a positive integer.


{\tt BLOCK} by definition means the same as {\tt
BLOCK(CD(\(d\),\(p\)))}.

{\tt CYCLIC(\(m\))} means that a {\it distributee} position whose index
along that dimension is \(j\) is mapped to an abstract processor whose
index along the corresponding dimension of the processor arrangement is
{\tt 1+MODULO(CD(\(j\),\(m\))-1,\(p\))}.  The first {\it distributee}
position in abstract processor \(k\) along that axis is position number
{\tt 1+\(m\)*(\(k\)-1)}.

The block size \(m\) must be a positive integer.


{\tt CYCLIC} by definition means the same as {\tt CYCLIC(1)}.

{\tt CYCLIC(\(m\))} and {\tt BLOCK(\(m\))} imply the same distribution
when \( m \times p \geq d \), but {\tt BLOCK(\(m\))} additionally
asserts that the distribution will not wrap around in a cyclic manner,
which a compiler cannot determine at compile time if \(m\) is not
constant.  Note that {\tt CYCLIC} and {\tt BLOCK} (without argument
expressions) do not imply the same distribution unless \( p \geq d \),
a degenerate case in which the block size is 1 and the distribution
does not wrap around.

Suppose that we have 16 abstract processors and an array of length 100:
                                                                        \CODE
!HPF$ PROCESSORS SEDECIM(16)
      REAL CENTURY(100)
                                                                        \EDOC
Distributing the array {\tt BLOCK} (which in this case would mean
the same as {\tt BLOCK(7)}):
                                                                        \CODE
!HPF$ DISTRIBUTE CENTURY(BLOCK) ONTO SEDECIM
                                                                        \EDOC
results in this mapping of array elements onto abstract processors:
\begin{center}
\setlength{\unitlength}{0.01in}
\begin{picture}(560,200)(0,0)
\put(0,175){\makebox(35,25){\small\rm 1}}
\put(35,175){\makebox(35,25){\small\rm 2}}
\put(70,175){\makebox(35,25){\small\rm 3}}
\put(105,175){\makebox(35,25){\small\rm 4}}
\put(140,175){\makebox(35,25){\small\rm 5}}
\put(175,175){\makebox(35,25){\small\rm 6}}
\put(210,175){\makebox(35,25){\small\rm 7}}
\put(245,175){\makebox(35,25){\small\rm 8}}
\put(280,175){\makebox(35,25){\small\rm 9}}
\put(315,175){\makebox(35,25){\small\rm 10}}
\put(350,175){\makebox(35,25){\small\rm 11}}
\put(385,175){\makebox(35,25){\small\rm 12}}
\put(420,175){\makebox(35,25){\small\rm 13}}
\put(455,175){\makebox(35,25){\small\rm 14}}
\put(490,175){\makebox(35,25){\small\rm 15}}
\put(525,175){\makebox(35,25){\small\rm 16}}
\thinlines
\multiput(0,25)(0,25){6}{\line(1,0){560}}
\thicklines
\multiput(0,0)(0,175){2}{\line(1,0){560}}
\multiput(0,0)(35,0){17}{\line(0,1){175}}
\put(0,150){\makebox(35,25){\tt 1}}
\put(0,125){\makebox(35,25){\tt 2}}
\put(0,100){\makebox(35,25){\tt 3}}
\put(0,75){\makebox(35,25){\tt 4}}
\put(0,50){\makebox(35,25){\tt 5}}
\put(0,25){\makebox(35,25){\tt 6}}
\put(0,0){\makebox(35,25){\tt 7}}
\put(35,150){\makebox(35,25){\tt 8}}
\put(35,125){\makebox(35,25){\tt 9}}
\put(35,100){\makebox(35,25){\tt 10}}
\put(35,75){\makebox(35,25){\tt 11}}
\put(35,50){\makebox(35,25){\tt 12}}
\put(35,25){\makebox(35,25){\tt 13}}
\put(35,0){\makebox(35,25){\tt 14}}
\put(70,150){\makebox(35,25){\tt 15}}
\put(70,125){\makebox(35,25){\tt 16}}
\put(70,100){\makebox(35,25){\tt 17}}
\put(70,75){\makebox(35,25){\tt 18}}
\put(70,50){\makebox(35,25){\tt 19}}
\put(70,25){\makebox(35,25){\tt 20}}
\put(70,0){\makebox(35,25){\tt 21}}
\put(105,150){\makebox(35,25){\tt 22}}
\put(105,125){\makebox(35,25){\tt 23}}
\put(105,100){\makebox(35,25){\tt 24}}
\put(105,75){\makebox(35,25){\tt 25}}
\put(105,50){\makebox(35,25){\tt 26}}
\put(105,25){\makebox(35,25){\tt 27}}
\put(105,0){\makebox(35,25){\tt 28}}
\put(140,150){\makebox(35,25){\tt 29}}
\put(140,125){\makebox(35,25){\tt 30}}
\put(140,100){\makebox(35,25){\tt 31}}
\put(140,75){\makebox(35,25){\tt 32}}
\put(140,50){\makebox(35,25){\tt 33}}
\put(140,25){\makebox(35,25){\tt 34}}
\put(140,0){\makebox(35,25){\tt 35}}
\put(175,150){\makebox(35,25){\tt 36}}
\put(175,125){\makebox(35,25){\tt 37}}
\put(175,100){\makebox(35,25){\tt 38}}
\put(175,75){\makebox(35,25){\tt 39}}
\put(175,50){\makebox(35,25){\tt 40}}
\put(175,25){\makebox(35,25){\tt 41}}
\put(175,0){\makebox(35,25){\tt 42}}
\put(210,150){\makebox(35,25){\tt 43}}
\put(210,125){\makebox(35,25){\tt 44}}
\put(210,100){\makebox(35,25){\tt 45}}
\put(210,75){\makebox(35,25){\tt 46}}
\put(210,50){\makebox(35,25){\tt 47}}
\put(210,25){\makebox(35,25){\tt 48}}
\put(210,0){\makebox(35,25){\tt 49}}
\put(245,150){\makebox(35,25){\tt 50}}
\put(245,125){\makebox(35,25){\tt 51}}
\put(245,100){\makebox(35,25){\tt 52}}
\put(245,75){\makebox(35,25){\tt 53}}
\put(245,50){\makebox(35,25){\tt 54}}
\put(245,25){\makebox(35,25){\tt 55}}
\put(245,0){\makebox(35,25){\tt 56}}
\put(280,150){\makebox(35,25){\tt 57}}
\put(280,125){\makebox(35,25){\tt 58}}
\put(280,100){\makebox(35,25){\tt 59}}
\put(280,75){\makebox(35,25){\tt 60}}
\put(280,50){\makebox(35,25){\tt 61}}
\put(280,25){\makebox(35,25){\tt 62}}
\put(280,0){\makebox(35,25){\tt 63}}
\put(315,150){\makebox(35,25){\tt 64}}
\put(315,125){\makebox(35,25){\tt 65}}
\put(315,100){\makebox(35,25){\tt 66}}
\put(315,75){\makebox(35,25){\tt 67}}
\put(315,50){\makebox(35,25){\tt 68}}
\put(315,25){\makebox(35,25){\tt 69}}
\put(315,0){\makebox(35,25){\tt 70}}
\put(350,150){\makebox(35,25){\tt 71}}
\put(350,125){\makebox(35,25){\tt 72}}
\put(350,100){\makebox(35,25){\tt 73}}
\put(350,75){\makebox(35,25){\tt 74}}
\put(350,50){\makebox(35,25){\tt 75}}
\put(350,25){\makebox(35,25){\tt 76}}
\put(350,0){\makebox(35,25){\tt 77}}
\put(385,150){\makebox(35,25){\tt 78}}
\put(385,125){\makebox(35,25){\tt 79}}
\put(385,100){\makebox(35,25){\tt 80}}
\put(385,75){\makebox(35,25){\tt 81}}
\put(385,50){\makebox(35,25){\tt 82}}
\put(385,25){\makebox(35,25){\tt 83}}
\put(385,0){\makebox(35,25){\tt 84}}
\put(420,150){\makebox(35,25){\tt 85}}
\put(420,125){\makebox(35,25){\tt 86}}
\put(420,100){\makebox(35,25){\tt 87}}
\put(420,75){\makebox(35,25){\tt 88}}
\put(420,50){\makebox(35,25){\tt 89}}
\put(420,25){\makebox(35,25){\tt 90}}
\put(420,0){\makebox(35,25){\tt 91}}
\put(455,150){\makebox(35,25){\tt 92}}
\put(455,125){\makebox(35,25){\tt 93}}
\put(455,100){\makebox(35,25){\tt 94}}
\put(455,75){\makebox(35,25){\tt 95}}
\put(455,50){\makebox(35,25){\tt 96}}
\put(455,25){\makebox(35,25){\tt 97}}
\put(455,0){\makebox(35,25){\tt 98}}
\put(490,150){\makebox(35,25){\tt 99}}
\put(490,125){\makebox(35,25){\tt 100}}
\end{picture}
\end{center}
Distributing the array {\tt BLOCK(8)}:
                                                                        \CODE
!HPF$ DISTRIBUTE CENTURY(BLOCK(8)) ONTO SEDECIM
                                                                        \EDOC
results in this mapping of array elements onto abstract processors:
\begin{center}
\setlength{\unitlength}{0.01in}
\begin{picture}(560,225)(0,0)
\put(0,200){\makebox(35,25){\small\rm 1}}
\put(35,200){\makebox(35,25){\small\rm 2}}
\put(70,200){\makebox(35,25){\small\rm 3}}
\put(105,200){\makebox(35,25){\small\rm 4}}
\put(140,200){\makebox(35,25){\small\rm 5}}
\put(175,200){\makebox(35,25){\small\rm 6}}
\put(210,200){\makebox(35,25){\small\rm 7}}
\put(245,200){\makebox(35,25){\small\rm 8}}
\put(280,200){\makebox(35,25){\small\rm 9}}
\put(315,200){\makebox(35,25){\small\rm 10}}
\put(350,200){\makebox(35,25){\small\rm 11}}
\put(385,200){\makebox(35,25){\small\rm 12}}
\put(420,200){\makebox(35,25){\small\rm 13}}
\put(455,200){\makebox(35,25){\small\rm 14}}
\put(490,200){\makebox(35,25){\small\rm 15}}
\put(525,200){\makebox(35,25){\small\rm 16}}
\thinlines
\multiput(0,25)(0,25){7}{\line(1,0){560}}
\thicklines
\multiput(0,0)(0,200){2}{\line(1,0){560}}
\multiput(0,0)(35,0){17}{\line(0,1){200}}
\put(0,175){\makebox(35,25){\tt 1}}
\put(0,150){\makebox(35,25){\tt 2}}
\put(0,125){\makebox(35,25){\tt 3}}
\put(0,100){\makebox(35,25){\tt 4}}
\put(0,75){\makebox(35,25){\tt 5}}
\put(0,50){\makebox(35,25){\tt 6}}
\put(0,25){\makebox(35,25){\tt 7}}
\put(0,0){\makebox(35,25){\tt 8}}
\put(35,175){\makebox(35,25){\tt 9}}
\put(35,150){\makebox(35,25){\tt 10}}
\put(35,125){\makebox(35,25){\tt 11}}
\put(35,100){\makebox(35,25){\tt 12}}
\put(35,75){\makebox(35,25){\tt 13}}
\put(35,50){\makebox(35,25){\tt 14}}
\put(35,25){\makebox(35,25){\tt 15}}
\put(35,0){\makebox(35,25){\tt 16}}
\put(70,175){\makebox(35,25){\tt 17}}
\put(70,150){\makebox(35,25){\tt 18}}
\put(70,125){\makebox(35,25){\tt 19}}
\put(70,100){\makebox(35,25){\tt 20}}
\put(70,75){\makebox(35,25){\tt 21}}
\put(70,50){\makebox(35,25){\tt 22}}
\put(70,25){\makebox(35,25){\tt 23}}
\put(70,0){\makebox(35,25){\tt 24}}
\put(105,175){\makebox(35,25){\tt 25}}
\put(105,150){\makebox(35,25){\tt 26}}
\put(105,125){\makebox(35,25){\tt 27}}
\put(105,100){\makebox(35,25){\tt 28}}
\put(105,75){\makebox(35,25){\tt 29}}
\put(105,50){\makebox(35,25){\tt 30}}
\put(105,25){\makebox(35,25){\tt 31}}
\put(105,0){\makebox(35,25){\tt 32}}
\put(140,175){\makebox(35,25){\tt 33}}
\put(140,150){\makebox(35,25){\tt 34}}
\put(140,125){\makebox(35,25){\tt 35}}
\put(140,100){\makebox(35,25){\tt 36}}
\put(140,75){\makebox(35,25){\tt 37}}
\put(140,50){\makebox(35,25){\tt 38}}
\put(140,25){\makebox(35,25){\tt 39}}
\put(140,0){\makebox(35,25){\tt 40}}
\put(175,175){\makebox(35,25){\tt 41}}
\put(175,150){\makebox(35,25){\tt 42}}
\put(175,125){\makebox(35,25){\tt 43}}
\put(175,100){\makebox(35,25){\tt 44}}
\put(175,75){\makebox(35,25){\tt 45}}
\put(175,50){\makebox(35,25){\tt 46}}
\put(175,25){\makebox(35,25){\tt 47}}
\put(175,0){\makebox(35,25){\tt 48}}
\put(210,175){\makebox(35,25){\tt 49}}
\put(210,150){\makebox(35,25){\tt 50}}
\put(210,125){\makebox(35,25){\tt 51}}
\put(210,100){\makebox(35,25){\tt 52}}
\put(210,75){\makebox(35,25){\tt 53}}
\put(210,50){\makebox(35,25){\tt 54}}
\put(210,25){\makebox(35,25){\tt 55}}
\put(210,0){\makebox(35,25){\tt 56}}
\put(245,175){\makebox(35,25){\tt 57}}
\put(245,150){\makebox(35,25){\tt 58}}
\put(245,125){\makebox(35,25){\tt 59}}
\put(245,100){\makebox(35,25){\tt 60}}
\put(245,75){\makebox(35,25){\tt 61}}
\put(245,50){\makebox(35,25){\tt 62}}
\put(245,25){\makebox(35,25){\tt 63}}
\put(245,0){\makebox(35,25){\tt 64}}
\put(280,175){\makebox(35,25){\tt 65}}
\put(280,150){\makebox(35,25){\tt 66}}
\put(280,125){\makebox(35,25){\tt 67}}
\put(280,100){\makebox(35,25){\tt 68}}
\put(280,75){\makebox(35,25){\tt 69}}
\put(280,50){\makebox(35,25){\tt 70}}
\put(280,25){\makebox(35,25){\tt 71}}
\put(280,0){\makebox(35,25){\tt 72}}
\put(315,175){\makebox(35,25){\tt 73}}
\put(315,150){\makebox(35,25){\tt 74}}
\put(315,125){\makebox(35,25){\tt 75}}
\put(315,100){\makebox(35,25){\tt 76}}
\put(315,75){\makebox(35,25){\tt 77}}
\put(315,50){\makebox(35,25){\tt 78}}
\put(315,25){\makebox(35,25){\tt 79}}
\put(315,0){\makebox(35,25){\tt 80}}
\put(350,175){\makebox(35,25){\tt 81}}
\put(350,150){\makebox(35,25){\tt 82}}
\put(350,125){\makebox(35,25){\tt 83}}
\put(350,100){\makebox(35,25){\tt 84}}
\put(350,75){\makebox(35,25){\tt 85}}
\put(350,50){\makebox(35,25){\tt 86}}
\put(350,25){\makebox(35,25){\tt 87}}
\put(350,0){\makebox(35,25){\tt 88}}
\put(385,175){\makebox(35,25){\tt 89}}
\put(385,150){\makebox(35,25){\tt 90}}
\put(385,125){\makebox(35,25){\tt 91}}
\put(385,100){\makebox(35,25){\tt 92}}
\put(385,75){\makebox(35,25){\tt 93}}
\put(385,50){\makebox(35,25){\tt 94}}
\put(385,25){\makebox(35,25){\tt 95}}
\put(385,0){\makebox(35,25){\tt 96}}
\put(420,175){\makebox(35,25){\tt 97}}
\put(420,150){\makebox(35,25){\tt 98}}
\put(420,125){\makebox(35,25){\tt 99}}
\put(420,100){\makebox(35,25){\tt 100}}
\end{picture}
\end{center}
Distributing the array {\tt BLOCK(6)} is not HPF-conforming because
\(6 \times 16 < 100\).

Distributing the array {\tt CYCLIC} (which means exactly
the same as {\tt CYCLIC(1)}):
                                                                        \CODE
!HPF$ DISTRIBUTE CENTURY(CYCLIC) ONTO SEDECIM
                                                                        \EDOC
results in this mapping of array elements onto abstract processors:
\begin{center}
\setlength{\unitlength}{0.01in}
\begin{picture}(560,200)(0,0)
\put(0,175){\makebox(35,25){\small\rm 1}}
\put(35,175){\makebox(35,25){\small\rm 2}}
\put(70,175){\makebox(35,25){\small\rm 3}}
\put(105,175){\makebox(35,25){\small\rm 4}}
\put(140,175){\makebox(35,25){\small\rm 5}}
\put(175,175){\makebox(35,25){\small\rm 6}}
\put(210,175){\makebox(35,25){\small\rm 7}}
\put(245,175){\makebox(35,25){\small\rm 8}}
\put(280,175){\makebox(35,25){\small\rm 9}}
\put(315,175){\makebox(35,25){\small\rm 10}}
\put(350,175){\makebox(35,25){\small\rm 11}}
\put(385,175){\makebox(35,25){\small\rm 12}}
\put(420,175){\makebox(35,25){\small\rm 13}}
\put(455,175){\makebox(35,25){\small\rm 14}}
\put(490,175){\makebox(35,25){\small\rm 15}}
\put(525,175){\makebox(35,25){\small\rm 16}}
\thinlines
\multiput(0,25)(0,25){6}{\line(1,0){560}}
\thicklines
\multiput(0,0)(0,175){2}{\line(1,0){560}}
\multiput(0,0)(35,0){17}{\line(0,1){175}}
\put(0,150){\makebox(35,25){\tt 1}}
\put(35,150){\makebox(35,25){\tt 2}}
\put(70,150){\makebox(35,25){\tt 3}}
\put(105,150){\makebox(35,25){\tt 4}}
\put(140,150){\makebox(35,25){\tt 5}}
\put(175,150){\makebox(35,25){\tt 6}}
\put(210,150){\makebox(35,25){\tt 7}}
\put(245,150){\makebox(35,25){\tt 8}}
\put(280,150){\makebox(35,25){\tt 9}}
\put(315,150){\makebox(35,25){\tt 10}}
\put(350,150){\makebox(35,25){\tt 11}}
\put(385,150){\makebox(35,25){\tt 12}}
\put(420,150){\makebox(35,25){\tt 13}}
\put(455,150){\makebox(35,25){\tt 14}}
\put(490,150){\makebox(35,25){\tt 15}}
\put(525,150){\makebox(35,25){\tt 16}}
\put(0,125){\makebox(35,25){\tt 17}}
\put(35,125){\makebox(35,25){\tt 18}}
\put(70,125){\makebox(35,25){\tt 19}}
\put(105,125){\makebox(35,25){\tt 20}}
\put(140,125){\makebox(35,25){\tt 21}}
\put(175,125){\makebox(35,25){\tt 22}}
\put(210,125){\makebox(35,25){\tt 23}}
\put(245,125){\makebox(35,25){\tt 24}}
\put(280,125){\makebox(35,25){\tt 25}}
\put(315,125){\makebox(35,25){\tt 26}}
\put(350,125){\makebox(35,25){\tt 27}}
\put(385,125){\makebox(35,25){\tt 28}}
\put(420,125){\makebox(35,25){\tt 29}}
\put(455,125){\makebox(35,25){\tt 30}}
\put(490,125){\makebox(35,25){\tt 31}}
\put(525,125){\makebox(35,25){\tt 32}}
\put(0,100){\makebox(35,25){\tt 33}}
\put(35,100){\makebox(35,25){\tt 34}}
\put(70,100){\makebox(35,25){\tt 35}}
\put(105,100){\makebox(35,25){\tt 36}}
\put(140,100){\makebox(35,25){\tt 37}}
\put(175,100){\makebox(35,25){\tt 38}}
\put(210,100){\makebox(35,25){\tt 39}}
\put(245,100){\makebox(35,25){\tt 40}}
\put(280,100){\makebox(35,25){\tt 41}}
\put(315,100){\makebox(35,25){\tt 42}}
\put(350,100){\makebox(35,25){\tt 43}}
\put(385,100){\makebox(35,25){\tt 44}}
\put(420,100){\makebox(35,25){\tt 45}}
\put(455,100){\makebox(35,25){\tt 46}}
\put(490,100){\makebox(35,25){\tt 47}}
\put(525,100){\makebox(35,25){\tt 48}}
\put(0,75){\makebox(35,25){\tt 49}}
\put(35,75){\makebox(35,25){\tt 50}}
\put(70,75){\makebox(35,25){\tt 51}}
\put(105,75){\makebox(35,25){\tt 52}}
\put(140,75){\makebox(35,25){\tt 53}}
\put(175,75){\makebox(35,25){\tt 54}}
\put(210,75){\makebox(35,25){\tt 55}}
\put(245,75){\makebox(35,25){\tt 56}}
\put(280,75){\makebox(35,25){\tt 57}}
\put(315,75){\makebox(35,25){\tt 58}}
\put(350,75){\makebox(35,25){\tt 59}}
\put(385,75){\makebox(35,25){\tt 60}}
\put(420,75){\makebox(35,25){\tt 61}}
\put(455,75){\makebox(35,25){\tt 62}}
\put(490,75){\makebox(35,25){\tt 63}}
\put(525,75){\makebox(35,25){\tt 64}}
\put(0,50){\makebox(35,25){\tt 65}}
\put(35,50){\makebox(35,25){\tt 66}}
\put(70,50){\makebox(35,25){\tt 67}}
\put(105,50){\makebox(35,25){\tt 68}}
\put(140,50){\makebox(35,25){\tt 69}}
\put(175,50){\makebox(35,25){\tt 70}}
\put(210,50){\makebox(35,25){\tt 71}}
\put(245,50){\makebox(35,25){\tt 72}}
\put(280,50){\makebox(35,25){\tt 73}}
\put(315,50){\makebox(35,25){\tt 74}}
\put(350,50){\makebox(35,25){\tt 75}}
\put(385,50){\makebox(35,25){\tt 76}}
\put(420,50){\makebox(35,25){\tt 77}}
\put(455,50){\makebox(35,25){\tt 78}}
\put(490,50){\makebox(35,25){\tt 79}}
\put(525,50){\makebox(35,25){\tt 80}}
\put(0,25){\makebox(35,25){\tt 81}}
\put(35,25){\makebox(35,25){\tt 82}}
\put(70,25){\makebox(35,25){\tt 83}}
\put(105,25){\makebox(35,25){\tt 84}}
\put(140,25){\makebox(35,25){\tt 85}}
\put(175,25){\makebox(35,25){\tt 86}}
\put(210,25){\makebox(35,25){\tt 87}}
\put(245,25){\makebox(35,25){\tt 88}}
\put(280,25){\makebox(35,25){\tt 89}}
\put(315,25){\makebox(35,25){\tt 90}}
\put(350,25){\makebox(35,25){\tt 91}}
\put(385,25){\makebox(35,25){\tt 92}}
\put(420,25){\makebox(35,25){\tt 93}}
\put(455,25){\makebox(35,25){\tt 94}}
\put(490,25){\makebox(35,25){\tt 95}}
\put(525,25){\makebox(35,25){\tt 96}}
\put(0,0){\makebox(35,25){\tt 97}}
\put(35,0){\makebox(35,25){\tt 98}}
\put(70,0){\makebox(35,25){\tt 99}}
\put(105,0){\makebox(35,25){\tt 100}}
\end{picture}
\end{center}
Distributing the array {\tt CYCLIC(3)}:
                                                                        \CODE
!HPF$ DISTRIBUTE CENTURY(CYCLIC(3)) ONTO SEDECIM
                                                                        \EDOC
results in this mapping of array elements onto abstract processors:
\begin{center}
\setlength{\unitlength}{0.01in}
\begin{picture}(560,250)(0,0)
\put(0,225){\makebox(35,25){\small\rm 1}}
\put(35,225){\makebox(35,25){\small\rm 2}}
\put(70,225){\makebox(35,25){\small\rm 3}}
\put(105,225){\makebox(35,25){\small\rm 4}}
\put(140,225){\makebox(35,25){\small\rm 5}}
\put(175,225){\makebox(35,25){\small\rm 6}}
\put(210,225){\makebox(35,25){\small\rm 7}}
\put(245,225){\makebox(35,25){\small\rm 8}}
\put(280,225){\makebox(35,25){\small\rm 9}}
\put(315,225){\makebox(35,25){\small\rm 10}}
\put(350,225){\makebox(35,25){\small\rm 11}}
\put(385,225){\makebox(35,25){\small\rm 12}}
\put(420,225){\makebox(35,25){\small\rm 13}}
\put(455,225){\makebox(35,25){\small\rm 14}}
\put(490,225){\makebox(35,25){\small\rm 15}}
\put(525,225){\makebox(35,25){\small\rm 16}}
\thinlines
\multiput(0,25)(0,25){8}{\line(1,0){560}}
\thicklines
\multiput(0,0)(0,225){2}{\line(1,0){560}}
\multiput(0,0)(35,0){17}{\line(0,1){225}}
\put(0,200){\makebox(35,25){\tt 1}}
\put(0,175){\makebox(35,25){\tt 2}}
\put(0,150){\makebox(35,25){\tt 3}}
\put(35,200){\makebox(35,25){\tt 4}}
\put(35,175){\makebox(35,25){\tt 5}}
\put(35,150){\makebox(35,25){\tt 6}}
\put(70,200){\makebox(35,25){\tt 7}}
\put(70,175){\makebox(35,25){\tt 8}}
\put(70,150){\makebox(35,25){\tt 9}}
\put(105,200){\makebox(35,25){\tt 10}}
\put(105,175){\makebox(35,25){\tt 11}}
\put(105,150){\makebox(35,25){\tt 12}}
\put(140,200){\makebox(35,25){\tt 13}}
\put(140,175){\makebox(35,25){\tt 14}}
\put(140,150){\makebox(35,25){\tt 15}}
\put(175,200){\makebox(35,25){\tt 16}}
\put(175,175){\makebox(35,25){\tt 17}}
\put(175,150){\makebox(35,25){\tt 18}}
\put(210,200){\makebox(35,25){\tt 19}}
\put(210,175){\makebox(35,25){\tt 20}}
\put(210,150){\makebox(35,25){\tt 21}}
\put(245,200){\makebox(35,25){\tt 22}}
\put(245,175){\makebox(35,25){\tt 23}}
\put(245,150){\makebox(35,25){\tt 24}}
\put(280,200){\makebox(35,25){\tt 25}}
\put(280,175){\makebox(35,25){\tt 26}}
\put(280,150){\makebox(35,25){\tt 27}}
\put(315,200){\makebox(35,25){\tt 28}}
\put(315,175){\makebox(35,25){\tt 29}}
\put(315,150){\makebox(35,25){\tt 30}}
\put(350,200){\makebox(35,25){\tt 31}}
\put(350,175){\makebox(35,25){\tt 32}}
\put(350,150){\makebox(35,25){\tt 33}}
\put(385,200){\makebox(35,25){\tt 34}}
\put(385,175){\makebox(35,25){\tt 35}}
\put(385,150){\makebox(35,25){\tt 36}}
\put(420,200){\makebox(35,25){\tt 37}}
\put(420,175){\makebox(35,25){\tt 38}}
\put(420,150){\makebox(35,25){\tt 39}}
\put(455,200){\makebox(35,25){\tt 40}}
\put(455,175){\makebox(35,25){\tt 41}}
\put(455,150){\makebox(35,25){\tt 42}}
\put(490,200){\makebox(35,25){\tt 43}}
\put(490,175){\makebox(35,25){\tt 44}}
\put(490,150){\makebox(35,25){\tt 45}}
\put(525,200){\makebox(35,25){\tt 46}}
\put(525,175){\makebox(35,25){\tt 47}}
\put(525,150){\makebox(35,25){\tt 48}}
\put(0,125){\makebox(35,25){\tt 49}}
\put(0,100){\makebox(35,25){\tt 50}}
\put(0,75){\makebox(35,25){\tt 51}}
\put(35,125){\makebox(35,25){\tt 52}}
\put(35,100){\makebox(35,25){\tt 53}}
\put(35,75){\makebox(35,25){\tt 54}}
\put(70,125){\makebox(35,25){\tt 55}}
\put(70,100){\makebox(35,25){\tt 56}}
\put(70,75){\makebox(35,25){\tt 57}}
\put(105,125){\makebox(35,25){\tt 58}}
\put(105,100){\makebox(35,25){\tt 59}}
\put(105,75){\makebox(35,25){\tt 60}}
\put(140,125){\makebox(35,25){\tt 61}}
\put(140,100){\makebox(35,25){\tt 62}}
\put(140,75){\makebox(35,25){\tt 63}}
\put(175,125){\makebox(35,25){\tt 64}}
\put(175,100){\makebox(35,25){\tt 65}}
\put(175,75){\makebox(35,25){\tt 66}}
\put(210,125){\makebox(35,25){\tt 67}}
\put(210,100){\makebox(35,25){\tt 68}}
\put(210,75){\makebox(35,25){\tt 69}}
\put(245,125){\makebox(35,25){\tt 70}}
\put(245,100){\makebox(35,25){\tt 71}}
\put(245,75){\makebox(35,25){\tt 72}}
\put(280,125){\makebox(35,25){\tt 73}}
\put(280,100){\makebox(35,25){\tt 74}}
\put(280,75){\makebox(35,25){\tt 75}}
\put(315,125){\makebox(35,25){\tt 76}}
\put(315,100){\makebox(35,25){\tt 77}}
\put(315,75){\makebox(35,25){\tt 78}}
\put(350,125){\makebox(35,25){\tt 79}}
\put(350,100){\makebox(35,25){\tt 80}}
\put(350,75){\makebox(35,25){\tt 81}}
\put(385,125){\makebox(35,25){\tt 82}}
\put(385,100){\makebox(35,25){\tt 83}}
\put(385,75){\makebox(35,25){\tt 84}}
\put(420,125){\makebox(35,25){\tt 85}}
\put(420,100){\makebox(35,25){\tt 86}}
\put(420,75){\makebox(35,25){\tt 87}}
\put(455,125){\makebox(35,25){\tt 88}}
\put(455,100){\makebox(35,25){\tt 89}}
\put(455,75){\makebox(35,25){\tt 90}}
\put(490,125){\makebox(35,25){\tt 91}}
\put(490,100){\makebox(35,25){\tt 92}}
\put(490,75){\makebox(35,25){\tt 93}}
\put(525,125){\makebox(35,25){\tt 94}}
\put(525,100){\makebox(35,25){\tt 95}}
\put(525,75){\makebox(35,25){\tt 96}}
\put(0,50){\makebox(35,25){\tt 97}}
\put(0,25){\makebox(35,25){\tt 98}}
\put(0,0){\makebox(35,25){\tt 99}}
\put(35,50){\makebox(35,25){\tt 100}}
\end{picture}
\end{center}



Note that it is perfectly permissible for an array to be distributed so
that some processors have no elements.  Indeed, an array may be ``distributed''
so that all elements reside on one processor.  For example,
                                                                        \CODE
!HPF$ DISTRIBUTE CENTURY(BLOCK(256)) ONTO SEDECIM
                                                                        \EDOC
results in having only one non-empty block---a partially-filled one at that,
having only 100 elements---on processor 1, with processors 2 through 16
having no elements of the array.



A {\tt DISTRIBUTE} directive must not cause
any data object associated with the {\it distributee} via storage association
({\tt COMMON} or {\tt EQUIVALENCE}) to be mapped such that storage
units of a scalar data object are split across more than one abstract
processor.
See section \ref{sequence} for further discussion of storage association.


The statement form of a {\tt DISTRIBUTE}
directive
may be considered an abbreviation for an attributed form that
happens to mention only one {\it distributee}; for example,
                                                                        \ICODE
!HPF$ DISTRIBUTE distributee ( dist-format-list ) ONTO dist-target
                                                                        \EDOC
is equivalent to
                                                                        \ICODE
!HPF$ DISTRIBUTE ( dist-format-list ) ONTO dist-target :: distributee
                                                                        \EDOC
Note that, to prevent syntactic ambiguity,
the {\it dist-format-clause} must be present in the statement
form, so in general the statement form of the directive may not be
used to specify the mapping of scalars.

If the {\it dist-format-clause} is omitted from the attributed form,
then the language processor may make an arbitrary choice of distribution
formats for each template or array.  So the directive
                                                                        \CODE
!HPF$ DISTRIBUTE ONTO P :: D1,D2,D3
                                                                        \EDOC
means the same as
                                                                        \CODE
!HPF$ DISTRIBUTE ONTO P :: D1
!HPF$ DISTRIBUTE ONTO P :: D2
!HPF$ DISTRIBUTE ONTO P :: D3
                                                                        \EDOC
to which a compiler, perhaps taking into account patterns of use of
{\tt D1}, {\tt D2}, and {\tt D3} within the code, might choose
to supply three distinct distributions such as, for example,
                                                                        \CODE
!HPF$ DISTRIBUTE D1(BLOCK, BLOCK) ONTO P
!HPF$ DISTRIBUTE D2(CYCLIC, BLOCK) ONTO P
!HPF$ DISTRIBUTE D3(BLOCK(43),CYCLIC) ONTO P
                                                                        \EDOC
Then again, the compiler might happen to choose the same
distribution for all three arrays.

In either the statement form or the attributed form,
if the {\tt ONTO} clause is present, it specifies the processor arrangement
that is the target of the distribution.
If the {\tt ONTO}  clause is omitted, then a implementation-dependent
processor arrangement is chosen arbitrarily for each {\it distributee}.
So, for example,
                                                                        \CODE
      REAL, DIMENSION(1000) :: ARTHUR, ARNOLD, LINUS, LUCY
!HPF$ PROCESSORS EXCALIBUR(32)
!HPF$ DISTRIBUTE (BLOCK) ONTO EXCALIBUR :: ARTHUR, ARNOLD
!HPF$ DISTRIBUTE (BLOCK) :: LINUS, LUCY
                                                                        \EDOC
causes the arrays {\tt ARTHUR} and {\tt ARNOLD} to have the same mapping,
so that corresponding elements reside in the same abstract processor,
because they are the same size and distributed in the same way ({\tt BLOCK})
onto the same processor arrangement ({\tt EXCALIBUR}).
However, {\tt LUCY} and {\tt LINUS} do not necessarily have the same
mapping because they might, depending on the implementation,
be distributed onto differently chosen processor arrangements;
so corresponding elements of {\tt LUCY} and {\tt LINUS} might not
reside on the same abstract processor.  (The {\tt ALIGN} directive provides
a way to ensure that two arrays have the same mapping without having
to specify an explicit processor arrangement.)

In a given environment, for some distributions, there may be no
appropriate processor arrangement.

\section{ALIGN Directive}
\label{ALIGN-SECTION}

The {\tt ALIGN} directive is used to specify that certain data objects
are to be mapped in the same way as certain other data objects.
Operations between aligned data objects are likely to be more efficient
than operations between data objects that are not known to be aligned
(because two objects that are aligned are intended to be mapped to the
same abstract processor).  The {\tt ALIGN} directive is designed to
make it particularly easy to specify explicit mappings for all the
elements of an array at once.  While objects can be aligned in some
cases through careful use of matching {\tt DISTRIBUTE} directives, {\tt
ALIGN} is more general and frequently more convenient.


The {\tt ALIGN} directive may appear only in the {\it
specification-part} of a scoping unit
and can contain only a {\it specification-expr} as a
{\it subscript} or in a {\it subscript-triplet}.

Formally, the syntax of {\tt ALIGN}
is as follows:

                                                                        \BNF
align-directive            \IS  ALIGN alignee align-directive-stuff

align-directive-stuff      \IS  ( align-source-list ) align-with-clause

align-attribute-stuff      \IS  [ ( align-source-list ) ] align-with-clause

alignee                    \IS  object-name

align-source               \IS  :
                           \OR  *
                           \OR  align-dummy

align-dummy                \IS  scalar-int-variable
                                                                        \FNB
\begin{constraints}
%Do not change order of constraints without changing reference to them below.

\item An {\it object-name} mentioned as an {\it alignee}
must be a simple name and not a subobject designator or a {\it component-name}.

\item An {\it object-name} mentioned as an {\it alignee} may not
appear as a {\it distributee}.

\item An {\it object-name} mentioned as an {\it alignee} may not
have the {\tt POINTER} attribute.

\item An {\it object-name} mentioned as an {\it alignee} may not
have the {\tt TARGET} attribute.

\item  If the {\it alignee} is scalar,
the {\it align-source-list} (and its surrounding parentheses)
must not appear.  In this case
the statement form of the directive is not allowed.



\item If the {\it align-source-list} is present, its length must equal the
rank of the alignee.

\item An {\it align-dummy} must be a named variable.

\item An object may not have both the {\tt INHERIT} attribute and
the {\tt ALIGN} attribute.
\end{constraints}

\begin{users}
Some of the above constraints are relaxed under the approved extensions
(see Section~\ref{ch-mapping-ext}):
mapping of derived type components (relaxes constraint 1) and
mapping of pointers (relaxes constraints 3).
\end{users}

Note that the possibility of an {\tt ALIGN} directive of the form
                                                                        \ICODE
!HPF$ ALIGN align-attribute-stuff :: alignee-list
                                                                        \EDOC
is covered by syntax rule \ref{combined-directive-rule} for a
{\it combined-directive}.

The statement form of an {\tt ALIGN}
directive
may be considered an abbreviation of an attributed form that
happens to mention only one {\it alignee}:
                                                                        \ICODE
!HPF$ ALIGN alignee ( align-source-list ) WITH align-spec
                                                                        \EDOC
is equivalent to
                                                                        \ICODE
!HPF$ ALIGN ( align-source-list ) WITH align-spec :: alignee
                                                                        \EDOC

If the {\it align-source-list} is omitted from the attributed form and
the {\it  alignee}s are not scalar, the {\it align-source-list} is
assumed to consist of a parenthesized list of ``{\tt :}'' entries,
equal in number to the rank of the {\it alignees}.  Similarly, if the
{\it align-subscript-list} is omitted from the {\it align-spec} in
either form, it is assumed to consist of a parenthesized list of ``{\tt
:}'' entries, equal in number to the rank of the {\it align-target}.
So the directive

                                                                        \CODE
!HPF$ ALIGN WITH B :: A1, A2, A3
                                                                        \EDOC
means
                                                                        \CODE
!HPF$ ALIGN (:,:) WITH B(:,:) :: A1, A2, A3
                                                                        \EDOC
which in turn means the same as
                                                                        \CODE
!HPF$ ALIGN A1(:,:) WITH B(:,:)
!HPF$ ALIGN A2(:,:) WITH B(:,:)
!HPF$ ALIGN A3(:,:) WITH B(:,:)
                                                                        \EDOC

\noindent
because an attributed-form directive that mentions more than one {\it alignee}
is equivalent to a series of identical directives, one for each {\it alignee};
all {\it  alignee}s must have the
same rank.  With this understanding, we will assume below, for the
sake of simplifying the description, that an {\tt ALIGN}
directive has a single {\it alignee}.

Each {\it align-source} corresponds to one axis of the {\it alignee}, and is
specified as either ``{\tt :}'' or ``{\tt *}'' or a dummy variable:

\begin{itemize}

\item
If it is ``{\tt :}'', then positions along that axis will be spread out
across the matching axis of the {\it align-spec} (see below).

\item
If it is ``{\tt *}'', then that axis is {\it collapsed}: positions along
that axis make no difference in determining the corresponding
position within the {\it align-target}.  (Replacing the ``{\tt *}'' with
a dummy variable name not used anywhere else in the directive would
have the same effect; ``{\tt *}'' is merely a convenience that saves the
trouble of inventing a variable name and makes it clear that no
dependence on that dimension is intended.)

\item
A dummy variable is considered to range over all valid
index values for that dimension of the {\it alignee}.

\end{itemize}

The {\tt WITH} clause of an {\tt ALIGN} has the following syntax:
                                                                        \BNF
align-with-clause          \IS  WITH align-spec

align-spec                 \IS  align-target [ ( align-subscript-list ) ]
                           \OR  * align-target [ ( align-subscript-list ) ]

align-target               \IS  object-name
                           \OR  template-name

align-subscript            \IS  int-expr
                           \OR  align-subscript-use
                           \OR  subscript-triplet
                           \OR  *

align-subscript-use    \IS  [ [ int-level-two-expr ] add-op ] 
align-add-operand
                           \OR  align-subscript-use add-op int-add-operand

align-add-operand          \IS  [ int-add-operand * ] align-primary
                           \OR  align-add-operand * int-mult-operand

align-primary              \IS  align-dummy
                           \OR  ( align-subscript-use )

int-add-operand            \IS  add-operand

int-mult-operand           \IS  mult-operand

int-level-two-expr           \IS  level-2-expr
                                                                        \FNB

The full syntax is given here for completeness, however some of the
forms will only be discussed in Section~\ref{ch-mapping-subr}.  These
``interprocedural'' forms are:
\begin{itemize}
\item The second option of  rule~\ref{align-spec-rule} (containing
the {\tt *} form)
\end{itemize}

\begin{constraints}
%Do not change order of constraints without changing reference to them below.

\item An {\it object-name} mentioned as an {\it align-target}
must be a simple name and not a subobject designator or a {\it component-name}.

\item An {\it align-target} may not have the {\tt OPTIONAL} attribute.

\item If the {\it align-spec} in an {\tt ALIGN} directive begins with
``{\tt *}'' then every {\it alignee} must be a dummy argument.

\item In an {\em align-directive} any {\em int-expr}, {\em int-level-two-expr},
{\em int-add-operand} or {\em int-mult-operand} must be a specification
expression.

\item Any {\em subscript} or {\em stride} in a {\em subscript-triplet} that is
an {\em align-subscript} in an {\em align-directive} must be a specification
expression.


\item  Each {\it align-dummy} may appear at most once in an
{\it align-subscript-list}.

\item  An {\it align-subscript-use} expression may contain at most one
occurrence of an {\it align-dummy}.

\item  An {\it align-dummy} may not appear anywhere in the {\it
align-spec} except where explicitly permitted to appear by virtue of
the grammar shown above.  Paraphrased, one may construct an {\it
align-subscript-use} by starting with an {\it align-dummy} and then
doing additive and multiplicative things to it with any integer
expressions that contain no {\it align-dummy}.

\item A {\it subscript} in an {\it align-subscript} may not contain
occurrences of any {\it align-dummy}.

\item An {\it int-add-operand}, {\it int-mult-operand}, or {\it
int-level-two-expr} must be of type integer.

\end{constraints}

\begin{users}
Some of the above constraints are relaxed under the approved extensions
(see Section~\ref{ch-mapping-ext}):
mapping of derived type components (relaxes constraint 1),
mapping of pointers (relaxes constraint 3) and
remapping of data objects (relaxes constraints 4 and 5).
\end{users}

The syntax rules for an {\it align-subscript-use} take account of
operator precedence issues, but the basic idea is simple: an {\it
align-subscript-use} is intended to be a linear function of a single
occurrence of an {\it align-dummy}.

For example, the following {\it align-subscript-use} expressions
are valid, assuming that {\tt J}, {\tt K}, and {\tt M}
are {\it align-dummy}s and {\tt N} is not an {\it align-dummy}:

\begin{flushleft}\tt
\begin{tabular}{l@{\quad}l@{\quad}l@{\quad}l@{\quad}l@{\quad}l}
J   &  J+1      &  3-K    &  2*M    &  N*M        &  100-3*M    \\
-J  &  +J       &  -K+3   &  M+2**3 &  M+N    &  -(4*7+IOR(6,9))*K-(13-5/3) \\
M*2 &  N*(M-N)  & 2*(J+1) &  5-K+3  &  10000-M*3  &  2*(3*(K-1)+13)-100
\end{tabular}
\end{flushleft}

The following expressions are not valid {\it align-subscript-use} expressions:
\begin{flushleft}\tt
\begin{tabular}{l@{\quad}l@{\quad}l@{\quad}l@{\quad}l@{\quad}l}
J+J  &  J-J      &  3*K-2*K   &  M*(N-M)  &  2*J-3*J+J & 2*(3*(K-1)+13)-K \\
J*J  &  J+K      &  3/K       &  2**M     &  M*K       &  K-3*M    \\
K-J  &  IOR(J,1) &  -K/3      &  M*(2+M)  &  M*(M-N)   &  2**(2*J-3*J+J)
\end{tabular}
\end{flushleft}

The {\it align-spec} must contain exactly as many {\it subscript-triplets}
as the number of colons (``{\tt :}'') appearing in the {\it align-source-list}.
These are matched up in corresponding left-to-right order, ignoring,
for this purpose, any {\it align-source} that is not a colon and
any {\it align-subscript} that is not a {\it subscript-triplet}.
Consider a dimension of the {\it alignee} for which a colon appears as
an {\it align-source} and
let the lower and upper bounds of that array be {\it LA} and {\it UA}.
Let the corresponding subscript triplet be
{\it LT\/}:{\it UT\/}:{\it ST} or its equivalent.
Then the colon could be replaced by a new, as-yet-unused
dummy variable, say {\tt J}, and the subscript triplet by the expression
{\tt (J-{\it LA})*{\it ST}+{\it LT}}
without affecting the meaning of the directive.  Moreover, the axes
must conform, which means that
\[ \max(0,UA-LA+1) \ = \  \max(0,\lceil (UT-LT+1) / ST \rceil) \]
must be true. (This is entirely analogous to the treatment of array
assignment.)

To simplify the remainder of the discussion,
we assume that every colon in the {\it align-source-list}
has been replaced by new dummy variables
in exactly the fashion just described,
and that every ``{\tt *}'' in the {\it align-source-list}
has likewise been replaced by an otherwise unused dummy variable.
For example,
                                                                        \CODE
!HPF$ ALIGN A(:,*,K,:,:,*) WITH B(31:,:,K+3,20:100:3)
                                                                        \EDOC
may be transformed into its equivalent
                                                                        \CODE
!HPF$ ALIGN A(I,J,K,L,M,N) WITH B(I-LBOUND(A,1)+31,       &
!HPF$            L-LBOUND(A,4)+LBOUND(B,2),K+3,(M-LBOUND(A,5))*3+20)
                                                                        \EDOC
with the attached requirements
\begin{center}
{\tt SIZE(A,1)~.EQ.~UBOUND(B,1)-30}     \\
{\tt SIZE(A,4)~.EQ.~SIZE(B,2)}          \\
{\tt SIZE(A,5)~.EQ.~(100-20+3)/3}
\end{center}
Thus we need consider further only the case where every {\it align-source}
is a dummy variable and no {\it align-subscript} is a {\it subscript-triplet}.

Each dummy variable is considered to range over all valid
index values for the corresponding dimension of the {\it alignee}.
Every combination of possible values for the index variables selects
an element of the {\it alignee}.
The {\it align-spec} indicates a corresponding element (or section) of
the {\it align-target} with which that element of the {\it alignee} should be
aligned; this indication may be a function of the index values, but
the nature of this function is syntactically restricted (as discussed above)
to linear functions
in order to limit the complexity of the implementation.
Each {\it align-dummy} variable may appear at
most once in the {\it align-spec} and only in certain rigidly prescribed
contexts.  The result is that each
{\it align-subscript} expression may contain at most one {\it align-dummy}
variable and the expression is constrained to be a linear function of that
variable.  (Therefore skew alignments are not possible.)

An asterisk ``{\tt *}'' as an {\it align-subscript}
indicates a replicated representation.  Each element of the {\it alignee}
is aligned with every position along that axis of the {\it align-target}.

\begin{rationale}
It may seem strange to use ``{\tt *}'' to mean both collapsing and
replication; the rationale is that ``{\tt *}'' always stands conceptually
for a dummy variable that appears
nowhere else in the statement and ranges over the set of indices for
the indicated dimension. Thus, for example,
                                                                        \CODE
!HPF$ ALIGN A(:) WITH D(:,*)
                                                                        \EDOC
means that a copy of {\tt A} is aligned with every column of {\tt D}, because
it is conceptually equivalent to
                                                                        \ICODE
      for every legitimate index j, align A(:) with D(:,j)
                                                                        \EDOC
just as
                                                                        \CODE
!HPF$ ALIGN A(:,*) WITH D(:)
                                                                        \EDOC
is conceptually equivalent to
                                                                        \ICODE
      for every legitimate index j, align A(:,j) with D(:)
                                                                        \EDOC
Note, however, that while HPF syntax allows
                                                                        \CODE
!HPF$ ALIGN A(:,*) WITH D(:)
                                                                        \EDOC
to be written in the alternate form
                                                                        \CODE
!HPF$ ALIGN A(:,J) WITH D(:)
                                                                        \EDOC
it does {\it not} allow
                                                                        \CODE
!HPF$ ALIGN A(:) WITH D(:,*)
                                                                        \EDOC
to be written in the alternate form
                                                                        \CODE
!HPF$ ALIGN A(:) WITH D(:,J)
                                                                        \EDOC
because that has another meaning (only a variable appearing
in the {\it align-source-list}
following the {\it alignee} is understood to be an {\it align-dummy},
so the current value of the variable {\tt J} is used, thus aligning
{\tt A} with a single column of {\tt D}).

Replication allows an optimizing compiler to arrange to read
whichever copy is closest.  (Of course, when a replicated data object
is written, all copies must be updated, not just one copy.  Replicated
representations are very useful for use as small lookup
tables, where it is much faster to have a copy in each physical
processor but without giving it an extra
dimension that is logically unnecessary to the algorithm.
\end{rationale}

By applying the transformations given above, all cases of an {\it
align-subscript} may be conceptually reduced to either an {\it
int-expr} (not involving an {\it align-dummy}) or an {\it
align-subscript-use} and the {\it align-source-list} may be reduced to
a list of index variables with no ``{\tt *}'' or ``{\tt:}''.  An {\it
align-subscript-list} may then be evaluated for any specific
combination of values for the {\it align-dummy} variables simply by
evaluating each {\it align-subscript} as an expression.  The resulting
subscript values must be legitimate subscripts for the {\it
align-target}.  (This implies that the {\it alignee} is not allowed to
``wrap around'' or ``extend past the edges'' of an {\it align-target}.)
The selected element of the {\it alignee} is then considered to be
aligned with the indicated element of the {\it align-target};  more
precisely, the selected element of the {\it alignee} is considered to
be ultimately aligned with the same object with which the indicated
element of the {\it align-target} is currently ultimately aligned
(possibly itself).

More examples of {\tt ALIGN} directives:
                                                                        \CODE
      INTEGER D1(N)
      LOGICAL D2(N,N)
      REAL, DIMENSION(N,N):: X,A,B,C,AR1,AR2A,P,Q,R,S
!HPF$ ALIGN X(:,*) WITH D1(:)
!HPF$ ALIGN (:,*) WITH D1:: A,B,C,AR1,AR2A
!HPF$ ALIGN WITH D2:: P,Q,R,S
                                                                        \EDOC
Note that, in a {\it alignee-list}, the alignees must all have the same
rank but need not all have the same shape; the extents need match only
for dimensions that correspond to colons in the {\it align-source-list}.
This turns out to be an extremely important convenience;
one of the most common cases in current practice is aligning arrays that
match in distributed (``parallel'') dimensions but may differ in
collapsed (``on-processor'') dimensions:
                                                                        \CODE

      REAL A(3,N), B(4,N), C(43,N), Q(N)
!HPF$ DISTRIBUTE Q(BLOCK)
!HPF$ ALIGN (*,:) WITH Q:: A,B,C
                                                                        \EDOC
Here there are processors (perhaps {\tt N} of
them) and arrays of different sizes (3, 4, 43) within each
processor are required.  As far as HPF is concerned,
 the numbers 3, 4, and 43 may be different,
because those axes will be collapsed.  Thus array elements with
indices differing only along that axis will all be aligned with the
same element of {\tt Q} (and thus be specified as residing in the same
processor).

In the following examples, each directive in the group means
the same thing, assuming that corresponding axis upper and lower bounds match:
                                                                        \CODE
!Second axis of X is collapsed
!HPF$ ALIGN X(:,*) WITH D1(:)
!HPF$ ALIGN X(J,*) WITH D1(J)
!HPF$ ALIGN X(J,K) WITH D1(J)

!Replicated representation along second axis of D3
!HPF$ ALIGN X(:,:) WITH D3(:,*,:)
!HPF$ ALIGN X(J,K) WITH D3(J,*,K)

!Transposing two axes
!HPF$ ALIGN X(J,K) WITH D2(K,J)
!HPF$ ALIGN X(J,:) WITH D2(:,J)
!HPF$ ALIGN X(:,K) WITH D2(K,:)
!But there isn't any way to get rid of *both* index variables;
! the subscript-triplet syntax alone cannot express transposition..

!Reversing both axes
!HPF$ ALIGN X(J,K) WITH D2(M-J+1,N-K+1)
!HPF$ ALIGN X(:,:) WITH D2(M:1:-1,N:1:-1)

!Simple case
!HPF$ ALIGN X(J,K) WITH D2(J,K)
!HPF$ ALIGN X(:,:) WITH D2(:,:)
!HPF$ ALIGN (J,K) WITH D2(J,K):: X
!HPF$ ALIGN (:,:) WITH D2(:,:):: X
!HPF$ ALIGN WITH D2:: X
                                                                        \EDOC


\section{Allocatable Arrays and Pointers}
\label{ALLOCATABLE-SECTION}

A variable with the {\tt ALLOCATABLE} attribute may appear
as an {\it alignee}
in an {\tt ALIGN} directive or as a {\it distributee} in a {\tt DISTRIBUTE}
directive.  Such directives do not take effect immediately, however; they
take effect each time the array is allocated by an {\tt ALLOCATE} statement,
rather than on entry to the scoping unit.
The values of all specification expressions in such a directive
are determined once on entry to the scoping unit
and may be used multiple times (or not at all).
For example:
                                                                        \CODE
      SUBROUTINE MILLARD_FILLMORE(N,M)
      REAL, ALLOCATABLE, DIMENSION(:) :: A, B
!HPF$ ALIGN B(I) WITH A(I+N)
!HPF$ DISTRIBUTE A(BLOCK(M*2))
      N = 43
      M = 91
      ALLOCATE(A(27))
      ALLOCATE(B(13))
      ...
                                                                        \EDOC
The values of the expressions {\tt N} and {\tt M*2}
on entry to the subprogram are conceptually retained by the
{\tt ALIGN} and {\tt DISTRIBUTE} directives for later use at allocation time.
When the array {\tt A} is allocated, it is distributed with a block size
equal to the retained value of {\tt M*2}, not the value 182.
When the array {\tt B} is allocated, it is aligned relative to {\tt A}
according to the retained value of {\tt N}, not its new value 43.

Note that it would have been incorrect in the {\tt MILLARD_FILLMORE} example
to perform the two {\tt ALLOCATE} statements in the opposite order.
In general, when an object {\tt X} is created it may be aligned to another
object {\tt Y} only if {\tt Y} has already been created or allocated.
The following example illustrates several related cases.
                                                                        \CODE
      SUBROUTINE WARREN_HARDING(P,Q)
      REAL P(:)
      REAL Q(:)
      REAL R(SIZE(Q))
      REAL, ALLOCATABLE :: S(:),T(:)
!HPF$ ALIGN P(I) WITH T(I)                        !Nonconforming
!HPF$ ALIGN Q(I) WITH *T(I)                       !Nonconforming
!HPF$ ALIGN R(I) WITH T(I)                        !Nonconforming
!HPF$ ALIGN S(I) WITH T(I)
      ALLOCATE(S(SIZE(Q)))                        !Nonconforming
      ALLOCATE(T(SIZE(Q)))
                                                                        \EDOC
The {\tt ALIGN} directives are not HPF-conforming because the array {\tt T}
has not yet been allocated at the time that the various alignments
must take place.  The four cases differ slightly in their details.
The arrays {\tt P} and {\tt Q} already exist on entry to the
subroutine, but because {\tt T} is not yet allocated, one cannot
correctly prescribe the alignment of {\tt P} or describe the
alignment of {\tt Q} relative to {\tt T}.  (See Section~\ref{ch-mapping-subr}
for a discussion of prescriptive and descriptive directives.)
The array {\tt R} is created on subroutine entry and its size
can correctly depend on the {\tt SIZE} of {\tt Q}, but the alignment
of {\tt R} cannot be specified in terms of the alignment of {\tt T}
any more than its size can be specified in terms of the size of {\tt T}.
It {\it is} permitted to have an alignment directive for {\tt S}
in terms of {\tt T}, because the alignment action does not take place
until {\tt S} is allocated; however, the first {\tt ALLOCATE}
statement is nonconforming because {\tt S} needs to be aligned
but at that point in time {\tt T} is still unallocated.


When an array is allocated, it will be aligned to an existing template
if there is an explicit {\tt ALIGN} directive for the allocatable
variable.  If there is no explicit {\tt ALIGN} directive, then the array
will be ultimately aligned with itself.  It is forbidden for any
other object to be ultimately aligned to an array at the time the
array becomes undefined by reason of deallocation.  All this applies
regardless of whether the name originally used in the {\tt ALLOCATE}
statement when the array was created had the {\tt ALLOCATABLE} attribute or
the {\tt POINTER} attribute.

Pointers cannot be explicitly mapped in HPF and thus
can only be associated with objects which are not explicitly mapped.
When used for allocation, the compiler may
choose any arbitrary mapping for data allocated through the pointer.
Explicit mapping of pointers is allowed under the approved extensions - see
section~\ref{POINTERS-SECTION} for details.
Also, the relationship of pointers and sequence attributes is
described in section~\ref{sequence}.


\section{PROCESSORS Directive}
\label{PROCESSORS-SECTION}

The {\tt PROCESSORS} directive declares one or more rectilinear
processor arrangements, specifying for each one its name, its rank
(number of dimensions), and the extent in each dimension.  It may
appear only in the {\it specification-part} of a scoping unit.  Every
dimension of a processor arrangement must have nonzero extent;
therefore a processor arrangement cannot be empty.

In the language of section 14.1.2 of the Fortran standard, processor
arrangements are local entities of class (1); therefore a processor
arrangement may not have the same name as a variable, named constant,
internal procedure, etc., in the same scoping unit.  Names of processor
arrangements obey the same rules for host and use association as other
names in the long list in section 12.1.2.2.1 of the Fortran
standard.


A processor arrangement declared in a module has the default accessibility
of the module.

\begin{rationale}
Because the name of a processor arrangement is not a first-class entity
in HPF, but must appear only in directives, it cannot appear in an
{\it access-stmt} ({\tt PRIVATE} or {\tt PUBLIC}).
If directives ever become full-fledged Fortran statements rather
than structured comments, then it would be appropriate to allow
the accessibility of a processor arrangement to be controlled
by listing its name in an {\it access-stmt}.
\end{rationale}


If two processor arrangements have the same shape, then corresponding
elements of the two arrangements are understood to refer to the same
abstract processor.  (It is anticipated that
implementation-dependent directives provided by some HPF
implementations could overrule the default correspondence of processor
arrangements that have the same shape.)

If directives collectively specify that two objects be mapped to the
same abstract processor at a given instant during the program
execution, the intent is that the two objects be mapped to the same
physical processor at that instant.

The intrinsic functions {\tt NUMBER_OF_PROCESSORS} and {\tt
PROCESSORS_SHAPE} may be used to inquire about the total number of
actual physical processors used to execute the program.  This
information may then be used to calculate appropriate sizes for the
declared abstract processor arrangements.

                                                                        \BNF
processors-directive       \IS  PROCESSORS processors-decl-list

processors-decl            \IS  processors-name [ (
explicit-shape-spec-list ) ]
                                                                        \FNB


Examples:

                                                                        \CODE
!HPF$ PROCESSORS P(N)
!HPF$ PROCESSORS Q(NUMBER_OF_PROCESSORS()),       &
!HPF$            R(8,NUMBER_OF_PROCESSORS()/8)
!HPF$ PROCESSORS BIZARRO(1972:1997,-20:17)
!HPF$ PROCESSORS SCALARPROC
                                                                        \EDOC

\noindent
If no shape is specified, then the declared processor arrangement is
conceptually scalar.

\begin{rationale}
A scalar processor arrangement may be useful as a way of
indicating that certain scalar data should be kept together but need
not interact strongly with distributed data.  Depending on the
implementation architecture, data distributed onto such a processor
arrangement may reside in a single ``control'' or ``host'' processor
(if the machine has one), or may reside in an arbitrarily chosen
processor, or may be replicated over all processors.  For target
architectures that have a set of computational processors and a
separate scalar host computer, a natural implementation is to map
every scalar processor arrangement onto the host processor.  For
target architectures that have a set of computational processors but
no separate scalar ``host'' computer, data mapped to a scalar
processor arrangement might be mapped to some arbitrarily chosen
computational processor or replicated onto all computational
processors.
\end{rationale}

An HPF compiler is required to accept any
{\tt PROCESSORS} declaration in which the product of the extents of each
declared processor arrangement is equal to the number of physical
processors that would be returned by the call {\tt NUMBER_OF_PROCESSORS()}.
It must also accept all declarations of scalar {\tt PROCESSOR}
arrangements.  Other cases may be handled as well, depending on the
implementation.

For compatibility with the Fortran attribute syntax, an optional
``{\tt ::}'' may be inserted.  The shape may also be specified with the
{\tt DIMENSION} attribute:
                                                                        \CODE
!HPF$ PROCESSORS :: RUBIK(3,3,3)
!HPF$ PROCESSORS, DIMENSION(3,3,3) :: RUBIK
                                                                        \EDOC
As in Fortran, an {\it explicit-shape-spec-list} in a {\it processors-decl}
will override an explicit {\tt DIMENSION} attribute:
                                                                        \CODE
!HPF$ PROCESSORS, DIMENSION(3,3,3) ::      &
!HPF$            RUBIK, RUBIKS_REVENGE(4,4,4), SOMA
                                                                        \EDOC
Here {\tt RUBIKS_REVENGE} is \( 4 \times 4 \times 4 \) while
{\tt RUBIK} and {\tt SOMA} are each \( 3 \times 3 \times 3 \).
(By the rules enunciated above, however, such a statement
may not be completely portable because no HPF language processor is
required to handle shapes of total sizes 27 and 64 simultaneously.)

Returning from a subprogram causes all processor arrangements
declared local to that subprogram to become undefined.  It is
not HPF-conforming for any array or template to be distributed onto a processor
arrangement at the time the processor arrangement becomes undefined
unless at least one of two conditions holds:

\begin{itemize}

\item The array or template itself becomes undefined at the same time
by virtue of returning from the subprogram.

\item Whenever the subprogram is called, the processor arrangement
is always locally defined in the same way, with identical lower bounds,
and identical upper bounds.

\begin{rationale}
Note that the second condition is slightly less stringent than requiring
all expressions to be constant.  This allows calls to
{\tt NUMBER_OF_PROCESSORS} or {\tt PROCESSORS_SHAPE} to appear
without violating the condition.
\end{rationale}

\end{itemize}

Variables in {\tt COMMON} or having the {\tt SAVE} attribute may be
mapped to a locally declared processor arrangement, but because the
first condition cannot hold for such variables (they don't become
undefined), the second condition must be observed.
This allows {\tt COMMON} variables to work properly through the customary
strategy of putting identical declarations in each scoping unit that
needs to use them, while allowing the processor arrangements to which
they may be mapped to depend on the value returned
by {\tt NUMBER_OF_PROCESSORS}. (See section~\ref{sequence} for
further information on mapping common variables.)

\begin{implementors}
It may be desirable to have a way for the user to specify at compile
time the number of physical processors on which the program is to be
executed.  This might be specified either by an implementation-dependent
directive, for example, or through the programming environment (for
example, as a UNIX command-line argument).  Such facilities are
beyond the scope of the HPF specification, but as food for thought we
offer the following illustrative hypothetical examples:
                                                                        \CODE
!Declaration for multiprocessor by ABC Corporation
!ABC$ PHYSICAL PROCESSORS(8)
!Declaration for mpp by XYZ Incorporated
!XYZ$ PHYSICAL PROCESSORS(65536)
!Declaration for hypercube machine by PDQ Limited
!PDQ$ PHYSICAL PROCESSORS(2,2,2,2,2,2,2,2,2,2)
!Declaration for two-dimensional grid machine by TLA GmbH
!TLA$ PHYSICAL PROCESSORS(128,64)
!One of the preceding might affect the following
!HPF$ PROCESSORS P(NUMBER_OF_PROCESSORS())
                                                                        \EDOC

It may furthermore be desirable to have a way for the user to specify
the precise mapping of the processor arrangement declared in a
{\tt PROCESSORS} statement to the physical processors of the executing
hardware.  Again, this might be specified either by a
implementation-dependent directive or through the programming
environment (for example, as a UNIX command-line argument);
such facilities are beyond the scope of the HPF specification, but as
food for thought we offer the following illustrative hypothetical
example:
                                                                        \CODE
!PDQ$ PHYSICAL PROCESSORS(2,2,2,2,2,2,2,2,2,2,2,2,2)
!HPF$ PROCESSORS G(8,64,16)
!PDQ$ MACHINE LAYOUT G(:GRAY(0:2),:GRAY(6:11),:BINARY(3:5,12))
                                                                        \EDOC
This might specify that the first dimension of {\tt G} should use hypercube
axes 0, 1, 2 with a Gray-code ordering; the second dimension should
use hypercube axes 6 through 11 with a Gray-code ordering; and the
third dimension should use hypercube axes 3, 4, 5, and 12 with a
binary ordering.
\end{implementors}




\section{TEMPLATE Directive}
\label{TEMPLATE-SECTION}

The {\tt TEMPLATE} directive declares one or more templates, specifying for
each the name, the rank (number of dimensions), and the extent in each
dimension.  It must appear in the {\it specification-part} of a scoping unit.

In the language of section 14.1.2 of the Fortran standard,
templates are local entities of class (1); therefore a template may
not have the same name as a variable, named constant, internal
procedure, etc., in the same scoping unit.  Template names obey the
rules for host and use association as other names in the
list in section 12.1.2.2.1 of the Fortran standard.


A template declared in a module has the default accessibility
of the module.

\begin{rationale}
Because the name of a template is not a first-class entity
in HPF, but must appear only in directives, it cannot appear in an
{\it access-stmt} ({\tt PRIVATE} or {\tt PUBLIC}).
If directives ever become full-fledged Fortran statements rather
than structured comments, then it would be appropriate to allow
the accessibility of a template to be controlled
by listing its name in an {\it access-stmt}.
\end{rationale}



A template is simply an abstract space of indexed positions; it can
be considered as an ``array of nothings'' (as compared to an ``array of
integers,'' say).
%%%If an array is a cat, then a template is a Cheshire
%%%cat, and the index space is the grin.
A template may be used as an abstract {\it align-target} that may
then be distributed.
                                                                        \BNF
template-directive             \IS  TEMPLATE template-decl-list

template-decl              \IS  template-name [ ( explicit-shape-spec-list ) ]

                                                                        \FNB

Examples:
                                                                        \CODE
!HPF$ TEMPLATE A(N)
!HPF$ TEMPLATE B(N,N), C(N,2*N)
!HPF$ TEMPLATE DOPEY(100,100),SNEEZY(24),GRUMPY(17,3,5)
                                                                        \EDOC

If the ``{\tt ::}'' syntax is used, then the declared templates may optionally
be distributed in the same {\it combined-directive}.  In this case
all templates declared by the
directive must have the same rank so that the {\tt DISTRIBUTE} attribute
will be meaningful.  The {\tt DIMENSION} attribute may also be used.
                                                                        \CODE
!HPF$ TEMPLATE, DISTRIBUTE(BLOCK,*) ::    &
!HPF$                              WHINEY(64,64),MOPEY(128,128)
!HPF$ TEMPLATE, DIMENSION(91,91) :: BORED,WHEEZY,PERKY
                                                                        \EDOC

Templates are useful in the particular situation where one must align
several arrays relative to one another but there is no need to
declare a single array that spans the entire index space of interest.
For example, one might want four \( N \times N \) arrays aligned to the four
corners of a template of size \( (N+1) \times (N+1) \):
                                                                        \CODE
!HPF$ TEMPLATE, DISTRIBUTE(BLOCK, BLOCK) :: EARTH(N+1,N+1)
      REAL, DIMENSION(N,N) :: NW, NE, SW, SE
!HPF$ ALIGN NW(I,J) WITH EARTH( I , J )
!HPF$ ALIGN NE(I,J) WITH EARTH( I ,J+1)
!HPF$ ALIGN SW(I,J) WITH EARTH(I+1, J )
!HPF$ ALIGN SE(I,J) WITH EARTH(I+1,J+1)
                                                                        \EDOC
Templates may also be useful in making assertions about the mapping of
dummy arguments (see Section~\ref{ch-mapping-subr}).

Unlike arrays, templates cannot be in {\tt COMMON}.  So two templates
declared in different scoping units will always be distinct, even if
they are given the same name.  The only way for two program units to
refer to the same template is to declare the template in a module that
is then used by the two program units.

Templates are not passed through the subprogram argument interface.
The template to which a dummy argument is aligned is always distinct
from the template to which the actual argument is aligned, though it
may be a copy (see section~\ref{INHERIT-SECTION}).  On exit from a
subprogram, an HPF implementation arranges that the actual argument is
aligned with the same template with which it was aligned before the
call.

Returning from a subprogram causes all templates declared local to that
subprogram to become undefined.  It is not HPF-conforming for any
variable to be aligned to a template at the time the template becomes
undefined unless at least one of two conditions holds:

\begin{itemize}

\item The variable itself becomes undefined at the same time by virtue
of returning from the subprogram.

\item Whenever the subprogram is called, the template is always locally
defined in the same way, with identical lower bounds, identical upper
bounds, and identical distribution information (if any) onto
identically defined processor arrangements (see
section~\ref{PROCESSORS-SECTION}).

\begin{rationale}
Note that this second condition is slightly less stringent than
requiring all expressions to be constant.  This allows calls to {\tt
NUMBER_OF_PROCESSORS} or {\tt PROCESSORS_SHAPE} to appear without
violating the condition.
\end{rationale}

\end{itemize}

\noindent
Variables in {\tt COMMON} or having the {\tt SAVE} attribute may be
mapped to a locally declared template, but because the first condition
cannot hold for such variable (they don't become undefined), the second
condition must be observed.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Material from the old chapter 7
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{Storage and Sequence Association}
\label{sequence}

HPF allows the mapping of
data objects across multiple processors in order to improve
parallel performance.  Fortran specifies
relationships between the storage for data objects associated
through {\tt COMMON} and {\tt EQUIVALENCE} statements, and the order of
array elements during association at procedure boundaries
between actual arguments and dummy arguments.
Otherwise, the location of data is not constrained by the
language.

{\tt COMMON} and {\tt EQUIVALENCE} statements constrain the alignment
of different data items based on the underlying model
of storage units and storage sequences:

\begin{quotation}
{\em Storage association is the association of two or more
data objects that occurs when two or more storage sequences
share or are aligned with one or more storage units.}
--- Fortran Standard (14.6.3.1)

\end{quotation}
The model of storage association is a single linearly
addressed memory, based on the traditional single address
space, single memory unit architecture. This model can cause
severe inefficiencies on architectures where storage for
data objects is mapped.

Sequence association refers to the order of array elements
that Fortran requires when an array expression or array
element is associated with a dummy array argument:

\begin{quotation}
{\em The rank and shape of the actual argument need not agree
with the rank and shape of the dummy argument, \ldots}

--- Fortran Standard (12.4.1.4)

\end{quotation}
As with storage association, sequence association
is a natural concept only in systems with
a linearly addressed memory.

As an aid to porting FORTRAN 77 codes, HPF allows codes that
rely on sequence and storage association to be valid in HPF.
Some modification to existing FORTRAN 77 codes may
nevertheless be necessary.
This section explains the relationship between HPF data
mapping and sequence and storage association.

\subsection{Storage Association}

\subsubsection{Definitions}

\label{sequence-defs}

\begin{enumerate}

\item {\tt COMMON} blocks are either {\it sequential}
or {\it nonsequential}, as determined by either explicit
directive or compiler default.  A sequential {\tt COMMON} block
has a single common  block storage sequence (Fortran Standard 5.5.2.1).

\item An {\it aggregate variable group}
is a collection of variables whose individual storage
sequences are parts of a single storage sequence.

Variables associated by {\tt EQUIVALENCE} statements
or by a combination of {\tt EQUIVALENCE} and {\tt COMMON} statements
form an aggregate variable group. The variables of a
sequential {\tt COMMON} block form a single aggregate variable
group.


\item The {\it size} of an aggregate variable group is the
number of storage units in the group's storage sequence
(Fortran Standard 14.6.3.1).

\item  \label{seq-var} Data objects are either
{\it sequential} or {\it nonsequential}.
A data object is {\it sequential} if and only if any of the
following holds:


\begin{enumerate}

\item it appears in a sequential {\tt COMMON} block;

\item it is a member of an aggregate variable group;

\item it is an assumed-size array;

\item it is a structure component of a derived type with
the Fortran {\tt SEQUENCE} attribute; or

\item it is declared to be sequential in an HPF {\tt SEQUENCE}
directive.

\end{enumerate}

A sequential object can be storage associated or
sequence associated;  nonsequential objects cannot.

\item A {\tt COMMON} block contains a sequence of {\it components}.
Each component is either an aggregate
variable group, or a variable that is not a member of any
aggregate variable group.
Sequential {\tt COMMON} blocks contain a single component.
Nonsequential {\tt COMMON} blocks may
contain several components that  may be
sequential variables or aggregate variable groups or
may be nonsequential.




\end{enumerate}


\subsubsection{Examples of Definitions}

                                                                \CODE
      IMPLICIT REAL (A-Z)
      COMMON /FOO/ A(100), B(100), C(100), D(100), E(100)
      DIMENSION X(100), Y(150), Z(200)


!Example 1:
      EQUIVALENCE ( A(1), Z(1) )
!Four components: (A, B), C, D, E
!Sizes are: 200, 100, 100, 100

!Example 2:
      EQUIVALENCE ( A(51), X(1) ) ( B(100), Y(1) )
!Two components (A, B, C, D), E
!Sizes are: 400, 100

!Example 3:
!HPF$ SEQUENCE /FOO/
!The COMMON has one component, (A, B, C, D, E)
!Size is 500
                                                                \EDOC
\noindent
The {\tt COMMON} block {\tt /FOO/} is nonsequential
in Examples 1 and 2.
Aggregate variable groups are shown as components in
parentheses.

\subsection {Sequence Directive}

A {\tt SEQUENCE} directive is defined to allow a user to declare
explicitly that data objects or {\tt COMMON}  blocks are to be treated
by the compiler as sequential.  ({\tt COMMON} blocks are by default
nonsequential.  Data objects are
nonsequential unless Definition~\ref{seq-var} applies.)
Some implementations may supply an  optional compilation
environment where the {\tt SEQUENCE} directive is applied by
default.  For completeness in such an environment, HPF
defines a {\tt NO SEQUENCE} directive to allow a user to
establish that the usual
nonsequential default should apply to a scoping unit, or selected
data objects and {\tt COMMON} blocks within the scoping unit.

                                        \BNF
sequence-directive  \IS    SEQUENCE [ [ :: ] association-name-list ]
                    \OR NO SEQUENCE [ [ :: ] association-name-list ]

association-name        \IS object-name
                        \OR / [ common-block-name ]  /
                                        \FNB

\begin{constraints}

\item An object name or {\tt COMMON} block name may appear at most once
in a {\it sequence-directive} within any scoping unit.


\item Only one sequence directive with no {\it association-name-list}
is permitted in the same scoping unit.


\end{constraints}

A pointer declared with the {\tt SEQUENCE} attribute can be only
associated with sequential objects, and conversely.

\subsubsection {Storage Association Rules}

\begin{enumerate}

\item A {\it sequence-directive} with an empty {\it
association-name-list}  is treated as if it contained the name of all
implicitly mapped objects and {\tt COMMON} blocks in the scoping unit
which cannot otherwise be determined to be sequential or nonsequential
by their language context.

\item A sequential object may not be explicitly mapped.

\item No explicit mapping may be given for a component of a derived
type having the Fortran {\tt SEQUENCE} attribute.
Note
that this rule is applicable only under the approved extensions
since components of derived types cannot be explicitly mapped in HPF.


\item If a {\tt COMMON} block is nonsequential, then all of the
following must hold:

\begin{enumerate}

\item Every occurrence of the {\tt COMMON}  block has exactly the
same number of components with each corresponding component
having a storage sequence of exactly the same size;

\item If a component is a nonsequential variable in {\it any}
occurrence of the {\tt COMMON} block, then it must be
nonsequential with identical  type, shape, and mapping
attributes in {\it every}  occurrence of the {\tt COMMON} block; and

\item Every occurrence of the {\tt COMMON}  block must be
nonsequential.

\end{enumerate}

\end{enumerate}


\subsubsection{Storage Association Discussion}
\begin{users}
Under these rules, variables in a {\tt COMMON} block can be mapped
as long as the components of the {\tt COMMON} block are the same in
every scoping unit that declares the {\tt COMMON} block.

Correct Fortran programs will not necessarily
be correct without modification in HPF.
The use of
{\tt EQUIVALENCE} with {\tt COMMON} blocks can impact the
mappability of data
objects in subtle ways. To allow maximum
optimization for performance, the HPF default for data objects is to
consider them mappable.
In order to get correct separate compilation for
subprograms that use {\tt COMMON} blocks with different aggregate
variable groups in different scoping units, it will be
necessary to insert the HPF {\tt SEQUENCE} directive.

As a check-list
for a user to determine the status of a data object or {\tt COMMON} block,
the following questions can be applied, in order:


\begin{itemize}

\item Does the object appear in some explicit language context
which dictates that the object be
sequential (e.g. {\tt EQUIVALENCE}) or nonsequential?

\item If not, does the object appear in an explicit mapping
directive?

\item If not, does the object or {\tt COMMON} block name appear in the
list of names on a {\tt SEQUENCE} or {\tt NO SEQUENCE} directive?

\item If not, does the scoping unit contain a nameless {\tt SEQUENCE}
 or {\tt NO SEQUENCE}?

\item If not, is the compilation affected by some special
implementation-dependent environment which dictates that names
default to {\tt SEQUENCE}?

\item If not, then the compiler will  consider the object or {\tt COMMON}
block name non-sequential and is free to apply data mapping
optimizations disregarding Fortran sequence and storage association.

\end{itemize}

\end{users}

\begin{implementors}
In order to protect the user and to facilitate portability of
older codes, two implementation options are strongly
recommended. First, every implementation should supply some
mechanism to verify that the type and shape of every mappable
array and the sizes of aggregate variable groups in  {\tt COMMON} blocks
are the same in every scoping unit unless the {\tt COMMON}  blocks are
declared to be sequential. This same check should
also verify that identical mappings have been selected for
the variables in {\tt COMMON}  blocks.  Implementations without
interprocedural information can use a link-time check. The
second implementation option recommended is a
mechanism to declare that data objects and {\tt COMMON} blocks for
a given compilation should be considered sequential unless
declared otherwise. The purpose of this feature is to permit
compilation of large old libraries or  subprograms where
storage association is known to exist without requiring that
the code be modified to apply the HPF
{\tt SEQUENCE} directive to every {\tt COMMON} block.
\end{implementors}
---------------------------------------------------------------------------
To (un)subscribe to this list, send mail to hpff-doc-request@cs.rice.edu.
Leave the subject line blank, and in the body put the line
(un)subscribe <email-address>
---------------------------------------------------------------------------

From owner-hpff-doc  Wed Nov  6 22:13:42 1996
Received: (from daemon@localhost) by cs.rice.edu (8.7.1/8.7.1) id WAA04002 for hpff-doc-out; Wed, 6 Nov 1996 22:13:42 -0600 (CST)
Received: from [128.42.5.176] (pasyn-59.rice.edu [128.42.5.187]) by cs.rice.edu (8.7.1/8.7.1) with SMTP id WAA03986; Wed, 6 Nov 1996 22:13:33 -0600 (CST)
X-Sender: chk@titan.cs.rice.edu (Unverified)
Message-Id: <v01540b13aea6c1c36648@[128.42.5.176]>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Date: Wed, 6 Nov 1996 22:13:19 -0600
To: offner@hpc.pko.dec.com (Carl Offner)
From: chk@cs.rice.edu (Chuck Koelbel)
Subject: Re: hpff-doc: mapping chapters
Cc: hpff-doc@cs.rice.edu
Sender: owner-hpff-doc
Precedence: bulk

---------------------------------------------------------------------------
hpff-doc@cs.rice.edu is a mailing list for HPF 2.0 language specification
authors and editors.  Instructions for adding or deleting yourself from this
list appear at the bottom of this message.
---------------------------------------------------------------------------
>I am following this with new versions of the three mapping chapters:
>
>        mapping-base.tex
>        mapping-subr.tex
>        mapping-ext.tex
>
>...

Thanks, Carl.

About making new drafts available:
I will put all new chapters that I receive into
ftp://titan.cs.rice.edu/public/HPFF/work-in-progress/.  Everything will be
handled under RCS, so we can recover old files if my FTP connection fails.
I won't repeat the announcement to hpff-doc unlesss there are problems with
the new file.

Carl's chapters will take a little longer to bring up than most.   Reason:
I'm on the road, using a borrowed PowerBook with a possibly flaky modem.
This is also why you may see the response to this message kind of late...


************************************************************
Chuck Koelbel                             Research Scientist
Center for Research on Parallel Computation  Rice University
                 "History is made at night.
           Character is what you are in the dark."
************************************************************


---------------------------------------------------------------------------
To (un)subscribe to this list, send mail to hpff-doc-request@cs.rice.edu.
Leave the subject line blank, and in the body put the line
(un)subscribe <email-address>
---------------------------------------------------------------------------

From owner-hpff-doc  Fri Nov  8 17:52:12 1996
Received: (from daemon@localhost) by cs.rice.edu (8.7.1/8.7.1) id RAA24193 for hpff-doc-out; Fri, 8 Nov 1996 17:52:12 -0600 (CST)
Received: from timbuk.cray.com (root@timbuk.cray.com [128.162.19.7]) by cs.rice.edu (8.7.1/8.7.1) with ESMTP id RAA24184 for <hpff-doc@cs.rice.edu>; Fri, 8 Nov 1996 17:52:06 -0600 (CST)
Received: from ironwood.cray.com (root@ironwood-fddi.cray.com [128.162.21.36]) by timbuk.cray.com (8.7.5/CRI-gate-8-2.11) with SMTP id RAA00149 for <hpff-doc@cs.rice.edu>; Fri, 8 Nov 1996 17:51:58 -0600 (CST)
Received: from hickory101.cray.com (tam@hickory101 [128.162.143.1]) by ironwood.cray.com (8.6.12/CRI-ccm_serv-8-2.8) with ESMTP id RAA03694; Fri, 8 Nov 1996 17:51:53 -0600
From: Thomas MacDonald <tam@cray.com>
Received: by hickory101.cray.com (8.6.12/btd-b3)
          id RAA02394; Fri, 8 Nov 1996 17:51:51 -0600
Message-Id: <199611082351.RAA02394@hickory101.cray.com>
Subject: hpff-doc: HPF Comments
To: hpff-doc@cs.rice.edu
Date: Fri, 8 Nov 1996 17:51:51 -0600 (CST)
Cc: tam@cray.com (Thomas MacDonald)
X-Mailer: ELM [version 2.4 PL24-CRI-b]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-hpff-doc
Precedence: bulk

---------------------------------------------------------------------------
hpff-doc@cs.rice.edu is a mailing list for HPF 2.0 language specification
authors and editors.  Instructions for adding or deleting yourself from this
list appear at the bottom of this message.
---------------------------------------------------------------------------
I recently sent Email to Mary Zosel explaining that I'll be taking over
Andy Meltzer's HPF responsibilities here at Cray Research (an SGI
company).  Larry Meadows sent me the following comments and I agreed
to respond.  I hope they are not too late to be of use.

I'm a little new to this so please let me know when I'm missing the
point or heading down the wrong path.

I suppose I need to be on some Email reflectors, but I'm not sure
which ones.

I printed a copy of the HPF 2.0.delta, dated October 19.  My comments 
below are based on that version.

Tom MacDonald
tam@cray.comn


> >From owner-hpff-doc@cs.rice.edu  Tue Sep 17 05:49:06 1996
> From: <zongaro@vnet.ibm.com> (Henry Zongaro)
> Subject: hpff-doc: Comments on Portable/Efficient Constructs
> To: meltzer@cray.com, chk@cs.rice.edu, hpff-doc@cs.rice.edu
> Date: Tue, 17 Sep 1996 08:41:02 -0400 (EDT)
> X-Mailer: ELM [version 2.4 PL24alpha3]
> 
> ---------------------------------------------------------------------------
> hpff-doc@cs.rice.edu is a mailing list for HPF 2.0 language specification
> authors and editors.  Instructions for adding or deleting yourself from this
> list appear at the bottom of this message.
> ---------------------------------------------------------------------------
> Hello,
> 
>      I have the following comments on the "Portable and Efficient Constructs"
> chapter.  My page references are with respect to the August 17 draft.  My
> apologies if any of these have already been corrected:
> 
>  - In section 7.1.1, I think "(em i.e. *)" is a typo.

I couldn't find a section 7.1.1 anywhere.  I suspect thngs changed since
your comments were made.  It appears that your comments are now against
section 8.

Anyway, I couldn't find the type - so I hope it's already been fixed.


>  - In section 7.3, there are certain requirements being placed on HPF
>    compilers, whereas this chapter is supposed to describe the features which
>    a user should use.

Usually specifications are a treaty between the users and the implementors.
If the users follow the rules then the implementors will too (ideally).
So, specifications always place constraints on both sides (seems to me).
Section 8.3 is all about the RESIDENT directive.  This has been quite
useful here at Cray because it allows the users to increase the number
of local references.  


>                       In addition, I'm not sure I entirely agree with the
>   defaults being described.  In particular, I know of an implementation in
>   which the default processors arrangement selected will not always be the
>   same for identical shape distributees with identical mappings.  For example,
> 
>            program prog
>              integer a(100, 100)
>      !hpf$   processors p(number_of_processors()/2, 2)
>      !hpf$   distribute a(block, block)
>            end program prog
> 
>            subroutine sub
>              integer a(100, 100)
>      !hpf$   distribute a(block, block)
>            end subroutine sub
> 
>    In prog, the array will be distributed onto p, so on 16 processors this
>    would be an 8x2 processors arrangement, but in sub, a 4x4 arrangement will
>    be selected.

You state that "I'm not sure I entirely agree with the defaults being
described" but you do not say which default you want changed or what
you think the defaults should be.

The example you describe above is non-intuitive (IMHO).  I believe more
details are needed before we can come to a conclusion here.  The defaults
are based upon our best understanding, an existing implementation, and
a fair amount of testing.  It would be helpful to get some more details
here.  It isn't at all clear why, in the example above, different
processor arrangements are desired.  Nor is it clear what the performance
advantage is, or how much performance will suffer if the same processor
arrangement is used.


> - In section 7.4, the precise meaning of "Alignments may not contain offsets
>   or strides" needs to be spelled out.  Also, the way I read it, it means that
>   the following is not allowed

Sorry, I'm having trouble finding this reference.  Perhaps you could provide
the title of the section.



>           integer a(10), b(0:9)
>     !hpf$ align a(i) with b(i-1)

It seems that there is no need for this case.  The following directive
does nicely:

    !hpf$ align a with b

The presence of offsets and strides greatly increses the complexity of the
analysis when the compiler is trying to determine locality.  If we're
looking for a performance subset, then compiler implementation costs and
compile time are a consideration.  Seems we need to ask ourselves how much
compile time is reasonable to provide good performance, and one of the
keys to good performance is increased locality.


>   although the following equivalent alignment is allowed, since I've specified
>   no explicit "offsets" or "strides".
> 
>           integer a(10), b(0:9)
>     !hpf$ align a(:) with b(:)

I'm not sure what the point is here.  The above directive is all that's
needed here and accomplishes my goal of not having alignments and strides.
So, perhaps your just pointing out the intent just to make sure everyone
understands the intent.


>  - In 7.4, the first paragraph states that "In alignment expressions the
>    dimensions may not be permuted."  This is repeated as the third bullet in
>    the list that follows.
>
>  - In 7.4, the sentence that follows the list, the word "is" should be dropped.
> 
>  - In 7.8, further to the comment that begins "What does a naive programmer do
>    with this advice?", I believe the intent of this chapter was to steer the
>    user away from features which might perform poorly.  A DO loop that's
>    specified to be INDEPENDENT, but can't be parallelized will perform about
>    the same as the same DO loop without INDEPENDENT.  Telling them be careful
>    with INDEPENDENT doesn't really give them any benefit.
> 
>  - In 7.10, it is stated that HPF_LOCAL and HPF_SERIAL "allow a program to get
>    at the highest performing features of a particular architecture."  This
>    might be true of HPF_LOCAL, but is probably not true of HPF_SERIAL.
> 
>  - In 7.11, the second sentence begins "The intrinsics. . . ."  I believe this
>    should be "The HPF library procedures. . . ."  However, this brings up a
>    second point - is use of all intrinsics encouraged?  What about the
>    transformational intrinsics with non-constant DIM arguments?
> 
>  - 7.13.  Given the restrictions being put in place for pointers in HPF 2.0, I
>    think this section can probably go away.
> 
>  - In the second paragraph of 7.14, it's stated that assumed shape mapped
>    dummies may be used, that they may use any mapping syntax and that array
>    valued function results may be explicitly mapped.  However, there is no
>    indication as to whether these are restricted, recommended or not
>    recommended.
> 
>  - In 7.15, the second sentence, "not suggested" should be "not recommended".
> 
> Thanks,
> 
> Henry

I'm going to stop with my comments here.  I'm having trouble relating
the comments back to the HPF Spec.


> >From: "Scott B. Baden" <baden@cs.ucsd.edu>
> To: meltzer@cray.com, zosel@llnl.gov
> Subject: Coments on Extrinsic Interfaces
> Cc: baden@cs.ucsd.edu
> 
> Hi, I have a few comments on Appendinx A.  
> 
> page 242, line 4.  SPELL OUT the words for CCI
> Line 19 "A short paepr with the formal definition" 
> 
> How many pages?
> 
> line 47
> 
> should the "web mark-up language" be html?
> In any case, if we ar asking for html, then we should provide a
> web submission procedure, i.e. a url to be accesible from
> the HPFF home page.
> 
> 
> Page 243  HPF_CRAFT.  
> Insert a reference to Chapter 7 at the end of the sentence on line 19.

This seems reasonable.


> The second paragarph (lines 20-26) is a little hard on the reader.
> I suggest beginning it as
> 
> "The combination of SPMD features with muilti-threaded execution
> is a powerful one.  It provides  the flexibility necessary
> to take advantage of low-level access, offering the highest performance
> of a more general programming model."

No problem.  I'm also looking at the sentence that refers to "chapter on
coding for [ortable performance in HPF."  Where is this chapter?


> fentence on lines 22-23 "including HPF.." 
> How does "including HPF data distribution"
> give acces to the "highest performing aspects of both modesl".

The sentence could be clearer.  The intent is that the HPF data
distribution directives are convenient, and HPF-Craft provides some
low-level access that when combined with the convenient distribution gives
good performance on the right platforms.  I'll look at it some more.


> Line 24.
> "HPF is not appropriate for all platorms..."
> What are the restrictions.

Well, a pure SIMD platform comes to mind.  We don't want to get into
the Biz of creating an exhaustive list cause we'll leave something out.
The sentence could be sotened to something like:

    An implementation may decide that HPF-Craft is not
    appropriate for a particular platform.


> It may be easier to follow section A.2. if you simply fold it into the
> appropriate subsections of A.2.5 (you may have to reorder
> the subsections of A.2.5)  Many parts of A.2.5 are very sparse.

I'll read it again to see if it can be made clearer.  I think it's
a nice lead in.  The only time I think it'd be difficult to follow
is if you had no knowledge of the previous sections.  I'm open
to the opinion of others though.


> page 245, line 14.
> What is the purpose of the sentence
> "The user was allowed to specify this."

I agree this sentence could be improved.   The point is that the
user has control over where iterations are executed.  I'll look
at this some more.


> example 3 (discussion)
> make it clear that !HPF$ ATOMIC_UPDATE is part of HPF_CRAFT

OK.


> example 5
> Overall I felt that example 5 was too complicated at least as part of
> the overview section. Can you come up with something simpler?

I suppose, but it's kind of nice to have something a little more
complicated as the last example.  It ties things together some.


> The geometry directive bit doesn't illustrate the utility of the
> construct.  You only use it in one place.  I felt that this simply
> complicatied the exmaple, which was really about something else
> entirely.

But the point is that you change the GEOMETRY statement and the
distributions of FX, FY, and FZ all change too.  Note the difference with
FXP and FYP.  You have to change both distirbutions.


> I would take out the gemoetry part and move it to A.2.5.16
> (and chose an exmaple that REALLY illustrates the utility of the
> construct)

I'l consider this.  Thanks for the suggestion.


> Page 246, section A.2.3.
> line 22 "S" in "Subprogrtam" should be in lower case.
> 
> Lines 29-30.
> You can cut the second sentence, it merely reiterates what you
> said in the first one.

Seems like it makes it clearer though.  I'll think about this too.


> Page 247.
> last paragraph of section A.2.3
> I found this hard to understand.
> Isn't the "dummy argument of the HPF_CRAFT subprogram"
> the same as the dummy argument of the "EXTRINSIC("HPF_CRAFT") subprogram"?

Yes.  Good point.


> section A.2.4.  I would put this before A.2.3  The model should come
> first.

I think I agree with you here.  I want to study it some more.


> lines 40 and 41.
> You use "these" to refer to "these barriers" and "these points"

I'm having trouble finding this one.  Maybe it was fixed already?


> Paragraph on lines 42-48.
> You should provide an example which illusrates the concepts.

Unfortunately my line numbers do not correspond with the cited line numbers.
I need more information.


> Line 42.
> Simplify this to:
> "Implicit barriers are placed whereever"
> (cut "which" from line 43)

Seems to have been fixed already?


> Page 248
> Line 21.  Be sure an inset a reference to Chpater 7 here.

Line number problems again.


> Line 39
> "Processors need not co-operate if there are only reads to non-local
> data."
> How do you define "cooperate?" on a machine that does not provide single
> sided communication, all processors in theory might need to
> communicate to carry out remote reads.

I'll work on a better speficication.  I agree this needs to be better.


> Page 249 section A.2.5.3.
> Pargraph 3 (lines 19-25)  Where would this be useful?

Line number problems again.


> Last para. lines -2629.  Can you give an example?

Sorry, I need new line numbers again.


> Page 25, lines 38-39
> Don't all processors have to participate when 
> an explicitly mapped array appears on the right hand side?
> This is seimilar to discussion above regarding line 39, pg. 248.

Sometimes you only want the elements that the particiapting
processors access.  For example, in a MASTER region:

     PRIVATE_ARRAY = MAPPED_ARRAY;

You don't need all processors to participate to make a local copy.


> Page 251.
> lines 26. Show syntax for CRITICAL.

Ok.


> Page 25, line 3.
> What is the "identifier for thetask executing the call"
> Is this the same as the physical processor, virtual processor?

I'm sorry, I can't find this reference.


> That's all folks!
> 
> Scott

Thanks for your comments.


Tom MacDonald
tam@cray.com
---------------------------------------------------------------------------
To (un)subscribe to this list, send mail to hpff-doc-request@cs.rice.edu.
Leave the subject line blank, and in the body put the line
(un)subscribe <email-address>
---------------------------------------------------------------------------

From owner-hpff-doc  Mon Nov 11 22:59:40 1996
Received: (from daemon@localhost) by cs.rice.edu (8.7.1/8.7.1) id WAA16658 for hpff-doc-out; Mon, 11 Nov 1996 22:59:40 -0600 (CST)
Received: from [128.42.5.170] (pasyn-45.rice.edu [128.42.5.173]) by cs.rice.edu (8.7.1/8.7.1) with ESMTP id WAA16647 for <hpff-doc>; Mon, 11 Nov 1996 22:59:33 -0600 (CST)
X-Sender: chk@titan.cs.rice.edu
Message-Id: <v03007826aeadb31f8202@[128.42.5.170]>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Date: Mon, 11 Nov 1996 22:59:27 -0600
To: hpff-doc
From: Chuck Koelbel <chk@cs.rice.edu>
Subject: hpff-doc: New version on-line
Sender: owner-hpff-doc
Precedence: bulk

---------------------------------------------------------------------------
hpff-doc@cs.rice.edu is a mailing list for HPF 2.0 language specification
authors and editors.  Instructions for adding or deleting yourself from this
list appear at the bottom of this message.
---------------------------------------------------------------------------
New files available in
ftp://titan.cs.rice.edu/public/HPFF/work-in-progress/:
	syntax-macs.tex
	hpf-report.tex
	hpf-local-ext.tex
	mapping-base.tex
	mapping-ext.tex
	mapping-subr.tex
	parallel.tex

Everyone who LaTeXs the document themselves should pick up the new
syntax-macs.tex (includes bug fixes - thank you, John May!) and
hpf-report.tex (increments the version number).

The new versions of the other files have been posted here previously, I
think.  Chapter authors should probably retrieve their own files to check
that Chuck's FTP didn't screw up again.


************************************************************
Chuck Koelbel                             Research Scientist
Center for Research on Parallel Computation  Rice University
                 "History is made at night.
           Character is what you are in the dark."
************************************************************


---------------------------------------------------------------------------
To (un)subscribe to this list, send mail to hpff-doc-request@cs.rice.edu.
Leave the subject line blank, and in the body put the line
(un)subscribe <email-address>
---------------------------------------------------------------------------

From owner-hpff-doc  Tue Nov 12 04:11:21 1996
Received: (from daemon@localhost) by cs.rice.edu (8.7.1/8.7.1) id EAA19602 for hpff-doc-out; Tue, 12 Nov 1996 04:11:21 -0600 (CST)
Received: from unidhp1.uni-c.dk (unidhp1.uni-c.dk [130.228.3.41]) by cs.rice.edu (8.7.1/8.7.1) with ESMTP id EAA19597 for <hpff-doc@cs.rice.edu>; Tue, 12 Nov 1996 04:11:16 -0600 (CST)
Message-Id: <199611121011.EAA19597@cs.rice.edu>
Received: by unidhp1.uni-c.dk
	(1.39.111.2/16.2) id AA182003797; Tue, 12 Nov 1996 11:16:37 +0100
From: Joergen Moth <Jorgen.Moth@uni-c.dk>
Subject: hpff-doc: Typos on page 154 of Version 2.0.delta
To: hpff-doc@cs.rice.edu
Date: Tue, 12 Nov 1996 11:16:37 +0100 (MET)
Cc: unijm@unidhp1.uni-c.dk (Joergen Moth)
X-Mailer: ELM [version 2.4 PL25]
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-hpff-doc
Precedence: bulk

---------------------------------------------------------------------------
hpff-doc@cs.rice.edu is a mailing list for HPF 2.0 language specification
authors and editors.  Instructions for adding or deleting yourself from this
list appear at the bottom of this message.
---------------------------------------------------------------------------
On page 154 of the HPF Language Specification version 2.0.delta:

Line 26: "A=>B" should read "P=>B"
Line 40: "pointer A" should read "pointer P"

-- 
     Jorgen Moth                  e-mail: jorgen.moth@uni-c.dk
      U N I - C                   phone : +45 / 3587 8963
The Danish Computing Centre       fax   : +45 / 3587 8990
 for Research and Education       mail  : DTU, bldg. 304, DK-2800 Lyngby
---------------------------------------------------------------------------
To (un)subscribe to this list, send mail to hpff-doc-request@cs.rice.edu.
Leave the subject line blank, and in the body put the line
(un)subscribe <email-address>
---------------------------------------------------------------------------

From owner-hpff-doc  Sat Nov 16 11:23:34 1996
Received: (from daemon@localhost) by cs.rice.edu (8.7.1/8.7.1) id LAA16772 for hpff-doc-out; Sat, 16 Nov 1996 11:23:34 -0600 (CST)
Received: from nz11.rz.uni-karlsruhe.de (nz11.rz.uni-karlsruhe.de [129.13.64.7]) by cs.rice.edu (8.7.1/8.7.1) with ESMTP id LAA16767 for <hpff-doc@cs.rice.edu>; Sat, 16 Nov 1996 11:23:13 -0600 (CST)
Message-Id: <199611161723.LAA16767@cs.rice.edu>
Received: from ry73.rz.uni-karlsruhe.de by nz11.rz.uni-karlsruhe.de with SMTP (PP); Sat, 16 Nov 1996 18:22:52 +0100
Received: by ry73.rz.uni-karlsruhe.de
	(1.38.193.4/16.2) id AA25732; Sat, 16 Nov 1996 18:22:51 +0100
Subject: hpff-doc: HPF v2.0.delta section 9
To: hpff-doc@cs.rice.edu
Date: Sat, 16 Nov 1996 18:22:50 +0100 (CET)
From: hennecke@rz.uni-karlsruhe.de (Michael Hennecke)
Reply-To: hennecke@rz.uni-karlsruhe.de (Michael Hennecke)
X-Mailer: ELM [version 2.4 PL23]
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 7bit
Sender: owner-hpff-doc
Precedence: bulk

---------------------------------------------------------------------------
hpff-doc@cs.rice.edu is a mailing list for HPF 2.0 language specification
authors and editors.  Instructions for adding or deleting yourself from this
list appear at the bottom of this message.
---------------------------------------------------------------------------
-----------------------------------------------
     Comments on HPF v2.0.delta, section 9
Michael Hennecke (hennecke@rz.uni-karlsruhe.de)
-----------------------------------------------

204:17   "Fortran 90" --> "Fortran 95"  (edits seem to be ok agains F95)
204:31+  Insert:
	 "To the numbered list of 9.4.1, add:
	   (8) Optional return of an asynchronous data transfer ID"
204:34   "<<is>> ID" -->  "<<or>> ID"  (there can be only one <<is>>)
205:6+   Does this cover the case of absent ASYNCH= specifier?
206:18   "IOSTAT=<label>" --> "IOSTAT=<scalar-default-int-expr>"
206:30+  This paragraph says that WAIT always waits (that is, blocks).
	 The paragraph on 206:36+ implies that WAIT does not block
	 if DONE= is present. This should be made a bit more explicit.

Best regards,
Michael Hennecke

 ======================================================================
  Michael Hennecke      http://www.uni-karlsruhe.de/~Michael.Hennecke/ 
 ----------------------------------------------------------------------
  University of Karlsruhe         RFC822: hennecke@rz.uni-karlsruhe.de 
  Computing Center (G20.21 R210)               No longer on BITNET :-(
  Zirkel 2  *  P.O. Box 69 80                 Phone: +49 721  608-4862 
  D-76128  Karlsruhe                               Fax: +49 721  32550 
 ======================================================================
---------------------------------------------------------------------------
To (un)subscribe to this list, send mail to hpff-doc-request@cs.rice.edu.
Leave the subject line blank, and in the body put the line
(un)subscribe <email-address>
---------------------------------------------------------------------------

From owner-hpff-doc  Mon Nov 25 10:30:21 1996
Received: (from daemon@localhost) by cs.rice.edu (8.7.1/8.7.1) id KAA26982 for hpff-doc-out; Mon, 25 Nov 1996 10:30:21 -0600 (CST)
Received: from [128.42.5.161] (pasyn-33.rice.edu [128.42.5.161]) by cs.rice.edu (8.7.1/8.7.1) with ESMTP id KAA26923 for <hpff-doc>; Mon, 25 Nov 1996 10:30:02 -0600 (CST)
X-Sender: chk@titan.cs.rice.edu
Message-Id: <v03007804aebf75b565d2@[128.42.5.147]>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Date: Mon, 25 Nov 1996 10:12:08 -0600
To: hpff-doc
From: "Jerrold L. Wagener" <jwagener@ionet.net> (by way of Chuck Koelbel)
Subject: hpff-doc: async I/O
Sender: owner-hpff-doc
Precedence: bulk

---------------------------------------------------------------------------
hpff-doc@cs.rice.edu is a mailing list for HPF 2.0 language specification
authors and editors.  Instructions for adding or deleting yourself from this
list appear at the bottom of this message.
---------------------------------------------------------------------------
To:       HPFF
From:     Jerry Wagener
Subject:  Asynchronous I/O
Date:     21 Nov 96

At its meeting last week X3J3 (unanimously) approved the attached
specification (ascii version of paper X3J3/96-158r2) for Asynchrounous I/O in
Fortran 2000.  Though a number of details have yet to be fleshed out for this
specification, the committee has asked that I call to your attention that
158r2 appears to differ significantly from the asynchronous I/O facility
planned for HPF 2.

The 158r2 specification is intended for the entire Fortran audience, not just
the high performance community; therefore it is more "permissive", and indeed
need not be implemented asynchronously for the "hard" cases.  We hope,
however, that the HPF 2 specification can be modeled as closely as possible
upon the appropriate subset of 158r2.  As best we can tell, a subst of 158r2
maps upon the HPF 2 requirements exactly, and we will attempt to provide you
with the details of this subset as soon as possible.

Sincerely yours,

Jerry

----------------------------------------------------------------

                                               X3J3/96-158r2

To: X3J3
From: Rich Bleikamp
Subject: Syntax and Edits for Async I/O (tentative)
Date: Nov. 19, 1996

See paper 96-147r1 for the semantics previously approved by
X3J3 for this
feature.

Major Changes since 96-158r1
     - Added an ASYNCHRONOUS statement  (per the straw
       vote).
     - changed ASYNC to ASYNCHRONOUS (attribute name,
       specifier name)
     - fixed 9.6.1.14: sort of, the phrase is still a
       little awkward.
     - added a conceptual model after the rationale.
     - fixed NAMELIST stuff, so only variables affected by
       the namelist are restricted like other list items.
     - I added a paragraph about implied-do-variables
       becoming undefined, in output data transfer
       statements.  This will prohibit the user from
       examining said variables, until the corresponding
       WAIT operation is performed.
     - functions referenced in item lists in async data
       transfer statements shall be PURE.
     - I addressed the issue of PRIVATE  componets needing
       the ASYNCHRONOUS attribute by adding the text
       ", or shall be a subobject  of an object with the
       ASYNCHRONOUS attribute" whever we required a
       variable to have that attribute.  I considered
       interp 140 (not approved yet) in this context.  The
       alternative is to change 6.1.2, where some
       attribuutes are inherited by components from their
       parent object.

Unresolved Issues
     -  We have not decided if ID= variables should be a new
       datatype, other than default integer (either a new
       intrinsic type, or an implicitly defined derived
       type).  This request came from Robert Corbett of
       Sun.
     - I've looked at the restrictions on namelist-group-
       objects not being pointers, and it is similar to
       other restrictions, namely, only arrays with
       constant bounds are permissible.  So i've decided
       not to do anything about this.  It is unrelated to
       ASYNC i/o.


"Notes to the reader"  are not notes to be included in the
standard.  Text to be included in the standard is either
"quoted" or indented.

Edits to 96-007R1:

In rule R426 (component-attr-spec), add:
     or  ASYNCHRONOUS

In rule R503 (attr-spec), add:
     or  ASYNCHRONOUS

and add a new section (page 57):
     5.1.2.12    ASYNCHRONOUS attribute
     The ASYNCHRONOUS attribute is required for some
     variables used in asynchronous input /output statements
     in some scoping units, as described in section 9.  This
     attribute may be specified for those variables, or any
     other variables, in other scoping units.

Note to reader: we allow any variable to have the
asynchronous attribute so users can remove ASYNCHRONOUS
specifiers from input / output statements without having to
delete the ASYNCHRONOUS attribute.

     Note: The ASYNCHRONOUS attribute is similar to the
     VOLATILE attribute provided by some processors, and is
     designed to safely permit traditional code motion
     optimizations in the presence of asynchronous input /
     output.  Variables in asynchronous input / output lists
     implicitly have the ASYNCHRONOUS attribute in the
     scoping unit of that asynchronous READ or WRITE
     statement, but shall have the ASYNCHRONOUS attribute in
     the scope of the corresponding wait operation, if the
     wait operation is in a different scoping unit.   -- End
     Note

Add a new section, 9.2.10 (and renumber 9.2.10 and later
sections):

     9.2.10  ASYNCHRONOUS  statement

     R5xx      is  ASYNCHRONOUS  [::]  object-name-list

     The ASYNCHRONOUS statement specifes the ASYNCHRONOUS
     attribute for a list of objects..

In rule R905 (OPEN statement connect-spec), add, after PAD=
(on its own line)(pg. 140):
     or  ASYNCHRONOUS

Add section 9.3.4.11 (page 142/143):

     9.3.4.11  ASYNCHRONOUS specifier in the OPEN statement

     If the ASYNCHRONOUS specifier is specified for a unit
     in an OPEN statement, then READ and WRITE statements
     for  that unit may include the ASYNCHRONOUS specifier
     in the control information list.

     The presence of an ASYNCHRONOUS specifier in a READ or
     WRITE statement permits (but does not require) a
     processor to perform the data transfer asynchronously.
     The WAIT, CLOSE, and file positioning statements may be
     used to wait for asynchronous input / output operations
     to complete, and the INQUIRE statement may be used to
     inquire whether or not asynchronous input / output
     operations have completed.

Note to the reader: the above rules imply only external unit
input / output may specify an ASYNCHRONOUS specifier for
READs and WRITEs, since internal files are not OPENed.

In section 9.3.5 (CLOSE statement), page 143, add the
following paragraph and
notes after line 5:

     Execution of a CLOSE statement causes the processor to
     wait for all pending data transfer operations for the
     specified unit to complete.

     Note:  A pending data transfer operation exists when a
     READ or WRITE statement with  the ASYNCHRONOUS
     specifier is executed, but the corresponding wait
     operation has not yet been executed.

     If a CLOSE statement is executed for a unit with
     pending data transfer operations, that CLOSE statement
     is considered to be the corresponding wait operation
     for the READ or WRITE statements that initiated those
     pending data transfer operations, and the CLOSE
     statement is considered to be a data transfer statement
     for purposes of end of file, eond of record, and error
     processing.


     If the CLOSE statement corresponding to one or more
     asynchronous READ or WRITE statements is in a different
     scoping unit than the READ or WRITE statements, then
     every accessible variable in the input / output lists
     (including implied-do-variables) or namelists, and the
     variable specified in the SIZE= specifier, if any,
     shall have the ASYNCHRONOUS attribute, or shall be a
     subobject  of an object with the ASYNCHRONOUS
     attribute, in the scoping unit of that CLOSE statement.
     When a namelist group name is specified in an
     input/output statement,  variables in the namelist
     group which are not actually read or written by the
     input/output statement do not need to have the
     ASYNCHRONOUS attribute in the scope of the CLOSE
     statement.

In rule 912 (io-control-spec) (page 144), add:

     or  ASYNCHRONOUS
     or  ID = scalar-default-int-variable

Add the following constraint after the constraint on line
19, page 145:

     Constraint: An ASYNCHRONOUS specifier shall be present
     if an ID= specifier is  present.

     Constraint: An ASYNCHRONOUS specifier shall not be
     specified if the io-unit is an internal-file-unit.

 Note to the reader: the first constraint implies an ID=
specifier, typically used in
a corresponding WAIT statement, is NOT required in an
asynchronous READ or WRITE statement.  The user would have
to CLOSE the unit (or execute another wait operation) before
referencing any storage locations in an input list or
namelist, and to NOT define any storage locations referenced
by an output list or namelist in an output statement.  This
allows a knowledgeable user to READ or WRITE massive amounts
of data to a file, without ever waiting for completion,  as
long as they close the file or perform some other wait
operation before modifying/referencing any storage locations
referenced by an input / output list or namelist.

In section 9.4.1.9 (page 147), first sentence, insert

     without an ASYNCHRONOUS specifier

before "terminates", and add the following as the last
sentence of that paragraph:

     If an ASYNCHRONOUS specifier is present, the variable
     specified in the SIZE= specifier, if any, will become
     defined, with the value described above, when the wait
     operation corresponding to the non-advancing input
     statement is executed.  If an asynchronous data
     transfer statement appears in a different scoping unit
     than its corresponding wait operation, then the
     variable specified in the SIZE= specifier, if
     accessible, shall have the ASYNCHRONOUS attribute, or
     shall be a subobject  of an object with the
     ASYNCHRONOUS attribute, in the scoping unit of the
     corresponding wait operation.

     Note: A CLOSE, INQUIRE or a file positioning statement,
     as well as a WAIT  statement, can be a wait operation
     (9.3.5).


Insert a new section:

     9.4.1.10  Asynchronous specifier

     The ASYNCHRONOUS specifier indicates that this data
     transfer operation can be performed asynchronously.
     Records read or written by asynchronous data  transfer
     statements will be read, written, and processed in the
     same order as they would have been if the data transfer
     statement did not contain the ASYNCHRONOUS specifier.

     The ASYNCHRONOUS specifier shall not be present in a
     READ or WRITE statement unless the OPEN statement for
     the unit referenced in the READ or WRITE statement
     contained an ASYNCHRONOUS specifier.

     When a data transfer statement with the ASYNCHRONOUS
     specifier  is executed, the program shall not execute
     any statements that would cause any variable in the
     input / output list, namelist, any do-variable in the
     item list, or the variable specified in a SIZE=
     specifier  to become undefined  as described in 14.7.6,
     until the corresponding wait operation is performed.
     When a namelist group name is specified in input/output
     statement with the ASYNCHRONOUS specifier, any
     variables in the namelist group which are not actually
     read or written by the input/output statement are not
     subject to the restrictions described in this
     paragraph.

     When a data transfer statement with the ASYNCHRONOUS
     specifier  is executed, the program shall not execute
     any statements which would cause the pointer
     association status of any variable in the input /
     output list, namelist, any do-variable in the item
     list, or a variable specified in the SIZE= specifier to
     change, or would cause any such variable to become
     associated with a different target, as described in
     14.6.2, until the corresponding wait operation is
     performed.  When a namelist group name is specified in
     an input/output statement, variables in the namelist
     group not actually read or written by the input/output
     statement are not subject to the  restrictions
     deescribed in this paragraph..

     Note: These last two restrictions ensure that certain
     variables referenced in asynchronous data transfer
     statements must still exist and reference the same
     storage locations when the corresponding wait operation
     is performed, including the implicit CLOSE for open
     units when a program is exiting.

     When an input data transfer statement with the
     ASYNCHRONOUS specifier is executed, the input list or
     namelist items, and any implied-do-variables, and the
     variables specified in the SIZE= specifier, if any,
     become undefined until the corresponding wait operation
     is executed (9.3.5, 9.5).  When a namelist group name
     is specified in an input/output statement,  variables
     in the namelist group not actually read or written by
     the input/output statement do not become undefined.

     When an output data transfer statement with the
     ASYNCHRONOUS specifier is executed, the output list or
     namelist items, and any implied-do-variables in the
     item list,  shall not be redefined until the
     corresponding wait operation is executed (9.3.5, 9.5).
     When a namelist group name is specified in such an
     input/output statement,  variables in the namelist
     group not actually read or written by the input/output
     statement may be redefined before the corresponding
     wait operation.

     When an output data transfer statement with the
     ASYNCHRONOUS specifier is executed, any implied-do-
     variables  in the item list become undefined untiil the
     corresponding wait operation is performed, at which
     tiime is becomes defined with the value it would have
     at the end of execution of the original READ or WRITE
     statement if that statement had not specified the
     ASYNCHRONOUS specifier.

     When an input/output data transfer statement with the
     ASYNCHRONOUS specifier is executed, all functions
     referenced in the item list shall be pure functions.

     Note:  This restriction on functions appearing in item
     lists for asynchronous data transfer statements applies
     to all function references, including those used in
     subscript, substring, and implied do loop calculations.
     End Note

Insert a new section 9.4.1.11:

     9.4.1.11  ID= specifier

     The ID= specifier identifies a variable which is
     assigned a processor dependent  value during the
     execution of an asynchronous data transfer statement.
     This  value can be used in a WAIT statement to force
     the processor to wait for a particular data transfer
     operation to complete.

In section 9.4.4, list item (5), change "namelist" to

     namelist,  except that if the ASYNCHRONOUS= specifier
     was also present, the entities specified in the
     input/output list or namelist become undefined

In section 9.4.4, list item (8), change "defined" to

     defined, except that a variable specified in a SIZE=
     specifier becomes undefined if an ASYNCHRONOUS
     specifier was also specified

In section 9.4.4.4, page 152, before the paragraph that
starts "On output ...", insert the following paragraphs:

     If an ASYNCHRONOUS specifier is specified on a data
     transfer statement, the actual list processing and data
     transfers may occur during execution of the input
     statement, during execution of the corresponding wait
     operation, or anywhere in-between.   The data transfer
     operation is considered to be a pending data transfer
     operation until a corresponding wait operation is
     performed.

     If an ASYNCHRONOUS specifier is specified on an input
     statement, the list items or namelist variables, any do-
     variable in the item list, and the variable specified
     in the SIZE= specifier, if any, become undefined until
     the corresponding wait operation is executed (9.3.5,
     9.5).  When a namelist group name is specified in an
     input/output statement,  variables in the namelist
     group not actually read by the input statement do not
     become undefined.

     If an ASYNCHRONOUS specifier is specified on an output
     statement, the list items or namelist variables, and
     any do-variable in the item list shall not be redefined
     until the corresponding wait operation is executed
     (9.3.5, 9.5).  When a namelist group name is specified
     in an output statement,  variables in the namelist
     group not actually written by the input/output
     statement are not subject to the restrictions described
     in this paragraph.

     When a data transfer operation is performed
     asynchronously, any errors which  would have caused the
     ERR= branch on a non-asynchronous READ or WRITE to be
     taken, and the IOSTAT variable to be defined with a non-
     zero value, may instead occur during execution of the
     corresponding wait operation (a WAIT, CLOSE,  INQUIRE
     or file positioning statement) and take the ERR= branch
     of that wait operation instead.  If an ID= specifier is
     not present in the initiating READ or WRITE statement,
     the errors may occur during the execution of any
     subsequent input / output statement on that same unit,
     and not just during the corresponding wait operation.

Insert a new section 9.5, and renumber every section
thereafter appropriately:

     9.5  WAIT statement

     Execution of a WAIT statement causes the processor to
     wait for some previously initiated asynchronous data
     transfers to complete.

         R919  wait-statement is  WAIT (wait-spec-list)

         R920  wait-spec      is  [UNIT = ] external-file-
     unit
                         or  IOSTAT = scalar-default-int-
     variable
                         or  ERR = label
                         or  ID = scalar-default-int-
     variable
                         or  END = label

     Constraint: A wait-spec-list shall contain exactly one
     external-file-unit specifier, and may contain at most
     one of each of the other specifiers.

     Constraint: If the optional characters UNIT= are
     omitted from the unit specifier, the unit specifier
     shall be the first item in the wait-spec-list.


    (note to Richard Maine: insert other appropriate
constraints, similar to the
    position-spec constraints, and one for the END=label
branch target)

     The IOSTAT=, ERR=, and END= specifiers are described in
     x, x, and x respectively.

     If an ID= specifier is not present, the processor waits
     for all pending data transfers on the specified unit to
     complete, if any.  If an ID= specifier is present, the
     processor waits for the corresponding READ or WRITE
     operation to complete.  The corresponding READ or WRITE
     operation is that READ or WRITE which returned the same
     value for the ID= specifier for the specified unit.
     The value specified for the ID= specifier shall be a
     value returned by a READ or WRITE statement for the
     specified unit, for which a corresponding wait
     operation has not been executed.

     The data transfer operation specified in the
     corresponding READ or WRITE statement may happen when
     the WAIT statement is executed, when the corresponding
     READ or WRITE statement was previously executed, or
     anytime in-between.  The WAIT statement is considered
     to be a data transfer statement for purposes of end of
     file, eond of record, and error processing.

     When the WAIT statement corresponding to a particular
     READ or WRITE statement is in a different scoping unit
     than the READ or WRITE statement, then every variable
     in the input / output list, namelist, any do-variable
     in the item list, or any variable specified in the
     SIZE= specifier, that is accessible, shall have the
     ASYNCHRONOUS attribute, or shall be a subobject  of an
     object with the ASYNCHRONOUS attribute, in the scoping
     unit of that WAIT statement.
     When a namelist group name is specified in an
     input/output statement,  variables in the namelist
     group not actually read or written by the input/output
     statement do not need to have the ASYNCHRONOUS
     attribute in the scope of that WAIT statement.

     Note: The CLOSE , INQUIRE, and file positioning
     statements, as well as the WAIT statement, can be a
     "wait" operation.

     Note: If an asynchronous READ attempts to read beyond
     the end of a file, then the end of file condition may
     occur either during execution of the READ statement or
     during execution of the  corresponding wait operation.
     If the end of file condition occurs during the wait
     operation, and there is not an END= or IOSTAT specifier
     in the statement which is the corresponding wait
     operation, then program execution terminates.  Error
     conditions are handled in a similar manner.

and renumber all subsequent rules.  (or insert the rule with
an used rule number?)

In the old section 9.5 (File Positioning statements), add
the following after
the last sentence in that section:

     Execution of a file positioning statement causes the
     processor to wait for all pending data transfer
     operations for the specified unit to complete.

      If a file positioning statement is executed for a unit
     with pending data transfer operations, that  statement
     is considered to be the corresponding wait operation
     for the READ or WRITE statements which initiated the
     pending data transfers, and is also considered to be an
     data transfer statement for purposes of end of file,
     error, and end of record processing.

     When a file positioning statement corresponding to one
     or more asynchronous  READ or WRITE statements is in a
     different scoping unit than the READ or WRITE
     statements, then every variable in those input / output
     lists, namelists, any do-variable in the item list,
     and the variable specified in a SIZE= specifier, that
     is accessible, shall have the ASYNCHRONOUS attribute,
     or shall be a subobject  of an object with the
     ASYNCHRONOUS attribute, in the scoping unit of that
     file positioning statement.  When a namelist group name
     is specified in an input/output statement,  variables
     in the namelist group not actually read or written by
     the input/output statement do not need the ASYNCHRONOUS
     attribute in the scoping unit of that file positioning
     statement.

In section 9.6.1, add the following to rule 924:
     or  ID = scalar-default-int-variable
     or  PENDING = scalar-default-logical-variable

and add these constraints around line 40 on page 156:
     Constraint: The ID=  and PENDING= specifiers shall not
     appear in an INQUIRE statement if the FILE = specifier
     is present.

     Constraint: If an ID= specifier is present, a PENDING=
     specifier shall also be present.

On page 159, add section 9.6.1.23
     9.6.1.23    ID= and PENDING= specifiers in the INQUIRE
       statement
     If an ID= specifier is not present in an INQUIRE
     statement, the variable specified in the PENDING=
     specifier is assigned the value true if there are any
     pending asynchronous data transfers for the specified
     unit which have not completed.  If an ID= specifier is
     present, the variable specified in the PENDING=
     specifier is assigned the value true if the data
     transfer identified by the ID= specifier for the
     specified unit has not yet completed.  In all other
     cases, the variable specified in the PENDING= specifier
     is set to false.

      If the variable specified in the PENDING= specifier is
     set to false, then the pending data transfer operations
     for this unit are considered to have completed, and
     this INQUIRE is the corresponding wait operation for
     the corresponding READ or WRITE statements.  When an
     ID= specifier is present, the corresponding operation
     is the READ or WRITE statement identified by the unit
     and ID= specifier value.  When an ID= specifier was not
     present, then this INQUIRE statement is the
     corresponding wait operation for all pending data
     transfer operations for the specified unit.  When the
     INQUIRE statement sets the variable specified in a
     PENDING= specifier to false annd there are pending data
     transfer operations, that INQUIRE statement is
     considered to be a data transfer statement for purposes
     of end of file, eond of record, and error processing.

In section 9.6.1.14,  add the following sentence as the last
sentence of the paragraph.
     If there are pending data transfer operations for the
     specified unit, the value assigned to the NEXTREC=
     specifier is computed as if all the pending data
     transfers had already completed.


Note to the reader:  the POSITION= specifier does not appear
to need any edits.

Note to the reader.  In section 14, we discuss events
causing definition and undefinition of variables.  In item
(3) of 14.7.5, we discuss when input causes an item to be
defined, in terms of when the data is transferred, so no
edit is needed in (3).  Note that the second part of (3)
applies to internal units, which cannot be written to
asynchronously.

In section 14.7.5, item (5), change an input/output
statement to an input/output statement without the
ASYNCHRONOUS specifier.

In section 14.7.5, item (8), change statement to
statement without an ASYNCHRONOUS specifier.

In section 14.7.5, insert this new item (9), and renumber
the remaining items:
     (9)  Execution of a READ statement containing both an
     ASYNCHRONOUS and a SIZE= specifier may cause the
     variable specified in the SIZE= specifier to become
     defined, or the corresponding wait operation may cause
     that variable to become defined.   Either the READ
     statement or the corresponding wait operation will
     cause that variable to become defined.

In section 14.7.6, item (4), change "input/output statement"
to "input/output statement or its corresponding wait
operation".

In section 14.7.6, item (5), change "input/output statement"
to "input/output statement or its corresponding wait
operation".

In section 14.7.6, item (7), change input statement to
input statement or its corresponding wait operation.

In section 14.7.6, add a new item (16) (the editor may
relocate to another part of the list if desired):
     Execution of a READ or WRITE statement with the
     ASYNCHRONOUS specifier causes all variables in the item
     list or namelist, all implied-do-variables in the item
     list, and the variable specified in the SIZE=
     specifier, if any, to become undefined.  Variables in a
     namelist group specified in such a READ or WRITE
     statement which are not actually read or written by the
     input/output statement do not become undefined.

Rationale: may be inserted in the appropriate annex if
desired.

Rationale for Asynchronous I/O

Rather than limit support for asynchronous I/O to what has
been traditionally provided by facilities such as BUFFERIN-
BUFFEROUT, this standard provides a more consistent, Fortran-
like syntax.  This permits the maximum possible level of
support for asynchronous I/O, as well as simplifying the
task of adapting existing standard conforming programs to
utilize asynchronous I/O.

Not all processors will actually provide support for
asynchronous I/O, nor will every processor which does
support asynchronous I/O be able to handle data transfer
statements with complicated I/O item lists in an
asynchronous manner.  Such processors can still be standard
conforming.   Hopefully, the documentation for each Fortran
processor will describe when, if ever, asynchronous I/O will
be performed.

Conceptual Model

This proposal accomodates two different conceptual models
for asynchronous I/O.

Model 1:
     the processor will only perform asynchronous I/O when
     the I/O list is sufficiently simple (perhaps one named
     array, contiguous (not assumed shape), and possibly
     only when doing unformatted I/O.

     The primary impact of this model is that no processor
     is ever required to do asynchronous I/O.  Hence, the
     either/or requirement on ERR and EOF processing (either
     during the initial read/write, or at the wait
     operatrion).  Also, none of the complicated issues with
     I/O list interactions need to be dealt with (like  READ
     (...) N, (a(i),i=1,N) )

Model 2: The processor is free to do any of the following:
    on output, create a buffer inside the I/O library,
     completely formatted, and then start an async write of
     the buffer, and immediately return to the next
     statement in the program.  The processor is free wait
     for previously issued WRITEs,  or not.

        OR

    pass off the I/O list to another processor/process,
     which will process the list items independently of the
     processor which executes the users code.  There is
     still an ordering requirement on list item processing,
     to handle things like READ (...) N,(a(i),i=1,N).  But
     there are restrictions on the user to ensure that
     function calls in the i/o list, and implied-do-
     variables, are free to be called/defined
     asynchronously.  Hence the requirement that an implied-
     do-variable not be referenced or redefined by any other
     statement, imcluding  another I/O statement, until the
     matching wait operation is executed, and that functions
     called as part of evaluating the I/O list be PURE.

One source of confusion is the role of the ID= values and
wait operations.
The standard allows a user to issue an infinite number of
asnyc I/O requests, without waiting for any of them to
complete, and to then wait for any or all of them.  This
might seem to place a requirement on the runtime library to
keep trrack of all these outstanding requests, which is an
impossible and undesirble task.  The overhead is immense.
So, the model I used to guide the edits is:
    not all requests need be tracked by the runtime.   When
     the user does NOT specify an ID= specifier on a READ or
     WRITE, the runtime is free to forget about this
     particular request once it has successfully completed.
     If it gets an ERR or END condition, the processor is
     free  to report this during any I/O operation to that
     unit.

    When an ID=specifier is present, the runtime is
     required to keep track of  any END or ERR conditions
     for that specific I/O request.  However, if the I/O
     request succeeds without any exceptional conditions
     occuring, then the runtime can forget about that ID=
     value if it wishes.  Typically, I except a runtime to
     only keep track of the last request made, or perhaps a
     very few.  Then, when a user WAITs for a particular
     request, either the library knows about it (and does
     the right thing w.r.t. error handling, etc.), or will
     assume it is one of those requests which successfully
     completed and was forgotten about (and will just return
     without signaling any end/err conditions).  It is
     encumbent on the user to only pass in valid ID= values.
     There is no requirement on the processor to detetct
     invalid ID= values.

    There is of course, a processor dependent limit on how
     many outstanding I/O requests which generate an END or
     ERROR condition can be handled before the processor
     runs out of memory to keep track of such stuff.

    The restrictions on the SIZE= variables are designed to
     allow the processor to update such variables at any
     time (after the request has been processed, but before
     the WAIT operation), and to then forget about them.
     That's why there is no SIZE= specifier allowed in the
     various WAIT operations.  Only an exceptional condition
     (errors or EOFs) are expected to be tracked by
     individual request by the runtime, and then only if an
     ID= specifier was present.

    The EOR= specifier has not been added to the WAIT
     operatioons.  Instead, the IOSTAT variable will have to
     be queried after a WAIT operation to handle this
     situation.  This choice was made because an EOR
     condition is not perceived to be an exceptional
     condition, like those that trigger and END= or ERR=
     branch.   This particular choice is philosophical, and
     was not based on significant technical difficulties.
     Note that the requirement to set the IOSTAT variable
     correctly  requires an implementation to remember which
     I/O requests got an EOR condition, so that a subsequent
     wait operation will return the correct IOSTAT value.
     This means there is a processor defined limit on the
     number of outstanding I/O requests (non-advancing)
     which got an EOR condition (constrained by available
     memory to keep track of this info, similar to END/ERR
     conditions).



---------------------------------------------------------------------------
To (un)subscribe to this list, send mail to hpff-doc-request@cs.rice.edu.
Leave the subject line blank, and in the body put the line
(un)subscribe <email-address>
---------------------------------------------------------------------------