From owner-hpff-distribute  Wed Dec 13 10:04:47 1995
Received: (from daemon@localhost) by cs.rice.edu (8.7.1/8.7.1) id KAA22554 for hpff-distribute-out; Wed, 13 Dec 1995 10:04:47 -0600 (CST)
Received: from fontainebleau.ensmp.fr (root@fontainebleau.ensmp.fr [192.54.148.100]) by cs.rice.edu (8.7.1/8.7.1) with SMTP id KAA22528 for <hpff-distribute@cs.rice.edu>; Wed, 13 Dec 1995 10:04:03 -0600 (CST)
Received: from cri.ensmp.fr (chailly.ensmp.fr) by fontainebleau.ensmp.fr with SMTP id AA06075
  (5.67a8/IDA-1.5 for <hpff-distribute@cs.rice.edu>); Wed, 13 Dec 1995 17:02:59 +0100
Received: from provins.caii by cri.ensmp.fr (4.1/SMI-4.0)
	id AA11056; Wed, 13 Dec 95 17:02:58 +0100
Date: Wed, 13 Dec 95 17:02:58 +0100
Message-Id: <9512131602.AA11056@cri.ensmp.fr>
Received: by provins.caii (4.1/SMI-4.1)
	id AA10623; Wed, 13 Dec 95 17:02:23 +0100
From: Fabien COELHO <coelho@chailly.ensmp.fr>
To: meltzer@cray.com, schreiber@hpl.hp.com, loveman@hpc.pko.dec.com
Subject: hpff-distribute: Discussion about HPF kernel/HPF2 and so
Cc: hpff-distribute@cs.rice.edu
Sender: owner-hpff-distribute
Precedence: bulk

---------------------------------------------------------------------------
hpff-distribute@cs.rice.edu is a mailing list for discussion of data
distribution in High Performance Fortran.  Instructions for adding or
deleting yourself from this list appear at the bottom of this message.
---------------------------------------------------------------------------

Hi,

It turns out from the last HPF Forum meetings that HPF2 might become some
kind of a kernel, in order to ease the HPF labelling of compilers and to
help compilers in achieving good performances on a smaller part of the
language. Other features are postponed to "HPF 2000" (HPF never ?).

The current design of this kernel is (from the SC'95 HPF BOF Session)
HPF1 MINUS:

 1. dynamic, realign, redistribute.
 2. alignment strides.
 3. alignment axis interchange.
 4. general cyclic distributions.
 5. inherit.
 6. sequence and storage association.

I would like to support some of these features for reintroduction in the
kernel and to comment some of the issues involved:

--
A. Replication (possibly partial) seems still there! great!

--
B. I cannot see any technical reason for restrictions 3 and 2 with -1
stride, as far as the state of the art in compilation for distributed
memory is concerned. Thus 3 and 2 with -1 stride should be put back into
the kernel.

--
C. Such restrictions (especially 1 to 4) mean that applications requiring
these features will not be ported to HPF. It excludes multigrid methods,
signal processing...:-( Restriction 6 excludes dusty deck Fortran 77
applications...:-(

--
D. Also the mere idea of a kernel seems to suggest that the state of the
art in compiler technology for high performance is frozen. I don't like
either the removal of features. Just compiling full HPF for inefficiency
is not a big issue, the big issue is compiling for high performance. Thus
I would prefer seeing HPF as it is with warnings about not yet efficiently
managed features rather than non implemented features.

--
E. However one of the big problem of HPF is Fortran 90. Although this
was/is a good choice for the future, Fortran 90 is quite big a language, a
making the user feel that every Fotran 90 feature is orthogonal to every
HPF feature make implementors spend their time (and money) on issues that
have nothing to do with high performance and optimizations.

--
F. I understood that Rob Schreiber got inherit back into the kernel for
switching purposes between subroutines compiled with more information
available to the compiler... thus allowing "subroutine cloning management
by hand". I don't think that I would like my newly coded Fortran 90
applications be entached by such ugly features that have nothing to do
with the computation, just like I do not like to see a loop unrolled x
times by hand for some architecture because the compiler cannot do it
automatically. Moreover any explicit call to HPF inquiry functions makes a
program non portable (to non HPF compilers), and a different version of
the code will be needed to run on my workstation.

The problem right here is the knowledge assumed by the compiler for
improving the generated code, regarding differently distributed arrays
that may be passed as an argument and when a prescriptive mapping is not
desired. Instead of cloning procedures by hand and using a descriptive
mapping in each clone, I would rather let the compiler perform the cloning
and compile the subroutine with different descriptive mappings allowed. It
would make sense both from the sofware engineering and language [compiler]
point of view. Something like:

  ! may be called with (block,*), (block,block) and non distribted args...
  sub(A,B)
  real A(n,n), B(n,n)
  !hpf$ assume
  !hpf$ distribute (block,*) :: A, B
  !hpf$ or
  !hpf$ distribute (block,block) :: A, B
  !hpf$ or
      ! no distribution at all... local data...
  !hpf$ end assume
  ...
  end sub

It is then up to the compiler to perform the cloning, to compile and
optimize each case and to insure the switching at runtime or maybe at
compile/link time. I don't see this as a major problem for a compiler.
Allowing to specify several descriptive mappings instead of just one or
none (inherit) seems a nice alternative for both high performance and code
quality. This suggests a static mapping polymorphism instead of the dynamic
one provided by inherit.

--
G. The dynamic stuff is still there implicitely for prescriptive mappings
at procedure interfaces. Thus it must be handled by the compiler and a
maybe limited version of dynamic/realign/redistribute can be kept as
part of the kernel with no extra advanced support needed from the
compiler...

Remappings, even not very efficiently managed, are usefull to some
applications. Also they are simpler to handle if alignment strides and
general cyclic distributions are removed from the core language. Thus I
would like having the remappings back into the kernel. 

One of the arguments I heard is "you can simulate them with copies". Sure!
but if it is just a matter a renaming an array along a procedure, I guess
the compiler can do it for me quite easily, so that I don't have to handle
it in my nice Fortran 90 source code while it has only to do with HPF!

The problem here is still and again a problem of compiler knowledge: The
compiler may not know the actual mapping of an array reference, thus
preventing any compile time optimization... Some remappings are difficult
to handle by the compiler and prevent high performance. The very fact that
some instances of a feature lead to bad performance does not mean that
this feature must be thrown out of the language. The point is to disallow
ambiguity. I would suggest such a restriction on their use:

  "an array reference should have only one (static) reaching mapping".

Thus avoiding 

  if (some condition) 
  then remap A another way ; 
  use A ; // compiler does not know A mapping statically, NO !

Such a restriction may not necessary apply on remapping statements,
allowing: 

  if (...) 
  then remap A and use A ; 
  else remap A and use A ;
  remap back A;

It is not a big deal for a compiler to express and check the associated
dataflow problem. Since no ambiguity is allowed, the compiler can
implement the feature by some renaming/copying and generate optimized code
for each access...

--
H. The static knowledge available to the compiler seems often to be the
point. For instance, care was taken while defining HPF that the number of
processors is not necessarily known at compile time, and compiler writers
spend time and energy to optimize code without this knowledge... However
some compilers (such as the DEC one) allow this knowledge to be provided by
the user at compile time and take advantage of this extra information for
improving the generated code. I guess that compiling for 8 "HPF
processors" does not forbid you to run the generated code on 4 processors
and 2 processes by processors? Thus some warnings about the use of
"HPF_NUMBER_OF_PROCESSORS()" could also be issued for high performance...


For short, I would suggest:

1/ keep HPF as it is. Up to the vendors to issue warnings about
   unefficient (unimplemented) features. This is the current status.

2/ If however you do want a kernel, and name it HPF:
   . F77 + array syntax + (simple) reductions
   . HPF1 
     - non |unit| alignment strides.
     - general cyclic distributions.
     - inherit but...
     + "static mapping polymorphism".
     - sequence and storage association...
     + some carefully chosen functions (transpose? matmul?...)


Thanks for having read this comments and maybe considering some of them.
Comments are welcome...

Have a nice day,

Fabien.

Fabien COELHO __ http://www.cri.ensmp.fr/~coelho __ coelho@cri.ensmp.fr
  CRI, ENSMP, 35, rue Saint Honoré, 77305 Fontainebleau Cedex, France
     phone: 33 1 64 69 {voice: 48 52, fax: 47 09, standard: 47 08}
       ________  All opinions expressed here are mine  ________
---------------------------------------------------------------------------
To (un)subscribe to this list, send mail to
hpff-distribute-request@cs.rice.edu.
Leave the subject line blank, and in the body put the line
(un)subscribe <email-address>
---------------------------------------------------------------------------