From owner-hpff-distribute  Fri Oct 27 13:12:00 1995
Received: (from daemon@localhost) by cs.rice.edu (8.7.1/8.7.1) id NAA11279 for hpff-distribute-out; Fri, 27 Oct 1995 13:12:00 -0500 (CDT)
Date: Fri, 27 Oct 1995 13:12:00 -0500 (CDT)
Message-Id: <199510271812.NAA11279@cs.rice.edu>
From: choudhar@cat.syr.edu (Alok Choudhary)
Subject: hpff-distribute: Attempt on OOC arrays for HPF
Sender: owner-hpff-distribute
Precedence: bulk

---------------------------------------------------------------------------
hpff-distribute@cs.rice.edu is a mailing list for discussion of data
distribution in High Performance Fortran.  Instructions for adding or
deleting yourself from this list appear at the bottom of this message.
---------------------------------------------------------------------------


[Alok sent this message 10/23/95, when Majordomo was experiencing a nervous
breakdown.  I'm forwarding it now that things appear to be working again. -
chk]

The following proposal presents a way to support out-of-core array in HPF.
The main objectives are consistency with the "normal" HPF data mapping
directives, simplicity, and minimal extensions. Note these are objectives!


The following illustrates an example of declaring out-of-core arrays.
The only addition here is the directive OUT-OF-CORE. (Other directives
and information will follow later in this writeup)

!HPF$ TEMPLATE TEMP(100,100)
!HPF$ DISTRIBUTE TEMP(CYCLIC(B),CYCLIC(B))
!HPF$ ALIGN WITH TEMP :: A,B,C
!HPF$ OUT-OF-CORE: A (<fn>)
!HPF$ OUT_OF_CORE: B

This directive simply says that if nodes had infinite memory then
elements of the array will be in the memory of the processor as described
by the distribution directives. In other words, the directives describe
which processor's memory will an element be brought into when in-core.
This is a logical and simple extension of the HPF data mapping directives.

Note: Only arrays are declared Out-of-core and not abstract templates, as
templates only represent abstract index space.

There is a file name associated with array A within the OOC directive above.
That is user's way of specifying a file name to be associated with
an array. The properties of the file will be declared through
the open statement (described later).

For example,
!HPF$ OUT-OF-CORE: A ("HUGE3D.dat")
means that the user will open a file with the above name before
any access to A is permitted.

On the other hand, the compiler is free to choose any file name
for array B, and compiler is responsible for opening that file(s).
Also, array B CANNOT BE PERSISTENT. In other words, the file
associated with B is a scratch file.

SPECIFICATION of Properties of an OOC array. This is proposed to be done
using the OPEN statement.

The STATUS field of the OPEN statement can be used to specify the
persistence property of the associated array.
A file associated with an OOC array is a special type of file,
with which certain properties are associated (described later).
The following new values for STATUS field are added.

OLDPERSISTENT = File already exist.
NEWPERSISTENT = File needs to be created and does not already exist.
OLDSCRATCH = File exists, but can be deleted after the program finishes.
NEWSCRATCH= File needs to be created for this program, but may be deleted
            after the program finishes.

The word "TEMPORARY" may be used instead of scratch. Can be decided after
discussion.

RESTRICTIONS: 1) Data Files associated with OOC arrays are unformatted only
                 because of performance reasons.

              2) Only ACTION = READWRITE is permitted (because an array
                 can be read or written). this raises an interesting
                 possibility. If the user knows that throughout the program
                 if an array is only read or written the other two options
                 of ACTION = READ or WRITE may be permitted. But my initial
                 proposal only allows READWRITE.

             3)  If the file is OLDPERSISTENT or OLDSCRATCH, there must be
                 another file called <fn>.metadata
                 This file contain the information about the data
distributions, number
                 of data files, logical processors distribution using which the
                 data files were created, etc. Essentially, all the information
                 that is required to describe and manage distributed arrays
(and some
                 more) is required with this metadata file. Hence, files
containing
                 persistent arrays are NOT standard FORTRAN Files.
Associated with
                 the metadata, are a number of inquiry functions, that allow
                 a program/compiler to inquire about the contents and
organization
                 of data within the data file(s). TYPE OF INFORMATION REQUIRED
                 TO BE IN THE METADAT FILE IS DESCRIBED BELOW.

Assumptions and background for metadata information and number of files.

There could be a) 1 file containing an OOC array or b) number of files
containing
an OOC array.

a) In this case, there are two possibilities. i) The data is organized in global
   (or undistributed) form in canonical form or in an easily describeable form.
   That is, one can descrbe the orders of dimensions in which data is
stored in the
   global name space, just like in-memory orders. E.g., row-major,
colum-major etc.
   ii) There is one data file, but the data is stored separately (appended
one after
       another) for each processor of the creating (logical) processor array
       overwhich the "contained" persistent array was distributed. For example,
       if the array was created using a 2X2 processor grid, then the signle data
       file will have four distinct sections, one for each processor (like a map
       of the local memory of each processor for the arra) appended one
after the
       other. The metadata file will contain a description of bounds of each
       section, size etc.

b) This is like case a ii) above, except, there will be one separate file
for each processor
   participating in the creation of the array (that is the corresponding
persistent files).
   The proposed convention for the file names is as follows (note that the
processor
   grid description will come from the metadat file).

E.G.,   for a two dimensional processor array (say 2x2) that created the
persistent array,

   There will be five files.
   <fn>.metadata
   <fn>.1.1
   <fn>.1.2
   <fn>.2.1
   <fn>.2.2

So, if the orginal file name was HUGEDATA specified by the user in the OOC
directive,
and status being OLDPERSISTENT, then the system will expect

HUGEDATA.metadata
HUGEDATA.1.1
HUGEDATA.1.2
HUGEDATA.2.1
HUGEDATA.2.2

Note that each individual file may have metadata in the beginning, but in
my opinion
that may hinder optimizations because if one has the flexibility to
stripe/distribute...
organize.. the datafiles, the metadata may come in the way because of different
datatypes(within the same file) etc. having metadata separately makes much
more sense.
Also, since it will be a small file, it can be replicated (cached) on all
the nodes.



The metadata should contain the following information.

1) Size of metadata file : int
2) Single file in global space, single file in individual proc space, or
multiple files : int
3) creating processor arrangement : int[7]
4) distribution information for each dimension
5) local bounds for each dimesion, global bounds for each dimension for
each processor
6) data type (this could be record description of each element of the array)
7) order of storage (colum major, row-major ..)
8).....
9) ....

Using this, one should be able to inquire about per processor information
as well
overall information about a persistent array.


The particular names of the inquiry functions need to be developed.


****************** Questions of Particular interests and my opinion on
those: Thanks Rob**

Q. IS OOC a type parameter?
OOC is just a directive, describing potentially a very large array. Things
should work
whether data fits in memory or not in exactly the same fashion.
**

Q. Are OOC arrays arbitrarily mappable?

OOC directive does not change the meaning of any mapping directive. So I
guess, yes.
However, it makes little sense to replicate etc. One may want to put
restrictions on those types of things. However, if a user uses OOC, he/she
probably
knows the tradeoffs.
**

Q Must all the intrinsics and library routines accept OOC argument??

Hmmmmmmm...

Should in principle, but a big burden!! I dont know.
But I should be able to compute minimum, maximum etc for 3D CFD
calculations ......
May be one can allow for intrinsics which are simple (e.g., sum, min, max ...)
**

Q May an OOC array have the pointer or the TARGET attribute?

BIG Hmmmmmmmmmmmm........   We are getting into muddy waters here! Can be
very complex?
Would anyone like to use such a facility with OOC array???? Pointers on
disks (files)
and pointers in memory are different things. Something called POINTER
SWIZLING is
needed to implement this stuff (people do this in persistent objects in
C++..), but
performance is not the primary concerne there!!!!

Q May an OOC be allocatable?

Yes, I believe so. There should not be any problem with that as long as
before allocating all the other things like open etc are taken care of.

Q Dynamic attribute  for an OOC array?

needs discussion. Is it necessary?
**

Q If you tie a file to the OOC array, can it be a sequential file?
  Any file tied to an OOC array should be a file described above. It is not
  any regular fortran file. It is required to be UNFORMATTED.


---------------------------------------------------------------------------
To (un)subscribe to this list, send mail to
hpff-distribute-request@cs.rice.edu.
Leave the subject line blank, and in the body put the line
(un)subscribe <email-address>
---------------------------------------------------------------------------

From owner-hpff-distribute  Mon Oct 30 11:10:08 1995
Received: (from daemon@localhost) by cs.rice.edu (8.7.1/8.7.1) id LAA06109 for hpff-distribute-out; Mon, 30 Oct 1995 11:10:08 -0600 (CST)
Received: from cat.syr.edu (pineapple.ece.syr.EDU [128.230.41.2]) by cs.rice.edu (8.7.1/8.7.1) with SMTP id LAA06085; Mon, 30 Oct 1995 11:09:23 -0600 (CST)
Date: Mon, 30 Oct 95 12:10:50 EST
From: choudhar@cat.syr.edu (Alok Choudhary)
Received: by cat.syr.edu (4.1/1.0-6/5/90)
	id AA16095; Mon, 30 Oct 95 12:10:50 EST
Message-Id: <9510301710.AA16095@cat.syr.edu>
To: hpff-distribute@cs.rice.edu
Subject: hpff-distribute: OOC proposal
Cc: chk@cs.rice.edu, choudhar@cat.syr.edu, ken@cs.rice.edu, schreibr@riacs.edu
Sender: owner-hpff-distribute
Precedence: bulk

---------------------------------------------------------------------------
hpff-distribute@cs.rice.edu is a mailing list for discussion of data
distribution in High Performance Fortran.  Instructions for adding or
deleting yourself from this list appear at the bottom of this message.
---------------------------------------------------------------------------


I have tried sending this proposal two weeks ago. It seems it did not
reach anyone due to email alias problems for hpff-distribute.

Here is another attempt. In case you received it in the past,
please bear with me.

Alok


The following proposal presents a way to support out-of-core array in HPF.
The main objectives are consistency with the "normal" HPF data mapping
directives, simplicity, and minimal extensions.

The following illustrates an example of declaring out-of-core arrays.
The only addition here is the directive OUT-OF-CORE. (Other directives
and information will follow later in this writeup)

!HPF$ TEMPLATE TEMP(100,100)
!HPF$ DISTRIBUTE TEMP(CYCLIC(B),CYCLIC(B))
!HPF$ ALIGN WITH TEMP :: A,B,C
!HPF$ OUT-OF-CORE: A (<fn>)
!HPF$ OUT_OF_CORE: B

This directive simply says that if nodes had infinite memory then
elements of the array will be in the memory of the processor as described
by the distribution directives. In other words, the directives describe
which processor's memory will an element be brought into when in-core.
This is a logical and simple extension of the HPF data mapping directives.

Note: Only arrays are declared Out-of-core and not abstract templates, as
templates only represent abstract index space.

There is a file name associated with array A within the OOC directive above.
That is user's way of specifying a file name to be associated with
an array. The properties of the file will be declared through
the open statement (described later).

For example,
!HPF$ OUT-OF-CORE: A ("HUGE3D.dat")
means that the user will open a file with the above name before
any access to A is permitted.

On the other hand, the compiler is free to choose any file name
for array B, and compiler is responsible for opening that file(s).
Also, array B CANNOT BE PERSISTENT. In other words, the file
associated with B is a scratch file.

SPECIFICATION of Properties of an OOC array. This is proposed to be done
using the OPEN statement.

The STATUS field of the OPEN statement can be used to specify the
persistence property of the associated array.
A file associated with an OOC array is a special type of file,
with which certain properties are associated (described later).
The following new values for STATUS field are added.

OLDPERSISTENT = File already exist.
NEWPERSISTENT = File needs to be created and does not already exist.
OLDSCRATCH = File exists, but can be deleted after the program finishes.
NEWSCRATCH= File needs to be created for this program, but may be deleted
            after the program finishes.

The word "TEMPORARY" may be used instead of scratch. Can be decided after
discussion.

RESTRICTIONS: 1) Data Files associated with OOC arrays are unformatted only
                 because of performance reasons.

              2) Only ACTION = READWRITE is permitted (because an array
                 can be read or written). this raises an interesting
                 possibility. If the user knows that throughout the program
                 if an array is only read or written the other two options
                 of ACTION = READ or WRITE may be permitted. But my initial
                 proposal only allows READWRITE.

             3)  If the file is OLDPERSISTENT or OLDSCRATCH, there must be
                 another file called <fn>.metadata
                 This file contain the information about the data distributions, number
                 of data files, logical processors distribution using which the
                 data files were created, etc. Essentially, all the information
                 that is required to describe and manage distributed arrays (and some
                 more) is required with this metadata file. Hence, files containing
                 persistent arrays are NOT standard FORTRAN Files. Associated with
                 the metadata, are a number of inquiry functions, that allow
                 a program/compiler to inquire about the contents and organization
                 of data within the data file(s). TYPE OF INFORMATION REQUIRED
                 TO BE IN THE METADAT FILE IS DESCRIBED BELOW.

Assumptions and background for metadata information and number of files.

There could be a) 1 file containing an OOC array or b) number of files containing
an OOC array.

a) In this case, there are two possibilities. i) The data is organized in global
   (or undistributed) form in canonical form or in an easily describeable form.
   That is, one can descrbe the orders of dimensions in which data is stored in the
   global name space, just like in-memory orders. E.g., row-major, colum-major etc.
   ii) There is one data file, but the data is stored separately (appended one after
       another) for each processor of the creating (logical) processor array
       overwhich the "contained" persistent array was distributed. For example,
       if the array was created using a 2X2 processor grid, then the signle data
       file will have four distinct sections, one for each processor (like a map
       of the local memory of each processor for the arra) appended one after the
       other. The metadata file will contain a description of bounds of each
       section, size etc.

b) This is like case a ii) above, except, there will be one separate file for each processor
   participating in the creation of the array (that is the corresponding persistent files).
   The proposed convention for the file names is as follows (note that the processor
   grid description will come from the metadat file).

E.G.,   for a two dimensional processor array (say 2x2) that created the persistent array,

   There will be five files.
   <fn>.metadata
   <fn>.1.1
   <fn>.1.2
   <fn>.2.1
   <fn>.2.2

So, if the orginal file name was HUGEDATA specified by the user in the OOC directive,
and status being OLDPERSISTENT, then the system will expect

HUGEDATA.metadata
HUGEDATA.1.1
HUGEDATA.1.2
HUGEDATA.2.1
HUGEDATA.2.2

Note that each individual file may have metadata in the beginning, but in my opinion
that may hinder optimizations because if one has the flexibility to stripe/distribute...
organize.. the datafiles, the metadata may come in the way because of different
datatypes(within the same file) etc. having metadata separately makes much more sense.
Also, since it will be a small file, it can be replicated (cached) on all the nodes.



The metadata should contain the following information.

1) Size of metadata file : int
2) Single file in global space, single file in individual proc space, or multiple files : int
3) creating processor arrangement : int[7]
4) distribution information for each dimension
5) local bounds for each dimesion, global bounds for each dimension for each processor
6) data type (this could be record description of each element of the array)
7) order of storage (colum major, row-major ..)
8).....
9) ....

Using this, one should be able to inquire about per processor information as well
overall information about a persistent array.


The particular names of the inquiry functions need to be developed.


****************** Questions of Particular interests and my opinion on those: Thanks Rob**

Q. IS OOC a type parameter?
OOC is just a directive, describing potentially a very large array. Things should work
whether data fits in memory or not in exactly the same fashion.
**

Q. Are OOC arrays arbitrarily mappable?

OOC directive does not change the meaning of any mapping directive. So I guess, yes.
However, it makes little sense to replicate etc. One may want to put
restrictions on those types of things. However, if a user uses OOC, he/she probably
knows the tradeoffs.
**

Q Must all the intrinsics and library routines accept OOC argument??

Hmmmmmmm...

Should in principle, but a big burden!! I dont know.
But I should be able to compute minimum, maximum etc for 3D CFD calculations ......
May be one can allow for intrinsics which are simple (e.g., sum, min, max ...)
**

Q May an OOC array have the pointer or the TARGET attribute?

BIG Hmmmmmmmmmmmm........   We are getting into muddy waters here! Can be very complex?
Would anyone like to use such a facility with OOC array???? Pointers on disks (files)
and pointers in memory are different things. Something called POINTER SWIZLING is
needed to implement this stuff (people do this in persistent objects in C++..), but
performance is not the primary concerne there!!!!

Q May an OOC be allocatable?

Yes, I believe so. There should not be any problem with that as long as
before allocating all the other things like open etc are taken care of.

Q Dynamic attribute  for an OOC array?

needs discussion. Is it necessary?
**

Q If you tie a file to the OOC array, can it be a sequential file?
  Any file tied to an OOC array should be a file described above. It is not
  any regular fortran file. It is required to be UNFORMATTED.

---------------------------------------------------------------------------
To (un)subscribe to this list, send mail to
hpff-distribute-request@cs.rice.edu.
Leave the subject line blank, and in the body put the line
(un)subscribe <email-address>
---------------------------------------------------------------------------

