Re: Defining EOF

(no name) (Jean-Pierre Prost/Watson/IBM Research@nas.nasa.gov)
29 Aug 96 13:21:29

John,

I like your approach, and I think it integrates pretty well with the eventual
bounded version of views, that I am in favor of.
We should add this topic for discussion at the I/O subcommittee next week.
Jean-Pierre

----------Forwarding Original Note --------
To: mpi-io @ mcs.anl.gov @ GW2
cc:
From: johnmay @ coral.llnl.gov (John M May) @ GW2
Date: 08/27/96 05:00:40 PM
Subject: Defining EOF
Security:

I think the trouble in defining EOF in MPI-IO comes from its dual use.
An application might want to know where the last physical byte
in the file lies, and it might also want to seek from the logical end
of the file. The solution is to define separate objects. We can define
EOF, as Jean-Pierre suggests, to be the next physical byte location
after the last byte written in the file. It's the same for all nodes.
An application might to use this value to position a new view right after
the last existing data in the file without leaving any holes. A second
object, which I'll call EOV, for "end of view," is useful for seeks.
Seeks apply to file offsets (in etypes) and not byte displacements, so
EOV marks just past the logical last etype accessible from a given view
(on a given node). Formally, I would define EOV as the offset of the
next etype after the last etype that falls completely below EOF, where
"next" and "last" are defined by the order of access given by the file
type. For example, suppose a node has a filetype whose etype is two
bytes wide and which defines access as follows --aa--bb--cc (access in
alphabetical order, "--" means a two-byte hole, + marks EOV, ^ marks EOF,
... means repetition). (Remember, we're looking at filetypes here, not
the actual data in the file.)

Consider the following placements:

+EOV
--aa--bb--cc--aa--bb--cc...
^EOF

EOV is at aa, the first accessible etype after the last etype written (cc).

+EOV
--aa--bb--cc...
^EOF

The file is empty; the first access will be at aa.

+EOV
--aa--bb--cc...
^EOF

EOF falls in the middle of an etype; since cc doesn't fall completely
below EOF, EOV is at cc, one etype after the last complete etype (bb).
Note that a program in this situation is possibly erroneous, since it
is accessing data with a view that doesn't align to the data in the
file. Nevertheless, the behavior here is well-defined.

Now consider a file type that accesses data in non-increasing order:

+EOV
--cc--bb--aa...
^EOF

The file is empty. The first access will be at aa.

+EOV
--cc--bb--aa--cc--bb--aa--cc--bb--aa--cc--bb--aa...
^EOF

Data filling 3 complete repetitions of the file type has been written.
The last logical etype is the cc to the left of EOF. The next
accessible etype is the aa marked by EOV. Seeking to EOV starts
you on the next filetype, just as you'd expect.

I realize these examples are complicated; they're showing how this
idea works in oddball cases. The easy cases don't cause any problems.
In particular, when the file is opened and the view is just bytes,
EOV = EOF.

Also, if views end up being bounded by N repetitions of the file
type, then EOV is easy to define. Rather than being related to
EOF, it's just the offset of the first (logical) etype of the
N+1st instance of filetype.

John