File Size Consistency Semantics

Bill Nitzberg (nitzberg@nas.nasa.gov)
Thu, 13 Mar 1997 20:33:53 -0800

We may need to re-examine file size consistency semantics.

First, we need to agree on whether the following two assumptions
are true (we may want to add text to READ_EXPLICIT):

Given: READ_EXPLICIT(fh, offset, buf, 1, BYTE, &status) ;
GET_ELEMENTS(status, BYTE, &count) ;

A1. if (offset < file size) then count is 1
A2. if (offset >= file size) then count is 0

Both of these assumptions *depend* on the value of file size,
which is currently ill defined in the draft.

Consider the following example:

Assume we have two processes which open an empty file:
Process 0 using filetype "--XX" writes 2 bytes at offset 0
Process 1 using filetype "XX--" reads 4 bytes at offset 0

Sample pseudocode:

process0()
{
char buf[10] = "cd" ;
Datatype ftype = { (2, BYTE), (3, BYTE) } /* "--XX" */;

fh = OPEN("foo", ...) ;
if (want_atomicity) {
SET_ATOMICITY(fh, 1) ;
}
FILE_SET_VIEW(fh, 0, BYTE, ftype, "native", INFO_NULL) ;
WRITE_EXPLICIT(fh, 0, &buf[0], 2, BYTE, &status) ;

...
}

process1()
{
char buf[10];
Datatype ftype = { (0, BYTE), (1, BYTE) } /* "XX--" */;

fh = OPEN("foo", ...) ;
if (want_atomicity) {
SET_ATOMICITY(fh, 1) ;
}
FILE_SET_VIEW(fh, 0, BYTE, ftype, "native", INFO_NULL) ;
READ_EXPLICIT(fh, 0, &buf[0], 4, BYTE, &status) ;

GET_ELEMENTS(status, BYTE, &count) ;

/* What should the value of "count" be ? */
}

The current draft says almost *nothing* regarding the returned value
for "count" on process 1.

Legal values according to the draft are: 0, 1, and 2.

The problem is that the standard does not specify *when* nor *how*
the file size is updated.

Note that even setting "want_atomic = TRUE" doesn't help.
SET_ATOMICITY only guarantees that a data location which is both
read and written behaves atomically---SET_ATOMICITY says nothing
about the file size. And, in fact, doing a FILE_SYNC doesn't
help either!---FILE_SYNC also says nothing about the file size,
just the file data.

One way to define tighter semantics is to add something like the
following to the "10.6.1 File Consistency" section (feel free to
propose text implementing alternate solutions):

The file size is maintained as if it is a separate storage location
in the file, governed by the consistency semantics for data
accesses. For the purpose of consistency semantics, the following
are considered writes to this storage location: calls to
MPI_FILE_SET_SIZE and MPI_FILE_PREALLOCATE, and writes which store
data beyond the current file size.

Advice to implementors: An implementation is not required to use
an actual storage location to store the file size as long as the
semantics are correctly implemented.

The above paragraph allows a user to use FILE_SYNC and SET_ATOMICITY
to enforce desired semantics. In the above example, if the user set
"want_atomic = TRUE", then valid values for "count" would only
be 0 and 2.

Sorry for the long note,

- bill