The new text is included below. In case you don't run this
through latex or try to read this text, I want to point out
a couple of problems here:
1) The original text says that "Etype is equivalent to a
dup of the etype used in MPI_FILE_SET_VIEW." I added some
text that says buftype is a equivalent to a dup of the
buftype passed to the read or write function, or else a
a type derived from that buftype. This allows an implementation
to create a derived type to span the user buf if it wants to,
instead of having the filter to do the tiling. The filter
shouldn't care which it does.
However, there is also text later on that says: "The filter
functions will only be passed basic datatypes employed by
the user and complex datatypes that the user has passed to
one of the functions above." I think this is intended to
say that if the program never deals with an MPI_DOUBLE, the
filter doesn't have to worry about translating one.
Unfortunately, this sentence could also be interpreted as
meaning that MPI only passes in the exact datatypes that
the user uses -- no dups, no derived types. Is this
interpretation intended? If so, we need to drop the wording
about dups and derived types. If not, we should change
this sentence.
2) I put a discussion item in the text about error codes.
I haven't heard any feedback on it, so I'd like to raise
the question again. Here it is:
What value should a filter return on error? Should it
be predefined, or should the filter allocate error codes as
needed? It would be nice if the filter knew enough not to
allocate a new error code each time it encountered an error,
but where can it cache exisiting error codes? In extra_state?
Then presumably it needs to do locking. Alternatively, the
implementor of a filter could allocate all the necessary error
codes in advance and store them in extra_state for all the filters
to use. The user would then have to call the filter initialization
code. We should offer some guidance here.
Comments?
John
\subsection{User Defined Data Representations}
\begchangeapr
\status{one vote}
\endchangeapr
There are two situations that cannot be handled by the required
representations:
\begin{enumerate}
\item a user wants to write a file in a representation unknown
to the implementation, and
\item a user wants to read a file written in a representation unknown
to the implementation.
\end{enumerate}
User defined data representations allow the user to insert a third
party filter into the I/O stream to do the data representation conversion.
\begchangeapr
% added extra_state argument
\begin{funcdef}{MPI\_REGISTER\_FILTER(datarep, read\_filter\_fn, write\_filter\_fn, etype\_file\_extent\_fn, extra\_state)}
\funcarg{\IN}{datarep}{data representation identifier (string)}
\funcarg{\IN}{read\_filter\_fn}{function invoked to convert from file representation to native representation (function)}
\funcarg{\IN}{write\_filter\_fn}{function invoked to convert from native representation to file representation (function)}
\funcarg{\IN}{etype\_file\_extent\_fn}{function invoked to get the extent of a datatype as represented in the file (function)}
\funcarg{\IN}{extra\_state}{extra state}
\end{funcdef}
\mpibind{MPI\_Register\_filter(char~*datarep, MPI\_Filter\_function~*read\_filter\_fn, MPI\_Filter\_function~*write\_filter\_fn, MPI\_Filter\_extent\_function~*etype\_file\_extent\_fn, void~*extra\_state)}
\mpifbind{MPI\_REGISTER\_FILTER(DATAREP, READ\_FILTER\_FN, WRITE\_FILTER\_FN, ETYPE\_FILE\_EXTENT\_FN, EXTRA\_STATE, IERROR) \fargs CHARACTER*(*) DATAREP \\ EXTERNAL READ\_FILTER\_FN, WRITE\_FILTER\_FN, ETYPE\_FILE\_EXTENT\_FN \\ INTEGER(KIND=MPI\_ADDRESS\_KIND) EXTRA\_STATE \\ INTEGER IERROR}
\mpicppbind{MPI::Register\_filter(const~char*~datarep, MPI::Filter\_function\&~read\_filter\_fn, MPI::Filter\_function\&~write\_filter\_fn, MPI::Filter\_extent\_function\&~etype\_file\_extent\_fn, void*~extra\_state)}
\endchangeapr
The call associates \mpiarg{read\_filter\_fn}, \mpiarg{write\_filter\_fn}, and
\mpiarg{etype\_file\_extent\_fn}
with the data representation identifier \mpiarg{datarep}.
\mpiarg{Datarep} can then be used as an argument
to \func{MPI\_FILE\_SET\_VIEW}, causing
subsequent data access operations to call the filter functions
to convert all data items accessed between file data representation
and native representation.
If \mpiarg{datarep} is already defined,
an error in the error class \error{MPI\_ERR\_DUP\_DATAREP}
\begchangeapr
is raised on \const{MPI\_COMM\_WORLD}.
\endchangeapr
\begchangeapr
% added extra_state
\mpitypedefbind{etype\_file\_extent\_fn(MPI\_Datatype~datatype, MPI\_Aint~*file\_extent, void~*extra\_state)}
\endchangeapr
The function \mpifunc{etype\_file\_extent\_fn} returns in
\mpiarg{file\_extent} the number of bytes required to
store \mpiarg{datatype} in the file representation.
\begchangeapr
% added extra_state
\mpitypedefbind{read\_filter\_fn(void~*userbuf, MPI\_Datatype~datatype, MPI\_Datatype~etype, int~count, void~*filebuf, int~position, void~*extra\_state)}
\endchangeapr
The function \mpifunc{read\_filter\_fn} must convert from
file data representation to native representation.
Before calling this routine,
\MPI/ allocates and fills \mpiarg{filebuf} with \mpiarg{count}
contiguous data items of type \mpiarg{etype}.
\begchangeapr
The function is passed, in \mpiarg{extra\_state},
the argument that was passed to the \mpifunc{MPI\_REGISTER\_HANDLER} call.
\endchangeapr
The function must copy all \mpiarg{count} data items from \mpiarg{filebuf}
to \mpiarg{userbuf} in the distribution described by \mpiarg{datatype},
converting each data item
from file representation to native representation.
\begchangeapr
If the size of \mpiarg{datatype} is less than the size
of \mpiarg{count} etypes, the filter must treat
\mpiarg{datatype} as being contiguously tiled over the
\mpiarg{userbuf}.
The filter must
begin storing converted data at the location in \mpiarg{userbuf} specified by
\mpiarg{position} etypes into the (tiled) \mpiarg{datatype}.
Specifically, if the
\mpiarg{etype} contains $n$ basic types, then the filter should store
the first converted datum the location in \mpiarg{userbuf} given by element
$(n \times \mbox{\mpiarg{position}})$ of the typemap of $m$ contiguous
copies of \mpiarg{datatype}, where $m \times \mbox{size(\mpiarg{datatype})}
\geq \mbox{\mpiarg{count}} \times \mbox{size(\mpiarg{etype})}$.
In order to accomplish the conversion,
\endchangeapr
the function is allowed to write to \mpiarg{filebuf}, if desired.
\mpiarg{Etype} is equivalent to a dup of the etype used
in \func{MPI\_FILE\_SET\_VIEW}.
\begchangeapr
\mpiarg{Datatype} may be equivalent to a dup of the buftype
that the user passed to the read or write function, or it may be a type that
\MPI/ has derived from the buftype for its internal use.
\endchangeapr
\begin{implementors}
For example,
a filtered read operation could be implemented as follows:
\begin{enumerate}
\item Get etype file extent
\item Allocate a filebuf large enough to hold count etypes
\item Read data from file into filebuf
\item Call \mpifunc{read\_filter\_fn} to convert data and place it into userbuf
\item Deallocate filebuf
\end{enumerate}
\end{implementors}
\begchangeapr
If \MPI/ cannot allocate a buffer large enough to hold
all the data to be converted from a read operation, it may
call the filter function repeatedly using the same \mpiarg{datatype}
and \mpiarg{userbuf}, and storing successive chunks of data to be
converted in \mpiarg{filebuf}. For the first call (and in
the case when all the data to be converted fits into
\mpiarg{filebuf}), \MPI/ will call the function with
\mpiarg{position} set to zero. Data converted during this
call will be stored in the \mpiarg{userbuf} according to
the first \mpiarg{count} etypes in \mpiarg{datatype}. Then
in subsequent calls to the filter function, \MPI/ will
increment the value in \mpiarg{position} by the \mpiarg{count} of
items converted in the previous call.
\begin{rationale}
Passing the filter a position and the datatype for the transfer
allows the filter to decode the datatype only once and
cache an internal representation of it on the datatype.
Then on subsequent calls, the filter can use the \mpiarg{position}
to quickly find its place in the datatype and continue the
storing converted data where it left off at the end of the previous call.
\end{rationale}
\begin{users}
Although the filter function may usefully cache an
interal representation on the datatype, it should not cache
any state information specific to an ongoing conversion
operation, since it is possible for the same datatype to
be used concurrently in multiple conversion operations.
\end{users}
\endchangeapr
\begchangeapr
% added extra_state
\mpitypedefbind{write\_filter\_fn(void~*userbuf, MPI\_Datatype~datatype, MPI\_Datatype~etype, int~count, void~*filebuf, int~position, void~*extra\_state)}
\endchangeapr
The function \mpifunc{write\_filter\_fn} must convert from
native representation to file data representation.
Before calling this routine,
\MPI/ allocates \mpiarg{filebuf} of a size large enough to hold
\mpiarg{count} contiguous data items of type \mpiarg{etype}.
\begchangeapr
The function must copy \mpiarg{count} data items from
\endchangeapr
\mpiarg{userbuf} in the distribution described by \mpiarg{datatype},
to a contiguous distribution in \mpiarg{filebuf}, converting each data item
from native representation to file representation.
\begchangeapr
If the size of \mpiarg{datatype} is less than the size
of \mpiarg{count} etypes, the filter must treat
\mpiarg{datatype} as being contiguously tiled over the
\mpiarg{userbuf}.
The function must
begin copying at the location in \mpiarg{userbuf} specified by
\mpiarg{position} etypes into the (tiled) \mpiarg{datatype}.
\endchangeapr
\mpiarg{Etype} is equivalent to a dup of the etype used
in \func{MPI\_FILE\_SET\_VIEW}.
\begchangeapr
\mpiarg{Datatype} may be equivalent to a dup of the buftype
that the user passed to the read or write function, or it may be a type that
\MPI/ has derived from the buftype for its internal use.
The function is passed, in \mpiarg{extra\_state},
the argument that was passed to the \mpifunc{MPI\_REGISTER\_HANDLER} call.
The predefined constant \const{MPI\_FILTER\_FN\_NULL} may be used
as either
% fixed typo in next line: change If to In. jmm
\mpifunc{write\_filter\_fn} or \mpifunc{read\_filter\_fn}. In that
case, \MPI/ will
not attempt to invoke \mpifunc{write\_filter\_fn} or
\mpifunc{read\_filter\_fn}, respectively, but will perform
the requested data access using ``native'' mode.
\endchangeapr
An \MPI/ implementation must ensure that all data accessed is converted,
\begchangeapr
either by using a filebuf large enough to all the requested
etypes or else by making repeated calls to the filter function
with the same \mpiarg{datatype} argument and appropriate values for
\mpiarg{position}.
An implementation will only invoke the filter functions when
one of the read or write routines in section \ref{sec:io-access},
MPI\_Pack, or MPI\_Unpack is called by the user.
The filter functions will only be passed basic datatypes employed
by the user
and complex datatypes that the user has passed to one of the
functions noted above.
\endchangeapr
The filter functions must be re-entrant (thread-safe).
Further, it is erroneous for the filter functions to
call any collective routines, or free \mpiarg{datatype}
or \mpiarg{etype}.
\begchangeapr
The filter functions should return an error code. If the
returned error
code has a value other than \const{MPI\_SUCCESS}, the
implementation will raise an error.
% Discussion added by jmm
\discuss{What value should a filter return on error? Should it
be predefined, or should the filter allocate error codes as
needed? It would be nice if the filter knew enough not to
allocate a new error code each time it encountered an error,
but where can it cache exisiting error codes? In \mpiarg{extra\_state}?
Then presumably it needs to do locking. Alternatively, the
implementor of a filter could allocate all the necessary error
codes in advance and store them in \mpiarg{extra\_state} for all the filters
to use. The user would then have to call the filter initialization
code. We should offer some guidance here.}
\endchangeapr
It is expected that a fully functional set of data filters will be difficult
to implement from scratch. However, a high quality implementation will
supply template source for at least one such filter
which the user could then easily
tailor to the native and file representations they desire.
\func{MPI\_REGISTER\_FILTER} is a local operation and only registers the
data representation filters for the calling \MPI/ process.
\begchangeapr
\missing{Modify GET\_FILE\_TYPE\_EXTENT}
\endchangeapr
\begchangeapr
\begin{funcdef}{MPI\_GET\_REGISTERED\_FILTER(datarep, read\_filter\_fn, write\_filter\_fn, etype\_file\_extent\_fn, extra\_state, flag)}
\funcarg{\IN}{datarep}{data representation identifier (string)}
\funcarg{\OUT}{read\_filter\_fn}{function invoked to convert from file representation to native representation (function)}
\funcarg{\OUT}{write\_filter\_fn}{function invoked to convert from native representation to file representation (function)}
\funcarg{\OUT}{etype\_file\_extent\_fn}{function invoked to get the extent of a datatype as represented in the file (function)}
\funcarg{\OUT}{extra\_state}{extra state}
\funcarg{\OUT}{flag}{(boolean)}
\end{funcdef}
\mpibind{MPI\_Register\_filter(char~*datarep, MPI\_Filter\_function~*read\_filter\_fn, MPI\_Filter\_function~*write\_filter\_fn, MPI\_Filter\_extent\_function~*etype\_file\_extent\_fn, void~*extra\_state, int~flag)}
\mpifbind{MPI\_REGISTER\_FILTER(DATAREP, READ\_FILTER\_FN, WRITE\_FILTER\_FN, ETYPE\_FILE\_EXTENT\_FN, EXTRA\_STATE, FLAG, IERROR) \fargs CHARACTER*(*) DATAREP \\ EXTERNAL READ\_FILTER\_FN, WRITE\_FILTER\_FN, ETYPE\_FILE\_EXTENT\_FN \\ INTEGER(KIND=MPI\_ADDRESS\_KIND) EXTRA\_STATE \\ LOGICAL FLAG \\ INTEGER IERROR}
\mpicppbind{MPI::Register\_filter(const~char*~datarep, MPI::Filter\_function\&~read\_filter\_fn, MPI::Filter\_function\&~write\_filter\_fn, MPI::Filter\_extent\_function\&~etype\_file\_extent\_fn, void*~extra\_state, int*~flag)}
\func{MPI\_GET\_REGISTERED\_FILTER} returns the filter functions for
a user-defined data representation filter.
If a matching \func{MPI\_REGISTER\_FILTER} call was previously made
(a call with the same \mpiarg{datarep}),
then \mpiarg{flag} is set to true,
and \mpiarg{read\_filter\_fn}, \mpiarg{write\_filter\_fn},
\mpiarg{etype\_file\_extent\_fn}, and \mpiarg{extra\_state},
are set to the values passed in to the matching \func{MPI\_REGISTER\_FILTER}.
If no previous matching call was made to \func{MPI\_REGISTER\_FILTER},
then \mpiarg{flag} is set to false, and the other arguments are unchanged.
Implementation defined data representation filters,
such as ``internal'',
may not be accessible via \func{MPI\_GET\_REGISTERED\_FILTER}.
\subsection{Matching Data Representations}
%-----------------------------------------
It is the user's responsibility to ensure that the
data representation used to read data from a file is {\em compatible} with
the data representation which was used to write that data to the file.
In general, using the same data representation name when writing