thoughts on info/hints

William C. Saphir (wcs@nas.nasa.gov)
Thu, 20 Jun 1996 22:38:57 -0700

Here are some thoughts on the info/hints proposals that are floating
around, for the dynamic and I/O chapters. Some of this is the result
of some conversations with MPI I/O folks, and some of it is my own
opinions. The dynamic chapter I'll send out later tonight will have
what we talked about for the dynamic chapter at the last meeting. For
the next one, due in a couple of weeks, we can put in some
alternatives, if some clear choices start to emerge.

Seems that people generally agree on the following:
1. info in the dynamic chapter and hints in the i/o chapter should be merged
2. the content of info/hints should be key/value pairs.

There are at least two related issues:
1. can a similar mechanism be applied to the command line as well?
2. what is the best way to deal with long strings in Fortran?

Some comments on merging info and hints:
1. for the "key", info uses a string, hints uses an integer
- Integer may be a good choice if there are places where extremely
fast parsing of info is necessary. I am not aware of any cases where
info is used in a performance-critical routine.
- A string seems to be a better choice for writing portable
programs. For instance, if the IBM implementation had a #defined
integer key IBM_KEY_ADAPTER_MODE, you'd get a compile-time error on an
SGI. On the other hand, "ibm_key_adapter_mode" can be ignored by an
SGI implementation. There are second order problems even with a
string value, but they're not as bad.

2. for the "value", info uses a string, I/O basically uses
a void* (with no copying) but is set up with a union in
C so that you can pass a (non-pointer) integer value directly.
In fortran the value is an integer*8. I think what was
meant was actually a choice argument in Fortran.
The idea is that the implementation knows from the key
what type the value is. This might even allow
character strings in Fortran.

There are couple of other proposals (described below). For the
moment I'll try to argue that the hints int-void*/integer*8 idea
won't work.

- consider keys that are not reserved by MPI. For instance, IBM MPI
might define a key "foo" for which it expects an integer value. SGI
MPI might define a key "foo" for which it expects a pointer value. If
someone originally wrote his/her program on an IBM, you'd like the SGI
implementation to be able to detect a bogus value and give a warning,
but what it will probably do is first dereference the int that was
passed for the IBM implementation and dump core instead. Could be
classified as user error, though info/hints should probably not be
able to easily cause a program to dump core. This particular example
is one of many similar ones, from which I (and Bill, I think)
concluded that with a void*-based interface you would want to
segregate the key name space, e.g. SGI_XXX for keys understood by an
SGI implementation and IBM_XXX for keys understood by an IBM
implementation, with this segregation required and enforced by
MPI. Not a good solution, in my view. The alternative is to have
types that can be determined by an implementation.
- as currently specified, the routine does not copy data - just
pointers. Cleaning up when the structure is deallocated is
then the user's responsibility and could be messy. Also,
it invites problems when data is allocated on the stack:

make_hints(MPI_Hints *hints);
{
MPI_Hints hints;
struct mystruct foo;
int hintname=HINT_MYSTRUCT;
/* fill in structure */
MPI_HINTS_CREATE(1, &hintname, &foo, hints);
}
=====================================
If I have understood the mail correctly, there are
three proposals for info in the dynamic chapter. These
are:

1. The main proposal (discussed at last meeting) based on
MPI_INFO_CREATE(OUT info)
MPI_INFO_SET(INOUT info, IN key, IN value)
MPI_INFO_GET(IN info, IN key, IN valuelen, OUT value, OUT flag)
MPI_INFO_DUP(IN info, OUT newinfo)
MPI_INFO_FREE(INOUT info)

"value" is always a string

2. A proposal from Rolf
MPI_INFO_CREATE(OUT info)
MPI_INFO_KEY_STRING(INOUT info, IN key, IN string)
MPI_INFO_KEY_INT(INOUT info, IN key, IN integer_value)
MPI_INFO_KEY_FLOAT(INOUT info, IN key, IN float_value)
MPI_INFO_DUP(IN info, OUT newinfo)
MPI_INFO_FREE(INOUT info)

"value may be a string, int or float depending on the call

3. A proposal from Eric/Raja
MPI_INFO_CREATE(OUT info)
MPI_INFO_APPEND(INOUT info, IN string)
MPI_INFO_DUP(IN info, OUT newinfo)
MPI_INFO_FREE(INOUT info)

"string" is a string of the form "key=value". Parsing
does not require "%" quoting of the type in the command-line
proposal, though some work needs to be done to sort out
"key=value" from "key = value".

---

Rolf and Raja have suggested that their proposals could be used for
the command line as well. I'm not sure what is proposed. For Rolf's ,
it would seem to require special keys "arg1", "arg2", arg3", etc. For
Raja's perhaps the idea is that for command lines, you don't pass
key=value pairs, but pass regular strings instead .

While I think it would be nice to be able to handle command lines with
the same mechanism, both of these seem contrived and not too elegant
to me. I may have misunderstood something so Raja and Rolf, please
clarify it necessary.

My current thinking is that if command_line needs to be changed, we
should go back to argv, which was originally rejected by the
committee. The new twist is that I do not think we considered doing
what is done in PVM, which is to separate the command itself (a single
string) from the argument list. In most cases the arg list would be
NULL or MPI_FNULL. I've presented in an earlier note an example that
shows even the 2-d array of strings that would be required for
SPAWN_MULTIPLE may not be too bad in Fortran.

----

Back to the three info proposals. While it is not perfect, I currently
lean towards #1 (the current proposal). Unless the third one has a
benefit for command_line that I haven't yet seen, the first is more
straightforward than the third. As for the second, I do not see how
to draw the line at string/int/float. The most important type (based
on keys/hints that are currently defined in the document) is probably
an "array of strings" (e.g., a hostlist) - this is not included. There
seems to be little call for int and float. What do we do in Fortran?
(2 more functions?) Why not include double/double_precision, and all
the other MPI basic types. In fact, the logical extension is that we
should use an arbitrary MPI datatype and a choice argument. To me this
is way overkill, very difficult in the simple cases (for instance,
there is no basic MPI "string" datatype, so users would have to
construct it), and using a mechanism that was designed for transfer of
data on a heterogeneous computer. Crawling back up the slippery slope,
we again reach int/float/string, and I am again unsatisfied.

In the end, it is not clear to me that there are any strong advantages
over pure string values. The main argument would be that in Fortran it
might be difficult to create those values (e.g. representing an
integer). Does character I/O solve most of that problem? In C,
sprintf() makes it pretty simple.

The deletion of "value_get" from the list of routines acknowledges
the fact that the interface would be hairy. I think this tries
to sweep under the rug a real problem. An implementation will
internally have to have such functionality in any case - it
will just be hidden from the user. Also, I disagree that
it is not necessary. We've found that users often have reasons
for wanting to deconstruct opaque objects. When the structure
is so clear (really being part of the definition of the object),
why hide it? The reason to hide it is that it would be ugly
to get the information. This tension is a red flag for me.

That's it for tonights ravings. Comments welcome.

Bill

ps. I'll be on vacation for the next 10 days, so if I don't
respond, I'm not ignoring you.