Seems like a relatively harmless addition to the standard, since we can just
throw a loop around current MPI-1 calls, but I don't yet understand the need to
optimize MPI_COMM_DUP - rare is the code that uses anything more sophisticated
than MPI_COMM_WORLD, in my experience. Could you provide a concrete example of
a code (or class of code) for which creating all of the communicators up front
is unacceptable? Even in the (highly theoretical) case where you're creating
and destroying lots of threads on the fly, shouldn't a high-quality application
just manage a small pool of communicators for the best performance?
-- Eric Salo Silicon Graphics Inc. "Do you know what the (415)933-2998 2011 N. Shoreline Blvd, 8U-802 last Xon said, just salo@sgi.com Mountain View, CA 94043-1389 before he died?"