Arkady Kanevsky email: arkady@...
Network Appliance Inc. phone: 781-768-5395
Waltham, MA 02451 central phone: 781-768-5300
From: Caitlin Bestler [mailto:caitlinb@...]
Sent: Thursday, April 05, 2007 5:17 PM
To: dat-discussions@yahoogroups.com
Subject: RE: [dat-discussions] Multicast RDMA proposalDAT is probably not the correct forum to discuss this, since I believe
the implications of multicast RDMA would be neutral to the API. A
reliable multicast session with one producer and multiple consumers
looks amazingly like a reliable point-to-point connection.
The protocol implications would have to be discussed in the IETF
and IBTA. But there are some tricky ones to be considered:
1) When will Send/RDMA Write operations complete? If the consumers
are fully enumerated and unchanging, which is acceptable for MPI,
then the sender merely has to merge all of the ACKs. Tricky, but
doable, and inherent to any reliable multicast lower layper protocol.
2) But trickier than that: how would the producer get more send credits?
RDMA Send Credits are ULP activities that are granted by posting a
receive buffer (InfiniBand) or send a ULP message (iWARP).
Determining
that you have message-level send credits would be very tricky.
Because
RDMA does not rely on transport layer buffering, it is very important
that buffer availability be explicitly advertised. Any proposal for
multicast RDMA might have to come up with a mechanism to pace the
consumption of anonymous buffers. Receivers that have fallen behind
would have to explicitly NACK. Collecting explicit acks for each
DDP Segment (or IB packet) from each multicast receiver does not
sound very feasible.
3) Theoretically a tagged buffer could have multicast meaning, if the
RKey is given the same meaning on each recepient. This is very easy
to mandate, but difficult to implement using existing RDMA
interfaces.
It probably requires an interface to request a specific RKey that is
only valid on a specific multicast session/connection.
Have you evaluated using RDMA Read to have each receiver fetch the
data as needed and using multicast Unreliable Datagrams / UDP to
notify the receivers of the availability of new data?