Skip to search.

Breaking News Visit Yahoo! News for the latest.

×Close this window

dat-discussions · DAT Collaborative

The Yahoo! Groups Product Blog

Check it out!

Group Information

  • Members: 181
  • Category: Protocols
  • Founded: Jul 20, 2001
  • Language: English
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Message search is now enhanced, find messages faster. Take it for a spin.

Messages

Advanced
Messages Help
Messages 2814 - 2843 of 4166   Oldest  |  < Older  |  Newer >  |  Newest
Messages: Show Message Summaries Sort by Date ^  
#2814 From: "Kanevsky, Arkady" <arkady@...>
Date: Thu Apr 1, 2004 12:30 pm
Subject: iWARP CM progress
arkadynetappcom
Send Email Send Email
 
Just bringing to your attention that MPA draft spec - iWARP CM is
leaning toward a bit in protocol to support accept/reject separation.
Having this in MPA spec will greatly help in having a common API to
support IB and iWARP.

Feel free to support it on IETF RDDP email reflector.
Thanks,
Arkady

http://www.ietf.org/mail-archive/working-groups/rddp/current/msg01461.ht
ml



Arkady Kanevsky                                email: arkady@...
Network Appliance                              phone: 781-768-5395
375 Totten Pond Rd.                            Fax: 781-895-1195
3rd Floor                                      http: www.netapp.com
Waltham, MA 02451-2010                         general phone:
781-768-5300

#2815 From: dat-discussions@yahoogroups.com
Date: Fri Apr 2, 2004 1:00 pm
Subject: New file uploaded to dat-discussions
dat-discussions@yahoogroups.com
Send Email Send Email
 
Hello,

This email message is a notification to let you know that
a file has been uploaded to the Files area of the dat-discussions
group.

   File        : /Minutes/DAT-MIN-03_31_04.pdf
   Uploaded by : arkadynetappcom <arkady@...>
   Description : minutes of 3/31/04 conf call

You can access this file at the URL

http://groups.yahoo.com/group/dat-discussions/files/Minutes/DAT-MIN-03_31_04.pdf

To learn more about file sharing for your group, please visit

http://help.yahoo.com/help/us/groups/files

Regards,

arkadynetappcom <arkady@...>

#2816 From: "Talpey, Thomas" <Thomas.Talpey@...>
Date: Fri Apr 2, 2004 1:29 pm
Subject: RE: Proposed change for dat_lmr_sync_rdma_read|write descriptions
tmtymailu
Send Email Send Email
 

I think we all agree that "correct" text would be dangerous and
probably more confusing than what we have. Adding general
discussion to the Model Implication is goodness, though.

My comments on the text:

At 08:39 AM 3/30/2004, Kanevsky, Arkady wrote:
>Based on this discussion I suggest we do not change the description of
>the two sync operations at all.
>Instead we should add a paragraph to the Model Implication subsection of
>each sync operation.
>
>Do we have an agreement on the texts?
>
>For dat_lmr_sync_rdma_read:
>Without this call the Provider may have no method of knowing that the
>buffer has been updated and that any non-coherent cache of it may have
>out of date contents without getting Provider involved per HCA handling
>of remote RDMA Read operation.

I suggest the following tweak (btw, don't say "HCA"):

This call ensures that the Provider receives a coherent view of the buffer
contents upon a subsequent remote RDMA Read* operation. After the call
completes, the Consumer may be assured that all platform-specific buffer
and cache updates have been performed, and that the LMR range has
consistency with the Provider hardware. Any subsequent write* by the
Consumer will void this consistency, the Provider is not required to detect
such access.

>
>For platforms that have:
>a. I/O noncoherent cache will invalidate the I/O cache before RDMA read.
>b. CPU noncoherent cache will flush the CPU cache before RDMA read.

Ok. But just stop here and drop c), it's redundant.

>c. I/O AND CPU noncoherent caches will have to do both, i.e. invalidate
>the I/O cache and flush the CPU cache before RDMA read.

Similar change for RDMA Write, change polarity at the two *s above.

Tom.

>
>For dat_lmr_sync_rdma_write:
>Typically the ULP has dictated that RDMA Writes are targeted to a
>specific portion of an LMR, for example matching an RMR bound for a
>single transaction. Without this information the Provider would have to
>flush all remotely accessible memory, not just the memory intended for
>this single transaction, or having a much closer relationship between
>HCA driver and DAT Provider.
>
>For platforms that have:
>a. I/O noncoherent cache will flush the I/O cache after RDMA write.
>b. CPU noncoherent cache will invalidate the CPU cache after RDMA write.
>c. I/O AND CPU noncoherent caches will have to do both, i.e. flush  the
>I/O cache and invalidate the CPU cache after RDMA write.
>
>
>Arkady Kanevsky                                email: arkady@...
>Network Appliance                              phone: 781-768-5395
>375 Totten Pond Rd.                            Fax: 781-895-1195
>3rd Floor                                      http: www.netapp.com
>Waltham, MA 02451-2010                         general phone:
>781-768-5300
>
>
>> -----Original Message-----
>> From: Sherman Pun [mailto:sherman.pun@...]
>> Sent: Friday, March 26, 2004 10:34 PM
>> To: Talpey, Thomas
>> Cc: dat-discussions@yahoogroups.com
>> Subject: Re: [dat-discussions] Proposed change for
>> dat_lmr_sync_rdma_read|write descriptions
>>
>>
>> > X-eGroups-Return:
>> sentto-3663869-2805-1080303514-sherman.pun=sun.com@...
>> ups.yahoo.com
>> > X-Sender: Thomas.Talpey@...
>> > X-Apparently-To: dat-discussions@yahoogroups.com
>> > content-class: urn:content-classes:message
>> > X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0
>> > X-MS-Has-Attach:
>> > X-MS-TNEF-Correlator:
>> > Thread-Topic: [dat-discussions] Proposed change for 
>> dat_lmr_sync_rdma_read|write descriptions
>> > Thread-Index: AcQTLHLuYyE97W9oQzSlF06qfh4nWA==
>> > To: <dat-discussions@yahoogroups.com>
>> > X-eGroups-Remote-IP: 198.95.226.53
>> > From: "Talpey, Thomas" <Thomas.Talpey@...>
>> > X-Yahoo-Profile: tmtymailu
>> > MIME-Version: 1.0
>> > Mailing-List: list dat-discussions@yahoogroups.com; contact
>> dat-discussions-owner@yahoogroups.com
>> > Delivered-To: mailing list dat-discussions@yahoogroups.com
>> > List-Unsubscribe:
>> <mailto:dat-discussions-unsubscribe@yahoogroups.com>
>> > Date: Fri, 26 Mar 2004 04:18:24 -0800
>> > Subject: Re: [dat-discussions] Proposed change for
>> dat_lmr_sync_rdma_read|write descriptions
>> >
>> > At 10:43 PM 3/25/2004, Sherman Pun wrote:
>> > >Direction is reverse. Think I/O cache and the remote side
>> has to come
>> > >through the I/O cache before seeing the local memory.
>> >
>> > Huh? What if the noncoherent cache is the one attached to
>> the CPU? If
>> > the peer is doing a read, we need to push from cache, not discard.
>>
>> Looks like the actual implementation will be platform
>> specific. Platforms that have: a. I/O noncoherent cache will
>> invalidate the I/O cache before RDMA read. b. CPU noncoherent
>> cache will flush the CPU cache before RDMA read. c. I/O AND
>> CPU noncoherent caches will have to do both, i.e. invalidate
>>    the I/O cache and flush the CPU cache.
>>
>> ---- Sherman Pun
>>      Sun Microsystems
>>
>> >
>> > Maybe this flush/invalidate terminology is dangerous to
>> attempt, after
>> > all.
>> >
>> > Tom.
>> >
>> > >The actual text will be:
>> > >
>> > >For dat_lmr_sync_rdma_read:
>> > >This operation guarantees consistency by locally invalidating the
>> > >non-coherent cache prior to buffers being retrieved by remote peer
>> > >RDMA read operation(s).
>> > >
>> > >For dat_lmr_sync_rdma_write:
>> > >This operation guarantees consistency by locally flushing the
>> > >non-coherent cache back to buffers which have been populated by
>> > >remote peer RDMA write operation(s).
>> > >
>> > >---- Sherman Pun
>> > >     Sun Microsystems
>> > >
>> > >> Please, respond with your position on it.
>> > >>
>> > >> Arkady Kanevsky                                email:
>> arkady@...
>> > >> Network Appliance                              phone:
>> 781-768-5395
>> > >> 375 Totten Pond Rd.                            Fax: 781-895-1195
>> > >> 3rd Floor                                      http:
>www.netapp.com
>> >> Waltham, MA 02451-2010                         general phone:
>> >> 781-768-5300
>> >>
>> >>
>> >> 
>> >> Yahoo! Groups Links
>> >>
>> >>
>> >>
>> >> 
>> >>
>> >
>> >--- Sherman
>> >
>> >
>> >
>> >
>> >Yahoo! Groups Links
>> >
>> >
>> >
>> >
>>
>
>--- Sherman
>
>
>
>
>Yahoo! Groups Links
>
><*> To visit your group on the web, go to:
>     http://groups.yahoo.com/group/dat-discussions/
>
><*> To unsubscribe from this group, send an email to:
>     dat-discussions-unsubscribe@yahoogroups.com
>
><*> Your use of Yahoo! Groups is subject to:
>     http://docs.yahoo.com/info/terms/
>


#2817 From: Caitlin Bestler <cait@...>
Date: Tue Apr 6, 2004 8:53 am
Subject: Transport neutral send pacing
caitlinbestler
Send Email Send Email
 
The following is proposed as informative text to aid consumers
in developing transport neutral applications. It deals with the
difference
in send pacing between iWARP and InfiniBand.



DAT only provides minimal guarantees as to what completion of
a send operation means to the Consumer:

- The Consumer's buffers for that send request are no longer
     required. They may be released, re-used, altered, etc.

- Barring a connection failure the message/data will be delivered
    to the peer already or at some time in the future. This delivery will
    follow all the normal ordering rules.

There are several things that are *not* guaranteed, which an InfiniBand
developer may have presumed:

- The payload has not been delivered to peer memory.

- The destination address has not been validated by the remote peer.

Additionally a transport neutral application cannot assume that an
excess
post will simply be held until enough credits are granted for it to be
sent.
DAT allows a transport, such as iWARP, to simply send the message once
there are sufficient LLP credits (TCP or SCTP). If there was no waiting
recv buffer the connection will be torn down.

There are at least two methods that a Consumer can use to properly
pace sends in a transport neutral pacing: ULP credits and RDMA Read
Round Tripping.


The ULP Credit strategy calls for each ULP to define a number of
credits that each peer has. One credit represents the right to send
a single message (with post_send). Credits are restored by a reply
message sent back by the peer  (with post_send). Typically each
reply restores a single message, but the ULP may choose to
consolidate replies or otherwise explicitly vary this policy.

An application using ULP Credits frequently will have no need to
check for send completion, and may choose to suppress all successful
completions. Receiving the reply to a message certainly guarantees that
it was sent properly. Checking for receive completions may still be of
value if the round-trip-time is relatively large and large buffers are
in
use, as that it enables earlier reclaiming of the output buffers.

When simple receive queues are used,  the output buffers can be
reclaimed
even more quickly if they are simply posted as the receive buffers for
the
reply message. The peer cannot reply to a message before it is
delivered to it, and it cannot be delivered until the entire payload has
been properly placed.

It may look strange to issue a post_recv for a buffer, and *then* post
it,
but it is indeed safe provided that enough buffers are in use for the
number of ULP credits issued. Of course if there are not enough buffers
for the number of ULP credits issued then the application is not safe --
no matter which buffers are used for the post_recv().


RDMA Round Tripping is a useful technique when the peer application
does not currently, or does not want to, generate a reply message on
its own. RDMA Round Tripping essentially duplicates the guarantees
that a message has reached the remote host, but not necessarily the
remote peer, provided by InfiniBand.

The sender simply posts an RDMA Read on remote memory. There
is no real need to fetch any meaningful content. The sole purpose
is to generate an RDMA Read command on the wire.

The RDMA Read will be sent in order, and it will only be processed
by the remote peer after all prior messages have been completed
to the user.

Therefore waiting for the RDMA Read Completion will automatically
pace the sender to the actual speed at which the remote host
is receiving and acking the messages.  Each RDMA Read Completion
can be used to restore N credits, just as a Reply message would
have. If the sender does three post_sends, followed by a post_rdma_read,
then the completion of the RDMA Read will restore three credits.

#2818 From: Caitlin Bestler <cait@...>
Date: Tue Apr 6, 2004 8:54 am
Subject: Transport neutral send pacing
caitlinbestler
Send Email Send Email
 
The following is proposed as informative text to aid consumers
in developing transport neutral applications. It deals with the
difference
in send pacing between iWARP and InfiniBand.



DAT only provides minimal guarantees as to what completion of
a send operation means to the Consumer:

- The Consumer's buffers for that send request are no longer
     required. They may be released, re-used, altered, etc.

- Barring a connection failure the message/data will be delivered
    to the peer already or at some time in the future. This delivery will
    follow all the normal ordering rules.

There are several things that are *not* guaranteed, which an InfiniBand
developer may have presumed:

- The payload has not been delivered to peer memory.

- The destination address has not been validated by the remote peer.

Additionally a transport neutral application cannot assume that an
excess
post will simply be held until enough credits are granted for it to be
sent.
DAT allows a transport, such as iWARP, to simply send the message once
there are sufficient LLP credits (TCP or SCTP). If there was no waiting
recv buffer the connection will be torn down.

There are at least two methods that a Consumer can use to properly
pace sends in a transport neutral pacing: ULP credits and RDMA Read
Round Tripping.


The ULP Credit strategy calls for each ULP to define a number of
credits that each peer has. One credit represents the right to send
a single message (with post_send). Credits are restored by a reply
message sent back by the peer  (with post_send). Typically each
reply restores a single message, but the ULP may choose to
consolidate replies or otherwise explicitly vary this policy.

An application using ULP Credits frequently will have no need to
check for send completion, and may choose to suppress all successful
completions. Receiving the reply to a message certainly guarantees that
it was sent properly. Checking for receive completions may still be of
value if the round-trip-time is relatively large and large buffers are
in
use, as that it enables earlier reclaiming of the output buffers.

When simple receive queues are used,  the output buffers can be
reclaimed
even more quickly if they are simply posted as the receive buffers for
the
reply message. The peer cannot reply to a message before it is
delivered to it, and it cannot be delivered until the entire payload has
been properly placed.

It may look strange to issue a post_recv for a buffer, and *then* post
it,
but it is indeed safe provided that enough buffers are in use for the
number of ULP credits issued. Of course if there are not enough buffers
for the number of ULP credits issued then the application is not safe --
no matter which buffers are used for the post_recv().


RDMA Round Tripping is a useful technique when the peer application
does not currently, or does not want to, generate a reply message on
its own. RDMA Round Tripping essentially duplicates the guarantees
that a message has reached the remote host, but not necessarily the
remote peer, provided by InfiniBand.

The sender simply posts an RDMA Read on remote memory. There
is no real need to fetch any meaningful content. The sole purpose
is to generate an RDMA Read command on the wire.

The RDMA Read will be sent in order, and it will only be processed
by the remote peer after all prior messages have been completed
to the user.

Therefore waiting for the RDMA Read Completion will automatically
pace the sender to the actual speed at which the remote host
is receiving and acking the messages.  Each RDMA Read Completion
can be used to restore N credits, just as a Reply message would
have. If the sender does three post_sends, followed by a post_rdma_read,
then the completion of the RDMA Read will restore three credits.

#2819 From: "Kanevsky, Arkady" <arkady@...>
Date: Tue Apr 6, 2004 11:46 am
Subject: RE: Transport neutral send pacing
arkadynetappcom
Send Email Send Email
 
Caitlin,
Can you separate RDMA Read example from credit flow control one?
Arkady

Arkady Kanevsky                                email: arkady@...
Network Appliance                              phone: 781-768-5395
375 Totten Pond Rd.                            Fax: 781-895-1195
3rd Floor                                      http: www.netapp.com
Waltham, MA 02451-2010                         general phone:
781-768-5300


> -----Original Message-----
> From: Caitlin Bestler [mailto:cait@...]
> Sent: Tuesday, April 06, 2004 4:54 AM
> To: dat-discussions@yahoogroups.com
> Subject: [dat-discussions] Transport neutral send pacing
>
>
> The following is proposed as informative text to aid
> consumers in developing transport neutral applications. It
> deals with the
> difference
> in send pacing between iWARP and InfiniBand.
>
>
>
> DAT only provides minimal guarantees as to what completion of
> a send operation means to the Consumer:
>
> - The Consumer's buffers for that send request are no longer
>     required. They may be released, re-used, altered, etc.
>
> - Barring a connection failure the message/data will be delivered
>    to the peer already or at some time in the future. This
> delivery will
>    follow all the normal ordering rules.
>
> There are several things that are *not* guaranteed, which an
> InfiniBand developer may have presumed:
>
> - The payload has not been delivered to peer memory.
>
> - The destination address has not been validated by the remote peer.
>
> Additionally a transport neutral application cannot assume that an
> excess
> post will simply be held until enough credits are granted for
> it to be
> sent.
> DAT allows a transport, such as iWARP, to simply send the
> message once there are sufficient LLP credits (TCP or SCTP).
> If there was no waiting recv buffer the connection will be torn down.
>
> There are at least two methods that a Consumer can use to
> properly pace sends in a transport neutral pacing: ULP
> credits and RDMA Read Round Tripping.
>
>
> The ULP Credit strategy calls for each ULP to define a number
> of credits that each peer has. One credit represents the
> right to send a single message (with post_send). Credits are
> restored by a reply message sent back by the peer  (with
> post_send). Typically each reply restores a single message,
> but the ULP may choose to consolidate replies or otherwise
> explicitly vary this policy.
>
> An application using ULP Credits frequently will have no need
> to check for send completion, and may choose to suppress all
> successful completions. Receiving the reply to a message
> certainly guarantees that it was sent properly. Checking for
> receive completions may still be of value if the
> round-trip-time is relatively large and large buffers are
> in
> use, as that it enables earlier reclaiming of the output buffers.
>
> When simple receive queues are used,  the output buffers can be
> reclaimed
> even more quickly if they are simply posted as the receive
> buffers for
> the
> reply message. The peer cannot reply to a message before it
> is delivered to it, and it cannot be delivered until the
> entire payload has been properly placed.
>
> It may look strange to issue a post_recv for a buffer, and
> *then* post
> it,
> but it is indeed safe provided that enough buffers are in use
> for the number of ULP credits issued. Of course if there are
> not enough buffers for the number of ULP credits issued then
> the application is not safe -- no matter which buffers are
> used for the post_recv().
>
>
> RDMA Round Tripping is a useful technique when the peer
> application does not currently, or does not want to, generate
> a reply message on its own. RDMA Round Tripping essentially
> duplicates the guarantees that a message has reached the
> remote host, but not necessarily the remote peer, provided by
> InfiniBand.
>
> The sender simply posts an RDMA Read on remote memory. There
> is no real need to fetch any meaningful content. The sole
> purpose is to generate an RDMA Read command on the wire.
>
> The RDMA Read will be sent in order, and it will only be
> processed by the remote peer after all prior messages have
> been completed to the user.
>
> Therefore waiting for the RDMA Read Completion will
> automatically pace the sender to the actual speed at which
> the remote host is receiving and acking the messages.  Each
> RDMA Read Completion can be used to restore N credits, just
> as a Reply message would have. If the sender does three
> post_sends, followed by a post_rdma_read, then the completion
> of the RDMA Read will restore three credits.
>
>
>
>
> Yahoo! Groups Links
>
>
>
>
>
>

#2820 From: Sherman Pun <sherman.pun@...>
Date: Tue Apr 6, 2004 8:26 pm
Subject: errata 108
sherman.pun@...
Send Email Send Email
 
As part of the errata 108, an enum dat_return_class data type was introduced.
The dat_return_class was used to replace the corresponding #define's.
New

typedef enum dat_return_class {
	 DAT_CLASS_ERROR       = 0x80000000
	 DAT_CLASS_WARNING     = 0x40000000
	 DAT_CLASS_SUCCESS     = 0x00000000
} DAT_RETURN_CLASS

replace the Old
#define DAT_CLASS_ERROR       0x80000000
#define DAT_CLASS_WARNING     0x40000000
#define DAT_CLASS_SUCCESS     0x00000000

However the DAT_CLASS_ERROR as an enum type creates a compilation
warning with the SunPro compiler that is used by Solaris.

warning: enumerator value overflows INT_MAX (2147483647) [or 0x7fff.ffff]

[gcc does not flag an enum value of 0x80000000 as a warning].

Since this is a timely matter, I would like to ask for a discussion in
this forum and in tomorrow meeting. Sun would like to propose a vote to
rollback to the original #define.

Additional information for impact assessments:
Even though errata 108 was voted in 8/6/2003, the latest
dapl_beta_2_02, 1/7/2004 reference implementation in sourceforge.net
did not have the enum dat_return_class definition. This change
showed up in the 3/24/2004 "New file uploaded" notification
to the dat-discussions. Impacts to applications should be none.
This rollback only affects providers that have adopted the
enum dat_return_class.

--- Sherman Pun
     Sun Microsystems

#2821 From: "Kanevsky, Arkady" <arkady@...>
Date: Wed Apr 7, 2004 1:48 am
Subject: Agenda and Logistics for 4/7/04 conf. call
arkadynetappcom
Send Email Send Email
 
We will have a DAT Collaborative conference call
this Wednesday April 7, 1:00-2:00pm EDT
(10:00am-11:00am PDT).

Moderator: Arkady Kanevsky
Phone: 888-827-8686
International: 303-928-2620
Conf ID: 1068642

Tentative Detailed Agenda for 4/7/04
(all times are EST)
* 1:00 - 1:05 - Review Minutes, AIs, Logistics (Arkady)
* 1:05 - 1:25 - errata 171 - function prototypes (Steve)
* 1:25 - 1:35 - old errata 108 (SunPro compiler issues) (Sherman)
* 1:35 - 1:55 - list of addition/changes to DAT proposals
(Caitlin/Arkady)
	 async successful completion meaning
	 single IOV for RDMA Read
	 local RDMA write permission for RDMA read
	 iWARP sync return for posts in disconnect state
	 send with invalidate (non-handling in DAPL-1.2)
* 1:55 - 2:00 - errata 56 (Peter)
	 dying thread requirement
* 2:00 - 2:00 - 1.2 errata list (Arkady)
	 time permitting




Arkady Kanevsky                                email: arkady@...
Network Appliance                              phone: 781-768-5395
375 Totten Pond Rd.                            Fax: 781-895-1195
3rd Floor                                      http: www.netapp.com
Waltham, MA 02451-2010                         general phone:
781-768-5300

#2822 From: "Kanevsky, Arkady" <arkady@...>
Date: Wed Apr 7, 2004 3:45 pm
Subject: Small addition issue #7 - EVD overflow detection
arkadynetappcom
Send Email Send Email
 
7. iWARP does not support for CQ overflow (kDAPL and uDAPL)
(warning
galore, Consumer advice for over provisioning EVD queue, Consumer DTO
posting/reaping flow control)

Here is what we currently state in DAT spec:
API reqs, chapter 5:
5.7 2
If the queue of an Event Dispatcher is full then uDAPL-1.0 shall
generate an Event Dispatcher overflow error that is delivered to the
Asynchronous Error EVD of the Interface Adapter.

Also 5.7. 7.d
Posting of a Software event cannot cause the Event Dispatcher queue
overflow
An attempt to post a Software event that causes an overflow is reported
to a Consumer synchronously and the Software event is not being posted
to the Event Dispatcher.
An attempt to post a Software event that causes an overflow for an Event
Dispatcher does not generate the EVD overflow error and hence, is not
reported on the Asynchronous Error Event Dispatcher.

The decsription of Event model has the following relevant text:
"It is then up to the Consumer to ensure that the event queue does not
overflow.

An event queue overflow generates an asynchronous error on the IA Event
Dispatcher. Overflow of the Asynchronous Error Event Dispatcher is a
catastrophic error; behavior of the Provider after that is undefined.
The behavior of the Provider after it posts a catastrophic error is
undefined. The Provider can consolidate multiple overflows of the same
event queue into a single notification. In general, the Provider is free
to consolidate multiple error notifications of the same type.
Connections are not broken when an associated Event Dispatcher for the
connection local Endpoint has the queue overflow condition. All cleanup
of a queue overflow is left to the Consumer.

Events that are posted to the overflown queue (by the Provider or
Consumer) are dropped by a Provider; they do not effect other events on
the event queue of the Event Dispatcher.

Note to Consumer: A properly configured Event Dispatcher with
appropriate UpCall routine handling should not overflow. It is up to the
Consumer to configure the Event Dispatcher and dequeue events fast
enough to avoid an overflow condition."

This text need to be cleaned up regardless how we handle this issue.
For starters connection that caused overflow will be broken. Consumer
may not have any way to clean up the queue. The queue may no longer
we usable at all, and access to events on it may no longer be supported.
The only thing that Consumers can do is to destroy EVD, but first all
connnections that use it must be disconnected.



This semantic is optional for iWARP verbs:
"The RI is NOT REQUIRED to perform CQ overflow detection or
protection. Therefore, the CQ overflow error codes in this document
are OPTIONAL. When an overflow occurs, the results are
indeterminate. Overflow of a CQ MUST NOT affect QPs which do not
report Work Completions to that CQ and MUST NOT affect other CQs.
Consequently, when creating the CQ, the Consumer should request
enough outstanding Work Requests so that if every possible
outstanding WR were to complete (such as may happen in an error
case), there would be room for the CQE on the CQ. The RI MUST NOT
enforce that every WQE on every Work Queue associated with the CQ
must have a CQE available for the WQE's Work Completion information."

In contrast IBTA 1.1 spec states:
"C10-17: Overflow of the CQ shall be detected and reported by the CI
before
the next WC is retrieved from that CQ. This error must be reported
as an affiliated asynchronous error -- see 11.6.3.2 Affiliated
Asynchronous
Errors on page 565."

We have multiple ways for dealing with the issue:
1. Require DAT implementation on iWARP to support overflow detection.
2. make EVD overflow detection optional.

I am not sure the first  route will be workable for iWARP RNIC.
I believe that Consumer should avoid overflow conditions and
not rely on the overflow error. If overflow happens it is really a
Catastrophic error and all connections associated with that EVD
should be disconnect and EVD destroyed.

If we go the second route we can either define a Provider attribute for
EVD overflow detection or not.
If we go with 2 we would also need to provide in Usage section of EVD
create and references to it from EP/SP/IA create calls Usage section
which guides Consumer how to choose the size for EVD.
We should also reference the same advice from EVD/EP/SRQ resize calls.
Also we can describe a simple scheme for flow control that inlcudes
accounting for available space on EVD queue and EP/SRQ queues.

For example, the following advice from iWARP verbs spec can be used
as starting point:
"If the Consumer wishes to have deterministic error behavior, at
Create/Modify QP, the sum of the maximum number of WQEs associated
with a single CQ should be less than or equal to the number of
entries in the CQ. A Consumer can size the CQ smaller, in which case
the error semantics of a CQ overflow are not deterministic, but
possible RNIC behavior includes overwriting previous CQEs in whole
or in part and thus may result in a data integrity issue.

An additional consideration for sizing the CQ is QP Destruction. Any
outstanding WRs which were on a Work Queue when it is destroyed may
occupy entries on the associated CQ. For more information, see
Section 6.1.4 - Destroying a Queue Pair."

Assuming that no application dependent on this error for normal
operation
I suggest we follow route 2 and do not define a Provider attribute for
it.
It will not cause any changes to current Providers or Consumers.

Arkady

Arkady

Arkady Kanevsky                                email: arkady@...
Network Appliance                              phone: 781-768-5395
375 Totten Pond Rd.                            Fax: 781-895-1195
3rd Floor                                      http: www.netapp.com
Waltham, MA 02451-2010                         general phone:
781-768-5300

#2823 From: "Kanevsky, Arkady" <arkady@...>
Date: Wed Apr 7, 2004 4:11 pm
Subject: List of small issues #8 - RDMA Write ordering non-guarantee
arkadynetappcom
Send Email Send Email
 
Both IB and iWARP do not provide any guarantee what happens
to a remote buffer when there are multiple RDMA Write
operations on the same connection (or different) simultaneoously.

We have the following requirement:
The result of the RDMA DTO accessing remote memory that is being
accessed by its local Consumer is not defined and the content of any
remote memory accessed by the RDMA DTO is also undefined:
Coherency between operations on local memory and RDMA DTO operations on
the same memory is defined by the local host system architecture.

We can add one more req:
The result of the multiple RDMA DTO accessing remore memory
simultaneously is not defined.

To Usage section of RDMA Write add:
The pipeline of RDMA Write operations over single connection can proceed
simultaneously. Thus, if they access the same remote memory the result
of the remote buffer is undeterminate. It can range from one of the
operations data in the buffer, can be either one, from mixture of data
from multiple RDMA Writes. If Consumer desires deterministic result they
should use ULP protocol to ensure that only one RDMA operation accesses
remote buffer at a time. For example, they can use 0-size RDMA Read
between a pair of RDMA Writes that access the same remote location.

I also think that we should state that each multiple RDMA Writes
accessing the same location
generates a return code just like it was the single RDMA Write accessing
that memory location.
Another words, no error will be generate because multiple RDMA Writes
access the same
memory location.

************************************************************************
************************
Not sure if we can provide guarantee that if multiple RDMA Writes write
the same value
into remote location(s) then DAT can provide a guarantee that the value
will be deterministic.
I think we can but want to here other people thoughts on it.


Arkady Kanevsky                                email: arkady@...
Network Appliance                              phone: 781-768-5395
375 Totten Pond Rd.                            Fax: 781-895-1195
3rd Floor                                      http: www.netapp.com
Waltham, MA 02451-2010                         general phone:
781-768-5300

#2824 From: "Talpey, Thomas" <Thomas.Talpey@...>
Date: Wed Apr 7, 2004 4:47 pm
Subject: Re: List of small issues #8 - RDMA Write ordering non-guarantee
tmtymailu
Send Email Send Email
 

Probably worth including local reads and writes in this list.
But, I suspect that trying to document (or not document)
the ordering of operations is Pandora's Box.

By the way, if it helps: the thing that is needed before
ensuring any consistency, is receipt of a completion on
the data *sink*. Completions at the data *source* mean
a very different thing.

Tom.

At 12:11 PM 4/7/2004, Kanevsky, Arkady wrote:
>Both IB and iWARP do not provide any guarantee what happens
>to a remote buffer when there are multiple RDMA Write
>operations on the same connection (or different) simultaneoously.
>
>We have the following requirement:
>The result of the RDMA DTO accessing remote memory that is being
>accessed by its local Consumer is not defined and the content of any
>remote memory accessed by the RDMA DTO is also undefined:
>Coherency between operations on local memory and RDMA DTO operations on
>the same memory is defined by the local host system architecture.
>
>We can add one more req:
>The result of the multiple RDMA DTO accessing remore memory
>simultaneously is not defined.
>
>To Usage section of RDMA Write add:
>The pipeline of RDMA Write operations over single connection can proceed
>simultaneously. Thus, if they access the same remote memory the result
>of the remote buffer is undeterminate. It can range from one of the
>operations data in the buffer, can be either one, from mixture of data
>from multiple RDMA Writes. If Consumer desires deterministic result they
>should use ULP protocol to ensure that only one RDMA operation accesses
>remote buffer at a time. For example, they can use 0-size RDMA Read
>between a pair of RDMA Writes that access the same remote location.
>
>I also think that we should state that each multiple RDMA Writes
>accessing the same location
>generates a return code just like it was the single RDMA Write accessing
>that memory location.
>Another words, no error will be generate because multiple RDMA Writes
>access the same
>memory location.
>
>************************************************************************
>************************
>Not sure if we can provide guarantee that if multiple RDMA Writes write
>the same value
>into remote location(s) then DAT can provide a guarantee that the value
>will be deterministic.
>I think we can but want to here other people thoughts on it.
>
>
>Arkady Kanevsky                                email: arkady@...
>Network Appliance                              phone: 781-768-5395
>375 Totten Pond Rd.                            Fax: 781-895-1195
>3rd Floor                                      http: www.netapp.com
>Waltham, MA 02451-2010                         general phone:
>781-768-5300
>
>
>------------------------ Yahoo! Groups Sponsor ---------------------~-->
>Buy Ink Cartridges or Refill Kits for your HP, Epson, Canon or Lexmark
>Printer at MyInks.com.  Free s/h on orders $50 or more to the US & Canada.
>http://www.c1tracking.com/l.asp?cid=5511
>http://us.click.yahoo.com/mOAaAA/3exGAA/qnsNAA/W6uqlB/TM
>---------------------------------------------------------------------~->
>
>
>Yahoo! Groups Links
>
><*> To visit your group on the web, go to:
>     http://groups.yahoo.com/group/dat-discussions/
>
><*> To unsubscribe from this group, send an email to:
>     dat-discussions-unsubscribe@yahoogroups.com
>
><*> Your use of Yahoo! Groups is subject to:
>     http://docs.yahoo.com/info/terms/
>


#2825 From: "Kanevsky, Arkady" <arkady@...>
Date: Wed Apr 7, 2004 4:55 pm
Subject: List of small issues #9 - RDMA Read flow control
arkadynetappcom
Send Email Send Email
 
Both IB and iWARP do provide guarantee for RDMA Read flow control.
Proposal:

Add to RDMA Read Usage section:
The number of posted RDMA Reads on Send
WQ can exceed max_rdma_read_out attribute of the EP.
DAT Provider ensures that the number of outstanding RDMA
Read on the transport does not exceed the EP attribute.

Consumer can rely on its own RDMA Read flow control to ensure
that the number of RDMA Reads for which completions have not
been generated does not exceed the EP max_rdma_read_out
Attribute vale.


We can generate a new req for Provider support for
RDMA Read flow control.
Arkady


Arkady Kanevsky                                email: arkady@...
Network Appliance                              phone: 781-768-5395
375 Totten Pond Rd.                            Fax: 781-895-1195
3rd Floor                                      http: www.netapp.com
Waltham, MA 02451-2010                         general phone:
781-768-5300

#2826 From: "Kanevsky, Arkady" <arkady@...>
Date: Wed Apr 7, 2004 5:01 pm
Subject: RE: List of small issues #8 - RDMA Write ordering non-guarantee
arkadynetappcom
Send Email Send Email
 
The proposed requirement is for all RDMA DTOs (read and write).
 
Are you proposing to add some text on what happens
on simulataneous RDMA Read and Write to the same mmeory
from different connections?
 
What completion on the data sink you refer to?
There is none for RDMA Write? That is the reason for second
half of the COnsumer advice for ULP protocol.
We can be more explicit and describe the protocol example
that Caitlin outlined for RDMA Write successful comletion.
But then there are no simulataneous RDMA Write since there is
a Send between them.
RDMA Read serves the same purpose in the example.
Arkady
 
 

Arkady Kanevsky                                email: arkady@...

Network Appliance                              phone: 781-768-5395

375 Totten Pond Rd.                            Fax: 781-895-1195

3rd Floor                                              http: www.netapp.com

Waltham, MA 02451-2010                  general phone: 781-768-5300

-----Original Message-----
From: Talpey, Thomas
Sent: Wednesday, April 07, 2004 12:48 PM
To: dat-discussions@yahoogroups.com
Subject: Re: [dat-discussions] List of small issues #8 - RDMA Write ordering non-guarantee

Probably worth including local reads and writes in this list.
But, I suspect that trying to document (or not document)
the ordering of operations is Pandora's Box.

By the way, if it helps: the thing that is needed before
ensuring any consistency, is receipt of a completion on
the data *sink*. Completions at the data *source* mean
a very different thing.

Tom.

At 12:11 PM 4/7/2004, Kanevsky, Arkady wrote:
>Both IB and iWARP do not provide any guarantee what happens
>to a remote buffer when there are multiple RDMA Write
>operations on the same connection (or different) simultaneoously.
>
>We have the following requirement:
>The result of the RDMA DTO accessing remote memory that is being
>accessed by its local Consumer is not defined and the content of any
>remote memory accessed by the RDMA DTO is also undefined:
>Coherency between operations on local memory and RDMA DTO operations on
>the same memory is defined by the local host system architecture.
>
>We can add one more req:
>The result of the multiple RDMA DTO accessing remore memory
>simultaneously is not defined.
>
>To Usage section of RDMA Write add:
>The pipeline of RDMA Write operations over single connection can proceed
>simultaneously. Thus, if they access the same remote memory the result
>of the remote buffer is undeterminate. It can range from one of the
>operations data in the buffer, can be either one, from mixture of data
>from multiple RDMA Writes. If Consumer desires deterministic result they
>should use ULP protocol to ensure that only one RDMA operation accesses
>remote buffer at a time. For example, they can use 0-size RDMA Read
>between a pair of RDMA Writes that access the same remote location.
>
>I also think that we should state that each multiple RDMA Writes
>accessing the same location
>generates a return code just like it was the single RDMA Write accessing
>that memory location.
>Another words, no error will be generate because multiple RDMA Writes
>access the same
>memory location.
>
>************************************************************************
>************************
>Not sure if we can provide guarantee that if multiple RDMA Writes write
>the same value
>into remote location(s) then DAT can provide a guarantee that the value
>will be deterministic.
>I think we can but want to here other people thoughts on it.
>
>
>Arkady Kanevsky                                email: arkady@...
>Network Appliance                              phone: 781-768-5395
>375 Totten Pond Rd.                            Fax: 781-895-1195
>3rd Floor                                      http: www.netapp.com
>Waltham, MA 02451-2010                         general phone:
>781-768-5300
>
>
>------------------------ Yahoo! Groups Sponsor ---------------------~-->
>Buy Ink Cartridges or Refill Kits for your HP, Epson, Canon or Lexmark
>Printer at MyInks.com.  Free s/h on orders $50 or more to the US & Canada.
>http://www.c1tracking.com/l.asp?cid=5511
>http://us.click.yahoo.com/mOAaAA/3exGAA/qnsNAA/W6uqlB/TM
>---------------------------------------------------------------------~->
>
>
>Yahoo! Groups Links
>
><*> To visit your group on the web, go to:
>     http://groups.yahoo.com/group/dat-discussions/
>
><*> To unsubscribe from this group, send an email to:
>     dat-discussions-unsubscribe@yahoogroups.com
>
><*> Your use of Yahoo! Groups is subject to:
>     http://docs.yahoo.com/info/terms/
>



#2827 From: "Talpey, Thomas" <Thomas.Talpey@...>
Date: Wed Apr 7, 2004 5:18 pm
Subject: RE: List of small issues #8 - RDMA Write ordering non-guarantee
tmtymailu
Send Email Send Email
 

At 01:01 PM 4/7/2004, Kanevsky, Arkady wrote:
>The proposed requirement is for all RDMA DTOs (read and write).
>
>Are you proposing to add some text on what happens
>on simulataneous RDMA Read and Write to the same mmeory
>from different connections?

Actually I was saying read and write from the target host bus.
But other connections are also subject to the non-guarantee.

>
>What completion on the data sink you refer to?
>There is none for RDMA Write? That is the reason for second
>half of the COnsumer advice for ULP protocol.

Right! The point is, since there is no completion at the data sink,
there is no guarantee that any data has even been placed. Much
less, which possibly colliding operation will eventually "win".

>We can be more explicit and describe the protocol example
>that Caitlin outlined for RDMA Write successful comletion.
>But then there are no simulataneous RDMA Write since there is
>a Send between them.

Agreed. In this case, the second write is guaranteed to win.

>RDMA Read serves the same purpose in the example.

Only with respect to writes on the same connection, and if the
RDMA Read targets the same region as the last write. If both are
true, then agreed.

This stuff is subtle.

Tom.

>Arkady
>
>
>
>Arkady Kanevsky                                email: arkady@...
>
>Network Appliance                              phone: 781-768-5395
>
>375 Totten Pond Rd.                            Fax: 781-895-1195
>
>3rd Floor                                              http: 
><http://www.netapp.com/> www.netapp.com
>
>Waltham, MA 02451-2010                  general phone: 781-768-5300
>
>-----Original Message-----
>From: Talpey, Thomas
>Sent: Wednesday, April 07, 2004 12:48 PM
>To: dat-discussions@yahoogroups.com
>Subject: Re: [dat-discussions] List of small issues #8 - RDMA Write ordering
>non-guarantee
>
>
>
>Probably worth including local reads and writes in this list.
>But, I suspect that trying to document (or not document)
>the ordering of operations is Pandora's Box.
>
>By the way, if it helps: the thing that is needed before
>ensuring any consistency, is receipt of a completion on
>the data *sink*. Completions at the data *source* mean
>a very different thing.
>
>Tom.
>
>At 12:11 PM 4/7/2004, Kanevsky, Arkady wrote:
>>Both IB and iWARP do not provide any guarantee what happens
>>to a remote buffer when there are multiple RDMA Write
>>operations on the same connection (or different) simultaneoously.
>>
>>We have the following requirement:
>>The result of the RDMA DTO accessing remote memory that is being
>>accessed by its local Consumer is not defined and the content of any
>>remote memory accessed by the RDMA DTO is also undefined:
>>Coherency between operations on local memory and RDMA DTO operations on
>>the same memory is defined by the local host system architecture.
>>
>>We can add one more req:
>>The result of the multiple RDMA DTO accessing remore memory
>>simultaneously is not defined.
>>
>>To Usage section of RDMA Write add:
>>The pipeline of RDMA Write operations over single connection can proceed
>>simultaneously. Thus, if they access the same remote memory the result
>>of the remote buffer is undeterminate. It can range from one of the
>>operations data in the buffer, can be either one, from mixture of data
>>from multiple RDMA Writes. If Consumer desires deterministic result they
>>should use ULP protocol to ensure that only one RDMA operation accesses
>>remote buffer at a time. For example, they can use 0-size RDMA Read
>>between a pair of RDMA Writes that access the same remote location.
>>
>>I also think that we should state that each multiple RDMA Writes
>>accessing the same location
>>generates a return code just like it was the single RDMA Write accessing
>>that memory location.
>>Another words, no error will be generate because multiple RDMA Writes
>>access the same
>>memory location.
>>
>>************************************************************************
>>************************
>>Not sure if we can provide guarantee that if multiple RDMA Writes write
>>the same value
>>into remote location(s) then DAT can provide a guarantee that the value
>>will be deterministic.
>>I think we can but want to here other people thoughts on it.
>>
>>
>>Arkady Kanevsky                                email: arkady@...
>>Network Appliance                              phone: 781-768-5395
>>375 Totten Pond Rd.                            Fax: 781-895-1195
>>3rd Floor                                      http: www.netapp.com
>>Waltham, MA 02451-2010                         general phone:
>>781-768-5300
>>
>>
>>------------------------ Yahoo! Groups Sponsor ---------------------~-->
>>Buy Ink Cartridges or Refill Kits for your HP, Epson, Canon or Lexmark
>>Printer at MyInks.com.  Free s/h on orders $50 or more to the US & Canada.
>>http://www.c1tracking.com/l.asp?cid=5511
>>http://us.click.yahoo.com/mOAaAA/3exGAA/qnsNAA/W6uqlB/TM
>>---------------------------------------------------------------------~->
>>
>>
>>Yahoo! Groups Links
>>
>><*> To visit your group on the web, go to:
>>     http://groups.yahoo.com/group/dat-discussions/
>>
>><*> To unsubscribe from this group, send an email to:
>>     dat-discussions-unsubscribe@yahoogroups.com
>>
>><*> Your use of Yahoo! Groups is subject to:
>>     http://docs.yahoo.com/info/terms/
>>
>
>
>
>
>  _____ 
>
>Yahoo! Groups Links
>
>
>*      To visit your group on the web, go to:
>http://groups.yahoo.com/group/dat-discussions/

>
>*      To unsubscribe from this group, send an email to:
>dat-discussions-unsubscribe@yahoogroups.com
><mailto:dat-discussions-unsubscribe@yahoogroups.com?subject=Unsubscribe>

>
>*      Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service
><http://docs.yahoo.com/info/terms/> .


#2828 From: Caitlin Bestler <cait@...>
Date: Wed Apr 7, 2004 7:55 pm
Subject: Re: List of small issues #8 - RDMA Write ordering non-guarantee
caitlinbestler
Send Email Send Email
 
On Apr 7, 2004, at 8:18 PM, Talpey, Thomas wrote:

>
> >We can be more explicit and describe the protocol example
>  >that Caitlin outlined for RDMA Write successful comletion.
>  >But then there are no simulataneous RDMA Write since there is
>  >a Send between them.
>
> Agreed. In this case, the second write is guaranteed to win.

Actually there is no guarantee.

Suppose the sender posts in order a Write A,
Send B, Write C and Send D. Where A and
C target the same memory.

If those four packets are received in reverse
order the following will happen:

	 D will be placed, but not delivered.
	 C will be placed, but not delivered.
	 B will be placed, but not delivered.
	 A will be placed (overwriting C),
		 resulting in B and D being delivered.

By the time the receiving consumer sees
completion for D the buffer will contain the
contents associated with B.

Even more simply, the packets arrive in order,
but C is placed *before* the receiving application
reaps the completion for B. The contents seen
by the application will be for the *second*
buffer.

In general, having the same buffer in play
for multiple in-flight messages will get you
in trouble.

#2829 From: dat-discussions@yahoogroups.com
Date: Wed Apr 7, 2004 7:13 pm
Subject: New file uploaded to dat-discussions
dat-discussions@yahoogroups.com
Send Email Send Email
 
Hello,

This email message is a notification to let you know that
a file has been uploaded to the Files area of the dat-discussions
group.

   File        : /dat_headers_1_1.tgz
   Uploaded by : arkadynetappcom <arkady@...>
   Description : final updated DAPL-1.1 header files

You can access this file at the URL

http://groups.yahoo.com/group/dat-discussions/files/dat_headers_1_1.tgz

To learn more about file sharing for your group, please visit

http://help.yahoo.com/help/us/groups/files

Regards,

arkadynetappcom <arkady@...>

#2830 From: "Kanevsky, Arkady" <arkady@...>
Date: Wed Apr 7, 2004 11:03 pm
Subject: Updated DAPL-1.1 header files are on DAT web site!
arkadynetappcom
Send Email Send Email
 
They include the errata changes we voted on today!

Arkady Kanevsky                                email: arkady@...
Network Appliance                              phone: 781-768-5395
375 Totten Pond Rd.                            Fax: 781-895-1195
3rd Floor                                      http: www.netapp.com
Waltham, MA 02451-2010                         general phone:
781-768-5300

#2831 From: "Talpey, Thomas" <Thomas.Talpey@...>
Date: Thu Apr 8, 2004 1:16 am
Subject: Re: List of small issues #8 - RDMA Write ordering non-guarantee
tmtymailu
Send Email Send Email
 

At 03:55 PM 4/7/2004, Caitlin Bestler wrote:
>Actually there is no guarantee.
>
>
>Suppose the sender posts in order a Write A,
>
>Send B, Write C and Send D. Where A and
>
>C target the same memory.

Right you are. Oops.

I refer to your closing statement with vehement agreement:

>In general, having the same buffer in play
>
>for multiple in-flight messages will get you
>
>in trouble.

To which I will add, trying to document the trouble is good
for still more! I much prefer to remain silent on the issue
in the doc.

Tom.


#2832 From: Steffen Persvold <sp@...>
Date: Thu Apr 8, 2004 6:42 am
Subject: Re: List of small issues #8 - RDMA Write ordering non-guarantee
spersvol
Send Email Send Email
 
Caitlin Bestler wrote:
>
>
> In general, having the same buffer in play
> for multiple in-flight messages will get you
> in trouble.
>

By having the same buffer, do you mean the same LMR or the same virtual address
? I'm thinking :

WriteA, LMR 1, address 0x4000000, length 4096 bytes
WriteB, LMR 1, adresss 0x4001000, length 128 bytes

Any chance WriteB will be delivered before WriteA ?

Regards,
--
Steffen Persvold
Senior Software Engineer
mob. +47 92 48 45 11
tel. +47 22 62 89 50
fax. +47 22 62 89 51

Scali - http://www.scali.com
High Performance Clustering

#2833 From: "Kanevsky, Arkady" <arkady@...>
Date: Thu Apr 8, 2004 12:30 pm
Subject: Issue #10 plus more - RE: Transport neutral send pacing
arkadynetappcom
Send Email Send Email
 
Looking into our reqs for DAPL reliability model (5.2 8)
We have the following statements:
G) DTO Completion means that the Consumer can reclaim local resources
associated with the DTO, including a local buffer that was specified for
the DTO.

This is consistent with the proposed clarification. And I do not see any
reason to change it.
All the error handling is stated in the previous reqs and remain valid.

The delivery rules stated:
H) Delivery Ordering Rules:
The data payload for the send DTO matching a receive DTO is delivered
into the receive-indicated buffer memory prior to the receive DTO
completion.
Receive DTOs on a connection are completed in the order of posting of
their corresponding sends.
Each RDMA write DTO posted on a connection prior to a send DTO posted to
the same connection has its data payload delivered to the memory
specified by RMR Context and RMR Target Address of the RDMA Write DTO
prior to the completion of the Receive DTO matching that send.

The first rule covers the issue #10 from the list.
But the one in completion ordering rules need to be fixed:
I) Completion Ordering rules:
All Recv DTOs posted to a connection are completed in the order posted.

This is not guaranteed for iWARP. I am not aware of any ULP that depends
on it. Moreover, with SRQ it can not be guaranteed.
So we need to change it for SRQ support anyhow.
So I proposed that we drop this requirement.

Note that iWARP verbs document does provide guarantee for non SRQ case:
"8.2.2 Work Request Processing
...
2. Work Requests submitted to a single Send Queue or Receive Queue
MUST be Completed by the RI in the same order as the Work
Requests were submitted. Note that this does not apply to WRs
posted to S-RQs.
..."

But given that it does not work for SRQ it is simplier to have a unified
local model for Recv
And not depend that Recv can complete in post order under some
scenarios.

The last Delivery rule already covers what issue #5 was trying to state.

Here is the full list of DAPL current Completion ordering rules:
The data payload of a DTO is delivered into the receive- or
RDMA-indicated buffer prior to the DTO completion.
All Send and RDMA Write DTOs posted to a connection are completed in the
order posted.
RDMA Read DTOs posted to a connection are completed in the order posted.
RDMA Read DTOs can be completed out of order with respect to Send and
RDMA Write DTOs posted to the same connection.
All Recv DTOs posted to a connection are completed in the order posted.
No order relationship between completions of Recv DTOs and all other
DTOs on the same connection.
All Send, RDMA Read and RDMA Write DTO completions on a connection
generate DTO completion Events into the same "Event Stream."
All Recv DTO completions on a connection generate DTO completion Events
into the same "Event Stream."

The first one need to twicked to be specific to local buffers only:
The data payload of a DTO is delivered into the receive- or
RDMA-indicated local buffer prior to the DTO completion.

All Recv DTOs posted to a connection are completed in the order posted.
- should be dropped.

No other req. changes are needed.


Arkady Kanevsky                                email: arkady@...
Network Appliance                              phone: 781-768-5395
375 Totten Pond Rd.                            Fax: 781-895-1195
3rd Floor                                      http: www.netapp.com
Waltham, MA 02451-2010                         general phone:
781-768-5300


> -----Original Message-----
> From: Caitlin Bestler [mailto:cait@...]
> Sent: Tuesday, April 06, 2004 4:54 AM
> To: dat-discussions@yahoogroups.com
> Subject: [dat-discussions] Transport neutral send pacing
>
>
> The following is proposed as informative text to aid
> consumers in developing transport neutral applications. It
> deals with the
> difference
> in send pacing between iWARP and InfiniBand.
>
>
>
> DAT only provides minimal guarantees as to what completion of
> a send operation means to the Consumer:
>
> - The Consumer's buffers for that send request are no longer
>     required. They may be released, re-used, altered, etc.
>
> - Barring a connection failure the message/data will be delivered
>    to the peer already or at some time in the future. This
> delivery will
>    follow all the normal ordering rules.
>
> There are several things that are *not* guaranteed, which an
> InfiniBand developer may have presumed:
>
> - The payload has not been delivered to peer memory.
>
> - The destination address has not been validated by the remote peer.
>
> Additionally a transport neutral application cannot assume that an
> excess
> post will simply be held until enough credits are granted for
> it to be
> sent.
> DAT allows a transport, such as iWARP, to simply send the
> message once there are sufficient LLP credits (TCP or SCTP).
> If there was no waiting recv buffer the connection will be torn down.
>
> There are at least two methods that a Consumer can use to
> properly pace sends in a transport neutral pacing: ULP
> credits and RDMA Read Round Tripping.
>
>
> The ULP Credit strategy calls for each ULP to define a number
> of credits that each peer has. One credit represents the
> right to send a single message (with post_send). Credits are
> restored by a reply message sent back by the peer  (with
> post_send). Typically each reply restores a single message,
> but the ULP may choose to consolidate replies or otherwise
> explicitly vary this policy.
>
> An application using ULP Credits frequently will have no need
> to check for send completion, and may choose to suppress all
> successful completions. Receiving the reply to a message
> certainly guarantees that it was sent properly. Checking for
> receive completions may still be of value if the
> round-trip-time is relatively large and large buffers are
> in
> use, as that it enables earlier reclaiming of the output buffers.
>
> When simple receive queues are used,  the output buffers can be
> reclaimed
> even more quickly if they are simply posted as the receive
> buffers for
> the
> reply message. The peer cannot reply to a message before it
> is delivered to it, and it cannot be delivered until the
> entire payload has been properly placed.
>
> It may look strange to issue a post_recv for a buffer, and
> *then* post
> it,
> but it is indeed safe provided that enough buffers are in use
> for the number of ULP credits issued. Of course if there are
> not enough buffers for the number of ULP credits issued then
> the application is not safe -- no matter which buffers are
> used for the post_recv().
>
>
> RDMA Round Tripping is a useful technique when the peer
> application does not currently, or does not want to, generate
> a reply message on its own. RDMA Round Tripping essentially
> duplicates the guarantees that a message has reached the
> remote host, but not necessarily the remote peer, provided by
> InfiniBand.
>
> The sender simply posts an RDMA Read on remote memory. There
> is no real need to fetch any meaningful content. The sole
> purpose is to generate an RDMA Read command on the wire.
>
> The RDMA Read will be sent in order, and it will only be
> processed by the remote peer after all prior messages have
> been completed to the user.
>
> Therefore waiting for the RDMA Read Completion will
> automatically pace the sender to the actual speed at which
> the remote host is receiving and acking the messages.  Each
> RDMA Read Completion can be used to restore N credits, just
> as a Reply message would have. If the sender does three
> post_sends, followed by a post_rdma_read, then the completion
> of the RDMA Read will restore three credits.
>
>
>
>
> Yahoo! Groups Links
>
>
>
>
>
>

#2834 From: "Kanevsky, Arkady" <arkady@...>
Date: Thu Apr 8, 2004 12:35 pm
Subject: RE: List of small issues #8 - RDMA Write ordering non-guarantee
arkadynetappcom
Send Email Send Email
 
There is no completion for RDMA Write on remote side.
So remote consumer can not distinguish between "completions"
of the two RDMA Write.
Both RDMA Write will complete remotely when Recv completes
For the matching Send that was posted after two RDMA Writes have been
posted.

So talking about order between 2 RDMA Writes is useless and misleading.


Arkady Kanevsky                                email: arkady@...
Network Appliance                              phone: 781-768-5395
375 Totten Pond Rd.                            Fax: 781-895-1195
3rd Floor                                      http: www.netapp.com
Waltham, MA 02451-2010                         general phone:
781-768-5300


> -----Original Message-----
> From: Steffen Persvold [mailto:sp@...]
> Sent: Thursday, April 08, 2004 2:42 AM
> To: dat-discussions@yahoogroups.com
> Subject: Re: [dat-discussions] List of small issues #8 - RDMA
> Write ordering non-guarantee
>
>
> Caitlin Bestler wrote:
> >
> >
> > In general, having the same buffer in play
> > for multiple in-flight messages will get you
> > in trouble.
> >
>
> By having the same buffer, do you mean the same LMR or the
> same virtual address ? I'm thinking :
>
> WriteA, LMR 1, address 0x4000000, length 4096 bytes
> WriteB, LMR 1, adresss 0x4001000, length 128 bytes
>
> Any chance WriteB will be delivered before WriteA ?
>
> Regards,
> --
> Steffen Persvold
> Senior Software Engineer
> mob. +47 92 48 45 11
> tel. +47 22 62 89 50
> fax. +47 22 62 89 51
>
> Scali - http://www.scali.com
> High Performance Clustering
>
>
>
> ------------------------ Yahoo! Groups Sponsor
> ---------------------~--> Buy Ink Cartridges or Refill Kits
> for your HP, Epson, Canon or Lexmark Printer at MyInks.com.
> Free s/h on orders $50 or more to the US & Canada.
> http://www.c1tracking.com/l.asp?cid=5511
>
http://us.click.yahoo.com/mOAaAA/3exGAA/qnsNAA/W6uqlB/TM
---------------------------------------------------------------------~->


Yahoo! Groups Links

#2835 From: "Kanevsky, Arkady" <arkady@...>
Date: Thu Apr 8, 2004 1:22 pm
Subject: RE: List of small issues #9 - RDMA Read flow control
arkadynetappcom
Send Email Send Email
 
Need to make one thing clear for Consumer in Usage statement:

While Provider does guarantee flow control for RDMA Read DTOs Consumer
should avoid posting more than max_rdma_read_out RDMA Read to a
connection. Since all DTOs posted to the Send WQ of the EP are processed
in order, inability to process RDMA Read that exceeds max_rdma_read_out
may stall processing all other DTOs of the Send WQ of the EP.

I am not clear if we should state MAY or WILL or CAN.
For iWARP the statement is clear.
In iWARP verbs:
" 18. If more RDMA Read Type Work Requests are posted to the Send Queue
than are indicated by the ORD QP Attribute, the RI MUST pause the
processing of the Send Queue until at least one prior RDMA Read Type WR
Completes. If zero outbound RDMA Read Request Messages are supported on
the QP, and the Consumer posts an RDMA Read Type Work Request, the RI
MUST Complete the Work Request in error."

Strange, considering that there is RDMA Read fence bit. But the bit will
not be considered if ORD QP Attribute is exceeded.

For IB less so:
InfiniBandTM Architecture Release 1.1 Software Transport Interface
November 6, 2002 VOLUME 1 - GENERAL SPECIFICATIONS FINAL Ordering
guarantees for processing and completion notifications exist only
between Work Requests submitted to the same queue. The ordering across
multiple Work Queues is undefined.
C10-101: The CI shall provide the guarantees for processing and
completion notifications between Work Requests submitted to the same
Send Queue as specified by the ordering rules in Table 66. Ordering
Rules: * Receive Queues are FIFO queues with the exception of the
reliable datagram issue described above. * Send Queues are FIFO queues,
according to the rules in Table 66 Work Request Operation Ordering. The
Fence Indicator can be used to require strict ordering. Table 66 Work
Request Operation Ordering

The processing ordering between RDMA Read and follow on other operations
on Send WQ is controlled by Fence bit (barrier_fence dto_completion
value.)

The issue is what is the definition of processing.
Is checking that max_rdma_read_out is exceeded is processing. Than the
fact of checking starts the processing and Send queue pipeline does not
need to be stalled. If processing mean ready to go on the wire then
pipeline can be stalled until RDMA Read can proceed.

In any case if Consumer ensures that # of RDMA Reads in pipeline does
not exceed max_rdma_read_out everything is safe and pipeline is not
stalled. But may be Consumer can control pipeline stalling via
Fence_barrier RDMA Read bit.

Arkady Kanevsky                                email: arkady@...
Network Appliance                              phone: 781-768-5395
375 Totten Pond Rd.                            Fax: 781-895-1195
3rd Floor                                      http: www.netapp.com
Waltham, MA 02451-2010                         general phone:
781-768-5300


> -----Original Message-----
> From: Kanevsky, Arkady
> Sent: Wednesday, April 07, 2004 12:55 PM
> To: dat-discussions@yahoogroups.com
> Subject: [dat-discussions] List of small issues #9 - RDMA
> Read flow control
>
>
> Both IB and iWARP do provide guarantee for RDMA Read flow control.
> Proposal:
>
> Add to RDMA Read Usage section:
> The number of posted RDMA Reads on Send
> WQ can exceed max_rdma_read_out attribute of the EP.
> DAT Provider ensures that the number of outstanding RDMA
> Read on the transport does not exceed the EP attribute.
>
> Consumer can rely on its own RDMA Read flow control to ensure
> that the number of RDMA Reads for which completions have not
> been generated does not exceed the EP max_rdma_read_out
> Attribute vale.
>
>
> We can generate a new req for Provider support for
> RDMA Read flow control.
> Arkady
>
>
> Arkady Kanevsky                                email:
> arkady@...
> Network Appliance                              phone: 781-768-5395
> 375 Totten Pond Rd.                            Fax: 781-895-1195
> 3rd Floor                                      http: www.netapp.com
> Waltham, MA 02451-2010                         general phone:
> 781-768-5300
>
>
> ------------------------ Yahoo! Groups Sponsor
> ---------------------~--> Buy Ink Cartridges or Refill Kits
> for your HP, Epson, Canon or Lexmark Printer at MyInks.com.
> Free s/h on orders $50 or more to the US & Canada.
> http://www.c1tracking.com/l.asp?cid=5511
>
http://us.click.yahoo.com/mOAaAA/3exGAA/qnsNAA/W6uqlB/TM
---------------------------------------------------------------------~->


Yahoo! Groups Links

#2836 From: "Talpey, Thomas" <Thomas.Talpey@...>
Date: Thu Apr 8, 2004 1:58 pm
Subject: RE: List of small issues #9 - RDMA Read flow control
tmtymailu
Send Email Send Email
 

I agree that the head-of-line blocking behavior of overposting
RDMA Read requests should be mentioned. It is serious and may
be avoided by the Consumer simply by checking the Send Q ORD
value.

I am not certain what you mean by "RDMA Read fence bit". There
are three types of fence defined by iWARP:
        Local Fence
        Read Fence
        End-to-end Fence
The first two are verb input modifiers, the third is implemented via
a sequence of RDMA Write and RDMA Read (iWARP verbs section
8.2.2.2).

One reason you may find the text strange is because the Read Fence
bit is only valid on Send and RDMA Write type requests. Since the
text is referring to the posting of RDMA Read requests, it is necessary
discussion. Also note the "at least one" inclusion - in the case where
a Send or RDMA Write posting asserts the Read Fence, then "all" the
prior RDMA Reads must complete. Does this help?

So, I don't think any of the iWARP fencing methods can protect the
consumer from pipeline stalling. The consumer should obey ORD
directly, if it chooses.

Tom.

At 09:22 AM 4/8/2004, Kanevsky, Arkady wrote:
>Need to make one thing clear for Consumer in Usage statement:
>
>While Provider does guarantee flow control for RDMA Read DTOs Consumer
>should avoid posting more than max_rdma_read_out RDMA Read to a
>connection. Since all DTOs posted to the Send WQ of the EP are processed
>in order, inability to process RDMA Read that exceeds max_rdma_read_out
>may stall processing all other DTOs of the Send WQ of the EP.
>
>I am not clear if we should state MAY or WILL or CAN.
>For iWARP the statement is clear.
>In iWARP verbs:
>" 18. If more RDMA Read Type Work Requests are posted to the Send Queue
>than are indicated by the ORD QP Attribute, the RI MUST pause the
>processing of the Send Queue until at least one prior RDMA Read Type WR
>Completes. If zero outbound RDMA Read Request Messages are supported on
>the QP, and the Consumer posts an RDMA Read Type Work Request, the RI
>MUST Complete the Work Request in error."
>
>Strange, considering that there is RDMA Read fence bit. But the bit will
>not be considered if ORD QP Attribute is exceeded.
>
>For IB less so:
>InfiniBandTM Architecture Release 1.1 Software Transport Interface
>November 6, 2002 VOLUME 1 - GENERAL SPECIFICATIONS FINAL Ordering
>guarantees for processing and completion notifications exist only
>between Work Requests submitted to the same queue. The ordering across
>multiple Work Queues is undefined.
>C10-101: The CI shall provide the guarantees for processing and
>completion notifications between Work Requests submitted to the same
>Send Queue as specified by the ordering rules in Table 66. Ordering
>Rules: * Receive Queues are FIFO queues with the exception of the
>reliable datagram issue described above. * Send Queues are FIFO queues,
>according to the rules in Table 66 Work Request Operation Ordering. The
>Fence Indicator can be used to require strict ordering. Table 66 Work
>Request Operation Ordering
>
>The processing ordering between RDMA Read and follow on other operations
>on Send WQ is controlled by Fence bit (barrier_fence dto_completion
>value.)
>
>The issue is what is the definition of processing.
>Is checking that max_rdma_read_out is exceeded is processing. Than the
>fact of checking starts the processing and Send queue pipeline does not
>need to be stalled. If processing mean ready to go on the wire then
>pipeline can be stalled until RDMA Read can proceed.
>
>In any case if Consumer ensures that # of RDMA Reads in pipeline does
>not exceed max_rdma_read_out everything is safe and pipeline is not
>stalled. But may be Consumer can control pipeline stalling via
>Fence_barrier RDMA Read bit.
>
>Arkady Kanevsky                                email: arkady@...
>Network Appliance                              phone: 781-768-5395
>375 Totten Pond Rd.                            Fax: 781-895-1195
>3rd Floor                                      http: www.netapp.com
>Waltham, MA 02451-2010                         general phone:
>781-768-5300
>
>
>> -----Original Message-----
>> From: Kanevsky, Arkady
>> Sent: Wednesday, April 07, 2004 12:55 PM
>> To: dat-discussions@yahoogroups.com
>> Subject: [dat-discussions] List of small issues #9 - RDMA
>> Read flow control
>>
>>
>> Both IB and iWARP do provide guarantee for RDMA Read flow control.
>> Proposal:
>>
>> Add to RDMA Read Usage section:
>> The number of posted RDMA Reads on Send
>> WQ can exceed max_rdma_read_out attribute of the EP.
>> DAT Provider ensures that the number of outstanding RDMA
>> Read on the transport does not exceed the EP attribute.
>>
>> Consumer can rely on its own RDMA Read flow control to ensure
>> that the number of RDMA Reads for which completions have not
>> been generated does not exceed the EP max_rdma_read_out
>> Attribute vale.
>>
>>
>> We can generate a new req for Provider support for
>> RDMA Read flow control.
>> Arkady
>>
>>
>> Arkady Kanevsky                                email:
>> arkady@...
>> Network Appliance                              phone: 781-768-5395
>> 375 Totten Pond Rd.                            Fax: 781-895-1195
>> 3rd Floor                                      http: www.netapp.com
>> Waltham, MA 02451-2010                         general phone:
>> 781-768-5300
>>
>>
>> ------------------------ Yahoo! Groups Sponsor
>> ---------------------~--> Buy Ink Cartridges or Refill Kits
>> for your HP, Epson, Canon or Lexmark Printer at MyInks.com. 
>> Free s/h on orders $50 or more to the US & Canada.
>> http://www.c1tracking.com/l.asp?cid=5511
>>
>http://us.click.yahoo.com/mOAaAA/3exGAA/qnsNAA/W6uqlB/TM
>---------------------------------------------------------------------~->
>
>
>Yahoo! Groups Links
>
>
>
>
>
>
>
>------------------------ Yahoo! Groups Sponsor ---------------------~-->
>Buy Ink Cartridges or Refill Kits for your HP, Epson, Canon or Lexmark
>Printer at MyInks.com.  Free s/h on orders $50 or more to the US & Canada.
>http://www.c1tracking.com/l.asp?cid=5511
>http://us.click.yahoo.com/mOAaAA/3exGAA/qnsNAA/W6uqlB/TM
>---------------------------------------------------------------------~->
>
>
>Yahoo! Groups Links
>
><*> To visit your group on the web, go to:
>     http://groups.yahoo.com/group/dat-discussions/
>
><*> To unsubscribe from this group, send an email to:
>     dat-discussions-unsubscribe@yahoogroups.com
>
><*> Your use of Yahoo! Groups is subject to:
>     http://docs.yahoo.com/info/terms/
>


#2837 From: "Kanevsky, Arkady" <arkady@...>
Date: Thu Apr 8, 2004 2:31 pm
Subject: RE: List of small issues #9 - RDMA Read flow control
arkadynetappcom
Send Email Send Email
 
If RDMA Read barrier fence bit set for an operation (for Send WQ) then the operation shall not start until
all preceeding RDMA Reads complete.
Thit bit is part of DAT APIs and exists on all RDMA transports.
 
The iWARP spec states that Send queue is stalled when ORD is exceeded regardless of the
RDMA Read fence bit setting. This is strange because if the bit is not set then operations on Send queue after stalled RDMA Read can proceed if their RDMA Read fence bit is not set.
 
Arkady
 
 

Arkady Kanevsky                                email: arkady@...

Network Appliance                              phone: 781-768-5395

375 Totten Pond Rd.                            Fax: 781-895-1195

3rd Floor                                              http: www.netapp.com

Waltham, MA 02451-2010                  general phone: 781-768-5300

-----Original Message-----
From: Talpey, Thomas
Sent: Thursday, April 08, 2004 9:58 AM
To: dat-discussions@yahoogroups.com
Subject: RE: [dat-discussions] List of small issues #9 - RDMA Read flow control

I agree that the head-of-line blocking behavior of overposting
RDMA Read requests should be mentioned. It is serious and may
be avoided by the Consumer simply by checking the Send Q ORD
value.

I am not certain what you mean by "RDMA Read fence bit". There
are three types of fence defined by iWARP:
        Local Fence
        Read Fence
        End-to-end Fence
The first two are verb input modifiers, the third is implemented via
a sequence of RDMA Write and RDMA Read (iWARP verbs section
8.2.2.2).

One reason you may find the text strange is because the Read Fence
bit is only valid on Send and RDMA Write type requests. Since the
text is referring to the posting of RDMA Read requests, it is necessary
discussion. Also note the "at least one" inclusion - in the case where
a Send or RDMA Write posting asserts the Read Fence, then "all" the
prior RDMA Reads must complete. Does this help?

So, I don't think any of the iWARP fencing methods can protect the
consumer from pipeline stalling. The consumer should obey ORD
directly, if it chooses.

Tom.

At 09:22 AM 4/8/2004, Kanevsky, Arkady wrote:
>Need to make one thing clear for Consumer in Usage statement:
>
>While Provider does guarantee flow control for RDMA Read DTOs Consumer
>should avoid posting more than max_rdma_read_out RDMA Read to a
>connection. Since all DTOs posted to the Send WQ of the EP are processed
>in order, inability to process RDMA Read that exceeds max_rdma_read_out
>may stall processing all other DTOs of the Send WQ of the EP.
>
>I am not clear if we should state MAY or WILL or CAN.
>For iWARP the statement is clear.
>In iWARP verbs:
>" 18. If more RDMA Read Type Work Requests are posted to the Send Queue
>than are indicated by the ORD QP Attribute, the RI MUST pause the
>processing of the Send Queue until at least one prior RDMA Read Type WR
>Completes. If zero outbound RDMA Read Request Messages are supported on
>the QP, and the Consumer posts an RDMA Read Type Work Request, the RI
>MUST Complete the Work Request in error."
>
>Strange, considering that there is RDMA Read fence bit. But the bit will
>not be considered if ORD QP Attribute is exceeded.
>
>For IB less so:
>InfiniBandTM Architecture Release 1.1 Software Transport Interface
>November 6, 2002 VOLUME 1 - GENERAL SPECIFICATIONS FINAL Ordering
>guarantees for processing and completion notifications exist only
>between Work Requests submitted to the same queue. The ordering across
>multiple Work Queues is undefined.
>C10-101: The CI shall provide the guarantees for processing and
>completion notifications between Work Requests submitted to the same
>Send Queue as specified by the ordering rules in Table 66. Ordering
>Rules: * Receive Queues are FIFO queues with the exception of the
>reliable datagram issue described above. * Send Queues are FIFO queues,
>according to the rules in Table 66 Work Request Operation Ordering. The
>Fence Indicator can be used to require strict ordering. Table 66 Work
>Request Operation Ordering
>
>The processing ordering between RDMA Read and follow on other operations
>on Send WQ is controlled by Fence bit (barrier_fence dto_completion
>value.)
>
>The issue is what is the definition of processing.
>Is checking that max_rdma_read_out is exceeded is processing. Than the
>fact of checking starts the processing and Send queue pipeline does not
>need to be stalled. If processing mean ready to go on the wire then
>pipeline can be stalled until RDMA Read can proceed.
>
>In any case if Consumer ensures that # of RDMA Reads in pipeline does
>not exceed max_rdma_read_out everything is safe and pipeline is not
>stalled. But may be Consumer can control pipeline stalling via
>Fence_barrier RDMA Read bit.
>
>Arkady Kanevsky                                email: arkady@...
>Network Appliance                              phone: 781-768-5395
>375 Totten Pond Rd.                            Fax: 781-895-1195
>3rd Floor                                      http: www.netapp.com
>Waltham, MA 02451-2010                         general phone:
>781-768-5300
>
>
>> -----Original Message-----
>> From: Kanevsky, Arkady
>> Sent: Wednesday, April 07, 2004 12:55 PM
>> To: dat-discussions@yahoogroups.com
>> Subject: [dat-discussions] List of small issues #9 - RDMA
>> Read flow control
>>
>>
>> Both IB and iWARP do provide guarantee for RDMA Read flow control.
>> Proposal:
>>
>> Add to RDMA Read Usage section:
>> The number of posted RDMA Reads on Send
>> WQ can exceed max_rdma_read_out attribute of the EP.
>> DAT Provider ensures that the number of outstanding RDMA
>> Read on the transport does not exceed the EP attribute.
>>
>> Consumer can rely on its own RDMA Read flow control to ensure
>> that the number of RDMA Reads for which completions have not
>> been generated does not exceed the EP max_rdma_read_out
>> Attribute vale.
>>
>>
>> We can generate a new req for Provider support for
>> RDMA Read flow control.
>> Arkady
>>
>>
>> Arkady Kanevsky                                email:
>> arkady@...
>> Network Appliance                              phone: 781-768-5395
>> 375 Totten Pond Rd.                            Fax: 781-895-1195
>> 3rd Floor                                      http: www.netapp.com
>> Waltham, MA 02451-2010                         general phone:
>> 781-768-5300
>>
>>
>> ------------------------ Yahoo! Groups Sponsor
>> ---------------------~--> Buy Ink Cartridges or Refill Kits
>> for your HP, Epson, Canon or Lexmark Printer at MyInks.com. 
>> Free s/h on orders $50 or more to the US & Canada.
>> http://www.c1tracking.com/l.asp?cid=5511
>>
>http://us.click.yahoo.com/mOAaAA/3exGAA/qnsNAA/W6uqlB/TM
>---------------------------------------------------------------------~->
>
>
>Yahoo! Groups Links
>
>
>
>
>
>
>
>------------------------ Yahoo! Groups Sponsor ---------------------~-->
>Buy Ink Cartridges or Refill Kits for your HP, Epson, Canon or Lexmark
>Printer at MyInks.com.  Free s/h on orders $50 or more to the US & Canada.
>http://www.c1tracking.com/l.asp?cid=5511
>http://us.click.yahoo.com/mOAaAA/3exGAA/qnsNAA/W6uqlB/TM
>---------------------------------------------------------------------~->
>
>
>Yahoo! Groups Links
>
><*> To visit your group on the web, go to:
>     http://groups.yahoo.com/group/dat-discussions/
>
><*> To unsubscribe from this group, send an email to:
>     dat-discussions-unsubscribe@yahoogroups.com
>
><*> Your use of Yahoo! Groups is subject to:
>     http://docs.yahoo.com/info/terms/
>


#2838 From: "Talpey, Thomas" <Thomas.Talpey@...>
Date: Thu Apr 8, 2004 2:44 pm
Subject: RE: List of small issues #9 - RDMA Read flow control
tmtymailu
Send Email Send Email
 

Oh I get it - you mean, if a fence bit is NOT set, then other
transports do NOT fence. And iWARP DOES fence in the ORD
overposting case.

Okay, we can and should state that unfenced requests may still
be deferred, depending on the provider. And make a note that
iWARP *will* behave in this fashion in the ORD case, which the
consumer can avoid.

Tom.

At 10:31 AM 4/8/2004, Kanevsky, Arkady wrote:
>If RDMA Read barrier fence bit set for an operation (for Send WQ) then the
>operation shall not start until
>all preceeding RDMA Reads complete.
>Thit bit is part of DAT APIs and exists on all RDMA transports.
>
>The iWARP spec states that Send queue is stalled when ORD is exceeded
>regardless of the
>RDMA Read fence bit setting. This is strange because if the bit is not set
>then operations on Send queue after stalled RDMA Read can proceed if their
>RDMA Read fence bit is not set.
>
>Arkady
>
>
>
>Arkady Kanevsky                                email: arkady@...
>
>Network Appliance                              phone: 781-768-5395
>
>375 Totten Pond Rd.                            Fax: 781-895-1195
>
>3rd Floor                                              http: 
><http://www.netapp.com/> www.netapp.com
>
>Waltham, MA 02451-2010                  general phone: 781-768-5300
>
>-----Original Message-----
>From: Talpey, Thomas
>Sent: Thursday, April 08, 2004 9:58 AM
>To: dat-discussions@yahoogroups.com
>Subject: RE: [dat-discussions] List of small issues #9 - RDMA Read flow control
>
>
>
>I agree that the head-of-line blocking behavior of overposting
>RDMA Read requests should be mentioned. It is serious and may
>be avoided by the Consumer simply by checking the Send Q ORD
>value.
>
>I am not certain what you mean by "RDMA Read fence bit". There
>are three types of fence defined by iWARP:
>        Local Fence
>        Read Fence
>        End-to-end Fence
>The first two are verb input modifiers, the third is implemented via
>a sequence of RDMA Write and RDMA Read (iWARP verbs section
>8.2.2.2).
>
>One reason you may find the text strange is because the Read Fence
>bit is only valid on Send and RDMA Write type requests. Since the
>text is referring to the posting of RDMA Read requests, it is necessary
>discussion. Also note the "at least one" inclusion - in the case where
>a Send or RDMA Write posting asserts the Read Fence, then "all" the
>prior RDMA Reads must complete. Does this help?
>
>So, I don't think any of the iWARP fencing methods can protect the
>consumer from pipeline stalling. The consumer should obey ORD
>directly, if it chooses.
>
>Tom.
>
>At 09:22 AM 4/8/2004, Kanevsky, Arkady wrote:
>>Need to make one thing clear for Consumer in Usage statement:
>>
>>While Provider does guarantee flow control for RDMA Read DTOs Consumer
>>should avoid posting more than max_rdma_read_out RDMA Read to a
>>connection. Since all DTOs posted to the Send WQ of the EP are processed
>>in order, inability to process RDMA Read that exceeds max_rdma_read_out
>>may stall processing all other DTOs of the Send WQ of the EP.
>>
>>I am not clear if we should state MAY or WILL or CAN.
>>For iWARP the statement is clear.
>>In iWARP verbs:
>>" 18. If more RDMA Read Type Work Requests are posted to the Send Queue
>>than are indicated by the ORD QP Attribute, the RI MUST pause the
>>processing of the Send Queue until at least one prior RDMA Read Type WR
>>Completes. If zero outbound RDMA Read Request Messages are supported on
>>the QP, and the Consumer posts an RDMA Read Type Work Request, the RI
>>MUST Complete the Work Request in error."
>>
>>Strange, considering that there is RDMA Read fence bit. But the bit will
>>not be considered if ORD QP Attribute is exceeded.
>>
>>For IB less so:
>>InfiniBandTM Architecture Release 1.1 Software Transport Interface
>>November 6, 2002 VOLUME 1 - GENERAL SPECIFICATIONS FINAL Ordering
>>guarantees for processing and completion notifications exist only
>>between Work Requests submitted to the same queue. The ordering across
>>multiple Work Queues is undefined.
>>C10-101: The CI shall provide the guarantees for processing and
>>completion notifications between Work Requests submitted to the same
>>Send Queue as specified by the ordering rules in Table 66. Ordering
>>Rules: * Receive Queues are FIFO queues with the exception of the
>>reliable datagram issue described above. * Send Queues are FIFO queues,
>>according to the rules in Table 66 Work Request Operation Ordering. The
>>Fence Indicator can be used to require strict ordering. Table 66 Work
>>Request Operation Ordering
>>
>>The processing ordering between RDMA Read and follow on other operations
>>on Send WQ is controlled by Fence bit (barrier_fence dto_completion
>>value.)
>>
>>The issue is what is the definition of processing.
>>Is checking that max_rdma_read_out is exceeded is processing. Than the
>>fact of checking starts the processing and Send queue pipeline does not
>>need to be stalled. If processing mean ready to go on the wire then
>>pipeline can be stalled until RDMA Read can proceed.
>>
>>In any case if Consumer ensures that # of RDMA Reads in pipeline does
>>not exceed max_rdma_read_out everything is safe and pipeline is not
>>stalled. But may be Consumer can control pipeline stalling via
>>Fence_barrier RDMA Read bit.
>>
>>Arkady Kanevsky                                email: arkady@...
>>Network Appliance                              phone: 781-768-5395
>>375 Totten Pond Rd.                            Fax: 781-895-1195
>>3rd Floor                                      http: www.netapp.com
>>Waltham, MA 02451-2010                         general phone:
>>781-768-5300
>>
>>
>>> -----Original Message-----
>>> From: Kanevsky, Arkady
>>> Sent: Wednesday, April 07, 2004 12:55 PM
>>> To: dat-discussions@yahoogroups.com
>>> Subject: [dat-discussions] List of small issues #9 - RDMA
>>> Read flow control
>>>
>>>
>>> Both IB and iWARP do provide guarantee for RDMA Read flow control.
>>> Proposal:
>>>
>>> Add to RDMA Read Usage section:
>>> The number of posted RDMA Reads on Send
>>> WQ can exceed max_rdma_read_out attribute of the EP.
>>> DAT Provider ensures that the number of outstanding RDMA
>>> Read on the transport does not exceed the EP attribute.
>>>
>>> Consumer can rely on its own RDMA Read flow control to ensure
>>> that the number of RDMA Reads for which completions have not
>>> been generated does not exceed the EP max_rdma_read_out
>>> Attribute vale.
>>>
>>>
>>> We can generate a new req for Provider support for
>>> RDMA Read flow control.
>>> Arkady
>>>
>>>
>>> Arkady Kanevsky                                email:
>>> arkady@...
>>> Network Appliance                              phone: 781-768-5395
>>> 375 Totten Pond Rd.                            Fax: 781-895-1195
>>> 3rd Floor                                      http: www.netapp.com
>>> Waltham, MA 02451-2010                         general phone:
>>> 781-768-5300
>>>
>>>
>>> ------------------------ Yahoo! Groups Sponsor
>>> ---------------------~--> Buy Ink Cartridges or Refill Kits
>>> for your HP, Epson, Canon or Lexmark Printer at MyInks.com. 
>>> Free s/h on orders $50 or more to the US & Canada.
>>> http://www.c1tracking.com/l.asp?cid=5511
>>>
>>http://us.click.yahoo.com/mOAaAA/3exGAA/qnsNAA/W6uqlB/TM
>>---------------------------------------------------------------------~->
>>
>>
>>Yahoo! Groups Links
>>
>>
>>
>>
>>
>>
>>
>>------------------------ Yahoo! Groups Sponsor ---------------------~-->
>>Buy Ink Cartridges or Refill Kits for your HP, Epson, Canon or Lexmark
>>Printer at MyInks.com.  Free s/h on orders $50 or more to the US & Canada.
>>http://www.c1tracking.com/l.asp?cid=5511
>>http://us.click.yahoo.com/mOAaAA/3exGAA/qnsNAA/W6uqlB/TM
>>---------------------------------------------------------------------~->
>>
>>
>>Yahoo! Groups Links
>>
>><*> To visit your group on the web, go to:
>>     http://groups.yahoo.com/group/dat-discussions/
>>
>><*> To unsubscribe from this group, send an email to:
>>     dat-discussions-unsubscribe@yahoogroups.com
>>
>><*> Your use of Yahoo! Groups is subject to:
>>     http://docs.yahoo.com/info/terms/
>>
>
>
>Yahoo! Groups Sponsor 
>
>ADVERTISEMENT
> <http://rd.yahoo.com/SIG=12cr23p0t/M=290828.4794622.5939935.1261774/D=egrou>pweb/S=1705701014:HM/EXP=1081521128/A=1950447/R=0/SIG=124ri7d3j/*http://ashn

>in.com/clk/muryutaitakenattogyo?YH=4794622&yhad=1950447> click here   

><http://us.adserver.yahoo.com/l?M=290828.4794622.5939935.1261774/D=egroupweb>/S=:HM/A=1950447/rand=429811249> 
>
>
>  _____ 
>
>Yahoo! Groups Links
>
>
>*      To visit your group on the web, go to:
>http://groups.yahoo.com/group/dat-discussions/

>
>*      To unsubscribe from this group, send an email to:
>dat-discussions-unsubscribe@yahoogroups.com
><mailto:dat-discussions-unsubscribe@yahoogroups.com?subject=Unsubscribe>

>
>*      Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service
><http://docs.yahoo.com/info/terms/> .


#2839 From: Caitlin Bestler <cait@...>
Date: Thu Apr 8, 2004 3:29 pm
Subject: Re: List of small issues #8 - RDMA Write ordering non-guarantee
caitlinbestler
Send Email Send Email
 
On Apr 8, 2004, at 9:42 AM, Steffen Persvold wrote:

> Caitlin Bestler wrote:
>>
>>
>> In general, having the same buffer in play
>> for multiple in-flight messages will get you
>> in trouble.
>>
>
> By having the same buffer, do you mean the same LMR or the same
> virtual address ? I'm thinking :
>
> WriteA, LMR 1, address 0x4000000, length 4096 bytes
> WriteB, LMR 1, adresss 0x4001000, length 128 bytes
>
> Any chance WriteB will be delivered before WriteA ?
>
>
The same targeted set of bytes, however represented.

In your example, the same set of bytes is not targeted,
so there is no risk.

#2840 From: "Kanevsky, Arkady" <arkady@...>
Date: Thu Apr 8, 2004 12:37 pm
Subject: FW: List of small issues #9 - RDMA Read flow control
arkadynetappcom
Send Email Send Email
 
Resending!

Arkady Kanevsky                                email: arkady@...
Network Appliance                              phone: 781-768-5395
375 Totten Pond Rd.                            Fax: 781-895-1195
3rd Floor                                      http: www.netapp.com
Waltham, MA 02451-2010                         general phone:
781-768-5300


> -----Original Message-----
> From: Kanevsky, Arkady
> Sent: Thursday, April 08, 2004 8:29 AM
> To: 'dat-discussions@yahoogroups.com'
> Subject: RE: [dat-discussions] List of small issues #9 - RDMA
> Read flow control
>
>
> Need to make one thing clear for Consumer in Usage statement:
>
> While Provider does guarantee flow control for RDMA Read DTOs
> Consumer should avoid posting more than max_rdma_read_out
> RDMA Read to a connection. Since all DTOs posted to the Send
> WQ of the EP are processed in order, inability to process
> RDMA Read that exceeds max_rdma_read_out may stall processing
> all other DTOs of the Send WQ of the EP.
>
> I am not clear if we should state MAY or WILL or CAN.
> For iWARP the statement is clear.
> In iWARP verbs:
> " 18. If more RDMA Read Type Work Requests are posted to the
> Send Queue than are indicated by the ORD QP Attribute, the RI
> MUST pause the processing of the Send Queue until at least
> one prior RDMA Read Type WR Completes. If zero outbound RDMA
> Read Request Messages are supported on the QP, and the
> Consumer posts an RDMA Read Type Work Request, the RI MUST
> Complete the Work Request in error."
>
> Strange, considering that there is RDMA Read fence bit. But
> the bit will not be considered if ORD QP Attribute is exceeded.
>
> For IB less so:
> InfiniBandTM Architecture Release 1.1 Software Transport
> Interface November 6, 2002 VOLUME 1 - GENERAL SPECIFICATIONS
> FINAL Ordering guarantees for processing and completion
> notifications exist only between Work Requests submitted to
> the same queue. The ordering across multiple Work Queues is undefined.
> C10-101: The CI shall provide the guarantees for processing
> and completion notifications between Work Requests submitted
> to the same Send Queue as specified by the ordering rules in
> Table 66. Ordering Rules: * Receive Queues are FIFO queues
> with the exception of the reliable datagram issue described
> above. * Send Queues are FIFO queues, according to the rules
> in Table 66 Work Request Operation Ordering. The Fence
> Indicator can be used to require strict ordering. Table 66
> Work Request Operation Ordering
>
> The processing ordering between RDMA Read and follow on other
> operations on Send WQ is controlled by Fence bit
> (barrier_fence dto_completion value.)
>
> The issue is what is the definition of processing.
> Is checking that max_rdma_read_out is exceeded is processing.
> Than the fact of checking starts the processing and Send
> queue pipeline does not need to be stalled. If processing
> mean ready to go on the wire then pipeline can be stalled
> until RDMA Read can proceed.
>
> In any case if Consumer ensures that # of RDMA Reads in
> pipeline does not exceed max_rdma_read_out everything is safe
> and pipeline is not stalled. But may be Consumer can control
> pipeline stalling via Fence_barrier RDMA Read bit.
>
>
> Arkady Kanevsky                                email:
> arkady@...
> Network Appliance                              phone: 781-768-5395
> 375 Totten Pond Rd.                            Fax: 781-895-1195
> 3rd Floor                                      http: www.netapp.com
> Waltham, MA 02451-2010                         general phone:
> 781-768-5300
>
>
> > -----Original Message-----
> > From: Kanevsky, Arkady
> > Sent: Wednesday, April 07, 2004 12:55 PM
> > To: dat-discussions@yahoogroups.com
> > Subject: [dat-discussions] List of small issues #9 - RDMA
> > Read flow control
> >
> >
> > Both IB and iWARP do provide guarantee for RDMA Read flow control.
> > Proposal:
> >
> > Add to RDMA Read Usage section:
> > The number of posted RDMA Reads on Send
> > WQ can exceed max_rdma_read_out attribute of the EP.
> > DAT Provider ensures that the number of outstanding RDMA
> > Read on the transport does not exceed the EP attribute.
> >
> > Consumer can rely on its own RDMA Read flow control to ensure
> > that the number of RDMA Reads for which completions have not
> > been generated does not exceed the EP max_rdma_read_out
> > Attribute vale.
> >
> >
> > We can generate a new req for Provider support for
> > RDMA Read flow control.
> > Arkady
> >
> >
> > Arkady Kanevsky                                email:
> > arkady@...
> > Network Appliance                              phone: 781-768-5395
> > 375 Totten Pond Rd.                            Fax: 781-895-1195
> > 3rd Floor                                      http: www.netapp.com
> > Waltham, MA 02451-2010                         general phone:
> > 781-768-5300
> >
> >
> > ------------------------ Yahoo! Groups Sponsor
> > ---------------------~--> Buy Ink Cartridges or Refill Kits
> > for your HP, Epson, Canon or Lexmark Printer at MyInks.com.
> > Free s/h on orders $50 or more to the US & Canada.
> > http://www.c1tracking.com/l.asp?cid=5511
> >
> http://us.click.yahoo.com/mOAaAA/3exGAA/qnsNAA/W6uqlB/TM
> --------------------------------------------------------------
> -------~->
>
>
> Yahoo! Groups Links
>
>
>
>
>
>

#2841 From: "Kanevsky, Arkady" <arkady@...>
Date: Thu Apr 8, 2004 12:28 pm
Subject: RE: List of small issues #9 - RDMA Read flow control
arkadynetappcom
Send Email Send Email
 
Need to make one thing clear for Consumer in Usage statement:

While Provider does guarantee flow control for RDMA Read DTOs
Consumer should avoid posting more than max_rdma_read_out
RDMA Read to a connection. Since all DTOs posted to the Send
WQ of the EP are processed in order, inability to process
RDMA Read that exceeds max_rdma_read_out may stall
processing all other DTOs of the Send WQ of the EP.

I am not clear if we should state MAY or WILL or CAN.
For iWARP the statement is clear.
In iWARP verbs:
" 18. If more RDMA Read Type Work Requests are posted to the Send
Queue than are indicated by the ORD QP Attribute, the RI MUST
pause the processing of the Send Queue until at least one prior
RDMA Read Type WR Completes. If zero outbound RDMA Read Request
Messages are supported on the QP, and the Consumer posts an RDMA
Read Type Work Request, the RI MUST Complete the Work Request in
error."

Strange, considering that there is RDMA Read fence bit. But the bit will
not be considered if ORD QP Attribute is exceeded.

For IB less so:
InfiniBandTM Architecture Release 1.1 Software Transport Interface
November 6, 2002
VOLUME 1 - GENERAL SPECIFICATIONS FINAL
Ordering guarantees for processing and completion notifications exist
only between Work Requests submitted to the same queue. The ordering
across multiple Work Queues is undefined.
C10-101: The CI shall provide the guarantees for processing and
completion
notifications between Work Requests submitted to the same Send
Queue as specified by the ordering rules in Table 66.
Ordering Rules:
* Receive Queues are FIFO queues with the exception of the reliable
datagram issue described above.
* Send Queues are FIFO queues, according to the rules in Table 66
Work Request Operation Ordering. The Fence Indicator can be used
to require strict ordering.
Table 66 Work Request Operation Ordering

The processing ordering between RDMA Read and follow on other
operations on Send WQ is controlled by Fence bit (barrier_fence
dto_completion value.)

The issue is what is the definition of processing.
Is checking that max_rdma_read_out is exceeded is processing.
Than the fact of checking starts the processing and Send queue pipeline
does not need to be stalled. If processing mean ready to go on the wire
then pipeline can be stalled until RDMA Read can proceed.

In any case if Consumer ensures that # of RDMA Reads in pipeline
does not exceed max_rdma_read_out everything is safe and pipeline
is not stalled. But may be Consumer can control pipeline stalling via
Fence_barrier RDMA Read bit.


Arkady Kanevsky                                email: arkady@...
Network Appliance                              phone: 781-768-5395
375 Totten Pond Rd.                            Fax: 781-895-1195
3rd Floor                                      http: www.netapp.com
Waltham, MA 02451-2010                         general phone:
781-768-5300


> -----Original Message-----
> From: Kanevsky, Arkady
> Sent: Wednesday, April 07, 2004 12:55 PM
> To: dat-discussions@yahoogroups.com
> Subject: [dat-discussions] List of small issues #9 - RDMA
> Read flow control
>
>
> Both IB and iWARP do provide guarantee for RDMA Read flow control.
> Proposal:
>
> Add to RDMA Read Usage section:
> The number of posted RDMA Reads on Send
> WQ can exceed max_rdma_read_out attribute of the EP.
> DAT Provider ensures that the number of outstanding RDMA
> Read on the transport does not exceed the EP attribute.
>
> Consumer can rely on its own RDMA Read flow control to ensure
> that the number of RDMA Reads for which completions have not
> been generated does not exceed the EP max_rdma_read_out
> Attribute vale.
>
>
> We can generate a new req for Provider support for
> RDMA Read flow control.
> Arkady
>
>
> Arkady Kanevsky                                email:
> arkady@...
> Network Appliance                              phone: 781-768-5395
> 375 Totten Pond Rd.                            Fax: 781-895-1195
> 3rd Floor                                      http: www.netapp.com
> Waltham, MA 02451-2010                         general phone:
> 781-768-5300
>
>
> ------------------------ Yahoo! Groups Sponsor
> ---------------------~--> Buy Ink Cartridges or Refill Kits
> for your HP, Epson, Canon or Lexmark Printer at MyInks.com.
> Free s/h on orders $50 or more to the US & Canada.
> http://www.c1tracking.com/l.asp?cid=5511
>
http://us.click.yahoo.com/mOAaAA/3exGAA/qnsNAA/W6uqlB/TM
---------------------------------------------------------------------~->


Yahoo! Groups Links

#2842 From: Steffen Persvold <sp@...>
Date: Thu Apr 8, 2004 1:42 pm
Subject: Re: List of small issues #8 - RDMA Write ordering non-guarantee
spersvol
Send Email Send Email
 
Kanevsky, Arkady wrote:
> There is no completion for RDMA Write on remote side.

I know.

> So remote consumer can not distinguish between "completions"
> of the two RDMA Write.

It can if it knows what to look for (for example a special pattern in the data).
Let's say the purpose of the (small) second write is to tell the remote consumer
that the first
(large) transfer is done. Are you saying this won't work ?

> Both RDMA Write will complete remotely when Recv completes
> For the matching Send that was posted after two RDMA Writes have been
> posted.
>
> So talking about order between 2 RDMA Writes is useless and misleading.
>

But what if Send/Recv is never used ? Why is ordering between RDMA writes
useless ? We depend on it in our MPI (i.e we only use RDMA write, no send/recv
at all)...

Regards,
--
Steffen Persvold
Senior Software Engineer
mob. +47 92 48 45 11
tel. +47 22 62 89 50
fax. +47 22 62 89 51

Scali - http://www.scali.com
High Performance Clustering

#2843 From: "Caitlin Bestler" <cait@...>
Date: Fri Apr 9, 2004 6:35 am
Subject: Re: List of small issues #8 - RDMA Write orderingnon-guarantee
caitlinbestler
Send Email Send Email
 
Steffen Persvold said:
> Kanevsky, Arkady wrote:
>> There is no completion for RDMA Write on remote side.
>
> I know.
>
>> So remote consumer can not distinguish between "completions"
>> of the two RDMA Write.
>
> It can if it knows what to look for (for example a special pattern in the
> data). Let's say the purpose of the (small) second write is to tell the
> remote consumer that the first
> (large) transfer is done. Are you saying this won't work ?
>

The data sink Consumer MUST NOT examine the data sink buffer
until after it reaps the completion. (Well, ok, since we are talking
about the Consumer, that is a SHOULD NOT, but the Consumer
MUST NOT complain when the buffer is still in a non-determinate
state because it is only guaranteed to be determinant when the
completion is generated).

The important issue here is caching. The RNIC/Provider is not
obligated to ensure that the RDMA Writes have been flushed into
Consumer accessible memory until it generates the Recv Completion.

The updated data could easily still be sitting in a buffer on the RNIC.

>> Both RDMA Write will complete remotely when Recv completes
>> For the matching Send that was posted after two RDMA Writes have been
>> posted.
>>
>> So talking about order between 2 RDMA Writes is useless and misleading.
>>
>
> But what if Send/Recv is never used ? Why is ordering between RDMA writes
> useless ? We depend on it in our MPI (i.e we only use RDMA write, no
> send/recv at all)...
>

You need to add zero length send/recvs to synchronize.

Without them the RNIC cannot know when it must flush its buffers
to host memory. If access to host memory is relatively expensive
(for example this is over a PCI bus) then it would be very likely
for the RNIC to defer host flushes as long as possible.




----------
Caitlin Bestler - cait@...
http://asomi.com/

Messages 2814 - 2843 of 4166   Oldest  |  < Older  |  Newer >  |  Newest
Add to My Yahoo!      XML What's This?

Copyright © 2010 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines NEW - Help