Thanks, I didn't know about either of those! Glad I asked. :)
Steve
On Sat, Jun 6, 2009 at 3:34 AM, Andrey Brito<aembrito@...> wrote:
>
>
> Hello Stephen,
>
> For C, I know TinySTM: http://www.tinystm.org/
>
> Regarding the lock-free structures, you can try take a look at:
> http://www.audiomulch.com/~rossb/code/lockfree/
>
> If you find something else, please let us know.
>
> []'s
>
> Andrey
>
>
> ________________________________
> De: Stephen Sinclair <radarsat1@...>
> Para: liblf-dev@yahoogroups.com
> Enviadas: Sexta-feira, 5 de Junho de 2009 21:09:58
> Assunto: [liblf-dev] STM in C?
>
> Are there any open source libraries implementing software
> transactional memory in C?
> Is there a web site anywhere with a list of open source resources for
> lock-free data structures?
>
> thanks,
> Steve
De: Stephen Sinclair
<radarsat1@...> Para: liblf-dev@yahoogroups.com Enviadas: Sexta-feira, 5 de Junho de 2009 21:09:58 Assunto: [liblf-dev] STM in C?
Are there any open source libraries implementing software
transactional memory in C?
Is there a web site anywhere with a list of open source resources for
lock-free data structures?
Are there any open source libraries implementing software
transactional memory in C?
Is there a web site anywhere with a list of open source resources for
lock-free data structures?
thanks,
Steve
Multicore Programming [6.05s]
C. Leiserson, M. Herlihy, N. Shavit
Multicores are bringing about a paradigm shift in programming. The
course exposes students to fundamental issues in the design of
concurrent programs and to the techniques necessary to make effective
use of multicore machines. It combines lectures and classwork to
gradually enhance students' intuition and technique.
July 20-24, 2009 | $3,250 | 3.3 CEUs
The issue that concerns me is the ordering of memory operations on the node pointer and on the node itself. For instance, the write of the node's address to the ring could execute earlier than the writes setting up the node itself; and (less common but still possible) the read of the node's address could occur *after* the reads of the node's contents. The net result would be a reader reading garbage from a node.
Try Googling for "double-checked locking is broken" for why you need memory barriers to solve this issue; it cannot be solved in portable C.
Cheers,
Chris
On 30 Mar 2009, at 04:48, Roman Dion <romandion78@...> wrote:
I'm very sorry that I can not find my posted reply in this groups , this is third time .why ?
--- In liblf-dev@yahoogroups.com, Chris Purcell <chris.purcell.39@...> wrote:
> > There don't appear to be any memory barriers in your code. How are you > avoiding the issue of out-of-order execution on multiprocessor machines? > > Chris
Thank you , Chris . Sorry for my poor english , I will try my best to express my own opinion .
It is true that I don't use any memory barriers in my code . First of all , we could assume that the assignment is atomic in uniprocessor machine , Such as int ret = 11 ;
in asm compiled by gcc 3.4.6 , it is c7 45 fc 0b 00 00 00 movl $0xb,0xfffffffc(%ebp)
There are two key ways I used in the implementation :
1. Combination of queue with ring array ; 2.There is almost no same memory which the enqueue thread and the dequeue need to access .
Look at the struct of queue_t : #define MAX_RING_SIZE 3 typedef struct _queue_st queue_t ; struct _queue_st{ qnode_t *nodes[MAX_RING_SIZE] ;
int enqueue ; int dequeue ;
qnode_t *enlast ; } ;
The enqueue function only use member enqueue/enlast/nodes , and the dequeue function only use member dequeue and nodes . Nodes is a small ring array , when enqueue function will put a node , it will
find a NULL element of nodes , otherwise , it put this node into old
link , pointed by member enlast ; if the dequeue want to get a node ,
it will get a node from a old link , indexed by member dequeue untill the link is NULL .
The enqueue function and dequeue function can not access the
same memory at the same time , only if queue_t's member enqueue is
equal to dequeue . I assume that the assignment is atomic in uniproccessor as before , so only after queue->nodes[queue->dequeue] = NULL is successful , then enqueue function can put its node to this link . On the other hand , only after queue->nodes[old] = node is successfull , then dequeue fucntion can get the node .
Even in multiproccessor machines , if movl instruction is atomic , the dequeue function only get the NULL value after nodes[old] = node is successfull .
I'm very sorry that I can not find my posted reply in this groups , this is third time .why ?
--- In liblf-dev@yahoogroups.com, Chris Purcell <chris.purcell.39@...> wrote:
> > There don't appear to be any memory barriers in your code. How are you > avoiding the issue of out-of-order execution on multiprocessor machines? > > Chris
Thank you , Chris . Sorry for my poor english , I will try my best to express my own opinion .
It is true that I don't use any memory barriers in my code . First of all , we could assume that the assignment is atomic in uniprocessor machine , Such as int ret = 11 ;
in asm compiled by gcc 3.4.6 , it is c7 45 fc 0b 00 00 00 movl $0xb,0xfffffffc(%ebp)
There are two key ways I used in the implementation :
1. Combination of queue with ring array ; 2.There is almost no same memory which the enqueue thread and the dequeue need to access .
Look at the struct of queue_t : #define MAX_RING_SIZE 3 typedef struct _queue_st queue_t ; struct _queue_st{ qnode_t *nodes[MAX_RING_SIZE] ;
int enqueue ; int dequeue ;
qnode_t *enlast ; } ;
The enqueue function only use member enqueue/enlast/nodes , and the dequeue function only use member dequeue and nodes . Nodes is a small ring array , when enqueue function will put a node , it will
find a NULL element of nodes , otherwise , it put this node into old
link , pointed by member enlast ; if the dequeue want to get a node ,
it will get a node from a old link , indexed by member dequeue untill the link is NULL .
The enqueue function and dequeue function can not access the
same memory at the same time , only if queue_t's member enqueue is
equal to dequeue . I assume that the assignment is atomic in uniproccessor as before , so only after queue->nodes[queue->dequeue] = NULL is successful , then enqueue function can put its node to this link . On the other hand , only after queue->nodes[old] = node is successfull , then dequeue fucntion can get the node .
Even in multiproccessor machines , if movl instruction is atomic , the dequeue function only get the NULL value after nodes[old] = node is successfull .
--- In liblf-dev@yahoogroups.com, Chris Purcell <chris.purcell.39@...>
wrote:
>
> There don't appear to be any memory barriers in your code. How are you
> avoiding the issue of out-of-order execution on multiprocessor machines?
>
> Chris
Thank you , Chris . Sorry for my poor english , I will try my best to
express my own opinion .
It is true that I don't use any memory barriers in my code and this queue
can only be used in single reader and single writer . First of all , we could
assume that the assignment is atomic on uniprocessor machine , Such as
int ret = 11 ;
in asm compiled by gcc 3.4.3 , it is
c7 45 fc 0b 00 00 00 movl $0xb,0xfffffffc(%ebp)
There are two key ways I used in the implementation :
1. Combination of queue with ring array ;
2.There is almost no same memory which the enqueue thread and the dequeue
thread need to access .
Look at the struct of queue_t :
#define MAX_RING_SIZE 3
typedef struct _queue_st queue_t ;
struct _queue_st{
qnode_t *nodes[MAX_RING_SIZE] ;
int enqueue ;
int dequeue ;
qnode_t *enlast ;
} ;
The enqueue function only use member enqueue/enlast/nodes , and the dequeue
function only use member dequeue and nodes . Nodes is a small ring array , when
enqueue function will put a node , it will find a NULL element of nodes ,
otherwise , it put this node into old link , pointed by member enlast ; if the
dequeue function want to get a node , it will get a node from a old link ,
indexed by member dequeue utill the link is NULL .
The enqueue function and dequeue function can not access the same memory
at the same time , only if queue_t's member enqueue is equal to dequeue . I
assume that the assignment is atomic on uniproccessor as before , so only after
queue->nodes[queue->dequeue] = NULL is successful , then enqueue function can
put its node to this link . On the other hand , only after queue->nodes[old] =
node is successful , then dequeue fucntion can get the node .
Even on multiproccessor machines , if movl instruction is atomic ,
the dequeue function only get the NULL value after odes[old] = node is
successful .
Best Regards .
There are two files in the file section of this list. I scanned the
emails of this list quickly and I think they both ran fine for many
people and failed for some. You'll have to see for yourself. The
discussion thread starts here:
http://tech.groups.yahoo.com/group/liblf-dev/message/136
bjorn
On Oct 31, 2008, at 3:17 PM, oja_i_i wrote:
> --- In liblf-dev@yahoogroups.com, Bjorn Roche <bjorn@...> wrote:
> >
> >
> > On Oct 30, 2008, at 5:10 PM, James McCartney wrote:
> >
> > > On Wed, Oct 29, 2008 at 11:37 AM, Bjorn Roche <bjorn@...>
> > > wrote:
> > > > Plus, gcc, for example, tends to
> > > > insert memory barriers in places, like at function calls.
> > >
> > > I really don't think so. citation?
> > >
> >
> > I can't find a citation right now, so I could be wrong. If I am
> wrong,
> > sorry about that. It's obviously not behavior to count on anway,
> since
> > even if you know you are always working with gcc functions can be
> > inlined.
>
> I've read so many citations, and other theories about ring buffers and
> memory barriers. But to me they're useless until someone can write a
> piece of code that fails when barriers are absent and succeeds when
> they're in, and thus proves they are needed.
>
> Theories should help writing such a test case, but for now, I (and
> others on linux-audio-dev and jack-devel) couldn't figure that out. Or
> better said, we thought we did, but running it on the most vulnerable
> hardware we could think about (PPC SMP), we observed no failure.
>
> Any link, suggestion?
>
> --
> Olivier Guilyardi / Samalyse
>
>
>
-----------------------------
Bjorn Roche
XO Wave
Digital Audio Production and Post-Production Software
http://www.xowave.comhttp://blog.bjornroche.comhttp://myspace.com/xowave
Hello,
I just saw this postedon reddit, I'm not sure but I thought it might
be relevant to this discussion.
http://bartoszmilewski.wordpress.com/2008/11/05/who-ordered-memory-fences-on-an-\
x86/
cheers,
Steve
On Fri, Oct 31, 2008 at 2:17 PM, oja_i_i <ml@...> wrote:
> --- In liblf-dev@yahoogroups.com, Bjorn Roche <bjorn@...> wrote:
>>
>>
>> On Oct 30, 2008, at 5:10 PM, James McCartney wrote:
>>
>> > On Wed, Oct 29, 2008 at 11:37 AM, Bjorn Roche <bjorn@...>
>> > wrote:
>> > > Plus, gcc, for example, tends to
>> > > insert memory barriers in places, like at function calls.
>> >
>> > I really don't think so. citation?
>> >
>>
>> I can't find a citation right now, so I could be wrong. If I am wrong,
>> sorry about that. It's obviously not behavior to count on anway, since
>> even if you know you are always working with gcc functions can be
>> inlined.
>
> I've read so many citations, and other theories about ring buffers and
> memory barriers. But to me they're useless until someone can write a
> piece of code that fails when barriers are absent and succeeds when
> they're in, and thus proves they are needed.
>
> Theories should help writing such a test case, but for now, I (and
> others on linux-audio-dev and jack-devel) couldn't figure that out. Or
> better said, we thought we did, but running it on the most vulnerable
> hardware we could think about (PPC SMP), we observed no failure.
>
> Any link, suggestion?
>
> --
> Olivier Guilyardi / Samalyse
>
>
>
> ------------------------------------
>
> Yahoo! Groups Links
>
>
>
>
--- In liblf-dev@yahoogroups.com, Bjorn Roche <bjorn@...> wrote:
>
>
> On Oct 30, 2008, at 5:10 PM, James McCartney wrote:
>
> > On Wed, Oct 29, 2008 at 11:37 AM, Bjorn Roche <bjorn@...>
> > wrote:
> > > Plus, gcc, for example, tends to
> > > insert memory barriers in places, like at function calls.
> >
> > I really don't think so. citation?
> >
>
> I can't find a citation right now, so I could be wrong. If I am wrong,
> sorry about that. It's obviously not behavior to count on anway, since
> even if you know you are always working with gcc functions can be
> inlined.
I've read so many citations, and other theories about ring buffers and
memory barriers. But to me they're useless until someone can write a
piece of code that fails when barriers are absent and succeeds when
they're in, and thus proves they are needed.
Theories should help writing such a test case, but for now, I (and
others on linux-audio-dev and jack-devel) couldn't figure that out. Or
better said, we thought we did, but running it on the most vulnerable
hardware we could think about (PPC SMP), we observed no failure.
Any link, suggestion?
--
Olivier Guilyardi / Samalyse
On Fri, Oct 31, 2008 at 10:34 AM, Bjorn Roche <bjorn@...> wrote:
>
> On Oct 30, 2008, at 5:10 PM, James McCartney wrote:
>
>> On Wed, Oct 29, 2008 at 11:37 AM, Bjorn Roche <bjorn@...>
>> wrote:
>> > Plus, gcc, for example, tends to
>> > insert memory barriers in places, like at function calls.
>>
>> I really don't think so. citation?
>>
>
>
> I can't find a citation right now, so I could be wrong. If I am wrong,
> sorry about that. It's obviously not behavior to count on anway, since
> even if you know you are always working with gcc functions can be
> inlined.
Even with -O0?
Steve
I don't think so either. I'd be surprised if gcc inserts any memory
barriers.
But I do remember reading somewhere that a C optimizer probably won't
reorder memory operations around a function with external linkage.. perhaps
thats what you're thinking of.
Ross.
----- Original Message -----
From: "Bjorn Roche" <bjorn@...>
To: <liblf-dev@yahoogroups.com>
Sent: Saturday, November 01, 2008 1:34 AM
Subject: Re: [liblf-dev] Lock-free ring buffer memory barriers test case
>
> On Oct 30, 2008, at 5:10 PM, James McCartney wrote:
>
>> On Wed, Oct 29, 2008 at 11:37 AM, Bjorn Roche <bjorn@...>
>> wrote:
>> > Plus, gcc, for example, tends to
>> > insert memory barriers in places, like at function calls.
>>
>> I really don't think so. citation?
>>
>
>
> I can't find a citation right now, so I could be wrong. If I am wrong,
> sorry about that. It's obviously not behavior to count on anway, since
> even if you know you are always working with gcc functions can be
> inlined.
>
> bjorn
>
> -----------------------------
> Bjorn Roche
> XO Wave
> Digital Audio Production and Post-Production Software
> http://www.xowave.com
> http://blog.bjornroche.com
> http://myspace.com/xowave
>
>
>
> ------------------------------------
>
> Yahoo! Groups Links
>
>
>
>
On Oct 30, 2008, at 5:10 PM, James McCartney wrote:
> On Wed, Oct 29, 2008 at 11:37 AM, Bjorn Roche <bjorn@...>
> wrote:
> > Plus, gcc, for example, tends to
> > insert memory barriers in places, like at function calls.
>
> I really don't think so. citation?
>
I can't find a citation right now, so I could be wrong. If I am wrong,
sorry about that. It's obviously not behavior to count on anway, since
even if you know you are always working with gcc functions can be
inlined.
bjorn
-----------------------------
Bjorn Roche
XO Wave
Digital Audio Production and Post-Production Software
http://www.xowave.comhttp://blog.bjornroche.comhttp://myspace.com/xowave
On Wed, Oct 29, 2008 at 11:37 AM, Bjorn Roche <bjorn@...> wrote:
> Plus, gcc, for example, tends to
> insert memory barriers in places, like at function calls.
I really don't think so. citation?
--
--- james mccartney
--- In liblf-dev@yahoogroups.com, Bjorn Roche <bjorn@...> wrote:
>
> Many optimizing compilers will make things work, not because they
> should work, but because optimizers like to remove operations and that
> tends to make things more atomic. Plus, gcc, for example, tends to
> insert memory barriers in places, like at function calls. In many
> cases, things can work because datastructures happen to be stable
> enough, even in light of weak ordering. However, all this is difficult
> to predict -- especially if you want your app to be portable across
> compilers and CPUs.
>
> Occasionally, I've seen people comment that you have to let these
> things run a really long time in order to see failure, but that's not
> usually the case. Usually, if a test fails to fail (!) it's probably
> because your compiler has done something, or you just haven't found a
> case where it fails. Since absence of proof is not proof of absence,
> and there were a lot of doubters out there, I wrote a counterexample
> that failed on a standard PPC mac. You can find it if you search the
> archives. I'm not sure anyone ever scrutinized it (even me -- I wrote
> it super fast), so it may have done what it did due to a bug, but if
> you doubt weak ordering I assure you very smart people have spent a
> long time building it into the next revision of c++ and I don't think
> they would have done that if it weren't necessary. But you can look at
> my code if you doubt it, and if it happens to run on your machine,
> you'll have to tweak it or take my word for it, as I no longer have
> the machine that code used to run on.
Thanks for your answer. Is this the counter example you mention:
http://tech.groups.yahoo.com/group/liblf-dev/message/203 ?
--
Olivier Guilyardi / Samalyse
Many optimizing compilers will make things work, not because they
should work, but because optimizers like to remove operations and that
tends to make things more atomic. Plus, gcc, for example, tends to
insert memory barriers in places, like at function calls. In many
cases, things can work because datastructures happen to be stable
enough, even in light of weak ordering. However, all this is difficult
to predict -- especially if you want your app to be portable across
compilers and CPUs.
Occasionally, I've seen people comment that you have to let these
things run a really long time in order to see failure, but that's not
usually the case. Usually, if a test fails to fail (!) it's probably
because your compiler has done something, or you just haven't found a
case where it fails. Since absence of proof is not proof of absence,
and there were a lot of doubters out there, I wrote a counterexample
that failed on a standard PPC mac. You can find it if you search the
archives. I'm not sure anyone ever scrutinized it (even me -- I wrote
it super fast), so it may have done what it did due to a bug, but if
you doubt weak ordering I assure you very smart people have spent a
long time building it into the next revision of c++ and I don't think
they would have done that if it weren't necessary. But you can look at
my code if you doubt it, and if it happens to run on your machine,
you'll have to tweak it or take my word for it, as I no longer have
the machine that code used to run on.
bjorn
On Oct 28, 2008, at 9:06 AM, oja_i_i wrote:
> Hi,
>
> Can any of you advise a good test case that would demonstrate memory
> barriers are needed in a lock-free ring buffer ? I mean: a piece of
> code that would fail because memory barriers are missing.
>
> I wrote the following test, and it succeeds on PowerPC SMP (dual IBM
> Cell system QS-20), which is supposed to have weak memory ordering:
>
> http://svn.samalyse.com/misc/rbtest/test-int-array.c
>
> It is part of a (4 implementations) ring buffer test suite, that one
> can run with:
>
> svn co http://svn.samalyse.com/misc/rbtest
> cd rbtest
> make test
>
> Best regards,
>
> --
> Olivier Guilyardi / Samalyse
>
-----------------------------
Bjorn Roche
XO Wave
Digital Audio Production and Post-Production Software
http://www.xowave.comhttp://blog.bjornroche.comhttp://myspace.com/xowave
Hi,
Can any of you advise a good test case that would demonstrate memory
barriers are needed in a lock-free ring buffer ? I mean: a piece of
code that would fail because memory barriers are missing.
I wrote the following test, and it succeeds on PowerPC SMP (dual IBM
Cell system QS-20), which is supposed to have weak memory ordering:
http://svn.samalyse.com/misc/rbtest/test-int-array.c
It is part of a (4 implementations) ring buffer test suite, that one
can run with:
svn co http://svn.samalyse.com/misc/rbtest
cd rbtest
make test
Best regards,
--
Olivier Guilyardi / Samalyse
I suggest reading http://en.wikipedia.org/wiki/Non-blocking_synchronization
Chris
On Sun, Sep 28, 2008 at 8:45 PM, Stephen Sinclair <radarsat1@...> wrote:
> On Fri, Sep 26, 2008 at 2:12 PM, Graham Wakefield
> <lists@...> wrote:
>> You could use the terminology 'wait-free' instead to transcend the issue?
>> Though it seems that your reviewer is wrongfully assigning purpose of TAS to
>> locks, instead of seeing TAS as necessary to locks.
>
> Yes, I only read recently that there is a subtle difference in the
> semantics of wait-free and lock-free. I'll have to look into it a bit
> deeper, perhaps wait-free is a better description. I think one
> implies the other, but not the other way around.
>
> Steve
>
> ------------------------------------
>
> Yahoo! Groups Links
>
>
>
>
On Fri, Sep 26, 2008 at 2:12 PM, Graham Wakefield
<lists@...> wrote:
> You could use the terminology 'wait-free' instead to transcend the issue?
> Though it seems that your reviewer is wrongfully assigning purpose of TAS to
> locks, instead of seeing TAS as necessary to locks.
Yes, I only read recently that there is a subtle difference in the
semantics of wait-free and lock-free. I'll have to look into it a bit
deeper, perhaps wait-free is a better description. I think one
implies the other, but not the other way around.
Steve
You could use the terminology 'wait-free' instead to transcend the issue? Though it seems that your reviewer is wrongfully assigning purpose of TAS to locks, instead of seeing TAS as necessary to locks.
On Sep 25, 2008, at 8:38 PM, Stephen Sinclair wrote:
Hello,
This seemed like an appropriate place for this question.
This is quite random, but I am emailing because I was wondering if I could get your opinion on a problem I am facing. I have been developing real-time simulation (VR) software. I don't currently know any professors who are experts on this issue so I am looking for an outside opinion. I'll just pose the question and you can ignore me if you like. :)
This software is multi-threaded, and I'm trying to take advantage of asynchronous threads as much as possible by running several simultaneous simulations (one doing graphics, one doing haptics, one doing physics) and while I'm currently using a slow method of communication between my threads so as to avoid locking my data structures (which would force synchroneity), I'm looking to improve it by taking advantage of shared memory for message passing. Each thread is considered soft real-time so they cannot block.
My plan is to do this using the test-and-set atomic operation (using the HP atomic_ops library). I have implemented the basic mechanism. Without describing everything in detail, basically message slots in a buffer are marked as in-use or not, as well as empty or non-empty, using test-and-set. The threads step through the queue looking for unused slots. The write operation then looks for empty slots and 'allocates' them, using test-and-swap. The read operation marks a slot as in-use before checking the message, then marking it empty after acting.
Anyways, I have received some criticism: "the authors claim that shared memory message-passing implementations using atomic test-and-set are 'lock-free'. This seems self-contradictory, as the whole purpose of test-and-set ops are to implement locks efficiently and robustly."
So, my question is, is this criticism correct? I never claimed that all message-passing implementations were lock free, only that I was developing a lock free message passing implementation. But I don't know if this person knows something I don't, and it's anonymous so I can't ask them.
I think that I've successfully implemented a message passing system which is free of locks, taking advantage of the fact that my threads are free to be asynchronous and so don't need to wait on each other but can just place messages in the queue. The atomic operations are simple, but sufficient to ensure that slots are not simultaneously used by multiple threads. Certainly my code doesn't implement any spinlocks or use pthread mutexes. Can I honestly say that I have implemented a lock free implementation, or is there something hidden in the atomic_ops library that does something contradicting my claim?
Also, do you have any knowledge of other lock-free message passing implementations, making use of atomic ops or not?
Thanks for any help you can offer. By the way, somewhat more on-topic, in the code I am talking about the message passing stuff is being implemented as a separate library and is GPL. It's available and seems to work okay, but I'll publish the URL when it's more ready. One issue I've had is that writing unit tests for the code is quite difficult since I don't know what to check for to see if any errors occurred. (Other than a segfault of course.)
Andrey,
Thanks a lot for your response, it gives me some confidence! :)
Also, I'll take a good look at that reference.
I found this list because I was looking for a place to pose this
question, but I'm actually quite interested in these types of data
structures since I do a lot of real-time programming. I'll hang
around on the list and see if I can contribute.
cheers,
Steve
On Fri, Sep 26, 2008 at 4:23 AM, Andrey Brito <aembrito@...> wrote:
>
> Hello Stephen,
>
> if this review you've got were right, there wouldn't exist lock-free
> algorithms, as the bases for lock-freedom are compare-and-swap and
> test-and-set. :-)
>
> But on the other hand, using test-and-set (or swap), doesn't guarantee that
> an algorithm is lock-free (e.g., TAS is used to implement locks). I would
> say something like: an algorithm is lock-free when a thread that stops can
> never prevent others to make progress (e.g., because it locked something,
> but did not released the lock).
>
> Anyway, there are a couple of papers from Keir Fraser dealing with
> lock-freedom, I am sure you can find a good formal definition with a
> reference to insert it in your paper.
>
> Good luck with your system,
>
> Andrey
>
>
>
> ----- Mensagem original ----
> De: Stephen Sinclair <radarsat1@...>
> Para: liblf-dev@yahoogroups.com
> Enviadas: Sexta-feira, 26 de Setembro de 2008 5:38:21
> Assunto: [liblf-dev] question on lock-free message passing, is it possible?
>
> Hello,
>
> This seemed like an appropriate place for this question.
>
> This is quite random, but I am emailing because I was wondering if I
> could get your opinion on a problem I am facing. I have been
> developing real-time simulation (VR) software. I don't currently know
> any professors who are experts on this issue so I
> am looking for an outside opinion. I'll just pose the question and
> you can ignore me if you like. :)
>
> This software is multi-threaded, and I'm trying to take advantage of
> asynchronous threads as much as possible by running several
> simultaneous simulations (one doing graphics, one doing haptics, one
> doing physics) and while I'm currently using a slow method of
> communication between my threads so as to avoid locking my data
> structures (which would force synchroneity) , I'm looking to improve it
> by taking advantage of shared memory for message passing. Each thread
> is considered soft real-time so they cannot block.
>
> My plan is to do this using the test-and-set atomic operation (using
> the HP atomic_ops library). I have implemented the basic mechanism.
> Without describing everything in detail, basically message slots in a
> buffer are marked as in-use or not, as well as empty or non-empty,
> using test-and-set. The threads step through the queue looking for
> unused slots. The write operation then looks for empty slots and
> 'allocates' them, using test-and-swap. The read operation marks a
> slot as in-use before checking the message, then marking it empty
> after acting.
>
> Anyways, I have received some criticism: "the authors claim that
> shared memory message-passing implementations using atomic
> test-and-set are 'lock-free'. This seems self-contradictory, as the
> whole purpose of test-and-set ops are to implement locks efficiently
> and robustly."
>
> So, my question is, is this criticism correct? I never claimed that
> all message-passing implementations were lock free, only that I was
> developing a lock free message passing implementation. But I don't
> know if this person knows something I don't, and it's anonymous so I
> can't ask them.
>
> I think that I've successfully implemented a message passing system
> which is free of locks, taking advantage of the fact that my threads
> are free to be asynchronous and so don't need to wait on each other
> but can just place messages in the queue. The atomic operations are
> simple, but sufficient to ensure that slots are not simultaneously
> used by multiple threads. Certainly my code doesn't implement any
> spinlocks or use pthread mutexes. Can I honestly say that I have
> implemented a lock free implementation, or is there something hidden
> in the atomic_ops library that does something contradicting my claim?
>
> Also, do you have any knowledge of other lock-free message passing
> implementations, making use of atomic ops or not?
>
> Thanks for any help you can offer. By the way, somewhat more
> on-topic, in the code I am talking about the message passing stuff is
> being implemented as a separate library and is GPL. It's available
> and seems to work okay, but I'll publish the URL when it's more ready.
> One issue I've had is that writing unit tests for the code is quite
> difficult since I don't know what to check for to see if any errors
> occurred. (Other than a segfault of course.)
>
> Steve
>
> ________________________________
> Novos endereços, o Yahoo! que você conhece. Crie um email novo com a sua
> cara @ymail.com ou @rocketmail.com.
if this review you've got were right, there wouldn't exist lock-free algorithms, as the bases for lock-freedom are compare-and-swap and test-and-set. :-)
But on the other hand, using test-and-set (or swap), doesn't guarantee that an algorithm is lock-free (e.g., TAS is used to implement locks). I would say something like: an algorithm is lock-free when a thread that stops can never prevent others to make progress (e.g., because it locked something, but did not released the lock).
Anyway, there are a couple of papers from Keir Fraser dealing with lock-freedom, I am sure you can find a good formal definition with a reference to insert it in your paper.
Good luck with your system,
Andrey
----- Mensagem original ---- De: Stephen Sinclair <radarsat1@...> Para: liblf-dev@yahoogroups.com Enviadas: Sexta-feira, 26 de Setembro de 2008 5:38:21 Assunto: [liblf-dev] question on lock-free message passing, is it possible?
Hello,
This seemed like an appropriate place for this question.
This is quite random, but I am emailing because I was wondering if I
could get your opinion on a problem I am facing. I have been
developing real-time simulation (VR) software. I don't currently know
any professors who are experts on this issue so I
am looking for an outside opinion. I'll just pose the question and
you can ignore me if you like. :)
This software is multi-threaded, and I'm trying to take advantage of
asynchronous threads as much as possible by running several
simultaneous simulations (one doing graphics, one doing haptics, one
doing physics) and while I'm currently using a slow method of
communication between my threads so as to avoid locking my data
structures (which would force synchroneity) , I'm looking to improve it
by taking advantage of shared memory for message passing. Each thread
is considered soft real-time so they cannot block.
My plan is to do this using the test-and-set atomic operation (using
the HP atomic_ops library). I have implemented the basic mechanism.
Without describing everything in detail, basically message slots in a
buffer are marked as in-use or not, as well as empty or non-empty,
using test-and-set. The threads step through the queue looking for
unused slots. The write operation then looks for empty slots and
'allocates' them, using test-and-swap. The read operation marks a
slot as in-use before checking the message, then marking it empty
after acting.
Anyways, I have received some criticism: "the authors claim that
shared memory message-passing implementations using atomic
test-and-set are 'lock-free'. This seems self-contradictory, as the
whole purpose of test-and-set ops are to implement locks efficiently
and robustly."
So, my question is, is this criticism correct? I never claimed that
all message-passing implementations were lock free, only that I was
developing a lock free message passing implementation. But I don't
know if this person knows something I don't, and it's anonymous so I
can't ask them.
I think that I've successfully implemented a message passing system
which is free of locks, taking advantage of the fact that my threads
are free to be asynchronous and so don't need to wait on each other
but can just place messages in the queue. The atomic operations are
simple, but sufficient to ensure that slots are not simultaneously
used by multiple threads. Certainly my code doesn't implement any
spinlocks or use pthread mutexes. Can I honestly say that I have
implemented a lock free implementation, or is there something hidden
in the atomic_ops library that does something contradicting my claim?
Also, do you have any knowledge of other lock-free message passing
implementations, making use of atomic ops or not?
Thanks for any help you can offer. By the way, somewhat more
on-topic, in the code I am talking about the message passing stuff is
being implemented as a separate library and is GPL. It's available
and seems to work okay, but I'll publish the URL when it's more ready.
One issue I've had is that writing unit tests for the code is quite
difficult since I don't know what to check for to see if any errors
occurred. (Other than a segfault of course.)
Steve
Novos endereços, o Yahoo! que você conhece. Crie um email novo com a sua cara @ymail.com ou @rocketmail.com.
Hello,
This seemed like an appropriate place for this question.
This is quite random, but I am emailing because I was wondering if I
could get your opinion on a problem I am facing. I have been
developing real-time simulation (VR) software. I don't currently know
any professors who are experts on this issue so I
am looking for an outside opinion. I'll just pose the question and
you can ignore me if you like. :)
This software is multi-threaded, and I'm trying to take advantage of
asynchronous threads as much as possible by running several
simultaneous simulations (one doing graphics, one doing haptics, one
doing physics) and while I'm currently using a slow method of
communication between my threads so as to avoid locking my data
structures (which would force synchroneity), I'm looking to improve it
by taking advantage of shared memory for message passing. Each thread
is considered soft real-time so they cannot block.
My plan is to do this using the test-and-set atomic operation (using
the HP atomic_ops library). I have implemented the basic mechanism.
Without describing everything in detail, basically message slots in a
buffer are marked as in-use or not, as well as empty or non-empty,
using test-and-set. The threads step through the queue looking for
unused slots. The write operation then looks for empty slots and
'allocates' them, using test-and-swap. The read operation marks a
slot as in-use before checking the message, then marking it empty
after acting.
Anyways, I have received some criticism: "the authors claim that
shared memory message-passing implementations using atomic
test-and-set are 'lock-free'. This seems self-contradictory, as the
whole purpose of test-and-set ops are to implement locks efficiently
and robustly."
So, my question is, is this criticism correct? I never claimed that
all message-passing implementations were lock free, only that I was
developing a lock free message passing implementation. But I don't
know if this person knows something I don't, and it's anonymous so I
can't ask them.
I think that I've successfully implemented a message passing system
which is free of locks, taking advantage of the fact that my threads
are free to be asynchronous and so don't need to wait on each other
but can just place messages in the queue. The atomic operations are
simple, but sufficient to ensure that slots are not simultaneously
used by multiple threads. Certainly my code doesn't implement any
spinlocks or use pthread mutexes. Can I honestly say that I have
implemented a lock free implementation, or is there something hidden
in the atomic_ops library that does something contradicting my claim?
Also, do you have any knowledge of other lock-free message passing
implementations, making use of atomic ops or not?
Thanks for any help you can offer. By the way, somewhat more
on-topic, in the code I am talking about the message passing stuff is
being implemented as a separate library and is GPL. It's available
and seems to work okay, but I'll publish the URL when it's more ready.
One issue I've had is that writing unit tests for the code is quite
difficult since I don't know what to check for to see if any errors
occurred. (Other than a segfault of course.)
Steve
Sorry I missed the licensing question. I don't want to lock down to
GPL either, but I am happy to be compatible with it. My understanding
has always been that BSD is compatible with GPL and LGPL, so I don't
see a need for tri-license, but if it clarifies any ambiguity that
works for me. Ideally, I'd like to keep it simple, available for
commercial purposes, and flexible enough for open source projects as
well. Personally, I'm quite happy with public domain, with something
BSD style simply as a request, because I think people honor those
kinds of requests as much as they honor actual requirements, but
whatever other folks want is good, too.
bjorn
On Jul 2, 2008, at 5:02 PM, Graham Wakefield wrote:
> Hi - great stuff!
>
> I'll do anything I can to help. I develop typically on OSX, but
> currently need to port to Linux (Ubuntu Hardy), so I can help
> testing etc, even if the actual atomic stuff is a little over my head.
>
> It's true that C++0x is on the way with better atomic/lock-free
> stuff, but it *will* still be a while before this is supported by
> the compilers that the masses are using ;-)
>
> I'd like to second Reed's question about the licensing. I recently
> saw the MPL-tri-license [1], which was new to me, and looks very
> appealing for my own work; flexible enough to support GPL, LGPL and
> BSD style projects. It would be a real shame to lock this into a
> GPL-only situation.
>
> Anyway, let me know if there's anything I can help with!
>
> Graham
>
> [1] http://www.mozilla.org/MPL/boilerplate-1.1/mpl-tri-license-c
>
> On Jul 2, 2008, at 12:42 PM, Bjorn Roche wrote:
>
>> Hey Reed,
>>
>> I haven't even looked at this stuff in forever, but I'm happy to
>> revisit if there's interest. I think the idea of some simple
>> datastructures is a good one. Of course, we should also keep in mind
>> that C++0x is going to have much better support for lock-free stuff
>> built in, and it may not be worth the effort. That said, I'm willing
>> to do some work if others are!
>>
>> bjorn
>>
>> On Jul 2, 2008, at 2:27 PM, Reed Hedges wrote:
>>
>> >
>> >
>> > Hi Bjorn, I'm learning about what's available in different
>> compilers
>> > in the way
>> > of atomic compare and swap (and other operations).
>> >
>> > Are you still working on this? It looks like you have C&S, for
>> > example, for OSX
>> > but not the others.
>> >
>> > I'd like to have some kind of lock free associative container,
>> maybe
>> > a skip list
>> > or something, maybe just a linked list of key/value records, not
>> > sure yet, that
>> > transparently falls back on granular locking if there's no atomic
>> > C&S (one mutex
>> > per node) -- probably by making the user supply a mutex to the
>> > "atomic"
>> > functions (even if it doesn't get used on platforms that have
>> atomic
>> > C&S).
>> >
>> > What do you think?
>> >
>> > Have you used ThreadSafeList much? How about the AsyncOperation
>> queue?
>> >
>> > Finally, what kind of license were you planning to use for this
>> code?
>> >
>> > Reed
>> >
>> > Bjorn Roche wrote:
>> > >
>> > >
>> > > Hey all,
>> > >
>> > > I've started a subversion repo here: http://liblf.xowave.com/.
>> > > <http://liblf.xowave.com/.> It is
>> > > publicly readable, and I'm happy to give people write access to
>> > folks
>> > > who want to contribute. It's pretty sloppy at the moment, but I
>> > think
>> > > it's a start.
>> > >
>> > > Here's a quick description of the files:
>> > >
>> > > Atomic.h - has atomic primitives for things needed in the rest of
>> > the
>> > > code. Conditional compiling should make this extensible to any
>> > > platform, but right now it works on Mac OS X and might be good
>> > enough
>> > > for Linux. I'd love to see the Linux side fixed up, not to
>> mention
>> > > some windows work.
>> > >
>> > > Fifo.h - your basic non-blocking fifo. Generically typed. There
>> is
>> > > also some old commented code that could be used as a blocking
>> Fifo.
>> > >
>> > > Async.cc
>> > > Async.h - implement an asynchronous Queue of "operations" that
>> can
>> > be
>> > > performed in a separate thread. Useful for reading from or to a
>> file
>> > > in the background. A nice improvement on this would be an
>> interface
>> > > for reclaiming completed operations.
>> > >
>> > > sleep.cc
>> > > sleep.h - just for short sleeps. Useful for testing and also
>> the few
>> > > functions that "block".
>> > >
>> > > ThreadSafeList.h - a Queue that can (in theory) be edited from
>> one
>> > > thread while iterated through in another.
>> > >
>> > > makefile - building and cleaning and running the test program. I
>> > > think this requires gmake.
>> > >
>> > > test.cc - some tests.
>> > >
>> > > bjorn
>> > >
>> > > -----------------------------
>> > > Bjorn Roche
>> > > XO Wave
>> > > Digital Audio Production and Post-Production Software
>> > > http://www.xowave.com <http://www.xowave.com>
>> > > http://blog.bjornroche.com <http://blog.bjornroche.com>
>> > > http://myspace.com/xowave <http://myspace.com/xowave>
>> > >
>> > >
>> >
>> >
>> >
>>
>> -----------------------------
>> Bjorn Roche
>> XO Wave
>> Digital Audio Production and Post-Production Software
>> http://www.xowave.com
>> http://blog.bjornroche.com
>> http://myspace.com/xowave
>>
>>
>>
>
> Be seeing you
>
> grrr waaa
> www.grahamwakefield.net
>
-----------------------------
Bjorn Roche
XO Wave
Digital Audio Production and Post-Production Software
http://www.xowave.comhttp://blog.bjornroche.comhttp://myspace.com/xowave
Hey Reed,
I haven't even looked at this stuff in forever, but I'm happy to
revisit if there's interest. I think the idea of some simple
datastructures is a good one. Of course, we should also keep in mind
that C++0x is going to have much better support for lock-free stuff
built in, and it may not be worth the effort. That said, I'm willing
to do some work if others are!
bjorn
On Jul 2, 2008, at 2:27 PM, Reed Hedges wrote:
>
>
> Hi Bjorn, I'm learning about what's available in different compilers
> in the way
> of atomic compare and swap (and other operations).
>
> Are you still working on this? It looks like you have C&S, for
> example, for OSX
> but not the others.
>
> I'd like to have some kind of lock free associative container, maybe
> a skip list
> or something, maybe just a linked list of key/value records, not
> sure yet, that
> transparently falls back on granular locking if there's no atomic
> C&S (one mutex
> per node) -- probably by making the user supply a mutex to the
> "atomic"
> functions (even if it doesn't get used on platforms that have atomic
> C&S).
>
> What do you think?
>
> Have you used ThreadSafeList much? How about the AsyncOperation queue?
>
> Finally, what kind of license were you planning to use for this code?
>
> Reed
>
> Bjorn Roche wrote:
> >
> >
> > Hey all,
> >
> > I've started a subversion repo here: http://liblf.xowave.com/.
> > <http://liblf.xowave.com/.> It is
> > publicly readable, and I'm happy to give people write access to
> folks
> > who want to contribute. It's pretty sloppy at the moment, but I
> think
> > it's a start.
> >
> > Here's a quick description of the files:
> >
> > Atomic.h - has atomic primitives for things needed in the rest of
> the
> > code. Conditional compiling should make this extensible to any
> > platform, but right now it works on Mac OS X and might be good
> enough
> > for Linux. I'd love to see the Linux side fixed up, not to mention
> > some windows work.
> >
> > Fifo.h - your basic non-blocking fifo. Generically typed. There is
> > also some old commented code that could be used as a blocking Fifo.
> >
> > Async.cc
> > Async.h - implement an asynchronous Queue of "operations" that can
> be
> > performed in a separate thread. Useful for reading from or to a file
> > in the background. A nice improvement on this would be an interface
> > for reclaiming completed operations.
> >
> > sleep.cc
> > sleep.h - just for short sleeps. Useful for testing and also the few
> > functions that "block".
> >
> > ThreadSafeList.h - a Queue that can (in theory) be edited from one
> > thread while iterated through in another.
> >
> > makefile - building and cleaning and running the test program. I
> > think this requires gmake.
> >
> > test.cc - some tests.
> >
> > bjorn
> >
> > -----------------------------
> > Bjorn Roche
> > XO Wave
> > Digital Audio Production and Post-Production Software
> > http://www.xowave.com <http://www.xowave.com>
> > http://blog.bjornroche.com <http://blog.bjornroche.com>
> > http://myspace.com/xowave <http://myspace.com/xowave>
> >
> >
>
>
>
-----------------------------
Bjorn Roche
XO Wave
Digital Audio Production and Post-Production Software
http://www.xowave.comhttp://blog.bjornroche.comhttp://myspace.com/xowave
Hi Bjorn, I'm learning about what's available in different compilers in the way
of atomic compare and swap (and other operations).
Are you still working on this? It looks like you have C&S, for example, for OSX
but not the others.
I'd like to have some kind of lock free associative container, maybe a skip list
or something, maybe just a linked list of key/value records, not sure yet, that
transparently falls back on granular locking if there's no atomic C&S (one mutex
per node) -- probably by making the user supply a mutex to the "atomic"
functions (even if it doesn't get used on platforms that have atomic C&S).
What do you think?
Have you used ThreadSafeList much? How about the AsyncOperation queue?
Finally, what kind of license were you planning to use for this code?
Reed
Bjorn Roche wrote:
>
>
> Hey all,
>
> I've started a subversion repo here: http://liblf.xowave.com/.
> <http://liblf.xowave.com/.> It is
> publicly readable, and I'm happy to give people write access to folks
> who want to contribute. It's pretty sloppy at the moment, but I think
> it's a start.
>
> Here's a quick description of the files:
>
> Atomic.h - has atomic primitives for things needed in the rest of the
> code. Conditional compiling should make this extensible to any
> platform, but right now it works on Mac OS X and might be good enough
> for Linux. I'd love to see the Linux side fixed up, not to mention
> some windows work.
>
> Fifo.h - your basic non-blocking fifo. Generically typed. There is
> also some old commented code that could be used as a blocking Fifo.
>
> Async.cc
> Async.h - implement an asynchronous Queue of "operations" that can be
> performed in a separate thread. Useful for reading from or to a file
> in the background. A nice improvement on this would be an interface
> for reclaiming completed operations.
>
> sleep.cc
> sleep.h - just for short sleeps. Useful for testing and also the few
> functions that "block".
>
> ThreadSafeList.h - a Queue that can (in theory) be edited from one
> thread while iterated through in another.
>
> makefile - building and cleaning and running the test program. I
> think this requires gmake.
>
> test.cc - some tests.
>
> bjorn
>
> -----------------------------
> Bjorn Roche
> XO Wave
> Digital Audio Production and Post-Production Software
> http://www.xowave.com <http://www.xowave.com>
> http://blog.bjornroche.com <http://blog.bjornroche.com>
> http://myspace.com/xowave <http://myspace.com/xowave>
>
>
Hey all,
I've been using Threading Building Blocks for the past year, and I'm
very happy with it. Lately I have been researching wait-free
algorithms, and have been reading "The Art of Multiprocessor
Programming". I would like to cooperate to implement some wait-free
algorithms using TBB as a basis. I fully appreciate the difficulty of
developing wait-free algorithms, so if anyone has a source of
algorithms that could just be coded into TBB in C++ I would be happy
to do so... although I suspect many algorithms may require fine tuning
to use the cache effectively.
Looking forward to helping. Also, I have an SVN repository setup as
part of "TBB Community Code" which hosts open source libraries
licensed under the same open source license as TBB itself.
I lurk on #tbb on irc.freenode.net, so come on out and chat.
AJ