Pub socket sending fail issue
Description
Environment
arm android
Activity
Christian Kamm June 29, 2013 at 4:04 PMEdited
Pull request link: https://github.com/zeromq/zeromq3-x/pull/92
WooSung Lee June 29, 2013 at 3:42 AMEdited
Christian : I made pull request.
I request as my first suggested code.
I referred previous discussed mail(Christian Kamm noticed, http://lists.zeromq.org/pipermail/zeromq-dev/2013-March/020897.html) and zmq::dist_t::write() failed case codes.
Christian Kamm June 27, 2013 at 4:42 PM
WooSung: Yes, but note that I'm new to zmq myself. It seems like moving the to-be-deleted pipe into the not-eligible range would suffice. Then the erase() call can only move another non-eligible pipe into the deleted one's spot.
WooSung Lee June 27, 2013 at 12:05 PM
Thanks, Christian and David.
I'll make pull request.
Christian, Do you want to swap only about eligible-1 ?
Christian Kamm June 27, 2013 at 9:26 AM
Okay, so the problem is that dist:ipe_terminate() doesn't realize array::erase(N) works by reordering, moving the last element into the Nth spot in the array. That way pipe_terminate can accidentally move an ineligible pipe into the matching, active or eligible part of the pipes array, displacing an active one.
Example:
WooSung Lee, your patch looks good, will you make a pull request? Since matching <= active <= eligible, you could probably get away with only adding the swap to eligible-1.
This should be easy to reproduce with three subscribers: make the third hit the HWM and then terminate the first. The second should stop getting data. I'll give that a try.
1. Symptom :
If there are A, B, C three nodes, they connected each other with zmq.
The A Pub socket is connected to B, C Sub socket.
When B node is terminated abnormally and start zmq connection repeatedly, sometimes existed A Pub socket cannot send to C Sub socket.
2. Cause :
In the pub socket, One pipe was write() failed because status was not active. And that pipe entered terminated() function.
After the pipe was erased, sometimes the existed normal(active) pipe array index will have equal or greater number than eligible number.
So, this normal pipe cannot be sent, because it does not satisfy the match() function condition.(if (pipes.index (pipe_) >= eligible) return)
3. Solution :
Tha active status pipe index must be located in less array index than eligible number.
For that, We added swap before erase in the terminated() function.
---The original codes
dist.cpp
---The modified codes
4. For example :
There is array size is 4, initially matching is 2, eligible is 4, active is 4.
I will display m(2), e(4), a(4),
— In the original codes
— In the modified codes