First part of a multi-part message lost
Description
Environment
None
Attachments
3
Activity
Show:

PieterP May 28, 2012 at 8:51 AM
Not reproducible using provided test case; if this still happens with 3.1 we'll create a new issue.

Martin Hurton May 22, 2012 at 9:16 PM
Hi Emmanuel.I cannot reproduce this problem. Which exact version of library did you use when testing? Thanks!

PieterP March 29, 2012 at 5:52 AM
Douglas, if you can test libzmq master against this issue, that'd be great.

DouglasY March 25, 2012 at 5:21 PM
does the change in this pull request fix the issue? https://github.com/zeromq/libzmq/pull/291

Joshua Foster February 9, 2012 at 4:22 PM
I just saw this also under XPUB/XSUB.
Joshua
I am using Zmq 3.0 on a Linux box.
I am using the PUB/SUB framework.
My messages are multi part messages with 4 parts.
The message is sent on a human action (hit a key)
I notice that from "time to time", I am loosing the first part of the
multi part message. Wireshark confirm this. When this happens, it is
always the first message exchanged from the publisher to the
subscriber. Following messages works fine.
There are several files attached to this bug report;
1 - pub.cpp which is the publisher code
2 - sub.cpp which is the subscriber code
3 - A wireshark file
To reproduce the problem, do the following:
1 - Start the pub with "pub <port number>" (pub 5555)
2 - In another window, start the sub with "sub <endpoint>" (sub tcp://*:5555)
3 - Wait for sub to be ready: It prints the string "entering zmq_poll"
4 - On the pub side, send a multi-part message by hiting a key then return
5 - The sub should print "Subscriber has received something"
6 - Kill the subscriber (not the publisher)
7 - Restart it
8 - When sub is ready, send a message from the pub
If it works, re-do step 6 to 8
After 4 to 5 tries, you should have a case where the message has been sent by
the publisher but not received by the subscriber.
The wireshark file also attached to this bug report is a recording
of network traffic during one of my test. Packet number 38 is a wrong one
(missing one part of the multi-part message) while packets 10 or 24 are
correct messages.
I have added some tracing info in zmq lib and tried to do some debugging.
I see the method zmq:ipe_t::write() called 4 times which I guess are due
to the four part of the multi part message
I see the execution of the encoder::get_data() method which I guess
retrieve data from the queue and send then to the BSD socket.
When I am loosing the first part of the message, this get_data()
method extract only three chunk of data. Each chunk is 2 bytes for ZMQ
protocol (I guess) plus the message part data.
If I compare with correct transmission of the multi-part message, it
should extract 4 chunks of data instead of 3.
Therefore, I guess that the first message part is lost somewhere
between this write() method of the pipe class and this
encoder::get_data() method.
Thank's for your help
Emmanuel