Loosing first multi-part message when using PGM

Description

I am using pub/sub pattern.
In my application, on the publisher side, I have a publisher socket
using TCP transport and another publisher socket using PGM transport.
I am able to manually trigger the PGM publisher to publish 1 multipart
message.
There are messages published at regular interval in the TCP publisher
(every 9 seconds).

I start the subscriber process which is doing nothing else then waiting
for ZMQ messages.
To do so, it has a thread listening for ZMQ sockets using the zmq:face with tongueoll
call. IT has 2 sockets (one sub for TCP and another one for PGM)
The subscriber process receives without any problem the TCP messages.
Nevertheless, the first message sent by the PGM publisher is lost
because the zmq:face with tongueoll() call does not return for this first PGM message!

I had a look at network packets using wireshark on the subscriber side.
I see the ODATA packet sent by the first PGM message from the
publisher host.

The problem is the same if I use epgm instead of pgm.
The same code works fine if all messages are transported using TCP.

The problem disappears if I use simple message instead of multipart message

Environment

None

Attachments

3
  • 22 Nov 2012, 04:19 PM
  • 03 Feb 2012, 01:21 PM
  • 03 Feb 2012, 01:20 PM

Activity

Show:

Martin Hurton March 25, 2013 at 3:57 PM
Edited

Thanks David. I am working on a patch that changes how identities are handled and it should fix this bug. I hope to review it and have it ready for merge soon.

David Walthour March 25, 2013 at 3:46 PM
Edited

This bug is occurring because the session_base_t class is expecting the first packet received to be an identity packet. However, for the pgm receiver this is never the case because the receiver is joining an ongoing stream, so the first packet is lost because the session_base_t mistakenly assumes it to be an identity packet and doesn't pass it to the receiver. I patched this bug by adding a setter for the identity_received variable and having the pgm_receiver class set this to true when the session_base_t is called by the pgm_receiver plug function. Hope this is properly addressed in a future release.

Martin Hurton November 23, 2012 at 1:52 PM

Can confirm the bug.
I hope the fix will be part of 3.2.3.

taurelf November 22, 2012 at 4:20 PM

Yes, both pub and sub uses 3.2.1 rc2
I have attached the wireshark trace.
First multi-part message is packet 14. The second one is packet 17

Good luck

Martin Hurton November 22, 2012 at 4:07 PM

Both pub and sub using 3.2.1 rc2?
Can you please post the packet capture?

Details

Assignee

Reporter

Components

Affects versions

Priority

Created October 6, 2011 at 12:06 PM
Updated March 25, 2013 at 3:57 PM