Uploaded image for project: 'libzmq'
  1. libzmq
  2. LIBZMQ-270

A SUB socket with a message in queue should always have ZMQ_POLLIN set

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 3.0.0
    • Fix Version/s: None
    • Component/s: core
    • Labels:
    • Environment:

      OSX, Linux

      Description

      When checking for events on a SUB socket with zmq_getsockopt(sock, ZMQ_EVENTS, &events, &size), the events variable should be set to ZMQ_POLLIN. In 3.0, it is set to 0.

      This only occurs when the SUB socket binds and the PUB connects. If you swap the bind/connect on the sockets, it works.

      This same code works in 2.1.x regardless of the order of bind/connect.

      (issue has been committed to the issues repository.)

        Gliffy Diagrams

          Attachments

            Activity

            cremes Chuck Remes created issue -
            Hide
            sustrik Martin Sustrik added a comment -

            This problem is caused by storing subscriptions in the SUB's application thread. SUB sends a subscription, then PUB connects. SUB can't send the subscription as its application thread is busy doing something else. Consequently, the subscription doesn't get to the PUB socket and the socket filters out the message.

            What's needed is storing subscriptions in I/O thread(s).

            The pre-requisite for that is patch f78d9b6bfca13e298c29fadabbbc870b37a0a573. I'll ask Pieter to backport it to 3.0.

            Show
            sustrik Martin Sustrik added a comment - This problem is caused by storing subscriptions in the SUB's application thread. SUB sends a subscription, then PUB connects. SUB can't send the subscription as its application thread is busy doing something else. Consequently, the subscription doesn't get to the PUB socket and the socket filters out the message. What's needed is storing subscriptions in I/O thread(s). The pre-requisite for that is patch f78d9b6bfca13e298c29fadabbbc870b37a0a573. I'll ask Pieter to backport it to 3.0.
            Hide
            sustrik Martin Sustrik added a comment -

            It seems there are some problems backporting the patch.

            Show
            sustrik Martin Sustrik added a comment - It seems there are some problems backporting the patch.
            Hide
            cremes Chuck Remes added a comment -

            This bug still exists on 3.1.0 beta release.

            Show
            cremes Chuck Remes added a comment - This bug still exists on 3.1.0 beta release.
            Hide
            sustrik Martin Sustrik added a comment -

            Ack. Solving this problem requires moving some of the subscription forwarding functionality to the I/O thread and is not a trivial fix. I'll try to solve this problem later on.

            Show
            sustrik Martin Sustrik added a comment - Ack. Solving this problem requires moving some of the subscription forwarding functionality to the I/O thread and is not a trivial fix. I'll try to solve this problem later on.
            cremes Chuck Remes made changes -
            Field Original Value New Value
            Priority Minor [ 4 ] Critical [ 2 ]
            Hide
            hurtonm Martin Hurton added a comment -

            It's not that zmq_getsockopt fails to indicate the message is available but that the receiver socket fails to receive the message.
            The problem is that after the sleep, the application immediately sends the message and the receiver is not yet subscribed.
            The library needs your application to call some operation (e.g. send/receive/getsockopt ...) so that it can subscribe the socket on the topic.
            To fix this, we need to make some internal changes first, which, as Martin Sustrik indicated, are not trivial.

            Show
            hurtonm Martin Hurton added a comment - It's not that zmq_getsockopt fails to indicate the message is available but that the receiver socket fails to receive the message. The problem is that after the sleep, the application immediately sends the message and the receiver is not yet subscribed. The library needs your application to call some operation (e.g. send/receive/getsockopt ...) so that it can subscribe the socket on the topic. To fix this, we need to make some internal changes first, which, as Martin Sustrik indicated, are not trivial.
            Hide
            cremes Chuck Remes added a comment -

            Changed priority from "critical" to "major" since it will require significant modification to the internals.

            Show
            cremes Chuck Remes added a comment - Changed priority from "critical" to "major" since it will require significant modification to the internals.
            cremes Chuck Remes made changes -
            Priority Critical [ 2 ] Major [ 3 ]
            Hide
            benjaminrk Min RK added a comment -

            To be specific, SUB sockets that bind will not get any message before the first they ask for. If SUB binds and PUB connects, PUB can send as many messages as slowly as it likes, and none will ever arrive before SUB's first poll/recv call. This means that many SUB-binding cases are totally unusable. Would it not make sense for the PUB connection handshake to include subscriptions?

            Show
            benjaminrk Min RK added a comment - To be specific, SUB sockets that bind will not get any message before the first they ask for. If SUB binds and PUB connects, PUB can send as many messages as slowly as it likes, and none will ever arrive before SUB's first poll/recv call. This means that many SUB-binding cases are totally unusable. Would it not make sense for the PUB connection handshake to include subscriptions?
            Hide
            pieterh Pieter Hintjens added a comment -

            There is a workaround (taken from LIBZMQ-559), that I've tested:

            zmq_pollitem_t pollitems [] = {

            { sub, 0, ZMQ_POLLIN, 0 }

            };
            zmq_poll (pollitems, 1, 1);

            See https://gist.github.com/hintjens/7344533 for a test case.

            Show
            pieterh Pieter Hintjens added a comment - There is a workaround (taken from LIBZMQ-559 ), that I've tested: zmq_pollitem_t pollitems [] = { { sub, 0, ZMQ_POLLIN, 0 } }; zmq_poll (pollitems, 1, 1); See https://gist.github.com/hintjens/7344533 for a test case.
            pieterh Pieter Hintjens made changes -
            Comment [ Here is a workaround that works:

            - use XSUB instead of SUB
            - after XSUB connects allow 20 msec for connection to establish
            - send subscription manually, as message starting with 0x01
            - allow 20 msec for publisher to receive subscription (if you need to synch it)
            ]
            pieterh Pieter Hintjens made changes -
            Comment [ Sadly the workaround doesn't work consistently... :-/ ]

              People

              • Assignee:
                sustrik Martin Sustrik
                Reporter:
                cremes Chuck Remes
              • Votes:
                3 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated: