libzmq
  1. libzmq
  2. LIBZMQ-356

asserton in mailbox.cpp:84 (zeromq-2.1.11).

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.1.11
    • Fix Version/s: None
    • Component/s: core
    • Labels:
    • Environment:

      qemu virtual cpu (cpu64-rhel6)
      3.00GHz 1 Gb Ram
      Microsoft Windows XP

      Description

      From time to time we've got an assertion in mailbox.cpp:84

        Activity

        Hide
        Aleksandr Danshyn added a comment -

        Abort raises under heavy load.

        GDB backtrace:

        #0 0xb7794424 in __kernel_vsyscall ()
        #1 0xb75af851 in raise () from /lib/tls/i686/nosegneg/libc.so.6
        #2 0xb75b2d42 in abort () from /lib/tls/i686/nosegneg/libc.so.6
        #3 0xb76efa37 in zmq::zmq_abort (errmsg_=0xb770f2fb "ok") at err.cpp:76
        #4 0xb76f2855 in zmq::mailbox_t::recv (this=0x83531c0, cmd_=0xb72d55e8, timeout_=0) at mailbox.cpp:84
        #5 0xb76fb7db in zmq::reaper_t::in_event (this=0x83531b0) at reaper.cpp:64
        #6 0xb76eeee1 in zmq::epoll_t::loop (this=0x8353d88) at epoll.cpp:161
        #7 0xb7706266 in thread_routine (arg_=0x8353dcc) at thread.cpp:75
        #8 0xb772b985 in start_thread () from /lib/tls/i686/nosegneg/libpthread.so.0
        #9 0xb765513e in clone () from /lib/tls/i686/nosegneg/libc.so.6

        Strace:
        2089 <... epoll_wait resumed> {{EPOLLIN,

        {u32=166350936, u64=166350936}

        }}, 256, -1) = 1
        2089 poll([

        {fd=4, events=POLLIN}

        ], 1, 0) = 1 ([

        {fd=4, revents=POLLIN}

        ])
        2089 write(2, "Assertion failed: ok (mailbox.cp"..., 38 <unfinished ...>
        2089 <... write resumed> ) = 38
        2089 rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
        2089 tgkill(2088, 2089, SIGABRT) = 0
        2089 — SIGABRT (Aborted) @ 0 (0) —

        Also, the same situation was described there: http://lists.zeromq.org/pipermail/zeromq-dev/2012-February/015961.html

        Take a look, please

        Show
        Aleksandr Danshyn added a comment - Abort raises under heavy load. GDB backtrace: #0 0xb7794424 in __kernel_vsyscall () #1 0xb75af851 in raise () from /lib/tls/i686/nosegneg/libc.so.6 #2 0xb75b2d42 in abort () from /lib/tls/i686/nosegneg/libc.so.6 #3 0xb76efa37 in zmq::zmq_abort (errmsg_=0xb770f2fb "ok") at err.cpp:76 #4 0xb76f2855 in zmq::mailbox_t::recv (this=0x83531c0, cmd_=0xb72d55e8, timeout_=0) at mailbox.cpp:84 #5 0xb76fb7db in zmq::reaper_t::in_event (this=0x83531b0) at reaper.cpp:64 #6 0xb76eeee1 in zmq::epoll_t::loop (this=0x8353d88) at epoll.cpp:161 #7 0xb7706266 in thread_routine (arg_=0x8353dcc) at thread.cpp:75 #8 0xb772b985 in start_thread () from /lib/tls/i686/nosegneg/libpthread.so.0 #9 0xb765513e in clone () from /lib/tls/i686/nosegneg/libc.so.6 Strace: 2089 <... epoll_wait resumed> {{EPOLLIN, {u32=166350936, u64=166350936} }}, 256, -1) = 1 2089 poll([ {fd=4, events=POLLIN} ], 1, 0) = 1 ([ {fd=4, revents=POLLIN} ]) 2089 write(2, "Assertion failed: ok (mailbox.cp"..., 38 <unfinished ...> 2089 <... write resumed> ) = 38 2089 rt_sigprocmask(SIG_UNBLOCK, [ABRT] , NULL, 8) = 0 2089 tgkill(2088, 2089, SIGABRT) = 0 2089 — SIGABRT (Aborted) @ 0 (0) — Also, the same situation was described there: http://lists.zeromq.org/pipermail/zeromq-dev/2012-February/015961.html Take a look, please
        Hide
        Aleksandr Danshyn added a comment -

        Tested on the stable 2.2 - the same assertion.

        Show
        Aleksandr Danshyn added a comment - Tested on the stable 2.2 - the same assertion.
        Hide
        meng added a comment - - edited

        We are using zmq2.2.0 in production. Recently,we encountered the exact same issue. Below are our code snippet to switch endpoint of the subscriber sockets.

        static void *
        subscriber_thread(void *priv)
        {
        void *context = zmq_init(2);
        void *command_socket = zmq_socket(context, ZMQ_REP);
        zmq_bind(command_socket, "tcp://127.0.0.1:10000");

        void *subscriber = NULL;
        char *zmq_endpoint = NULL;
        int linger = 0;
        reconnect:
        subscriber = zmq_socket(context,ZMQ_SUB);
        int rc = zmq_connect(subscriber,backends_priv->zeromq_endpoint);
        if(rc)
        LOG_ERROR("zmq_connect error [%d]",rc);
        const char *filter = "";
        zmq_setsockopt(subscriber, ZMQ_SUBSCRIBE, filter, strlen(filter));
        zmq_setsockopt(subscriber, ZMQ_LINGER, &linger, sizeof (int));

        while(1){
        LOG_INFO("POLL");
        zmq_pollitem_t items[2];
        items[0].socket = command_socket;
        items[0].fd = 0;
        items[0].events = ZMQ_POLLIN;

        items[1].socket = subscriber;
        items[1].fd = 0;
        items[1].events = ZMQ_POLLIN;

        int result = zmq_poll(items, 2, -1);

        if(result <= 0)

        { LOG_INFO("zmq_poll result: [%d,%s]",result,zmq_strerror(zmq_errno())); continue; }

        if(items[0].revents > 0){

        zmq_endpoint = zstr_recv(command_socket);
        LOG_INFO("going to switch to [%s]",zmq_endpoint);
        if(zmq_endpoint){
        //zmq_disconnect is expected here?
        if(subscriber)

        { zmq_close(subscriber); }

        switch_zeromq_endpoint(zmq_endpoint,backends_priv);
        goto reconnect;

        }
        }

        if(items[1].revents > 0)

        { //business logic here. }

        }

        zmq_close(subscriber);
        zmq_close(command_socket);
        zmq_term(context);

        NEEDLESS_RETURN(NULL);

        }

        Show
        meng added a comment - - edited We are using zmq2.2.0 in production. Recently,we encountered the exact same issue. Below are our code snippet to switch endpoint of the subscriber sockets. static void * subscriber_thread(void *priv) { void *context = zmq_init(2); void *command_socket = zmq_socket(context, ZMQ_REP); zmq_bind(command_socket, "tcp://127.0.0.1:10000"); void *subscriber = NULL; char *zmq_endpoint = NULL; int linger = 0; reconnect: subscriber = zmq_socket(context,ZMQ_SUB); int rc = zmq_connect(subscriber,backends_priv->zeromq_endpoint); if(rc) LOG_ERROR("zmq_connect error [%d] ",rc); const char *filter = ""; zmq_setsockopt(subscriber, ZMQ_SUBSCRIBE, filter, strlen(filter)); zmq_setsockopt(subscriber, ZMQ_LINGER, &linger, sizeof (int)); while(1){ LOG_INFO("POLL"); zmq_pollitem_t items [2] ; items [0] .socket = command_socket; items [0] .fd = 0; items [0] .events = ZMQ_POLLIN; items [1] .socket = subscriber; items [1] .fd = 0; items [1] .events = ZMQ_POLLIN; int result = zmq_poll(items, 2, -1); if(result <= 0) { LOG_INFO("zmq_poll result: [%d,%s]",result,zmq_strerror(zmq_errno())); continue; } if(items [0] .revents > 0){ zmq_endpoint = zstr_recv(command_socket); LOG_INFO("going to switch to [%s] ",zmq_endpoint); if(zmq_endpoint){ //zmq_disconnect is expected here? if(subscriber) { zmq_close(subscriber); } switch_zeromq_endpoint(zmq_endpoint,backends_priv); goto reconnect; } } if(items [1] .revents > 0) { //business logic here. } } zmq_close(subscriber); zmq_close(command_socket); zmq_term(context); NEEDLESS_RETURN(NULL); }
        Hide
        meng added a comment - - edited

        Below is OUR stacktrace when abrted

        (gdb) bt
        #0 0x000000391ec328a5 in raise () from /lib64/libc.so.6
        #1 0x000000391ec34085 in abort () from /lib64/libc.so.6
        #2 0x00000032f2213749 in zmq::zmq_abort (errmsg_=0x3e43 <Address 0x3e43 out of bounds>) at err.cpp:76
        #3 0x00000032f22158e0 in zmq::mailbox_t::recv (this=<value optimized out>, cmd_=<value optimized out>, timeout_=<value optimized out>) at mailbox.cpp:84
        #4 0x00000032f221f721 in zmq::reaper_t::in_event (this=<value optimized out>) at reaper.cpp:64
        #5 0x00000032f2212fcd in zmq::epoll_t::loop (this=<value optimized out>) at epoll.cpp:153
        #6 0x00000032f222700b in thread_routine (arg_=0x7f3cd1e54130) at thread.cpp:75
        #7 0x000000391f007851 in start_thread () from /lib64/libpthread.so.0
        #8 0x000000391ece76dd in clone () from /lib64/libc.so.6

        Show
        meng added a comment - - edited Below is OUR stacktrace when abrted (gdb) bt #0 0x000000391ec328a5 in raise () from /lib64/libc.so.6 #1 0x000000391ec34085 in abort () from /lib64/libc.so.6 #2 0x00000032f2213749 in zmq::zmq_abort (errmsg_=0x3e43 <Address 0x3e43 out of bounds>) at err.cpp:76 #3 0x00000032f22158e0 in zmq::mailbox_t::recv (this=<value optimized out>, cmd_=<value optimized out>, timeout_=<value optimized out>) at mailbox.cpp:84 #4 0x00000032f221f721 in zmq::reaper_t::in_event (this=<value optimized out>) at reaper.cpp:64 #5 0x00000032f2212fcd in zmq::epoll_t::loop (this=<value optimized out>) at epoll.cpp:153 #6 0x00000032f222700b in thread_routine (arg_=0x7f3cd1e54130) at thread.cpp:75 #7 0x000000391f007851 in start_thread () from /lib64/libpthread.so.0 #8 0x000000391ece76dd in clone () from /lib64/libc.so.6
        Hide
        Pieter Hintjens added a comment -

        This was fixed in version 3.x; it's a known issue in 2.x.

        Show
        Pieter Hintjens added a comment - This was fixed in version 3.x; it's a known issue in 2.x.

          People

          • Assignee:
            Unassigned
            Reporter:
            Mikhail Navernyuk
          • Votes:
            6 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: