Uploaded image for project: 'libzmq'
  1. libzmq
  2. LIBZMQ-356

asserton in mailbox.cpp:84 (zeromq-2.1.11).

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.1.11
    • Fix Version/s: None
    • Component/s: core
    • Labels:
    • Environment:

      qemu virtual cpu (cpu64-rhel6)
      3.00GHz 1 Gb Ram
      Microsoft Windows XP

      Description

      From time to time we've got an assertion in mailbox.cpp:84

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            dalazx Aleksandr Danshyn added a comment -

            Abort raises under heavy load.

            GDB backtrace:

            #0 0xb7794424 in __kernel_vsyscall ()
            #1 0xb75af851 in raise () from /lib/tls/i686/nosegneg/libc.so.6
            #2 0xb75b2d42 in abort () from /lib/tls/i686/nosegneg/libc.so.6
            #3 0xb76efa37 in zmq::zmq_abort (errmsg_=0xb770f2fb "ok") at err.cpp:76
            #4 0xb76f2855 in zmq::mailbox_t::recv (this=0x83531c0, cmd_=0xb72d55e8, timeout_=0) at mailbox.cpp:84
            #5 0xb76fb7db in zmq::reaper_t::in_event (this=0x83531b0) at reaper.cpp:64
            #6 0xb76eeee1 in zmq::epoll_t::loop (this=0x8353d88) at epoll.cpp:161
            #7 0xb7706266 in thread_routine (arg_=0x8353dcc) at thread.cpp:75
            #8 0xb772b985 in start_thread () from /lib/tls/i686/nosegneg/libpthread.so.0
            #9 0xb765513e in clone () from /lib/tls/i686/nosegneg/libc.so.6

            Strace:
            2089 <... epoll_wait resumed> {{EPOLLIN,

            {u32=166350936, u64=166350936}

            }}, 256, -1) = 1
            2089 poll([

            {fd=4, events=POLLIN}

            ], 1, 0) = 1 ([

            {fd=4, revents=POLLIN}

            ])
            2089 write(2, "Assertion failed: ok (mailbox.cp"..., 38 <unfinished ...>
            2089 <... write resumed> ) = 38
            2089 rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
            2089 tgkill(2088, 2089, SIGABRT) = 0
            2089 — SIGABRT (Aborted) @ 0 (0) —

            Also, the same situation was described there: http://lists.zeromq.org/pipermail/zeromq-dev/2012-February/015961.html

            Take a look, please

            Show
            dalazx Aleksandr Danshyn added a comment - Abort raises under heavy load. GDB backtrace: #0 0xb7794424 in __kernel_vsyscall () #1 0xb75af851 in raise () from /lib/tls/i686/nosegneg/libc.so.6 #2 0xb75b2d42 in abort () from /lib/tls/i686/nosegneg/libc.so.6 #3 0xb76efa37 in zmq::zmq_abort (errmsg_=0xb770f2fb "ok") at err.cpp:76 #4 0xb76f2855 in zmq::mailbox_t::recv (this=0x83531c0, cmd_=0xb72d55e8, timeout_=0) at mailbox.cpp:84 #5 0xb76fb7db in zmq::reaper_t::in_event (this=0x83531b0) at reaper.cpp:64 #6 0xb76eeee1 in zmq::epoll_t::loop (this=0x8353d88) at epoll.cpp:161 #7 0xb7706266 in thread_routine (arg_=0x8353dcc) at thread.cpp:75 #8 0xb772b985 in start_thread () from /lib/tls/i686/nosegneg/libpthread.so.0 #9 0xb765513e in clone () from /lib/tls/i686/nosegneg/libc.so.6 Strace: 2089 <... epoll_wait resumed> {{EPOLLIN, {u32=166350936, u64=166350936} }}, 256, -1) = 1 2089 poll([ {fd=4, events=POLLIN} ], 1, 0) = 1 ([ {fd=4, revents=POLLIN} ]) 2089 write(2, "Assertion failed: ok (mailbox.cp"..., 38 <unfinished ...> 2089 <... write resumed> ) = 38 2089 rt_sigprocmask(SIG_UNBLOCK, [ABRT] , NULL, 8) = 0 2089 tgkill(2088, 2089, SIGABRT) = 0 2089 — SIGABRT (Aborted) @ 0 (0) — Also, the same situation was described there: http://lists.zeromq.org/pipermail/zeromq-dev/2012-February/015961.html Take a look, please
            Hide
            dalazx Aleksandr Danshyn added a comment -

            Tested on the stable 2.2 - the same assertion.

            Show
            dalazx Aleksandr Danshyn added a comment - Tested on the stable 2.2 - the same assertion.
            Hide
            jammy meng added a comment - - edited

            We are using zmq2.2.0 in production. Recently,we encountered the exact same issue. Below are our code snippet to switch endpoint of the subscriber sockets.

            static void *
            subscriber_thread(void *priv)
            {
            void *context = zmq_init(2);
            void *command_socket = zmq_socket(context, ZMQ_REP);
            zmq_bind(command_socket, "tcp://127.0.0.1:10000");

            void *subscriber = NULL;
            char *zmq_endpoint = NULL;
            int linger = 0;
            reconnect:
            subscriber = zmq_socket(context,ZMQ_SUB);
            int rc = zmq_connect(subscriber,backends_priv->zeromq_endpoint);
            if(rc)
            LOG_ERROR("zmq_connect error [%d]",rc);
            const char *filter = "";
            zmq_setsockopt(subscriber, ZMQ_SUBSCRIBE, filter, strlen(filter));
            zmq_setsockopt(subscriber, ZMQ_LINGER, &linger, sizeof (int));

            while(1){
            LOG_INFO("POLL");
            zmq_pollitem_t items[2];
            items[0].socket = command_socket;
            items[0].fd = 0;
            items[0].events = ZMQ_POLLIN;

            items[1].socket = subscriber;
            items[1].fd = 0;
            items[1].events = ZMQ_POLLIN;

            int result = zmq_poll(items, 2, -1);

            if(result <= 0)

            { LOG_INFO("zmq_poll result: [%d,%s]",result,zmq_strerror(zmq_errno())); continue; }

            if(items[0].revents > 0){

            zmq_endpoint = zstr_recv(command_socket);
            LOG_INFO("going to switch to [%s]",zmq_endpoint);
            if(zmq_endpoint){
            //zmq_disconnect is expected here?
            if(subscriber)

            { zmq_close(subscriber); }

            switch_zeromq_endpoint(zmq_endpoint,backends_priv);
            goto reconnect;

            }
            }

            if(items[1].revents > 0)

            { //business logic here. }

            }

            zmq_close(subscriber);
            zmq_close(command_socket);
            zmq_term(context);

            NEEDLESS_RETURN(NULL);

            }

            Show
            jammy meng added a comment - - edited We are using zmq2.2.0 in production. Recently,we encountered the exact same issue. Below are our code snippet to switch endpoint of the subscriber sockets. static void * subscriber_thread(void *priv) { void *context = zmq_init(2); void *command_socket = zmq_socket(context, ZMQ_REP); zmq_bind(command_socket, "tcp://127.0.0.1:10000"); void *subscriber = NULL; char *zmq_endpoint = NULL; int linger = 0; reconnect: subscriber = zmq_socket(context,ZMQ_SUB); int rc = zmq_connect(subscriber,backends_priv->zeromq_endpoint); if(rc) LOG_ERROR("zmq_connect error [%d] ",rc); const char *filter = ""; zmq_setsockopt(subscriber, ZMQ_SUBSCRIBE, filter, strlen(filter)); zmq_setsockopt(subscriber, ZMQ_LINGER, &linger, sizeof (int)); while(1){ LOG_INFO("POLL"); zmq_pollitem_t items [2] ; items [0] .socket = command_socket; items [0] .fd = 0; items [0] .events = ZMQ_POLLIN; items [1] .socket = subscriber; items [1] .fd = 0; items [1] .events = ZMQ_POLLIN; int result = zmq_poll(items, 2, -1); if(result <= 0) { LOG_INFO("zmq_poll result: [%d,%s]",result,zmq_strerror(zmq_errno())); continue; } if(items [0] .revents > 0){ zmq_endpoint = zstr_recv(command_socket); LOG_INFO("going to switch to [%s] ",zmq_endpoint); if(zmq_endpoint){ //zmq_disconnect is expected here? if(subscriber) { zmq_close(subscriber); } switch_zeromq_endpoint(zmq_endpoint,backends_priv); goto reconnect; } } if(items [1] .revents > 0) { //business logic here. } } zmq_close(subscriber); zmq_close(command_socket); zmq_term(context); NEEDLESS_RETURN(NULL); }
            Hide
            jammy meng added a comment - - edited

            Below is OUR stacktrace when abrted

            (gdb) bt
            #0 0x000000391ec328a5 in raise () from /lib64/libc.so.6
            #1 0x000000391ec34085 in abort () from /lib64/libc.so.6
            #2 0x00000032f2213749 in zmq::zmq_abort (errmsg_=0x3e43 <Address 0x3e43 out of bounds>) at err.cpp:76
            #3 0x00000032f22158e0 in zmq::mailbox_t::recv (this=<value optimized out>, cmd_=<value optimized out>, timeout_=<value optimized out>) at mailbox.cpp:84
            #4 0x00000032f221f721 in zmq::reaper_t::in_event (this=<value optimized out>) at reaper.cpp:64
            #5 0x00000032f2212fcd in zmq::epoll_t::loop (this=<value optimized out>) at epoll.cpp:153
            #6 0x00000032f222700b in thread_routine (arg_=0x7f3cd1e54130) at thread.cpp:75
            #7 0x000000391f007851 in start_thread () from /lib64/libpthread.so.0
            #8 0x000000391ece76dd in clone () from /lib64/libc.so.6

            Show
            jammy meng added a comment - - edited Below is OUR stacktrace when abrted (gdb) bt #0 0x000000391ec328a5 in raise () from /lib64/libc.so.6 #1 0x000000391ec34085 in abort () from /lib64/libc.so.6 #2 0x00000032f2213749 in zmq::zmq_abort (errmsg_=0x3e43 <Address 0x3e43 out of bounds>) at err.cpp:76 #3 0x00000032f22158e0 in zmq::mailbox_t::recv (this=<value optimized out>, cmd_=<value optimized out>, timeout_=<value optimized out>) at mailbox.cpp:84 #4 0x00000032f221f721 in zmq::reaper_t::in_event (this=<value optimized out>) at reaper.cpp:64 #5 0x00000032f2212fcd in zmq::epoll_t::loop (this=<value optimized out>) at epoll.cpp:153 #6 0x00000032f222700b in thread_routine (arg_=0x7f3cd1e54130) at thread.cpp:75 #7 0x000000391f007851 in start_thread () from /lib64/libpthread.so.0 #8 0x000000391ece76dd in clone () from /lib64/libc.so.6
            Hide
            pieterh Pieter Hintjens added a comment -

            This was fixed in version 3.x; it's a known issue in 2.x.

            Show
            pieterh Pieter Hintjens added a comment - This was fixed in version 3.x; it's a known issue in 2.x.

              People

              • Assignee:
                Unassigned
                Reporter:
                jupy Mikhail Navernyuk
              • Votes:
                6 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: