Uploaded image for project: 'libzmq'
  1. libzmq
  2. LIBZMQ-356

asserton in mailbox.cpp:84 (zeromq-2.1.11).

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.1.11
    • Fix Version/s: None
    • Component/s: core
    • Labels:
    • Environment:

      qemu virtual cpu (cpu64-rhel6)
      3.00GHz 1 Gb Ram
      Microsoft Windows XP

      Description

      From time to time we've got an assertion in mailbox.cpp:84

        Gliffy Diagrams

          Activity

          Hide
          dalazx Aleksandr Danshyn added a comment -

          Abort raises under heavy load.

          GDB backtrace:

          #0 0xb7794424 in __kernel_vsyscall ()
          #1 0xb75af851 in raise () from /lib/tls/i686/nosegneg/libc.so.6
          #2 0xb75b2d42 in abort () from /lib/tls/i686/nosegneg/libc.so.6
          #3 0xb76efa37 in zmq::zmq_abort (errmsg_=0xb770f2fb "ok") at err.cpp:76
          #4 0xb76f2855 in zmq::mailbox_t::recv (this=0x83531c0, cmd_=0xb72d55e8, timeout_=0) at mailbox.cpp:84
          #5 0xb76fb7db in zmq::reaper_t::in_event (this=0x83531b0) at reaper.cpp:64
          #6 0xb76eeee1 in zmq::epoll_t::loop (this=0x8353d88) at epoll.cpp:161
          #7 0xb7706266 in thread_routine (arg_=0x8353dcc) at thread.cpp:75
          #8 0xb772b985 in start_thread () from /lib/tls/i686/nosegneg/libpthread.so.0
          #9 0xb765513e in clone () from /lib/tls/i686/nosegneg/libc.so.6

          Strace:
          2089 <... epoll_wait resumed> {{EPOLLIN,

          {u32=166350936, u64=166350936}

          }}, 256, -1) = 1
          2089 poll([

          {fd=4, events=POLLIN}

          ], 1, 0) = 1 ([

          {fd=4, revents=POLLIN}

          ])
          2089 write(2, "Assertion failed: ok (mailbox.cp"..., 38 <unfinished ...>
          2089 <... write resumed> ) = 38
          2089 rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
          2089 tgkill(2088, 2089, SIGABRT) = 0
          2089 — SIGABRT (Aborted) @ 0 (0) —

          Also, the same situation was described there: http://lists.zeromq.org/pipermail/zeromq-dev/2012-February/015961.html

          Take a look, please

          Show
          dalazx Aleksandr Danshyn added a comment - Abort raises under heavy load. GDB backtrace: #0 0xb7794424 in __kernel_vsyscall () #1 0xb75af851 in raise () from /lib/tls/i686/nosegneg/libc.so.6 #2 0xb75b2d42 in abort () from /lib/tls/i686/nosegneg/libc.so.6 #3 0xb76efa37 in zmq::zmq_abort (errmsg_=0xb770f2fb "ok") at err.cpp:76 #4 0xb76f2855 in zmq::mailbox_t::recv (this=0x83531c0, cmd_=0xb72d55e8, timeout_=0) at mailbox.cpp:84 #5 0xb76fb7db in zmq::reaper_t::in_event (this=0x83531b0) at reaper.cpp:64 #6 0xb76eeee1 in zmq::epoll_t::loop (this=0x8353d88) at epoll.cpp:161 #7 0xb7706266 in thread_routine (arg_=0x8353dcc) at thread.cpp:75 #8 0xb772b985 in start_thread () from /lib/tls/i686/nosegneg/libpthread.so.0 #9 0xb765513e in clone () from /lib/tls/i686/nosegneg/libc.so.6 Strace: 2089 <... epoll_wait resumed> {{EPOLLIN, {u32=166350936, u64=166350936} }}, 256, -1) = 1 2089 poll([ {fd=4, events=POLLIN} ], 1, 0) = 1 ([ {fd=4, revents=POLLIN} ]) 2089 write(2, "Assertion failed: ok (mailbox.cp"..., 38 <unfinished ...> 2089 <... write resumed> ) = 38 2089 rt_sigprocmask(SIG_UNBLOCK, [ABRT] , NULL, 8) = 0 2089 tgkill(2088, 2089, SIGABRT) = 0 2089 — SIGABRT (Aborted) @ 0 (0) — Also, the same situation was described there: http://lists.zeromq.org/pipermail/zeromq-dev/2012-February/015961.html Take a look, please
          Hide
          dalazx Aleksandr Danshyn added a comment -

          Tested on the stable 2.2 - the same assertion.

          Show
          dalazx Aleksandr Danshyn added a comment - Tested on the stable 2.2 - the same assertion.
          Hide
          jammy meng added a comment - - edited

          We are using zmq2.2.0 in production. Recently,we encountered the exact same issue. Below are our code snippet to switch endpoint of the subscriber sockets.

          static void *
          subscriber_thread(void *priv)
          {
          void *context = zmq_init(2);
          void *command_socket = zmq_socket(context, ZMQ_REP);
          zmq_bind(command_socket, "tcp://127.0.0.1:10000");

          void *subscriber = NULL;
          char *zmq_endpoint = NULL;
          int linger = 0;
          reconnect:
          subscriber = zmq_socket(context,ZMQ_SUB);
          int rc = zmq_connect(subscriber,backends_priv->zeromq_endpoint);
          if(rc)
          LOG_ERROR("zmq_connect error [%d]",rc);
          const char *filter = "";
          zmq_setsockopt(subscriber, ZMQ_SUBSCRIBE, filter, strlen(filter));
          zmq_setsockopt(subscriber, ZMQ_LINGER, &linger, sizeof (int));

          while(1){
          LOG_INFO("POLL");
          zmq_pollitem_t items[2];
          items[0].socket = command_socket;
          items[0].fd = 0;
          items[0].events = ZMQ_POLLIN;

          items[1].socket = subscriber;
          items[1].fd = 0;
          items[1].events = ZMQ_POLLIN;

          int result = zmq_poll(items, 2, -1);

          if(result <= 0)

          { LOG_INFO("zmq_poll result: [%d,%s]",result,zmq_strerror(zmq_errno())); continue; }

          if(items[0].revents > 0){

          zmq_endpoint = zstr_recv(command_socket);
          LOG_INFO("going to switch to [%s]",zmq_endpoint);
          if(zmq_endpoint){
          //zmq_disconnect is expected here?
          if(subscriber)

          { zmq_close(subscriber); }

          switch_zeromq_endpoint(zmq_endpoint,backends_priv);
          goto reconnect;

          }
          }

          if(items[1].revents > 0)

          { //business logic here. }

          }

          zmq_close(subscriber);
          zmq_close(command_socket);
          zmq_term(context);

          NEEDLESS_RETURN(NULL);

          }

          Show
          jammy meng added a comment - - edited We are using zmq2.2.0 in production. Recently,we encountered the exact same issue. Below are our code snippet to switch endpoint of the subscriber sockets. static void * subscriber_thread(void *priv) { void *context = zmq_init(2); void *command_socket = zmq_socket(context, ZMQ_REP); zmq_bind(command_socket, "tcp://127.0.0.1:10000"); void *subscriber = NULL; char *zmq_endpoint = NULL; int linger = 0; reconnect: subscriber = zmq_socket(context,ZMQ_SUB); int rc = zmq_connect(subscriber,backends_priv->zeromq_endpoint); if(rc) LOG_ERROR("zmq_connect error [%d] ",rc); const char *filter = ""; zmq_setsockopt(subscriber, ZMQ_SUBSCRIBE, filter, strlen(filter)); zmq_setsockopt(subscriber, ZMQ_LINGER, &linger, sizeof (int)); while(1){ LOG_INFO("POLL"); zmq_pollitem_t items [2] ; items [0] .socket = command_socket; items [0] .fd = 0; items [0] .events = ZMQ_POLLIN; items [1] .socket = subscriber; items [1] .fd = 0; items [1] .events = ZMQ_POLLIN; int result = zmq_poll(items, 2, -1); if(result <= 0) { LOG_INFO("zmq_poll result: [%d,%s]",result,zmq_strerror(zmq_errno())); continue; } if(items [0] .revents > 0){ zmq_endpoint = zstr_recv(command_socket); LOG_INFO("going to switch to [%s] ",zmq_endpoint); if(zmq_endpoint){ //zmq_disconnect is expected here? if(subscriber) { zmq_close(subscriber); } switch_zeromq_endpoint(zmq_endpoint,backends_priv); goto reconnect; } } if(items [1] .revents > 0) { //business logic here. } } zmq_close(subscriber); zmq_close(command_socket); zmq_term(context); NEEDLESS_RETURN(NULL); }
          Hide
          jammy meng added a comment - - edited

          Below is OUR stacktrace when abrted

          (gdb) bt
          #0 0x000000391ec328a5 in raise () from /lib64/libc.so.6
          #1 0x000000391ec34085 in abort () from /lib64/libc.so.6
          #2 0x00000032f2213749 in zmq::zmq_abort (errmsg_=0x3e43 <Address 0x3e43 out of bounds>) at err.cpp:76
          #3 0x00000032f22158e0 in zmq::mailbox_t::recv (this=<value optimized out>, cmd_=<value optimized out>, timeout_=<value optimized out>) at mailbox.cpp:84
          #4 0x00000032f221f721 in zmq::reaper_t::in_event (this=<value optimized out>) at reaper.cpp:64
          #5 0x00000032f2212fcd in zmq::epoll_t::loop (this=<value optimized out>) at epoll.cpp:153
          #6 0x00000032f222700b in thread_routine (arg_=0x7f3cd1e54130) at thread.cpp:75
          #7 0x000000391f007851 in start_thread () from /lib64/libpthread.so.0
          #8 0x000000391ece76dd in clone () from /lib64/libc.so.6

          Show
          jammy meng added a comment - - edited Below is OUR stacktrace when abrted (gdb) bt #0 0x000000391ec328a5 in raise () from /lib64/libc.so.6 #1 0x000000391ec34085 in abort () from /lib64/libc.so.6 #2 0x00000032f2213749 in zmq::zmq_abort (errmsg_=0x3e43 <Address 0x3e43 out of bounds>) at err.cpp:76 #3 0x00000032f22158e0 in zmq::mailbox_t::recv (this=<value optimized out>, cmd_=<value optimized out>, timeout_=<value optimized out>) at mailbox.cpp:84 #4 0x00000032f221f721 in zmq::reaper_t::in_event (this=<value optimized out>) at reaper.cpp:64 #5 0x00000032f2212fcd in zmq::epoll_t::loop (this=<value optimized out>) at epoll.cpp:153 #6 0x00000032f222700b in thread_routine (arg_=0x7f3cd1e54130) at thread.cpp:75 #7 0x000000391f007851 in start_thread () from /lib64/libpthread.so.0 #8 0x000000391ece76dd in clone () from /lib64/libc.so.6
          Hide
          pieterh Pieter Hintjens added a comment -

          This was fixed in version 3.x; it's a known issue in 2.x.

          Show
          pieterh Pieter Hintjens added a comment - This was fixed in version 3.x; it's a known issue in 2.x.

            People

            • Assignee:
              Unassigned
              Reporter:
              jupy Mikhail Navernyuk
            • Votes:
              6 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: