asserton in mailbox.cpp:84 (zeromq-2.1.11).

Description

From time to time we've got an assertion in mailbox.cpp:84

Environment

qemu virtual cpu (cpu64-rhel6)
3.00GHz 1 Gb Ram
Microsoft Windows XP

Activity

Show:

PieterP January 31, 2013 at 2:48 PM

This was fixed in version 3.x; it's a known issue in 2.x.

MengZ January 25, 2013 at 8:38 AM
Edited

Below is OUR stacktrace when abrted

(gdb) bt
#0 0x000000391ec328a5 in raise () from /lib64/libc.so.6
#1 0x000000391ec34085 in abort () from /lib64/libc.so.6
#2 0x00000032f2213749 in zmq::zmq_abort (errmsg_=0x3e43 <Address 0x3e43 out of bounds>) at err.cpp:76
#3 0x00000032f22158e0 in zmq::mailbox_t::recv (this=<value optimized out>, cmd_=<value optimized out>, timeout_=<value optimized out>) at mailbox.cpp:84
#4 0x00000032f221f721 in zmq::reaper_t::in_event (this=<value optimized out>) at reaper.cpp:64
#5 0x00000032f2212fcd in zmq::epoll_t::loop (this=<value optimized out>) at epoll.cpp:153
#6 0x00000032f222700b in thread_routine (arg_=0x7f3cd1e54130) at thread.cpp:75
#7 0x000000391f007851 in start_thread () from /lib64/libpthread.so.0
#8 0x000000391ece76dd in clone () from /lib64/libc.so.6

MengZ January 25, 2013 at 8:36 AM
Edited

We are using zmq2.2.0 in production. Recently,we encountered the exact same issue. Below are our code snippet to switch endpoint of the subscriber sockets.

static void *
subscriber_thread(void *priv)
{
void *context = zmq_init(2);
void *command_socket = zmq_socket(context, ZMQ_REP);
zmq_bind(command_socket, "tcp://127.0.0.1:10000");

void *subscriber = NULL;
char *zmq_endpoint = NULL;
int linger = 0;
reconnect:
subscriber = zmq_socket(context,ZMQ_SUB);
int rc = zmq_connect(subscriber,backends_priv->zeromq_endpoint);
if(rc)
LOG_ERROR("zmq_connect error [%d]",rc);
const char *filter = "";
zmq_setsockopt(subscriber, ZMQ_SUBSCRIBE, filter, strlen(filter));
zmq_setsockopt(subscriber, ZMQ_LINGER, &linger, sizeof (int));

while(1){
LOG_INFO("POLL");
zmq_pollitem_t items[2];
items[0].socket = command_socket;
items[0].fd = 0;
items[0].events = ZMQ_POLLIN;

items[1].socket = subscriber;
items[1].fd = 0;
items[1].events = ZMQ_POLLIN;

int result = zmq_poll(items, 2, -1);

if(result <= 0){
LOG_INFO("zmq_poll result: [%d,%s]",result,zmq_strerror(zmq_errno()));
continue;
}

if(items[0].revents > 0){

zmq_endpoint = zstr_recv(command_socket);
LOG_INFO("going to switch to [%s]",zmq_endpoint);
if(zmq_endpoint){
//zmq_disconnect is expected here?
if(subscriber){
zmq_close(subscriber);
}
switch_zeromq_endpoint(zmq_endpoint,backends_priv);
goto reconnect;

}
}

if(items[1].revents > 0){
//business logic here.
}
}

zmq_close(subscriber);
zmq_close(command_socket);
zmq_term(context);

NEEDLESS_RETURN(NULL);

}

Oleksandr Danshyn May 24, 2012 at 6:36 PM

Tested on the stable 2.2 - the same assertion.

Oleksandr Danshyn May 24, 2012 at 4:44 PM

Abort raises under heavy load.

GDB backtrace:

#0 0xb7794424 in __kernel_vsyscall ()
#1 0xb75af851 in raise () from /lib/tls/i686/nosegneg/libc.so.6
#2 0xb75b2d42 in abort () from /lib/tls/i686/nosegneg/libc.so.6
#3 0xb76efa37 in zmq::zmq_abort (errmsg_=0xb770f2fb "ok") at err.cpp:76
#4 0xb76f2855 in zmq::mailbox_t::recv (this=0x83531c0, cmd_=0xb72d55e8, timeout_=0) at mailbox.cpp:84
#5 0xb76fb7db in zmq::reaper_t::in_event (this=0x83531b0) at reaper.cpp:64
#6 0xb76eeee1 in zmq::epoll_t::loop (this=0x8353d88) at epoll.cpp:161
#7 0xb7706266 in thread_routine (arg_=0x8353dcc) at thread.cpp:75
#8 0xb772b985 in start_thread () from /lib/tls/i686/nosegneg/libpthread.so.0
#9 0xb765513e in clone () from /lib/tls/i686/nosegneg/libc.so.6

Strace:
2089 <... epoll_wait resumed> EPOLLIN, {u32=166350936, u64=166350936}, 256, -1) = 1
2089 poll([{fd=4, events=POLLIN}], 1, 0) = 1 ([{fd=4, revents=POLLIN}])
2089 write(2, "Assertion failed: ok (mailbox.cp"..., 38 <unfinished ...>
2089 <... write resumed> ) = 38
2089 rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
2089 tgkill(2088, 2089, SIGABRT) = 0
2089 — SIGABRT (Aborted) @ 0 (0) —

Also, the same situation was described there: http://lists.zeromq.org/pipermail/zeromq-dev/2012-February/015961.html

Take a look, please

Fixed

Details

Assignee

Reporter

Labels

Components

Affects versions

Priority

Created April 10, 2012 at 12:26 PM
Updated January 31, 2013 at 2:48 PM
Resolved January 31, 2013 at 2:48 PM