asserton in mailbox.cpp:84 (zeromq-2.1.11).
Description
Environment
qemu virtual cpu (cpu64-rhel6)
3.00GHz 1 Gb Ram
Microsoft Windows XP
Activity
PieterP January 31, 2013 at 2:48 PM
This was fixed in version 3.x; it's a known issue in 2.x.
MengZ January 25, 2013 at 8:38 AMEdited
Below is OUR stacktrace when abrted
(gdb) bt
#0 0x000000391ec328a5 in raise () from /lib64/libc.so.6
#1 0x000000391ec34085 in abort () from /lib64/libc.so.6
#2 0x00000032f2213749 in zmq::zmq_abort (errmsg_=0x3e43 <Address 0x3e43 out of bounds>) at err.cpp:76
#3 0x00000032f22158e0 in zmq::mailbox_t::recv (this=<value optimized out>, cmd_=<value optimized out>, timeout_=<value optimized out>) at mailbox.cpp:84
#4 0x00000032f221f721 in zmq::reaper_t::in_event (this=<value optimized out>) at reaper.cpp:64
#5 0x00000032f2212fcd in zmq::epoll_t::loop (this=<value optimized out>) at epoll.cpp:153
#6 0x00000032f222700b in thread_routine (arg_=0x7f3cd1e54130) at thread.cpp:75
#7 0x000000391f007851 in start_thread () from /lib64/libpthread.so.0
#8 0x000000391ece76dd in clone () from /lib64/libc.so.6
MengZ January 25, 2013 at 8:36 AMEdited
We are using zmq2.2.0 in production. Recently,we encountered the exact same issue. Below are our code snippet to switch endpoint of the subscriber sockets.
static void *
subscriber_thread(void *priv)
{
void *context = zmq_init(2);
void *command_socket = zmq_socket(context, ZMQ_REP);
zmq_bind(command_socket, "tcp://127.0.0.1:10000");
void *subscriber = NULL;
char *zmq_endpoint = NULL;
int linger = 0;
reconnect:
subscriber = zmq_socket(context,ZMQ_SUB);
int rc = zmq_connect(subscriber,backends_priv->zeromq_endpoint);
if(rc)
LOG_ERROR("zmq_connect error [%d]",rc);
const char *filter = "";
zmq_setsockopt(subscriber, ZMQ_SUBSCRIBE, filter, strlen(filter));
zmq_setsockopt(subscriber, ZMQ_LINGER, &linger, sizeof (int));
while(1){
LOG_INFO("POLL");
zmq_pollitem_t items[2];
items[0].socket = command_socket;
items[0].fd = 0;
items[0].events = ZMQ_POLLIN;
items[1].socket = subscriber;
items[1].fd = 0;
items[1].events = ZMQ_POLLIN;
int result = zmq_poll(items, 2, -1);
if(result <= 0){
LOG_INFO("zmq_poll result: [%d,%s]",result,zmq_strerror(zmq_errno()));
continue;
}
if(items[0].revents > 0){
zmq_endpoint = zstr_recv(command_socket);
LOG_INFO("going to switch to [%s]",zmq_endpoint);
if(zmq_endpoint){
//zmq_disconnect is expected here?
if(subscriber){
zmq_close(subscriber);
}
switch_zeromq_endpoint(zmq_endpoint,backends_priv);
goto reconnect;
}
}
if(items[1].revents > 0){
//business logic here.
}
}
zmq_close(subscriber);
zmq_close(command_socket);
zmq_term(context);
NEEDLESS_RETURN(NULL);
}
Oleksandr Danshyn May 24, 2012 at 6:36 PM
Tested on the stable 2.2 - the same assertion.
Oleksandr Danshyn May 24, 2012 at 4:44 PM
Abort raises under heavy load.
GDB backtrace:
#0 0xb7794424 in __kernel_vsyscall ()
#1 0xb75af851 in raise () from /lib/tls/i686/nosegneg/libc.so.6
#2 0xb75b2d42 in abort () from /lib/tls/i686/nosegneg/libc.so.6
#3 0xb76efa37 in zmq::zmq_abort (errmsg_=0xb770f2fb "ok") at err.cpp:76
#4 0xb76f2855 in zmq::mailbox_t::recv (this=0x83531c0, cmd_=0xb72d55e8, timeout_=0) at mailbox.cpp:84
#5 0xb76fb7db in zmq::reaper_t::in_event (this=0x83531b0) at reaper.cpp:64
#6 0xb76eeee1 in zmq::epoll_t::loop (this=0x8353d88) at epoll.cpp:161
#7 0xb7706266 in thread_routine (arg_=0x8353dcc) at thread.cpp:75
#8 0xb772b985 in start_thread () from /lib/tls/i686/nosegneg/libpthread.so.0
#9 0xb765513e in clone () from /lib/tls/i686/nosegneg/libc.so.6
Strace:
2089 <... epoll_wait resumed> EPOLLIN, {u32=166350936, u64=166350936
}, 256, -1) = 1
2089 poll([{fd=4, events=POLLIN}], 1, 0) = 1 ([{fd=4, revents=POLLIN}])
2089 write(2, "Assertion failed: ok (mailbox.cp"..., 38 <unfinished ...>
2089 <... write resumed> ) = 38
2089 rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
2089 tgkill(2088, 2089, SIGABRT) = 0
2089 — SIGABRT (Aborted) @ 0 (0) —
Also, the same situation was described there: http://lists.zeromq.org/pipermail/zeromq-dev/2012-February/015961.html
Take a look, please
From time to time we've got an assertion in mailbox.cpp:84