PUSH/PULL, sender block when zmq::socket_base_t::send

Description

Use PUSH/PULL, tcp protocol, ZeroMQ ver3.2.2

one PUSH socket: ZMQ_SNDHWM=1, ZMQ_SNDTIMEO=-1(default)
three PULL sockets: ZMQ_RCVHWM=1000

1) 3 receivers(PULL sockets) bind respectively, then 1 sender(PUSH socket) connects 3 receivers.
2) the sender sends messages continuously, 3 receivers receive messages, every things is ok for hours.
3) the sender block unexpectedly. And block for hours...
4) GDB attach the sender process, print some info, then exit GDB, the sender work again. It is so confused.

GDB back trace:

Thread 72 (Thread 0x49871940 (LWP 410912)):
#0 0x0000003a00cd3368 in epoll_wait () from /lib64/libc.so.6
#1 0x00002aaaab1bb227 in zmq::epoll_t::loop (this=0x2aaab4000ed0) at epoll.cpp:142
#2 0x00002aaaab1d847c in thread_routine (arg_=0x2aaab4000f40) at thread.cpp:83
#3 0x0000003a01406367 in start_thread () from /lib64/libpthread.so.0
#4 0x0000003a00cd2f7d in clone () from /lib64/libc.so.6

Thread 71 (Thread 0x4a272940 (LWP 410913)):
#0 0x0000003a00cd3368 in epoll_wait () from /lib64/libc.so.6
#1 0x00002aaaab1bb227 in zmq::epoll_t::loop (this=0x2aaab4003b70) at epoll.cpp:142
#2 0x00002aaaab1d847c in thread_routine (arg_=0x2aaab4003be0) at thread.cpp:83
#3 0x0000003a01406367 in start_thread () from /lib64/libpthread.so.0
#4 0x0000003a00cd2f7d in clone () from /lib64/libc.so.6

Thread 70 (Thread 0x4ac73940 (LWP 410914)):
#0 0x0000003a00cd3368 in epoll_wait () from /lib64/libc.so.6
#1 0x00002aaaab1bb227 in zmq::epoll_t::loop (this=0x2aaab4003fb0) at epoll.cpp:142
#2 0x00002aaaab1d847c in thread_routine (arg_=0x2aaab4004020) at thread.cpp:83
#3 0x0000003a01406367 in start_thread () from /lib64/libpthread.so.0
#4 0x0000003a00cd2f7d in clone () from /lib64/libc.so.6

Thread 68 (Thread 0x4c075940 (LWP 410916)):
#0 0x0000003a00cca436 in poll () from /lib64/libc.so.6
#1 0x00002aaaab1ccbd1 in zmq::signaler_t::wait (this=<value optimized out>, timeout_=-1) at signaler.cpp:145
#2 0x00002aaaab1bff32 in zmq::mailbox_t::recv (this=0x2aaab40043d8, cmd_=0x4c074ec0, timeout_=-1) at mailbox.cpp:69
#3 0x00002aaaab1cd790 in zmq::socket_base_t:rocess_commands (this=0x2aaab4004180, timeout_=-1, throttle_=<value optimized out>) at socket_base.cpp:793
#4 0x00002aaaab1cdbe6 in zmq::socket_base_t::send (this=0x2aaab4004180, msg_=0x4c074f70, flags_=0) at socket_base.cpp:645
#5 0x00002aaaab1dff9a in s_sendmsg (s_=0x2aaab4004180, msg_=0x4c074f70, flags_=0) at zmq.cpp:337
#6 0x00002aaaab1e05a1 in zmq_send (s_=0x2aaab4004180, buf_=0x2aab0cff39c8, len_=167, flags_=0) at zmq.cpp:362
#7 0x00002aaaaae064c3 in zmq::socket_t::send (this=0x2aaab4000dd0, buf_=0x2aab0cff39c8, len_=167, flags_=0) at ../include/zmq/zmq.hpp:364
#8 0x00002aaaaadfe089 in LogE::ZmqClient::sendWorker_ (this=0x1111380) at ../src/zmq/ZmqClient.cpp:423
#9 0x00002aaaaadfe2ea in LogE::SendWorker (client=0x1111380) at ../src/zmq/ZmqClient.cpp:61
#10 0x0000003a01406367 in start_thread () from /lib64/libpthread.so.0
#11 0x0000003a00cd2f7d in clone () from /lib64/libc.so.6

netstat result on the sender:
tcp 0 0 10.22.206.32:35127 10.4.20.38:55590 ESTABLISHED 209005/icebox
tcp 0 0 10.22.206.32:38920 10.4.20.37:55590 ESTABLISHED 209005/icebox
tcp 0 0 10.22.206.32:41685 10.4.20.36:55590 ESTABLISHED 209005/icebox

In thread 68, the sender block at zmq::signaler_t::wait.
It seems that it is waiting for the writer to be active, but can it wait for hours?
And once i gdb attach the process, then exit gdb, all goes well again?

Someone please help me. I have reproduce this for 5 times.

Environment

centos 64bit

Activity

Show:
Guido G.
August 29, 2013, 10:27 AM

Without the code it's close to impossible to reproduce the exact error you encouterned. There are lots of things that might have gone wrong outside of libzmq.

Assignee

Unassigned

Reporter

imleong

Labels

Components

Affects versions

Priority

Blocker