Zeromq messages getting dropped

Description

I have been using zeromq (2.2.0) with java binding which is used by storm (https://github.com/zeromq/jzmq).

Though I am using jzmq binding from storm, I think this is a zeromq bug. Attached is JAVA junit test which reproduce this error.

Instructions to run test:

(1) build and install 2.2.0 version of zeromq
(2) build and install java binding of zeromq from https://github.com/zeromq/jzmq (Follow instruction to build)
(3) untar attached maven project to a directory
(4) change to this test project directory and run "mvn test -Dtest=RouterDealerBrokerLoadTest" (you may need to set -Djava.library.path to point to installed jzmq/zeromq lib if they are not in your path)
(4) This tests sends a big zeromq test meessage (41kb size) 100 times in 100 different threads using a dealer socket to a broker (router dealer combination), On the output you will see test failing and something like this:

com.gaikai.zeromq.test.RouterDealerBrokerLoadTest-main 21:08:54,921 INFO RouterDealerBrokerLoadTest:113 - TEST message size: 41500 bytes

com.gaikai.zeromq.test.RouterDealerBrokerLoadTest-main 21:47:21,069 ERROR RouterDealerBrokerLoadTest:184 - *****************************************************************
com.gaikai.zeromq.test.RouterDealerBrokerLoadTest-main 21:47:21,069 ERROR RouterDealerBrokerLoadTest:185 - Did not receive: 7 sent messages.
com.gaikai.zeromq.test.RouterDealerBrokerLoadTest-main 21:47:21,069 ERROR RouterDealerBrokerLoadTest:186 - Did not receive messages from these threads:
com.gaikai.zeromq.test.RouterDealerBrokerLoadTest-main 21:47:21,070 ERROR RouterDealerBrokerLoadTest:192 - -CLIENT-4-MESSAGE-100
com.gaikai.zeromq.test.RouterDealerBrokerLoadTest-main 21:47:21,070 ERROR RouterDealerBrokerLoadTest:192 - -CLIENT-6-MESSAGE-100
com.gaikai.zeromq.test.RouterDealerBrokerLoadTest-main 21:47:21,070 ERROR RouterDealerBrokerLoadTest:192 - -CLIENT-15-MESSAGE-100
com.gaikai.zeromq.test.RouterDealerBrokerLoadTest-main 21:47:21,070 ERROR RouterDealerBrokerLoadTest:192 - -CLIENT-34-MESSAGE-100
com.gaikai.zeromq.test.RouterDealerBrokerLoadTest-main 21:47:21,070 ERROR RouterDealerBrokerLoadTest:192 - -CLIENT-69-MESSAGE-100
com.gaikai.zeromq.test.RouterDealerBrokerLoadTest-main 21:47:21,071 ERROR RouterDealerBrokerLoadTest:192 - -CLIENT-81-MESSAGE-100
com.gaikai.zeromq.test.RouterDealerBrokerLoadTest-main 21:47:21,071 ERROR RouterDealerBrokerLoadTest:192 - -CLIENT-78-MESSAGE-100
com.gaikai.zeromq.test.RouterDealerBrokerLoadTest-main 21:47:21,071 ERROR RouterDealerBrokerLoadTest:199 - *****************************************************************
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 43.334 sec <<< FAILURE!

Results :

Failed tests: testStartDEALERandDEALER(com.gaikai.zeromq.test.RouterDealerBrokerLoadTest): test failed Fail to get total events: 10000 total event received: 9993

Here is more details on the test program:

(1) I have a broker with router and dealer socket and it uses polling to transfer messages among sockets. I started broker which binds these two sockets on some port.
(2) I have a test program in JAVA where I start 100 concurrent threads, each sends 100 message (approximate size of each message is 41kb) using a dealer socket to broker's router socket. (I made sure sockets are not shared among threads and using one parent zmq context, the hwm and LINGER on all sockets are set)

(3) I connect to broker's dealer socket using a dealer socket and count messages. I expect total count to be 10000.
(4) What I am noticing is that I do not get 10000 message, I am seeing that LAST MESSAGE in some of my test program threads are getting dropped. I get 9996 or some other number messages. Usually 2-20 messages are getting dropped.

I found that people have reported this error and one suggestion is to put sleep after sending message. If I put sleep after sending message using a socket it does work but not always, specially under large message size and heavy load.

Environment

environment: Gentoo Linux

riteshadval@aw26 ~/work/source/git/cloud/zmq-dropped-msg $ uname -a
Linux aw26 3.5.3-gentoo #4 SMP Thu Oct 4 03:35:09 UTC 2012 x86_64 Intel(R) Core(TM) i7 CPU 960 @ 3.20GHz GenuineIntel GNU/Linux

Attachments

1

Activity

Show:

ritesh adval February 1, 2013 at 6:34 PM

Hi Min,

Thanks for fixing this issue. Any chance if this can be back ported to 2.2 libzmq, or should I do it myself on my local branch?

PieterP February 1, 2013 at 9:08 AM

Backported to 3.2 stable.

PieterP February 1, 2013 at 8:56 AM

Fix merged to libzmq master.

MinM February 1, 2013 at 8:55 AM

ritesh adval January 23, 2013 at 9:49 PM

Java Test case

Fixed

Details

Assignee

Reporter

Components

Fix versions

Affects versions

Priority

Created January 23, 2013 at 9:38 PM
Updated February 1, 2013 at 6:34 PM
Resolved February 1, 2013 at 9:08 AM