Assertion fail for IPv4 Interface String

Description

Having issue using IPv4 address as interface name:

My group looks like:
"epgm://10.145.17.1;239.192.1.1:7555"

Using zmq::socket_t my_zmq_socket;
my_zmq_socket.connect("epgm://10.145.17.1;239.192.1.1:7555")

When I run I get the following error:
Assertion failed: rc == 0 (connect_session.cpp:96)

This however works:
"eth1://10.145.17.1;239.192.1.1:7555"

My icfonfig:

eth0 Link encap:Ethernet HWaddr
inet addr:10.150.10.90 Bcast:10.150.255.255 Mask:255.255.0.0
inet6 addr: fe80::230:48ff:fe7d:cfac/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1473122 errors:0 dropped:0 overruns:0 frame:0
TX packets:1008461 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1244653467 (1.1 GiB) TX bytes:188923296 (180.1 MiB)
Memory:d8a20000-d8a40000

eth1 Link encap:Ethernet HWaddr
inet addr:10.145.17.1 Bcast:10.145.17.7 Mask:255.255.255.248
inet6 addr: fe80::230:48ff:fe7d:cfad/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:506008 errors:0 dropped:0 overruns:0 frame:0
TX packets:839568 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:43034651 (41.0 MiB) TX bytes:1134986294 (1.0 GiB)
Memory:d8a60000-d8a80000

Environment

x86_64 Debian
Linux 2.6.32.2011.01.27.13.37(

Activity

Show:

PieterP May 21, 2012 at 7:12 PM

Fixed in OpenPGM.

Steven McCoy May 20, 2012 at 2:22 AM

Committed fix, need to roll a release.

Reed December 23, 2011 at 9:41 PM

Yeah that was me. I logged issue 12.

Martin Sustrik December 23, 2011 at 9:35 PM

It seems this is an OpenPGM bug. Once the issue is solved (issue 12) we have to update the OpenPGM version packaged with 0MQ.

Reed December 21, 2011 at 9:03 PM
Edited

Alright I think I found the issue.

The actual error was coming for the eventfd(0,0) call in the inlined function pgm_notify_init from notify.h. The sterror(errno) was "Too many open files". I did indeed find at some point the pgm_init code was opening > 1024 files.

Looking at the proc for the running process showed a whole lot of this:

~>sudo ls -al /proc/<my_cool_zmq_process_id>/fd total 0 dr-x------ 2 root root 0 Dec 21 12:25 . dr-xr-xr-x 8 root root 0 Dec 21 12:25 .. lr-x------ 1 root root 64 Dec 21 12:25 100 -> /dev/urandom lr-x------ 1 root root 64 Dec 21 12:25 1000 -> /dev/urandom lr-x------ 1 root root 64 Dec 21 12:25 1001 -> /dev/urandom lr-x------ 1 root root 64 Dec 21 12:25 1002 -> /dev/urandom lr-x------ 1 root root 64 Dec 21 12:25 1003 -> /dev/urandom lr-x------ 1 root root 64 Dec 21 12:25 1004 -> /dev/urandom lr-x------ 1 root root 64 Dec 21 12:25 1005 -> /dev/urandom ...

The deal is in the openpgm framework in rand.c -the function pgm_rand_create looks like:

60 PGM_GNUC_INTERNAL 61 void 62 pgm_rand_create ( 63 pgm_rand_t* new_rand 64 ) 65 { 66 /* pre-conditions */ 67 pgm_assert (NULL != new_rand); 68 69 #ifndef _WIN32 70 /* attempt to read seed from kernel 71 */ 72 FILE* fp; 73 do { 74 fp = fopen ("/dev/urandom", "rb"); 75 } while (PGM_UNLIKELY(EINTR == errno)); 76 if (fp) { 77 size_t items_read; 78 do { 79 items_read = fread (&new_rand->seed, sizeof(new_rand->seed), 1, fp); 80 } while (PGM_UNLIKELY(EINTR == errno)); 81 fclose (fp); 82 if (1 == items_read) 83 return; 84 } 85 #endif /* !_WIN32 */ 86 const pgm_time_t now = pgm_time_update_now(); 87 new_rand->seed = (uint32_t)pgm_to_msecs (now); 88 }

The issue of course is this bit right here:

73 do { 74 fp = fopen ("/dev/urandom", "rb"); 75 } while (PGM_UNLIKELY(EINTR == errno));

The author is NOT checking that the value of fp is in error (ie fp != NULL). If errno has some garbage value from some other error, this do while will loop forever until it runs out of file descriptors per the process ulimit on the system (mine being 1024).

I was able to get by, by changing:

- 75 } while (PGM_UNLIKELY(EINTR == errno));

+ 75 } while ((fp == NULL) && (PGM_UNLIKELY(EINTR == errno)));

This is probably sub-optimal and not the best solution, but given it's a function which returns void I wasn't sure as to the best approach for signalling error back to the user.

Thanks

Fixed

Details

Assignee

Reporter

Components

Affects versions

Priority

Created December 16, 2011 at 11:23 PM
Updated May 21, 2012 at 7:12 PM
Resolved May 21, 2012 at 7:12 PM