Uploaded image for project: 'libzmq'
  1. libzmq
  2. LIBZMQ-496

Crash on heavy socket opening/closing: Device or resource busy (mutex.hpp:90)

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 3.2.2
    • Fix Version/s: 3.2.3
    • Component/s: core
    • Labels:
    • Environment:

      CentOS release 6.3 (Final Santiago)

      Description

      On heavy subscribe socket opening/closing I experience an assert hit when pthread_mutex_destroy is called in mutex.hpp.

      Attached is a piece of code which demonstrates this.
      Compiled with:

      gcc -O zmqpub.c -o zmqpub -lzmq -lpthread
      gcc -O zmqsub.c -o zmqsub -lzmq -lpthread

      Executed with:

      ./zmqpub &
      ./zmqsub

      zmqsub process will be crashed but the timing is very rare.
      It was taken few days to reproduce this issue on my environment.

      This test needs many open files so you should change the limits.

      ulimit -n 8192
      ulimit -c unlimited

      I got a core file and printed all stack trace of threads on this process.
      Please see bt.txt. But I couldn't find another thread to lock the mutex.

      I thought this issue looks same as LIBZMQ-281.
      But the version is different so I create new issue for this problem.

        Gliffy Diagrams

          Attachments

          1. bt.txt
            238 kB
          2. bt2.txt
            131 kB
          3. zmqpub.c
            0.8 kB
          4. zmqsub.c
            1 kB

            Activity

            Hide
            kanekotky Takayuki Kaneko added a comment -

            I reproduce a core file again. (bt2.txt)

            In bt.txt, Thread 52 was running at epoll_wait(). In bt2.txt, Thread 48 was running at write() on signaler.cpp:119.

            Is this a timing issue between epoll thread and closing socket thread?

            Show
            kanekotky Takayuki Kaneko added a comment - I reproduce a core file again. (bt2.txt) In bt.txt, Thread 52 was running at epoll_wait(). In bt2.txt, Thread 48 was running at write() on signaler.cpp:119. Is this a timing issue between epoll thread and closing socket thread?
            Hide
            mika.fischer Mika Fischer added a comment -

            Hi,

            could you please try to reproduce it after applying this patch / using this branch:
            https://github.com/mika-fischer/zeromq3-x/commit/1a17eb392e353a0c7606b127ac3100075427e424

            I suspect it's exactly the same issue as LIBZMQ-281, just harder to trigger in zeromq3-x than it was in zeromq2-x. I wasn't able to trigger it using the test case in LIBZMQ-281, but we ran into it on one of our production systems with ZeroMQ 3.2.2.

            Show
            mika.fischer Mika Fischer added a comment - Hi, could you please try to reproduce it after applying this patch / using this branch: https://github.com/mika-fischer/zeromq3-x/commit/1a17eb392e353a0c7606b127ac3100075427e424 I suspect it's exactly the same issue as LIBZMQ-281 , just harder to trigger in zeromq3-x than it was in zeromq2-x. I wasn't able to trigger it using the test case in LIBZMQ-281 , but we ran into it on one of our production systems with ZeroMQ 3.2.2.
            Hide
            kanekotky Takayuki Kaneko added a comment -

            Hi Mika,

            I ran the same test to reproduce it after applying your patch.
            I couldn't reproduce ever.

            I know this patch is a workground as you said, but it is very usefull!
            I hope this patch is goint to be merged into the branch 3-x.

            Show
            kanekotky Takayuki Kaneko added a comment - Hi Mika, I ran the same test to reproduce it after applying your patch. I couldn't reproduce ever. I know this patch is a workground as you said, but it is very usefull! I hope this patch is goint to be merged into the branch 3-x.
            Hide
            mika.fischer Mika Fischer added a comment -

            Thanks for testing! I opened a pull request for the fix: https://github.com/zeromq/zeromq3-x/pull/79

            Show
            mika.fischer Mika Fischer added a comment - Thanks for testing! I opened a pull request for the fix: https://github.com/zeromq/zeromq3-x/pull/79
            Hide
            mika.fischer Mika Fischer added a comment -

            This has been merged into https://github.com/zeromq/zeromq3-x.

            Takeshi, could you check it's fixed for you with the latest version from the repository above and close this issue if that's the case.

            Thanks!

            Show
            mika.fischer Mika Fischer added a comment - This has been merged into https://github.com/zeromq/zeromq3-x . Takeshi, could you check it's fixed for you with the latest version from the repository above and close this issue if that's the case. Thanks!
            Hide
            kanekotky Takayuki Kaneko added a comment -

            Mika,

            Thanks for your quick response! I closed this issue.

            Regards,

            Show
            kanekotky Takayuki Kaneko added a comment - Mika, Thanks for your quick response! I closed this issue. Regards,

              People

              • Assignee:
                Unassigned
                Reporter:
                kanekotky Takayuki Kaneko
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: