[kamailio/kamailio] Kamailio 5.0.2 suddenly stops processing traffic, then generates a core when restarting. (#1172)

classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[kamailio/kamailio] Kamailio 5.0.2 suddenly stops processing traffic, then generates a core when restarting. (#1172)

Daniel-Constantin Mierla-11

Description

Kamailio suddenly stops working (this is a production server so debug logging is disabled), when we restart it a core is generated and a segfault is logged.

Troubleshooting

The only thing I have seen is some reference to IPv6 in the backtrace. But no idea where to start looking. I'm pretty sure we will need more information, but I don't know how to read or understand backtraces, please let me know what you see in it so I can dig deeper into this problem.

Reproduction

I don't know how to reproduce yet, first I have to understand the cause.

Debugging Data

Backtrace:

[New LWP 12052]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/sbin/kamailio -P /var/run/kamailio/kamailio.pid -f /etc/kamailio/sbc.cfg -'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007efcf3b5df69 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
#0  0x00007efcf3b5df69 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
No symbol table info available.
#1  0x00007efcf3b5e234 in OPENSSL_cleanup () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
No symbol table info available.
#2  0x00007efcf7721910 in __run_exit_handlers (status=0, listp=0x7efcf7a855d8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:83
        atfct = <optimized out>
        onfct = <optimized out>
        cxafct = <optimized out>
        f = <optimized out>
#3  0x00007efcf772196a in __GI_exit (status=<optimized out>) at exit.c:105
No locals.
#4  0x000055f0a9483d54 in handle_sigs () at main.c:699
        chld = 845005216
        chld_status = 845005248
        any_chld_stopped = 32766
        memlog = 0
        __func__ = "handle_sigs"
#5  0x000055f0a948ea43 in main_loop () at main.c:1758
        i = 8
        pid = 12091
        si = 0x0
        si_desc = "udp receiver child=7 sock=[2602:FF37:0:1:0:0:C601:377A]:5060\000\367v\215\200\262+\363\372~\000\000\025\067\236\003\000\000\000\000\300\017H\251\360U\000\000\340\305]2\376\177", '\000' <repeats 18 times>, "\020\303]2\376\177\000\000a\202g\251\360U\000"
        nrprocs = 8
        woneinit = 1
        __func__ = "main_loop"
#6  0x000055f0a9494fb9 in main (argc=13, argv=0x7ffe325dc5e8) at main.c:2646
        cfg_stream = 0x55f0aa7fb010
        c = -1
        r = 0
        tmp = 0x7ffe325dceb6 ""
        tmp_len = -133383536
        port = 32508
        proto = 845006016
        options = 0x55f0a97d80e8 ":f:cm:M:dVIhEeb:l:L:n:vKrRDTN:W:w:t:u:g:P:G:SQ:O:a:A:x:X:Y:"
        ret = -1
        seed = 3679304743
        rfd = 4
        debug_save = 0
        debug_flag = 0
        dont_fork_cnt = 0
        n_lst = 0x0
        p = 0xffffffff <error: Cannot access memory at address 0xffffffff>
        st = {st_dev = 19, st_ino = 14769, st_nlink = 2, st_mode = 16832, st_uid = 109, st_gid = 113, __pad0 = 0, st_rdev = 0, st_size = 40, st_blksize = 4096, st_blocks = 0, st_atim = {tv_sec = 1498523393, tv_nsec = 731915583}, st_mtim = {tv_sec = 1498533865, tv_nsec = 274210743}, st_ctim = {tv_sec = 1498533865, tv_nsec = 274210743}, __glibc_reserved = {0, 0, 0}}
        __func__ = "main"

Info locals:

[New LWP 12052]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/sbin/kamailio -P /var/run/kamailio/kamailio.pid -f /etc/kamailio/sbc.cfg -'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007efcf3b5df69 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
No symbol table info available.

List:

[New LWP 12052]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/sbin/kamailio -P /var/run/kamailio/kamailio.pid -f /etc/kamailio/sbc.cfg -'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007efcf3b5df69 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1

Log Messages

We have regular logging with no issues, then suddenly we get:

Jun 28 01:57:51 kam-1 sbc[12075]: WARNING: <script>: [DISPATCHER] - Destination down: OPTIONS sip:A.A.A.A:5060 (<null>)
Jun 28 01:57:51 kam-1 sbc[12075]: WARNING: <script>: [DISPATCHER] - Destination down: OPTIONS sip:B.B.B.B:5060 (<null>)
Jun 28 01:58:40 kam-1 sbc[12076]: NOTICE: <script>: Node unavailable, trying next sip:A.A.A.A:5060 - M=INVITE R=sip:+[hidden email]:5060 ID=334f3vqDtj
Jun 28 01:58:40 kam-1 sbc[12076]: NOTICE: <script>: Forwarding request - M=INVITE R=sip:+[hidden email]:5060 ID=334f3vqDtj
Jun 28 01:58:55 kam-1 sbc[12075]: WARNING: dialog [dlg_handlers.c:1574]: dlg_ontimeout(): timeout for dlg with CallID '00g4gNy5KZ' and tags 'FHmrD-CM3' 'SZ3a5Sg58c5pB'
Jun 28 01:58:55 kam-1 sbc[12075]: WARNING: dialog [dlg_handlers.c:1574]: dlg_ontimeout(): timeout for dlg with CallID 'Kj7uLRzJfd' and tags 'wRSVjfGaT' 'yQHvjNaZ5Z3rc'
Jun 28 01:58:55 kam-1 sbc[12075]: WARNING: dialog [dlg_handlers.c:1574]: dlg_ontimeout(): timeout for dlg with CallID 'S5QUIXtS6-' and tags 'FV4ySEouz' 'eS7F910ycNg8H'
Jun 28 01:59:10 kam-1 sbc[12075]: ERROR: <script>: No available gateways - M=INVITE R=sip:+[hidden email]:5060 ID=334f3vqDtj
Jun 28 02:01:45 kam-1 sbc[12080]: NOTICE: dialog [dlg_hash.c:248]: dlg_clean_run(): dialog in delete state is too old (0x7efaf3904008 ref 1)
Jun 28 02:03:15 kam-1 sbc[12080]: NOTICE: dialog [dlg_hash.c:248]: dlg_clean_run(): dialog in delete state is too old (0x7efaf40f0f50 ref 1)
Jun 28 02:04:45 kam-1 sbc[12080]: NOTICE: dialog [dlg_hash.c:248]: dlg_clean_run(): dialog in delete state is too old (0x7efaf4d92108 ref 1)
Jun 28 02:55:59 kam-1 sbc[12071]: INFO: <core> [main.c:823]: sig_usr(): signal 15 received
Jun 28 02:55:59 kam-1 sbc[12052]: NOTICE: <core> [main.c:696]: handle_sigs(): Thank you for flying kamailio!!!

NOTE: The process stops logging at 02:04:45, then at 02:55:59 we manually restart kamailio and a core is generated.

Also, in dmesg we have:

[138953.744199] kamailio[12052]: segfault at 7efaf32f51b8 ip 00007efcf3b5df69 sp 00007ffe325dc0c0 error 4 in libcrypto.so.1.1[7efcf39f6000+265000]

SIP Traffic

I don't have sip traffic, If required I can start capturing it.

Additional Information

# kamailio -v
version: kamailio 5.0.2 (x86_64/linux)
flags: STATS: Off, USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MEM, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLACKLIST, HAVE_RESOLV_RES
ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB
poll method support: poll, epoll_lt, epoll_et, sigio_rt, select.
id: unknown
compiled with gcc 6.3.0
#
# dpkg -l | grep kamailio
ii  kamailio                         5.0.2+stretch                  amd64        very fast, dynamic and configurable SIP server
ii  kamailio-dbg:amd64               5.0.2+stretch                  amd64        very fast and configurable SIP server [debug symbols]
ii  kamailio-extra-modules:amd64     5.0.2+stretch                  amd64        Extra modules for the Kamailio SIP Server
ii  kamailio-tls-modules:amd64       5.0.2+stretch                  amd64        TLS support for the Kamailio SIP server (authentication, transport)
#
# dpkg -l |grep  libssl
ii  libssl1.0.2:amd64                1.0.2l-2                       amd64        Secure Sockets Layer toolkit - shared libraries
ii  libssl1.1:amd64                  1.1.0f-3                       amd64        Secure Sockets Layer toolkit - shared libraries#
  • Operating System:
# uname -a
Linux kam-1 4.9.0-3-amd64 #1 SMP Debian 4.9.30-2+deb9u1 (2017-06-18) x86_64 GNU/Linux
# lsb_release -a
No LSB modules are available.
Distributor ID:	Debian
Description:	Debian GNU/Linux 9.0 (stretch)
Release:	9.0
Codename:	stretch
#


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

<script type="application/json" data-scope="inboxmarkup">{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/kamailio/kamailio","title":"kamailio/kamailio","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/kamailio/kamailio"}},"updates":{"snippets":[{"icon":"DESCRIPTION","message":"Kamailio 5.0.2 suddenly stops processing traffic, then generates a core when restarting. (#1172)"}],"action":{"name":"View Issue","url":"https://github.com/kamailio/kamailio/issues/1172"}}}</script>
_______________________________________________
Kamailio (SER) - Development Mailing List
[hidden email]
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [kamailio/kamailio] Kamailio 5.0.2 suddenly stops processing traffic, then generates a core when restarting. (#1172)

Daniel-Constantin Mierla-11

I'm now capturing traffic in case it happens again.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

<script type="application/json" data-scope="inboxmarkup">{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/kamailio/kamailio","title":"kamailio/kamailio","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/kamailio/kamailio"}},"updates":{"snippets":[{"icon":"PERSON","message":"@joelsdc in #1172: I'm now capturing traffic in case it happens again."}],"action":{"name":"View Issue","url":"https://github.com/kamailio/kamailio/issues/1172#issuecomment-311779093"}}}</script>
_______________________________________________
Kamailio (SER) - Development Mailing List
[hidden email]
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [kamailio/kamailio] Kamailio 5.0.2 suddenly stops processing traffic, then generates a core when restarting. (#1172)

Daniel-Constantin Mierla-11
In reply to this post by Daniel-Constantin Mierla-11

I'm wondering, is this a kamailio issue or a debian stretch libssl1.1 issue?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

<script type="application/json" data-scope="inboxmarkup">{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/kamailio/kamailio","title":"kamailio/kamailio","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/kamailio/kamailio"}},"updates":{"snippets":[{"icon":"PERSON","message":"@joelsdc in #1172: I'm wondering, is this a kamailio issue or a debian stretch libssl1.1 issue?"}],"action":{"name":"View Issue","url":"https://github.com/kamailio/kamailio/issues/1172#issuecomment-311788848"}}}</script>
_______________________________________________
Kamailio (SER) - Development Mailing List
[hidden email]
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [kamailio/kamailio] Kamailio 5.0.2 suddenly stops processing traffic, then generates a core when restarting. (#1172)

Daniel-Constantin Mierla-11
In reply to this post by Daniel-Constantin Mierla-11

The same problem occurred on the second SBC, I have the packet capture from that moment. Please let me know what information you need.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

<script type="application/json" data-scope="inboxmarkup">{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/kamailio/kamailio","title":"kamailio/kamailio","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/kamailio/kamailio"}},"updates":{"snippets":[{"icon":"PERSON","message":"@joelsdc in #1172: The same problem occurred on the second SBC, I have the packet capture from that moment. Please let me know what information you need. "}],"action":{"name":"View Issue","url":"https://github.com/kamailio/kamailio/issues/1172#issuecomment-311835421"}}}</script>
_______________________________________________
Kamailio (SER) - Development Mailing List
[hidden email]
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [kamailio/kamailio] Kamailio 5.0.2 suddenly stops processing traffic, then generates a core when restarting. (#1172)

Daniel-Constantin Mierla-11
In reply to this post by Daniel-Constantin Mierla-11

The segfault happens at shutdown, which should be avoided if possible, but it is not relevant for what happens at runtime.

From the log messages, it seems that you use dispatcher (or some other routing module) and there is no active gateway where the sip request should be sent.

What you have seen in the packet capture? Are the SIP requests replied? Or no response from Kamailio?

Do you have any monitoring tool that may stop kamailio? The signal 15 (TERM) is the usual way to stop an application.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

<script type="application/json" data-scope="inboxmarkup">{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/kamailio/kamailio","title":"kamailio/kamailio","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/kamailio/kamailio"}},"updates":{"snippets":[{"icon":"PERSON","message":"@miconda in #1172: The segfault happens at shutdown, which should be avoided if possible, but it is not relevant for what happens at runtime.\r\n\r\nFrom the log messages, it seems that you use dispatcher (or some other routing module) and there is no active gateway where the sip request should be sent.\r\n\r\nWhat you have seen in the packet capture? Are the SIP requests replied? Or no response from Kamailio?\r\n\r\nDo you have any monitoring tool that may stop kamailio? The signal 15 (TERM) is the usual way to stop an application."}],"action":{"name":"View Issue","url":"https://github.com/kamailio/kamailio/issues/1172#issuecomment-311872602"}}}</script>
_______________________________________________
Kamailio (SER) - Development Mailing List
[hidden email]
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [kamailio/kamailio] Kamailio 5.0.2 suddenly stops processing traffic, then generates a core when restarting. (#1172)

Daniel-Constantin Mierla-11
In reply to this post by Daniel-Constantin Mierla-11

In the capture at some point Kamailio stops responding to any request. We have Nagios sending OPTIONS requests, when we get the CRITICAL because Kamailio doesn't reply, then we manually restart it (so the signal 15 TERM is caused in the restart).

We have a dispatcher group with 2 nodes, if both are down, Kamailio should reply with a 503.

I can send you the pcap in private, I would have to hide a lot of data to make post it here.

Also, I'm pretty sure the problem will reproduce, when it is happening how can I get some traces or something useful to send to you before we restart it?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

<script type="application/json" data-scope="inboxmarkup">{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/kamailio/kamailio","title":"kamailio/kamailio","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/kamailio/kamailio"}},"updates":{"snippets":[{"icon":"PERSON","message":"@joelsdc in #1172: In the capture at some point Kamailio stops responding to any request. We have Nagios sending OPTIONS requests, when we get the CRITICAL because Kamailio doesn't reply, then we manually restart it (so the signal 15 TERM is caused in the restart).\r\n\r\nWe have a dispatcher group with 2 nodes, if both are down, Kamailio should reply with a 503.\r\n\r\nI can send you the pcap in private, I would have to hide a lot of data to make post it here.\r\n\r\nAlso, I'm pretty sure the problem will reproduce, when it is happening how can I get some traces or something useful to send to you before we restart it?\r\n\r\n"}],"action":{"name":"View Issue","url":"https://github.com/kamailio/kamailio/issues/1172#issuecomment-311886954"}}}</script>
_______________________________________________
Kamailio (SER) - Development Mailing List
[hidden email]
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [kamailio/kamailio] Kamailio 5.0.2 suddenly stops processing traffic, then generates a core when restarting. (#1172)

Daniel-Constantin Mierla-11
In reply to this post by Daniel-Constantin Mierla-11

Have you enabled one core per process (pid)? If not, core files from other processes may be overwritten by the main process doing cleanup at shut down. Anyhow, if a worker process crashes, you should see a log message in syslog. not only the one about signal 15.

If you don't get a response, then the sip trace is not that useful for the moment.

First thing to double check -- if you enabled pike module, be sure you don't block the traffic that is trusted.

Otherwise, if you can reproduce, do following:

  • when kamailio runs fine, do kamctl ps to get the list of running processes
  • when kamailio stops routing sip traffic, attach with gdb to a worker process for your kind of traffic (UDP, TCP ... for example for UDP, attach to the PID listed for the process with ID 1) and get the backtrace:
gdb /path/to/kamailio PID
bt full

Also see if the CPU is at high load. That could be an indication of a dead (spin) lock. If CPU is low, maybe there is some blocking operation dns, db query, ...


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

<script type="application/json" data-scope="inboxmarkup">{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/kamailio/kamailio","title":"kamailio/kamailio","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/kamailio/kamailio"}},"updates":{"snippets":[{"icon":"PERSON","message":"@miconda in #1172: Have you enabled one core per process (pid)? If not, core files from other processes may be overwritten by the main process doing cleanup at shut down. Anyhow, if a worker process crashes, you should see a log message in syslog. not only the one about signal 15.\r\n\r\nIf you don't get a response, then the sip trace is not that useful for the moment.\r\n\r\nFirst thing to double check -- if you enabled pike module, be sure you don't block the traffic that is trusted.\r\n\r\nOtherwise, if you can reproduce, do following:\r\n\r\n * when kamailio runs fine, do `kamctl ps` to get the list of running processes\r\n * when kamailio stops routing sip traffic, attach with gdb to a worker process for your kind of traffic (UDP, TCP ... for example for UDP, attach to the PID listed for the process with ID 1) and get the backtrace:\r\n\r\n```\r\ngdb /path/to/kamailio PID\r\nbt full\r\n```\r\n\r\nAlso see if the CPU is at high load. That could be an indication of a dead (spin) lock. If CPU is low, maybe there is some blocking operation dns, db query, ..."}],"action":{"name":"View Issue","url":"https://github.com/kamailio/kamailio/issues/1172#issuecomment-311888817"}}}</script>
_______________________________________________
Kamailio (SER) - Development Mailing List
[hidden email]
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [kamailio/kamailio] Kamailio 5.0.2 suddenly stops processing traffic, then generates a core when restarting. (#1172)

Daniel-Constantin Mierla-11
In reply to this post by Daniel-Constantin Mierla-11

Ok! I'll wait until it happens again.

Some extra info:

1- Pike is enabled, but we have logging when it blocks something (and no logs appeared).
2- CPU is always < 1% usage.
3- I have enabled kernel.core_uses_pid=1.
4- Our traffic is:

[clients]--> IPv4/IPv6 tls_443 [Kamailio] IPv4 udp_5060 --> IPv4 udp_5060 [dispatcher setid 4].

If I run the bt full right now to a worker process, should we get random stuff?

Becase right now I'm getting mostly the same on each process:

root:~# for i in `kamctl ps | grep PID | awk '{print $2}' | tr -d ","`; do gdb /usr/sbin/kamailio -ex "bt full" --batch $i >> $i.txt 2>&1; done
root:~# 

root:~# ls -l
total 140
-rw-r--r-- 1 root root 1769 Jun 29 01:04 23585.txt
-rw-r--r-- 1 root root 3300 Jun 29 01:04 23586.txt
-rw-r--r-- 1 root root 3300 Jun 29 01:04 23587.txt
-rw-r--r-- 1 root root 3301 Jun 29 01:04 23588.txt
-rw-r--r-- 1 root root 3300 Jun 29 01:04 23589.txt
-rw-r--r-- 1 root root 3301 Jun 29 01:04 23590.txt
-rw-r--r-- 1 root root 3300 Jun 29 01:04 23591.txt
-rw-r--r-- 1 root root 3301 Jun 29 01:04 23592.txt
-rw-r--r-- 1 root root 3301 Jun 29 01:04 23593.txt
-rw-r--r-- 1 root root 3432 Jun 29 01:04 23594.txt
-rw-r--r-- 1 root root 3432 Jun 29 01:04 23595.txt
-rw-r--r-- 1 root root 3431 Jun 29 01:04 23596.txt
-rw-r--r-- 1 root root 3432 Jun 29 01:04 23597.txt
-rw-r--r-- 1 root root 3432 Jun 29 01:04 23598.txt
-rw-r--r-- 1 root root 3432 Jun 29 01:04 23599.txt
-rw-r--r-- 1 root root 3432 Jun 29 01:04 23600.txt
-rw-r--r-- 1 root root 3432 Jun 29 01:04 23601.txt
-rw-r--r-- 1 root root 2388 Jun 29 01:04 23602.txt
-rw-r--r-- 1 root root 1836 Jun 29 01:04 23603.txt
-rw-r--r-- 1 root root 2245 Jun 29 01:04 23604.txt
-rw-r--r-- 1 root root 4023 Jun 29 01:04 23605.txt
-rw-r--r-- 1 root root 2950 Jun 29 01:04 23606.txt
-rw-r--r-- 1 root root 2837 Jun 29 01:04 23607.txt
-rw-r--r-- 1 root root 4691 Jun 29 01:04 23608.txt
-rw-r--r-- 1 root root 2740 Jun 29 01:04 23609.txt
-rw-r--r-- 1 root root 2691 Jun 29 01:04 23610.txt
-rw-r--r-- 1 root root 2691 Jun 29 01:04 23611.txt
-rw-r--r-- 1 root root 2691 Jun 29 01:04 23612.txt
-rw-r--r-- 1 root root 2691 Jun 29 01:04 23613.txt
-rw-r--r-- 1 root root 2691 Jun 29 01:04 23614.txt
-rw-r--r-- 1 root root 2691 Jun 29 01:04 23615.txt
-rw-r--r-- 1 root root 2691 Jun 29 01:04 23616.txt
-rw-r--r-- 1 root root 2691 Jun 29 01:04 23617.txt
-rw-r--r-- 1 root root 2151 Jun 29 01:04 23618.txt
root:~#

root:~# grep -n ffff *
23585.txt:31:        p = 0xffffffff <error: Cannot access memory at address 0xffffffff>
23586.txt:39:        p = 0xffffffff <error: Cannot access memory at address 0xffffffff>
23587.txt:39:        p = 0xffffffff <error: Cannot access memory at address 0xffffffff>
23588.txt:39:        p = 0xffffffff <error: Cannot access memory at address 0xffffffff>
23589.txt:39:        p = 0xffffffff <error: Cannot access memory at address 0xffffffff>
23590.txt:39:        p = 0xffffffff <error: Cannot access memory at address 0xffffffff>
23591.txt:39:        p = 0xffffffff <error: Cannot access memory at address 0xffffffff>
23592.txt:39:        p = 0xffffffff <error: Cannot access memory at address 0xffffffff>
23593.txt:39:        p = 0xffffffff <error: Cannot access memory at address 0xffffffff>
23594.txt:39:        p = 0xffffffff <error: Cannot access memory at address 0xffffffff>
23595.txt:39:        p = 0xffffffff <error: Cannot access memory at address 0xffffffff>
23596.txt:39:        p = 0xffffffff <error: Cannot access memory at address 0xffffffff>
23597.txt:39:        p = 0xffffffff <error: Cannot access memory at address 0xffffffff>
23598.txt:39:        p = 0xffffffff <error: Cannot access memory at address 0xffffffff>
23599.txt:39:        p = 0xffffffff <error: Cannot access memory at address 0xffffffff>
23600.txt:39:        p = 0xffffffff <error: Cannot access memory at address 0xffffffff>
23601.txt:39:        p = 0xffffffff <error: Cannot access memory at address 0xffffffff>
23602.txt:40:        p = 0xffffffff <error: Cannot access memory at address 0xffffffff>
23603.txt:33:        p = 0xffffffff <error: Cannot access memory at address 0xffffffff>
23604.txt:39:        p = 0xffffffff <error: Cannot access memory at address 0xffffffff>
23605.txt:75:        p = 0xffffffff <error: Cannot access memory at address 0xffffffff>
23606.txt:51:        p = 0xffffffff <error: Cannot access memory at address 0xffffffff>
23607.txt:49:        p = 0xffffffff <error: Cannot access memory at address 0xffffffff>
23608.txt:75:        p = 0xffffffff <error: Cannot access memory at address 0xffffffff>
23609.txt:50:        p = 0xffffffff <error: Cannot access memory at address 0xffffffff>
23610.txt:47:        p = 0xffffffff <error: Cannot access memory at address 0xffffffff>
23611.txt:47:        p = 0xffffffff <error: Cannot access memory at address 0xffffffff>
23612.txt:47:        p = 0xffffffff <error: Cannot access memory at address 0xffffffff>
23613.txt:47:        p = 0xffffffff <error: Cannot access memory at address 0xffffffff>
23614.txt:47:        p = 0xffffffff <error: Cannot access memory at address 0xffffffff>
23615.txt:47:        p = 0xffffffff <error: Cannot access memory at address 0xffffffff>
23616.txt:47:        p = 0xffffffff <error: Cannot access memory at address 0xffffffff>
23617.txt:47:        p = 0xffffffff <error: Cannot access memory at address 0xffffffff>
23618.txt:41:        p = 0xffffffff <error: Cannot access memory at address 0xffffffff>
root:~#

Am I missing something or am I doing anything wrong here?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

<script type="application/json" data-scope="inboxmarkup">{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/kamailio/kamailio","title":"kamailio/kamailio","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/kamailio/kamailio"}},"updates":{"snippets":[{"icon":"PERSON","message":"@joelsdc in #1172: Ok! I'll wait until it happens again. \r\n\r\nSome extra info:\r\n\r\n1- Pike is enabled, but we have logging when it blocks something (and no logs appeared).\r\n2- CPU is always \u003c 1% usage.\r\n3- I have enabled kernel.core_uses_pid=1. \r\n4- Our traffic is:\r\n\r\n[clients]--\u003e IPv4/IPv6 tls_443 [Kamailio] IPv4 udp_5060 --\u003e IPv4 udp_5060 [dispatcher setid 4].\r\n\r\nIf I run the bt full right now to a worker process, should we get random stuff?\r\n\r\nBecase right now I'm getting mostly the same on each process:\r\n\r\n```\r\nroot:~# for i in `kamctl ps | grep PID | awk '{print $2}' | tr -d \",\"`; do gdb /usr/sbin/kamailio -ex \"bt full\" --batch $i \u003e\u003e $i.txt 2\u003e\u00261; done\r\nroot:~# \r\n\r\nroot:~# ls -l\r\ntotal 140\r\n-rw-r--r-- 1 root root 1769 Jun 29 01:04 23585.txt\r\n-rw-r--r-- 1 root root 3300 Jun 29 01:04 23586.txt\r\n-rw-r--r-- 1 root root 3300 Jun 29 01:04 23587.txt\r\n-rw-r--r-- 1 root root 3301 Jun 29 01:04 23588.txt\r\n-rw-r--r-- 1 root root 3300 Jun 29 01:04 23589.txt\r\n-rw-r--r-- 1 root root 3301 Jun 29 01:04 23590.txt\r\n-rw-r--r-- 1 root root 3300 Jun 29 01:04 23591.txt\r\n-rw-r--r-- 1 root root 3301 Jun 29 01:04 23592.txt\r\n-rw-r--r-- 1 root root 3301 Jun 29 01:04 23593.txt\r\n-rw-r--r-- 1 root root 3432 Jun 29 01:04 23594.txt\r\n-rw-r--r-- 1 root root 3432 Jun 29 01:04 23595.txt\r\n-rw-r--r-- 1 root root 3431 Jun 29 01:04 23596.txt\r\n-rw-r--r-- 1 root root 3432 Jun 29 01:04 23597.txt\r\n-rw-r--r-- 1 root root 3432 Jun 29 01:04 23598.txt\r\n-rw-r--r-- 1 root root 3432 Jun 29 01:04 23599.txt\r\n-rw-r--r-- 1 root root 3432 Jun 29 01:04 23600.txt\r\n-rw-r--r-- 1 root root 3432 Jun 29 01:04 23601.txt\r\n-rw-r--r-- 1 root root 2388 Jun 29 01:04 23602.txt\r\n-rw-r--r-- 1 root root 1836 Jun 29 01:04 23603.txt\r\n-rw-r--r-- 1 root root 2245 Jun 29 01:04 23604.txt\r\n-rw-r--r-- 1 root root 4023 Jun 29 01:04 23605.txt\r\n-rw-r--r-- 1 root root 2950 Jun 29 01:04 23606.txt\r\n-rw-r--r-- 1 root root 2837 Jun 29 01:04 23607.txt\r\n-rw-r--r-- 1 root root 4691 Jun 29 01:04 23608.txt\r\n-rw-r--r-- 1 root root 2740 Jun 29 01:04 23609.txt\r\n-rw-r--r-- 1 root root 2691 Jun 29 01:04 23610.txt\r\n-rw-r--r-- 1 root root 2691 Jun 29 01:04 23611.txt\r\n-rw-r--r-- 1 root root 2691 Jun 29 01:04 23612.txt\r\n-rw-r--r-- 1 root root 2691 Jun 29 01:04 23613.txt\r\n-rw-r--r-- 1 root root 2691 Jun 29 01:04 23614.txt\r\n-rw-r--r-- 1 root root 2691 Jun 29 01:04 23615.txt\r\n-rw-r--r-- 1 root root 2691 Jun 29 01:04 23616.txt\r\n-rw-r--r-- 1 root root 2691 Jun 29 01:04 23617.txt\r\n-rw-r--r-- 1 root root 2151 Jun 29 01:04 23618.txt\r\nroot:~#\r\n\r\nroot:~# grep -n ffff *\r\n23585.txt:31: p = 0xffffffff \u003cerror: Cannot access memory at address 0xffffffff\u003e\r\n23586.txt:39: p = 0xffffffff \u003cerror: Cannot access memory at address 0xffffffff\u003e\r\n23587.txt:39: p = 0xffffffff \u003cerror: Cannot access memory at address 0xffffffff\u003e\r\n23588.txt:39: p = 0xffffffff \u003cerror: Cannot access memory at address 0xffffffff\u003e\r\n23589.txt:39: p = 0xffffffff \u003cerror: Cannot access memory at address 0xffffffff\u003e\r\n23590.txt:39: p = 0xffffffff \u003cerror: Cannot access memory at address 0xffffffff\u003e\r\n23591.txt:39: p = 0xffffffff \u003cerror: Cannot access memory at address 0xffffffff\u003e\r\n23592.txt:39: p = 0xffffffff \u003cerror: Cannot access memory at address 0xffffffff\u003e\r\n23593.txt:39: p = 0xffffffff \u003cerror: Cannot access memory at address 0xffffffff\u003e\r\n23594.txt:39: p = 0xffffffff \u003cerror: Cannot access memory at address 0xffffffff\u003e\r\n23595.txt:39: p = 0xffffffff \u003cerror: Cannot access memory at address 0xffffffff\u003e\r\n23596.txt:39: p = 0xffffffff \u003cerror: Cannot access memory at address 0xffffffff\u003e\r\n23597.txt:39: p = 0xffffffff \u003cerror: Cannot access memory at address 0xffffffff\u003e\r\n23598.txt:39: p = 0xffffffff \u003cerror: Cannot access memory at address 0xffffffff\u003e\r\n23599.txt:39: p = 0xffffffff \u003cerror: Cannot access memory at address 0xffffffff\u003e\r\n23600.txt:39: p = 0xffffffff \u003cerror: Cannot access memory at address 0xffffffff\u003e\r\n23601.txt:39: p = 0xffffffff \u003cerror: Cannot access memory at address 0xffffffff\u003e\r\n23602.txt:40: p = 0xffffffff \u003cerror: Cannot access memory at address 0xffffffff\u003e\r\n23603.txt:33: p = 0xffffffff \u003cerror: Cannot access memory at address 0xffffffff\u003e\r\n23604.txt:39: p = 0xffffffff \u003cerror: Cannot access memory at address 0xffffffff\u003e\r\n23605.txt:75: p = 0xffffffff \u003cerror: Cannot access memory at address 0xffffffff\u003e\r\n23606.txt:51: p = 0xffffffff \u003cerror: Cannot access memory at address 0xffffffff\u003e\r\n23607.txt:49: p = 0xffffffff \u003cerror: Cannot access memory at address 0xffffffff\u003e\r\n23608.txt:75: p = 0xffffffff \u003cerror: Cannot access memory at address 0xffffffff\u003e\r\n23609.txt:50: p = 0xffffffff \u003cerror: Cannot access memory at address 0xffffffff\u003e\r\n23610.txt:47: p = 0xffffffff \u003cerror: Cannot access memory at address 0xffffffff\u003e\r\n23611.txt:47: p = 0xffffffff \u003cerror: Cannot access memory at address 0xffffffff\u003e\r\n23612.txt:47: p = 0xffffffff \u003cerror: Cannot access memory at address 0xffffffff\u003e\r\n23613.txt:47: p = 0xffffffff \u003cerror: Cannot access memory at address 0xffffffff\u003e\r\n23614.txt:47: p = 0xffffffff \u003cerror: Cannot access memory at address 0xffffffff\u003e\r\n23615.txt:47: p = 0xffffffff \u003cerror: Cannot access memory at address 0xffffffff\u003e\r\n23616.txt:47: p = 0xffffffff \u003cerror: Cannot access memory at address 0xffffffff\u003e\r\n23617.txt:47: p = 0xffffffff \u003cerror: Cannot access memory at address 0xffffffff\u003e\r\n23618.txt:41: p = 0xffffffff \u003cerror: Cannot access memory at address 0xffffffff\u003e\r\nroot:~#\r\n```\r\n\r\nAm I missing something or am I doing anything wrong here?"}],"action":{"name":"View Issue","url":"https://github.com/kamailio/kamailio/issues/1172#issuecomment-311893980"}}}</script>
_______________________________________________
Kamailio (SER) - Development Mailing List
[hidden email]
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [kamailio/kamailio] Kamailio 5.0.2 suddenly stops processing traffic, then generates a core when restarting. (#1172)

Daniel-Constantin Mierla-11
In reply to this post by Daniel-Constantin Mierla-11

Can you paste the content of a bactrace for one udp and one tcp worker?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

<script type="application/json" data-scope="inboxmarkup">{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/kamailio/kamailio","title":"kamailio/kamailio","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/kamailio/kamailio"}},"updates":{"snippets":[{"icon":"PERSON","message":"@miconda in #1172: Can you paste the content of a bactrace for one udp and one tcp worker?"}],"action":{"name":"View Issue","url":"https://github.com/kamailio/kamailio/issues/1172#issuecomment-311913726"}}}</script>
_______________________________________________
Kamailio (SER) - Development Mailing List
[hidden email]
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [kamailio/kamailio] Kamailio 5.0.2 suddenly stops processing traffic, then generates a core when restarting. (#1172)

Daniel-Constantin Mierla-11
In reply to this post by Daniel-Constantin Mierla-11

Hi Daniel,

Here is the output:

root:~/bt# kamctl ps | grep -B1 "tcp receiver"
      "PID":  23610,
      "DSC":  "tcp receiver (generic) child=0"
--
      "PID":  23611,
      "DSC":  "tcp receiver (generic) child=1"
--
      "PID":  23612,
      "DSC":  "tcp receiver (generic) child=2"
--
      "PID":  23613,
      "DSC":  "tcp receiver (generic) child=3"
--
      "PID":  23614,
      "DSC":  "tcp receiver (generic) child=4"
--
      "PID":  23615,
      "DSC":  "tcp receiver (generic) child=5"
--
      "PID":  23616,
      "DSC":  "tcp receiver (generic) child=6"
--
      "PID":  23617,
      "DSC":  "tcp receiver (generic) child=7"
root:~/bt#

root:~/bt# gdb /usr/sbin/kamailio -ex "bt full" --batch 23610
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f73f4cbd0d3 in __epoll_wait_nocancel () at ../sysdeps/unix/syscall-template.S:84
84	../sysdeps/unix/syscall-template.S: No such file or directory.
#0  0x00007f73f4cbd0d3 in __epoll_wait_nocancel () at ../sysdeps/unix/syscall-template.S:84
No locals.
#1  0x000055a667819c13 in io_wait_loop_epoll (h=0x55a667cbee80 <io_w>, t=2, repeat=0) at core/io_wait.h:1034
        n = 2
        r = 1
        fm = 0x7ffddf805cf0
        revents = 1734426560
        __func__ = "io_wait_loop_epoll"
#2  0x000055a66782afd3 in tcp_receive_loop (unix_sock=40) at core/tcp_read.c:1789
        __func__ = "tcp_receive_loop"
#3  0x000055a66770f088 in tcp_init_children () at core/tcp_main.c:4796
        r = 0
        i = 7
        reader_fd_1 = 40
        pid = 0
        si_desc = "tcp receiver (generic)\000\000\360\\\200\337\375\177", '\000' <repeats 18 times>, "\320X\200\337\375\177\000\000u\340tg\246U\000\000\201\016\234g\246U\000\000\001\000\000\000\000\000\000\000p\022\230g\000\000\000\000\070\352C\364s\177\000\000\300?ag\246U\000\000\f\372\000s\000\000\000\000\360X\200\337\375\177\000\000\f\372\000s\000\000\000"
        si = 0x0
        __func__ = "tcp_init_children"
#4  0x000055a667620ff7 in main_loop () at main.c:1715
        i = 8
        pid = 23603
        si = 0x0
        si_desc = "udp receiver child=7 sock=[2602:FF37:0:1:0:0:C601:3779]:5060\000\257$\366\200:z\360q\177\000\000\f\372\000s\000\000\000\000\300?ag\246U\000\000\360\\\200\337\375\177", '\000' <repeats 18 times>, " Z\200\337\375\177\000\000a\262\200g\246U\000"
        nrprocs = 8
        woneinit = 1
        __func__ = "main_loop"
#5  0x000055a667627fb9 in main (argc=13, argv=0x7ffddf805cf8) at main.c:2646
        cfg_stream = 0x55a668b42010
        c = -1
        r = 0
        tmp = 0x7ffddf806eb5 ""
        tmp_len = -178570608
        port = 32627
        proto = -545236016
        options = 0x55a66796b0e8 ":f:cm:M:dVIhEeb:l:L:n:vKrRDTN:W:w:t:u:g:P:G:SQ:O:a:A:x:X:Y:"
        ret = -1
        seed = 83065607
        rfd = 4
        debug_save = 0
        debug_flag = 0
        dont_fork_cnt = 0
        n_lst = 0x0
        p = 0xffffffff <error: Cannot access memory at address 0xffffffff>
        st = {st_dev = 19, st_ino = 15809, st_nlink = 2, st_mode = 16832, st_uid = 109, st_gid = 113, __pad0 = 0, st_rdev = 0, st_size = 40, st_blksize = 4096, st_blocks = 0, st_atim = {tv_sec = 1498523393, tv_nsec = 31134570}, st_mtim = {tv_sec = 1498687722, tv_nsec = 986617235}, st_ctim = {tv_sec = 1498687722, tv_nsec = 986617235}, __glibc_reserved = {0, 0, 0}}
        __func__ = "main"
root:~/bt#

root:~/bt# kamctl ps | grep -B1 "udp receiver"
      "PID":  23586,
      "DSC":  "udp receiver child=0 sock=198.1.55.121:5060"
--
      "PID":  23587,
      "DSC":  "udp receiver child=1 sock=198.1.55.121:5060"
--
      "PID":  23588,
      "DSC":  "udp receiver child=2 sock=198.1.55.121:5060"
--
      "PID":  23589,
      "DSC":  "udp receiver child=3 sock=198.1.55.121:5060"
--
      "PID":  23590,
      "DSC":  "udp receiver child=4 sock=198.1.55.121:5060"
--
      "PID":  23591,
      "DSC":  "udp receiver child=5 sock=198.1.55.121:5060"
--
      "PID":  23592,
      "DSC":  "udp receiver child=6 sock=198.1.55.121:5060"
--
      "PID":  23593,
      "DSC":  "udp receiver child=7 sock=198.1.55.121:5060"
--
      "PID":  23594,
      "DSC":  "udp receiver child=0 sock=[2602:FF37:0:1:0:0:C601:3779]:5060"
--
      "PID":  23595,
      "DSC":  "udp receiver child=1 sock=[2602:FF37:0:1:0:0:C601:3779]:5060"
--
      "PID":  23596,
      "DSC":  "udp receiver child=2 sock=[2602:FF37:0:1:0:0:C601:3779]:5060"
--
      "PID":  23597,
      "DSC":  "udp receiver child=3 sock=[2602:FF37:0:1:0:0:C601:3779]:5060"
--
      "PID":  23598,
      "DSC":  "udp receiver child=4 sock=[2602:FF37:0:1:0:0:C601:3779]:5060"
--
      "PID":  23599,
      "DSC":  "udp receiver child=5 sock=[2602:FF37:0:1:0:0:C601:3779]:5060"
--
      "PID":  23600,
      "DSC":  "udp receiver child=6 sock=[2602:FF37:0:1:0:0:C601:3779]:5060"
--
      "PID":  23601,
      "DSC":  "udp receiver child=7 sock=[2602:FF37:0:1:0:0:C601:3779]:5060"
root:~/bt#

root:~/bt# gdb /usr/sbin/kamailio -ex "bt full" --batch 23586
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f73f4cbd9f3 in __recvfrom_nocancel () at ../sysdeps/unix/syscall-template.S:84
84	../sysdeps/unix/syscall-template.S: No such file or directory.
#0  0x00007f73f4cbd9f3 in __recvfrom_nocancel () at ../sysdeps/unix/syscall-template.S:84
No locals.
#1  0x000055a6676ad0f2 in udp_rcv_loop () at core/udp_server.c:441
        len = 392
        buf = "SIP/2.0 200 canceling\r\nVia: SIP/2.0/UDP 198.1.55.121;branch=z9hG4bK1b16.c1f15688f268122a121deccea3eb8c11.0;i=eff9\r\nFrom: \"Deniz Altunta?\" <sip:[hidden email]>;tag=Zcf"...
        tmp = 0x55a66774e70c <init_child+52> "\311\303UH\211\345AWAVAUATSH\203\354\070H\211}\270H\203}\270"
        from = 0x7f73f44b6348
        fromlen = 16
        ri = {src_ip = {af = 2, len = 4, u = {addrl = {2067202502, 140135995956936}, addr32 = {2067202502, 0, 4097990344, 32627}, addr16 = {454, 31543, 0, 0, 24264, 62530, 32627, 0}, addr = "\306\001\067{\000\000\000\000\310^B\364s\177\000"}}, dst_ip = {af = 2, len = 4, u = {addrl = {2033648070, 0}, addr32 = {2033648070, 0, 0, 0}, addr16 = {454, 31031, 0, 0, 0, 0, 0, 0}, addr = "\306\001\067y", '\000' <repeats 11 times>}}, src_port = 5060, dst_port = 5060, proto_reserved1 = 0, proto_reserved2 = 0, src_su = {s = {sa_family = 2, sa_data = "\023\304\306\001\067{\000\000\000\000\000\000\000"}, sin = {sin_family = 2, sin_port = 50195, sin_addr = {s_addr = 2067202502}, sin_zero = "\000\000\000\000\000\000\000"}, sin6 = {sin6_family = 2, sin6_port = 50195, sin6_flowinfo = 2067202502, sin6_addr = {__in6_u = {__u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, sin6_scope_id = 0}}, bind_address = 0x7f73f4425d48, proto = 1 '\001'}
        __func__ = "udp_rcv_loop"
#2  0x000055a66761ff1e in main_loop () at main.c:1625
        i = 0
        pid = 0
        si = 0x7f73f4425d48
        si_desc = "udp receiver child=0 sock=198.1.55.121:5060\000o\000\000\000H֘g\246U\000\000\000*r\331D\257$\366\200:z\360q\177\000\000\f\372\000s\000\000\000\000\300?ag\246U\000\000\360\\\200\337\375\177", '\000' <repeats 18 times>, " Z\200\337\375\177\000\000a\262\200g\246U\000"
        nrprocs = 8
        woneinit = 0
        __func__ = "main_loop"
#3  0x000055a667627fb9 in main (argc=13, argv=0x7ffddf805cf8) at main.c:2646
        cfg_stream = 0x55a668b42010
        c = -1
        r = 0
        tmp = 0x7ffddf806eb5 ""
        tmp_len = -178570608
        port = 32627
        proto = -545236016
        options = 0x55a66796b0e8 ":f:cm:M:dVIhEeb:l:L:n:vKrRDTN:W:w:t:u:g:P:G:SQ:O:a:A:x:X:Y:"
        ret = -1
        seed = 83065607
        rfd = 4
        debug_save = 0
        debug_flag = 0
        dont_fork_cnt = 0
        n_lst = 0x0
        p = 0xffffffff <error: Cannot access memory at address 0xffffffff>
        st = {st_dev = 19, st_ino = 15809, st_nlink = 2, st_mode = 16832, st_uid = 109, st_gid = 113, __pad0 = 0, st_rdev = 0, st_size = 40, st_blksize = 4096, st_blocks = 0, st_atim = {tv_sec = 1498523393, tv_nsec = 31134570}, st_mtim = {tv_sec = 1498687722, tv_nsec = 986617235}, st_ctim = {tv_sec = 1498687722, tv_nsec = 986617235}, __glibc_reserved = {0, 0, 0}}
        __func__ = "main"
root:~/bt#


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

<script type="application/json" data-scope="inboxmarkup">{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/kamailio/kamailio","title":"kamailio/kamailio","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/kamailio/kamailio"}},"updates":{"snippets":[{"icon":"PERSON","message":"@joelsdc in #1172: Hi Daniel, \r\n\r\nHere is the output:\r\n\r\n```\r\nroot:~/bt# kamctl ps | grep -B1 \"tcp receiver\"\r\n \"PID\": 23610,\r\n \"DSC\": \"tcp receiver (generic) child=0\"\r\n--\r\n \"PID\": 23611,\r\n \"DSC\": \"tcp receiver (generic) child=1\"\r\n--\r\n \"PID\": 23612,\r\n \"DSC\": \"tcp receiver (generic) child=2\"\r\n--\r\n \"PID\": 23613,\r\n \"DSC\": \"tcp receiver (generic) child=3\"\r\n--\r\n \"PID\": 23614,\r\n \"DSC\": \"tcp receiver (generic) child=4\"\r\n--\r\n \"PID\": 23615,\r\n \"DSC\": \"tcp receiver (generic) child=5\"\r\n--\r\n \"PID\": 23616,\r\n \"DSC\": \"tcp receiver (generic) child=6\"\r\n--\r\n \"PID\": 23617,\r\n \"DSC\": \"tcp receiver (generic) child=7\"\r\nroot:~/bt#\r\n\r\nroot:~/bt# gdb /usr/sbin/kamailio -ex \"bt full\" --batch 23610\r\n[Thread debugging using libthread_db enabled]\r\nUsing host libthread_db library \"/lib/x86_64-linux-gnu/libthread_db.so.1\".\r\n0x00007f73f4cbd0d3 in __epoll_wait_nocancel () at ../sysdeps/unix/syscall-template.S:84\r\n84\t../sysdeps/unix/syscall-template.S: No such file or directory.\r\n#0 0x00007f73f4cbd0d3 in __epoll_wait_nocancel () at ../sysdeps/unix/syscall-template.S:84\r\nNo locals.\r\n#1 0x000055a667819c13 in io_wait_loop_epoll (h=0x55a667cbee80 \u003cio_w\u003e, t=2, repeat=0) at core/io_wait.h:1034\r\n n = 2\r\n r = 1\r\n fm = 0x7ffddf805cf0\r\n revents = 1734426560\r\n __func__ = \"io_wait_loop_epoll\"\r\n#2 0x000055a66782afd3 in tcp_receive_loop (unix_sock=40) at core/tcp_read.c:1789\r\n __func__ = \"tcp_receive_loop\"\r\n#3 0x000055a66770f088 in tcp_init_children () at core/tcp_main.c:4796\r\n r = 0\r\n i = 7\r\n reader_fd_1 = 40\r\n pid = 0\r\n si_desc = \"tcp receiver (generic)\\000\\000\\360\\\\\\200\\337\\375\\177\", '\\000' \u003crepeats 18 times\u003e, \"\\320X\\200\\337\\375\\177\\000\\000u\\340tg\\246U\\000\\000\\201\\016\\234g\\246U\\000\\000\\001\\000\\000\\000\\000\\000\\000\\000p\\022\\230g\\000\\000\\000\\000\\070\\352C\\364s\\177\\000\\000\\300?ag\\246U\\000\\000\\f\\372\\000s\\000\\000\\000\\000\\360X\\200\\337\\375\\177\\000\\000\\f\\372\\000s\\000\\000\\000\"\r\n si = 0x0\r\n __func__ = \"tcp_init_children\"\r\n#4 0x000055a667620ff7 in main_loop () at main.c:1715\r\n i = 8\r\n pid = 23603\r\n si = 0x0\r\n si_desc = \"udp receiver child=7 sock=[2602:FF37:0:1:0:0:C601:3779]:5060\\000\\257$\\366\\200:z\\360q\\177\\000\\000\\f\\372\\000s\\000\\000\\000\\000\\300?ag\\246U\\000\\000\\360\\\\\\200\\337\\375\\177\", '\\000' \u003crepeats 18 times\u003e, \" Z\\200\\337\\375\\177\\000\\000a\\262\\200g\\246U\\000\"\r\n nrprocs = 8\r\n woneinit = 1\r\n __func__ = \"main_loop\"\r\n#5 0x000055a667627fb9 in main (argc=13, argv=0x7ffddf805cf8) at main.c:2646\r\n cfg_stream = 0x55a668b42010\r\n c = -1\r\n r = 0\r\n tmp = 0x7ffddf806eb5 \"\"\r\n tmp_len = -178570608\r\n port = 32627\r\n proto = -545236016\r\n options = 0x55a66796b0e8 \":f:cm:M:dVIhEeb:l:L:n:vKrRDTN:W:w:t:u:g:P:G:SQ:O:a:A:x:X:Y:\"\r\n ret = -1\r\n seed = 83065607\r\n rfd = 4\r\n debug_save = 0\r\n debug_flag = 0\r\n dont_fork_cnt = 0\r\n n_lst = 0x0\r\n p = 0xffffffff \u003cerror: Cannot access memory at address 0xffffffff\u003e\r\n st = {st_dev = 19, st_ino = 15809, st_nlink = 2, st_mode = 16832, st_uid = 109, st_gid = 113, __pad0 = 0, st_rdev = 0, st_size = 40, st_blksize = 4096, st_blocks = 0, st_atim = {tv_sec = 1498523393, tv_nsec = 31134570}, st_mtim = {tv_sec = 1498687722, tv_nsec = 986617235}, st_ctim = {tv_sec = 1498687722, tv_nsec = 986617235}, __glibc_reserved = {0, 0, 0}}\r\n __func__ = \"main\"\r\nroot:~/bt#\r\n\r\nroot:~/bt# kamctl ps | grep -B1 \"udp receiver\"\r\n \"PID\": 23586,\r\n \"DSC\": \"udp receiver child=0 sock=198.1.55.121:5060\"\r\n--\r\n \"PID\": 23587,\r\n \"DSC\": \"udp receiver child=1 sock=198.1.55.121:5060\"\r\n--\r\n \"PID\": 23588,\r\n \"DSC\": \"udp receiver child=2 sock=198.1.55.121:5060\"\r\n--\r\n \"PID\": 23589,\r\n \"DSC\": \"udp receiver child=3 sock=198.1.55.121:5060\"\r\n--\r\n \"PID\": 23590,\r\n \"DSC\": \"udp receiver child=4 sock=198.1.55.121:5060\"\r\n--\r\n \"PID\": 23591,\r\n \"DSC\": \"udp receiver child=5 sock=198.1.55.121:5060\"\r\n--\r\n \"PID\": 23592,\r\n \"DSC\": \"udp receiver child=6 sock=198.1.55.121:5060\"\r\n--\r\n \"PID\": 23593,\r\n \"DSC\": \"udp receiver child=7 sock=198.1.55.121:5060\"\r\n--\r\n \"PID\": 23594,\r\n \"DSC\": \"udp receiver child=0 sock=[2602:FF37:0:1:0:0:C601:3779]:5060\"\r\n--\r\n \"PID\": 23595,\r\n \"DSC\": \"udp receiver child=1 sock=[2602:FF37:0:1:0:0:C601:3779]:5060\"\r\n--\r\n \"PID\": 23596,\r\n \"DSC\": \"udp receiver child=2 sock=[2602:FF37:0:1:0:0:C601:3779]:5060\"\r\n--\r\n \"PID\": 23597,\r\n \"DSC\": \"udp receiver child=3 sock=[2602:FF37:0:1:0:0:C601:3779]:5060\"\r\n--\r\n \"PID\": 23598,\r\n \"DSC\": \"udp receiver child=4 sock=[2602:FF37:0:1:0:0:C601:3779]:5060\"\r\n--\r\n \"PID\": 23599,\r\n \"DSC\": \"udp receiver child=5 sock=[2602:FF37:0:1:0:0:C601:3779]:5060\"\r\n--\r\n \"PID\": 23600,\r\n \"DSC\": \"udp receiver child=6 sock=[2602:FF37:0:1:0:0:C601:3779]:5060\"\r\n--\r\n \"PID\": 23601,\r\n \"DSC\": \"udp receiver child=7 sock=[2602:FF37:0:1:0:0:C601:3779]:5060\"\r\nroot:~/bt#\r\n\r\nroot:~/bt# gdb /usr/sbin/kamailio -ex \"bt full\" --batch 23586\r\n[Thread debugging using libthread_db enabled]\r\nUsing host libthread_db library \"/lib/x86_64-linux-gnu/libthread_db.so.1\".\r\n0x00007f73f4cbd9f3 in __recvfrom_nocancel () at ../sysdeps/unix/syscall-template.S:84\r\n84\t../sysdeps/unix/syscall-template.S: No such file or directory.\r\n#0 0x00007f73f4cbd9f3 in __recvfrom_nocancel () at ../sysdeps/unix/syscall-template.S:84\r\nNo locals.\r\n#1 0x000055a6676ad0f2 in udp_rcv_loop () at core/udp_server.c:441\r\n len = 392\r\n buf = \"SIP/2.0 200 canceling\\r\\nVia: SIP/2.0/UDP 198.1.55.121;branch=z9hG4bK1b16.c1f15688f268122a121deccea3eb8c11.0;i=eff9\\r\\nFrom: \\\"Deniz Altunta?\\\" \u003csip:b9c825417f684a93a03cff1dde3f6c44@app.nextplus.me\u003e;tag=Zcf\"...\r\n tmp = 0x55a66774e70c \u003cinit_child+52\u003e \"\\311\\303UH\\211\\345AWAVAUATSH\\203\\354\\070H\\211}\\270H\\203}\\270\"\r\n from = 0x7f73f44b6348\r\n fromlen = 16\r\n ri = {src_ip = {af = 2, len = 4, u = {addrl = {2067202502, 140135995956936}, addr32 = {2067202502, 0, 4097990344, 32627}, addr16 = {454, 31543, 0, 0, 24264, 62530, 32627, 0}, addr = \"\\306\\001\\067{\\000\\000\\000\\000\\310^B\\364s\\177\\000\"}}, dst_ip = {af = 2, len = 4, u = {addrl = {2033648070, 0}, addr32 = {2033648070, 0, 0, 0}, addr16 = {454, 31031, 0, 0, 0, 0, 0, 0}, addr = \"\\306\\001\\067y\", '\\000' \u003crepeats 11 times\u003e}}, src_port = 5060, dst_port = 5060, proto_reserved1 = 0, proto_reserved2 = 0, src_su = {s = {sa_family = 2, sa_data = \"\\023\\304\\306\\001\\067{\\000\\000\\000\\000\\000\\000\\000\"}, sin = {sin_family = 2, sin_port = 50195, sin_addr = {s_addr = 2067202502}, sin_zero = \"\\000\\000\\000\\000\\000\\000\\000\"}, sin6 = {sin6_family = 2, sin6_port = 50195, sin6_flowinfo = 2067202502, sin6_addr = {__in6_u = {__u6_addr8 = '\\000' \u003crepeats 15 times\u003e, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, sin6_scope_id = 0}}, bind_address = 0x7f73f4425d48, proto = 1 '\\001'}\r\n __func__ = \"udp_rcv_loop\"\r\n#2 0x000055a66761ff1e in main_loop () at main.c:1625\r\n i = 0\r\n pid = 0\r\n si = 0x7f73f4425d48\r\n si_desc = \"udp receiver child=0 sock=198.1.55.121:5060\\000o\\000\\000\\000H֘g\\246U\\000\\000\\000*r\\331D\\257$\\366\\200:z\\360q\\177\\000\\000\\f\\372\\000s\\000\\000\\000\\000\\300?ag\\246U\\000\\000\\360\\\\\\200\\337\\375\\177\", '\\000' \u003crepeats 18 times\u003e, \" Z\\200\\337\\375\\177\\000\\000a\\262\\200g\\246U\\000\"\r\n nrprocs = 8\r\n woneinit = 0\r\n __func__ = \"main_loop\"\r\n#3 0x000055a667627fb9 in main (argc=13, argv=0x7ffddf805cf8) at main.c:2646\r\n cfg_stream = 0x55a668b42010\r\n c = -1\r\n r = 0\r\n tmp = 0x7ffddf806eb5 \"\"\r\n tmp_len = -178570608\r\n port = 32627\r\n proto = -545236016\r\n options = 0x55a66796b0e8 \":f:cm:M:dVIhEeb:l:L:n:vKrRDTN:W:w:t:u:g:P:G:SQ:O:a:A:x:X:Y:\"\r\n ret = -1\r\n seed = 83065607\r\n rfd = 4\r\n debug_save = 0\r\n debug_flag = 0\r\n dont_fork_cnt = 0\r\n n_lst = 0x0\r\n p = 0xffffffff \u003cerror: Cannot access memory at address 0xffffffff\u003e\r\n st = {st_dev = 19, st_ino = 15809, st_nlink = 2, st_mode = 16832, st_uid = 109, st_gid = 113, __pad0 = 0, st_rdev = 0, st_size = 40, st_blksize = 4096, st_blocks = 0, st_atim = {tv_sec = 1498523393, tv_nsec = 31134570}, st_mtim = {tv_sec = 1498687722, tv_nsec = 986617235}, st_ctim = {tv_sec = 1498687722, tv_nsec = 986617235}, __glibc_reserved = {0, 0, 0}}\r\n __func__ = \"main\"\r\nroot:~/bt#\r\n```"}],"action":{"name":"View Issue","url":"https://github.com/kamailio/kamailio/issues/1172#issuecomment-311996070"}}}</script>
_______________________________________________
Kamailio (SER) - Development Mailing List
[hidden email]
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [kamailio/kamailio] Kamailio 5.0.2 suddenly stops processing traffic, then generates a core when restarting. (#1172)

Daniel-Constantin Mierla-11
In reply to this post by Daniel-Constantin Mierla-11

The backtraces show that the worker processes are waiting to receive traffic from network, which is the normal situation. When you get it to the state of not processing sip traffic, grab again the backtraces.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

<script type="application/json" data-scope="inboxmarkup">{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/kamailio/kamailio","title":"kamailio/kamailio","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/kamailio/kamailio"}},"updates":{"snippets":[{"icon":"PERSON","message":"@miconda in #1172: The backtraces show that the worker processes are waiting to receive traffic from network, which is the normal situation. When you get it to the state of not processing sip traffic, grab again the backtraces."}],"action":{"name":"View Issue","url":"https://github.com/kamailio/kamailio/issues/1172#issuecomment-312110717"}}}</script>
_______________________________________________
Kamailio (SER) - Development Mailing List
[hidden email]
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [kamailio/kamailio] Kamailio 5.0.2 suddenly stops processing traffic, then generates a core when restarting. (#1172)

Daniel-Constantin Mierla-11
In reply to this post by Daniel-Constantin Mierla-11

Hi Daniel,

Today it happened again. I have more information:

We have currently running 4x Debian 9 servers with Kamailio v5:

KamailioA & KamailioB
KamailioC & KamailioD

Servers A/B only take care of SSL offloading and loadbalance/failover between servers C/D. (using dispatcher with minimal config)

Flow would be:

User <-> KamailioA/B <-> KamailioC/D <-> ...

Today KamailioB stop replying... and I got the backtraces.

From KamailioA: 0 problems
From KamailioB:

root:~/bt2# grep DISPATCHER /var/log/kamailio/kamailio.log
Jul  1 15:03:06 13cn4 sbc[14833]: WARNING: <script>: [DISPATCHER] - Destination down: OPTIONS sip:A.A.A.A:5060 (<null>)
Jul  1 15:03:06 13cn4 sbc[14833]: WARNING: <script>: [DISPATCHER] - Destination down: OPTIONS sip:B.B.B.B:5060 (<null>)
Jul  1 15:39:50 13cn4 sbc[14818]: WARNING: <script>: [DISPATCHER] - Destination up: OPTIONS sip:A.A.A.A:5060
Jul  1 15:39:50 13cn4 sbc[14818]: WARNING: <script>: [DISPATCHER] - Destination up: OPTIONS sip:B.B.B.B:5060
root:~/bt2#

A.A.A.A and B.B.B.B would be KamailioC/D.

Note, this only happened in KamailioB, KamailioA had 0 problems.

When I connected to the server, I did the following:

# for i in `kamctl ps | grep PID | awk '{print $2}' | tr -d ","`; do gdb /usr/sbin/kamailio -ex "bt full" --batch $i >> $i.txt 2>&1; done

Well, I got the backtraces per PID, the thing is, right after running that command, traffic started flowing again (I didn't restart or anything), hence the timestamps:

root:~/bt2# ls -lh
total 356K
-rw-r--r-- 1 root root 1.8K Jul  1 15:39 14816.txt
-rw-r--r-- 1 root root  15K Jul  1 15:39 14817.txt
-rw-r--r-- 1 root root  15K Jul  1 15:39 14818.txt
-rw-r--r-- 1 root root  15K Jul  1 15:39 14819.txt
-rw-r--r-- 1 root root  48K Jul  1 15:39 14820.txt
-rw-r--r-- 1 root root  47K Jul  1 15:39 14821.txt
-rw-r--r-- 1 root root  15K Jul  1 15:39 14822.txt
-rw-r--r-- 1 root root  15K Jul  1 15:39 14823.txt
-rw-r--r-- 1 root root  15K Jul  1 15:39 14824.txt
-rw-r--r-- 1 root root 3.4K Jul  1 15:39 14825.txt
-rw-r--r-- 1 root root 3.4K Jul  1 15:39 14826.txt
-rw-r--r-- 1 root root 3.4K Jul  1 15:39 14827.txt
-rw-r--r-- 1 root root 3.4K Jul  1 15:39 14828.txt
-rw-r--r-- 1 root root 3.4K Jul  1 15:39 14829.txt
-rw-r--r-- 1 root root 3.4K Jul  1 15:39 14830.txt
-rw-r--r-- 1 root root 3.4K Jul  1 15:39 14831.txt
-rw-r--r-- 1 root root 3.4K Jul  1 15:39 14832.txt
-rw-r--r-- 1 root root  42K Jul  1 15:39 14833.txt
-rw-r--r-- 1 root root 1.8K Jul  1 15:39 14834.txt
-rw-r--r-- 1 root root 2.2K Jul  1 15:39 14835.txt
-rw-r--r-- 1 root root 4.0K Jul  1 15:39 14836.txt
-rw-r--r-- 1 root root 2.9K Jul  1 15:39 14837.txt
-rw-r--r-- 1 root root 2.8K Jul  1 15:39 14838.txt
-rw-r--r-- 1 root root 4.7K Jul  1 15:39 14839.txt
-rw-r--r-- 1 root root 2.7K Jul  1 15:39 14840.txt
-rw-r--r-- 1 root root 2.7K Jul  1 15:39 14841.txt
-rw-r--r-- 1 root root 2.7K Jul  1 15:39 14842.txt
-rw-r--r-- 1 root root 7.5K Jul  1 15:39 14843.txt
-rw-r--r-- 1 root root 2.7K Jul  1 15:39 14844.txt
-rw-r--r-- 1 root root 2.7K Jul  1 15:39 14845.txt
-rw-r--r-- 1 root root 7.5K Jul  1 15:39 14846.txt
-rw-r--r-- 1 root root 2.7K Jul  1 15:39 14847.txt
-rw-r--r-- 1 root root 7.5K Jul  1 15:39 14848.txt
-rw-r--r-- 1 root root 4.2K Jul  1 15:39 14849.txt
root:~/bt2#

backtrace_20170701_1539.tar.gz

I have tried to look at the backtraces, but to me they seem ok (kind of like the previous ones, where Kamailio is just waiting for new requests).

So at this point my assumption is that something is triggering dispatcher to see both nodes down and therefor stops processing traffic, when dispatcher sees the nodes up again, all starts working.

Let me know what you think Daniel and how I can investigate this further.

I also have tcpdump captures at the time, I will check to see if OPTIONS are actually being sent out or not to try to narrow this a little more.

Thanks!
Joel.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

<script type="application/json" data-scope="inboxmarkup">{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/kamailio/kamailio","title":"kamailio/kamailio","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/kamailio/kamailio"}},"updates":{"snippets":[{"icon":"PERSON","message":"@joelsdc in #1172: Hi Daniel, \r\n\r\nToday it happened again. I have more information:\r\n\r\nWe have currently running 4x Debian 9 servers with Kamailio v5:\r\n\r\nKamailioA \u0026 KamailioB\r\nKamailioC \u0026 KamailioD\r\n\r\nServers A/B only take care of SSL offloading and loadbalance/failover between servers C/D. (using dispatcher with minimal config)\r\n\r\nFlow would be:\r\n\r\nUser \u003c-\u003e KamailioA/B \u003c-\u003e KamailioC/D \u003c-\u003e ...\r\n\r\nToday KamailioB stop replying... and I got the backtraces.\r\n\r\nFrom KamailioA: 0 problems\r\nFrom KamailioB: \r\n\r\n```\r\nroot:~/bt2# grep DISPATCHER /var/log/kamailio/kamailio.log\r\nJul 1 15:03:06 13cn4 sbc[14833]: WARNING: \u003cscript\u003e: [DISPATCHER] - Destination down: OPTIONS sip:A.A.A.A:5060 (\u003cnull\u003e)\r\nJul 1 15:03:06 13cn4 sbc[14833]: WARNING: \u003cscript\u003e: [DISPATCHER] - Destination down: OPTIONS sip:B.B.B.B:5060 (\u003cnull\u003e)\r\nJul 1 15:39:50 13cn4 sbc[14818]: WARNING: \u003cscript\u003e: [DISPATCHER] - Destination up: OPTIONS sip:A.A.A.A:5060\r\nJul 1 15:39:50 13cn4 sbc[14818]: WARNING: \u003cscript\u003e: [DISPATCHER] - Destination up: OPTIONS sip:B.B.B.B:5060\r\nroot:~/bt2#\r\n```\r\nA.A.A.A and B.B.B.B would be KamailioC/D.\r\n\r\nNote, this only happened in KamailioB, KamailioA had 0 problems.\r\n\r\n\r\nWhen I connected to the server, I did the following:\r\n\r\n```\r\n# for i in `kamctl ps | grep PID | awk '{print $2}' | tr -d \",\"`; do gdb /usr/sbin/kamailio -ex \"bt full\" --batch $i \u003e\u003e $i.txt 2\u003e\u00261; done\r\n```\r\n\r\nWell, I got the backtraces per PID, the thing is, right after running that command, traffic started flowing again (I didn't restart or anything), hence the timestamps:\r\n\r\n```\r\nroot:~/bt2# ls -lh\r\ntotal 356K\r\n-rw-r--r-- 1 root root 1.8K Jul 1 15:39 14816.txt\r\n-rw-r--r-- 1 root root 15K Jul 1 15:39 14817.txt\r\n-rw-r--r-- 1 root root 15K Jul 1 15:39 14818.txt\r\n-rw-r--r-- 1 root root 15K Jul 1 15:39 14819.txt\r\n-rw-r--r-- 1 root root 48K Jul 1 15:39 14820.txt\r\n-rw-r--r-- 1 root root 47K Jul 1 15:39 14821.txt\r\n-rw-r--r-- 1 root root 15K Jul 1 15:39 14822.txt\r\n-rw-r--r-- 1 root root 15K Jul 1 15:39 14823.txt\r\n-rw-r--r-- 1 root root 15K Jul 1 15:39 14824.txt\r\n-rw-r--r-- 1 root root 3.4K Jul 1 15:39 14825.txt\r\n-rw-r--r-- 1 root root 3.4K Jul 1 15:39 14826.txt\r\n-rw-r--r-- 1 root root 3.4K Jul 1 15:39 14827.txt\r\n-rw-r--r-- 1 root root 3.4K Jul 1 15:39 14828.txt\r\n-rw-r--r-- 1 root root 3.4K Jul 1 15:39 14829.txt\r\n-rw-r--r-- 1 root root 3.4K Jul 1 15:39 14830.txt\r\n-rw-r--r-- 1 root root 3.4K Jul 1 15:39 14831.txt\r\n-rw-r--r-- 1 root root 3.4K Jul 1 15:39 14832.txt\r\n-rw-r--r-- 1 root root 42K Jul 1 15:39 14833.txt\r\n-rw-r--r-- 1 root root 1.8K Jul 1 15:39 14834.txt\r\n-rw-r--r-- 1 root root 2.2K Jul 1 15:39 14835.txt\r\n-rw-r--r-- 1 root root 4.0K Jul 1 15:39 14836.txt\r\n-rw-r--r-- 1 root root 2.9K Jul 1 15:39 14837.txt\r\n-rw-r--r-- 1 root root 2.8K Jul 1 15:39 14838.txt\r\n-rw-r--r-- 1 root root 4.7K Jul 1 15:39 14839.txt\r\n-rw-r--r-- 1 root root 2.7K Jul 1 15:39 14840.txt\r\n-rw-r--r-- 1 root root 2.7K Jul 1 15:39 14841.txt\r\n-rw-r--r-- 1 root root 2.7K Jul 1 15:39 14842.txt\r\n-rw-r--r-- 1 root root 7.5K Jul 1 15:39 14843.txt\r\n-rw-r--r-- 1 root root 2.7K Jul 1 15:39 14844.txt\r\n-rw-r--r-- 1 root root 2.7K Jul 1 15:39 14845.txt\r\n-rw-r--r-- 1 root root 7.5K Jul 1 15:39 14846.txt\r\n-rw-r--r-- 1 root root 2.7K Jul 1 15:39 14847.txt\r\n-rw-r--r-- 1 root root 7.5K Jul 1 15:39 14848.txt\r\n-rw-r--r-- 1 root root 4.2K Jul 1 15:39 14849.txt\r\nroot:~/bt2#\r\n```\r\n[backtrace_20170701_1539.tar.gz](https://github.com/kamailio/kamailio/files/1117494/backtrace_20170701_1539.tar.gz)\r\n\r\n\r\nI have tried to look at the backtraces, but to me they seem ok (kind of like the previous ones, where Kamailio is just waiting for new requests).\r\n\r\nSo at this point my assumption is that something is triggering dispatcher to see both nodes down and therefor stops processing traffic, when dispatcher sees the nodes up again, all starts working.\r\n\r\nLet me know what you think Daniel and how I can investigate this further.\r\n\r\nI also have tcpdump captures at the time, I will check to see if OPTIONS are actually being sent out or not to try to narrow this a little more.\r\n\r\nThanks!\r\nJoel.\r\n\r\n"}],"action":{"name":"View Issue","url":"https://github.com/kamailio/kamailio/issues/1172#issuecomment-312461004"}}}</script>
_______________________________________________
Kamailio (SER) - Development Mailing List
[hidden email]
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [kamailio/kamailio] Kamailio 5.0.2 suddenly stops processing traffic, then generates a core when restarting. (#1172)

Daniel-Constantin Mierla-11
In reply to this post by Daniel-Constantin Mierla-11

Hi again, unfortunately the tcpdump where only capturing the external side, I have restarted tcpdump now capturing internal too so we can see the OPTIONS between the servers.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

<script type="application/json" data-scope="inboxmarkup">{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/kamailio/kamailio","title":"kamailio/kamailio","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/kamailio/kamailio"}},"updates":{"snippets":[{"icon":"PERSON","message":"@joelsdc in #1172: Hi again, unfortunately the tcpdump where only capturing the external side, I have restarted tcpdump now capturing internal too so we can see the OPTIONS between the servers."}],"action":{"name":"View Issue","url":"https://github.com/kamailio/kamailio/issues/1172#issuecomment-312462955"}}}</script>
_______________________________________________
Kamailio (SER) - Development Mailing List
[hidden email]
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [kamailio/kamailio] Kamailio 5.0.2 suddenly stops processing traffic, then generates a core when restarting. (#1172)

Daniel-Constantin Mierla-11
In reply to this post by Daniel-Constantin Mierla-11

Hi Daniel,

More information. I'm attaching the log (with debug=2) between the dispatcher nodes down & up at the time of the problem.

kamailio_sbc_2.log.txt

Some extra stuff:

  1. There is a gap a little bit after the dispatcher sets the nodes down, until a little before the dispatcher sets the nodes up, hope this helps.
  2. We use VoIPmonitor and the sensor receives the traffic from a port mirror on the switch, in the timeframe of the issue, there is no traffic detected from the affected server.
  3. We pull performance metrics from Kamailio every 10 seconds using xhttp and json-rpc modules, during the issue our tool can't pool metric either:
2017-07-01T22:02:15Z E! Error in plugin [inputs.httpjson]: Get http://127.0.0.1/statistics: net/http: timeout awaiting response headers (Client.Timeout exceeded while awaiting headers)
2017-07-01T22:02:25Z E! Error in plugin [inputs.httpjson]: Get http://127.0.0.1/statistics: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2017-07-01T22:02:35Z E! Error in plugin [inputs.httpjson]: Get http://127.0.0.1/statistics: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
[...]
[...]
[...]
2017-07-01T22:39:25Z E! Error in plugin [inputs.httpjson]: Get http://127.0.0.1/statistics: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2017-07-01T22:39:35Z E! Error in plugin [inputs.httpjson]: Get http://127.0.0.1/statistics: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2017-07-01T22:39:45Z E! Error in plugin [inputs.httpjson]: Get http://127.0.0.1/statistics: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

I will leave the tcpdumps enabled anyway in case next time it happens we can get more or better information.

Joel.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

<script type="application/json" data-scope="inboxmarkup">{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/kamailio/kamailio","title":"kamailio/kamailio","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/kamailio/kamailio"}},"updates":{"snippets":[{"icon":"PERSON","message":"@joelsdc in #1172: Hi Daniel, \r\n\r\nMore information. I'm attaching the log (with debug=2) between the dispatcher nodes down \u0026 up at the time of the problem.\r\n\r\n[kamailio_sbc_2.log.txt](https://github.com/kamailio/kamailio/files/1117574/kamailio_sbc_2.log.txt)\r\n\r\n\r\nSome extra stuff:\r\n\r\n1. There is a gap a little bit after the dispatcher sets the nodes down, until a little before the dispatcher sets the nodes up, hope this helps.\r\n2. We use VoIPmonitor and the sensor receives the traffic from a port mirror on the switch, in the timeframe of the issue, there is no traffic detected from the affected server.\r\n3. We pull performance metrics from Kamailio every 10 seconds using xhttp and json-rpc modules, during the issue our tool can't pool metric either:\r\n\r\n```\r\n2017-07-01T22:02:15Z E! Error in plugin [inputs.httpjson]: Get http://127.0.0.1/statistics: net/http: timeout awaiting response headers (Client.Timeout exceeded while awaiting headers)\r\n2017-07-01T22:02:25Z E! Error in plugin [inputs.httpjson]: Get http://127.0.0.1/statistics: net/http: request canceled (Client.Timeout exceeded while awaiting headers)\r\n2017-07-01T22:02:35Z E! Error in plugin [inputs.httpjson]: Get http://127.0.0.1/statistics: net/http: request canceled (Client.Timeout exceeded while awaiting headers)\r\n[...]\r\n[...]\r\n[...]\r\n2017-07-01T22:39:25Z E! Error in plugin [inputs.httpjson]: Get http://127.0.0.1/statistics: net/http: request canceled (Client.Timeout exceeded while awaiting headers)\r\n2017-07-01T22:39:35Z E! Error in plugin [inputs.httpjson]: Get http://127.0.0.1/statistics: net/http: request canceled (Client.Timeout exceeded while awaiting headers)\r\n2017-07-01T22:39:45Z E! Error in plugin [inputs.httpjson]: Get http://127.0.0.1/statistics: net/http: request canceled (Client.Timeout exceeded while awaiting headers)\r\n```\r\n\r\nI will leave the tcpdumps enabled anyway in case next time it happens we can get more or better information.\r\n\r\n\r\nJoel.\r\n\r\n\r\n"}],"action":{"name":"View Issue","url":"https://github.com/kamailio/kamailio/issues/1172#issuecomment-312466536"}}}</script>
_______________________________________________
Kamailio (SER) - Development Mailing List
[hidden email]
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [kamailio/kamailio] Kamailio 5.0.2 suddenly stops processing traffic, then generates a core when restarting. (#1172)

Daniel-Constantin Mierla-11
In reply to this post by Daniel-Constantin Mierla-11

Hi,

It happened again, I checked the captures and this is what is happening during the issue:

From KamailioB to KamailioC/D

  1. Kamailio B is sending OPTIONS to KamailioC/D.
  2. KamailioC/D are replying 200.
  3. Kamailio B is not acting to those 200 and has the nodes DOWN (probably because it is in that "hung" state).

At the same time:

From KamailioC/D to KamailioB

  1. KamailioC/D are sending OPTIONS to KamailioB.
  2. KamailioB is NOT replying anything.

This time I had to restart Kamailio and it started working right away again.

backtrace_20170701_2001.tar.gz

Sorry for so many posts, but I update this as things happen.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

<script type="application/json" data-scope="inboxmarkup">{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/kamailio/kamailio","title":"kamailio/kamailio","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/kamailio/kamailio"}},"updates":{"snippets":[{"icon":"PERSON","message":"@joelsdc in #1172: Hi, \r\n\r\nIt happened again, I checked the captures and this is what is happening during the issue:\r\n\r\nFrom KamailioB to KamailioC/D\r\n\r\n1. Kamailio B is sending OPTIONS to KamailioC/D.\r\n2. KamailioC/D are replying 200.\r\n3. Kamailio B is not acting to those 200 and has the nodes DOWN (probably because it is in that \"hung\" state).\r\n\r\nAt the same time:\r\n\r\nFrom KamailioC/D to KamailioB\r\n\r\n1. KamailioC/D are sending OPTIONS to KamailioB.\r\n2. KamailioB is NOT replying anything.\r\n\r\nThis time I had to restart Kamailio and it started working right away again.\r\n\r\n[backtrace_20170701_2001.tar.gz](https://github.com/kamailio/kamailio/files/1117612/backtrace_20170701_2001.tar.gz)\r\n\r\nSorry for so many posts, but I update this as things happen."}],"action":{"name":"View Issue","url":"https://github.com/kamailio/kamailio/issues/1172#issuecomment-312470108"}}}</script>
_______________________________________________
Kamailio (SER) - Development Mailing List
[hidden email]
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [kamailio/kamailio] Kamailio 5.0.2 suddenly stops processing traffic, then generates a core when restarting. (#1172)

Daniel-Constantin Mierla-11
In reply to this post by Daniel-Constantin Mierla-11

It looks like something blocked on tls processing. The process 14848 is doing:

#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
No locals.
#1  0x00007f4fdee39c50 in __GI___pthread_rwlock_unlock (rwlock=0x7f4dde916288) at pthread_rwlock_unlock.c:42
        futex_shared = 0
#2  0x00007f4fdf207f09 in CRYPTO_THREAD_unlock () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
No symbol table info available.
#3  0x00007f4fdf1d6cb9 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
No symbol table info available.
#4  0x00007f4fdf5216a0 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
No symbol table info available.
#5  0x00007f4fdf5152cf in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
No symbol table info available.
#6  0x00007f4fdf50d9f1 in SSL_do_handshake () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
No symbol table info available.
#7  0x00007f4fdf77d73c in tls_accept (c=0x7f4ddf75e2f0, error=0x7ffd3522dc90) at tls_server.c:411

The 14847 and many others:

#0  0x00007f4fdee39450 in futex_wait (private=<optimized out>, expected=1446580, futex_word=0x7f4dde916294) at ../sysdeps/unix/sysv/linux/futex-internal.h:61
        __ret = -512
        err = <optimized out>
#1  futex_wait_simple (private=<optimized out>, expected=1446580, futex_word=0x7f4dde916294) at ../sysdeps/nptl/futex-internal.h:135
No locals.
#2  __pthread_rwlock_wrlock_slow (rwlock=0x7f4dde916288) at pthread_rwlock_wrlock.c:67
        waitval = 1446580
        result = 0
        futex_shared = <optimized out>
#3  0x00007f4fdf207ee9 in CRYPTO_THREAD_write_lock () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
No symbol table info available.
#4  0x00007f4fdf1d6658 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
No symbol table info available.
#5  0x00007f4fdf51500f in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
No symbol table info available.
#6  0x00007f4fdf50d9f1 in SSL_do_handshake () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
No symbol table info available.
#7  0x00007f4fdf77d73c in tls_accept (c=0x7f4ddf1993a8, error=0x7ffd3522dc90) at tls_server.c:411

As I assume by function names, one is trying to do unlock, but still waiting for something, while the others try to acquire the lock.

The thing is that openssl (libssl) 1.1.0 don't use anymore the custom lock functions for their internal stuff, being thread safe implicitely. I pushed a patch to master branch, disabling the part where we set the lock/unlock functions for libssl, but that mainly avoids compile warnings, otherwise it doesn't affect how libssl works internally for v1.1.0+.

If that unlock takes too long inside libssl/libcrypto, either is bug there in the libs or somehow the pointer to the lock (or the lock structure content) got corrupted. For the first option, you would have to use a libssl less than 1.1.0 -- maybe you can search on the web and see if there are others reporting issues with locking/deadlocks for openssl 1.1.0+.

The second option could be also the fault of kamailio, if there is a buffer overflow somewhere. But this should pop up as a bug with openssl less than 1.1.0 (in that case our internal lock structure should be overwritten/corrupted) and nobody reported any deadlock/crash related to tls for very long time, although it is used extensively for very long time.

I would try to use first openssl less than 1.1.0 and see if the same situation occurs. If not, then it is likely something related to the new locking system done inside openssl 1.1.0.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

<script type="application/json" data-scope="inboxmarkup">{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/kamailio/kamailio","title":"kamailio/kamailio","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/kamailio/kamailio"}},"updates":{"snippets":[{"icon":"PERSON","message":"@miconda in #1172: It looks like something blocked on tls processing. The process 14848 is doing:\r\n\r\n```\r\n#0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135\r\nNo locals.\r\n#1 0x00007f4fdee39c50 in __GI___pthread_rwlock_unlock (rwlock=0x7f4dde916288) at pthread_rwlock_unlock.c:42\r\n futex_shared = 0\r\n#2 0x00007f4fdf207f09 in CRYPTO_THREAD_unlock () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1\r\nNo symbol table info available.\r\n#3 0x00007f4fdf1d6cb9 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1\r\nNo symbol table info available.\r\n#4 0x00007f4fdf5216a0 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1\r\nNo symbol table info available.\r\n#5 0x00007f4fdf5152cf in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1\r\nNo symbol table info available.\r\n#6 0x00007f4fdf50d9f1 in SSL_do_handshake () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1\r\nNo symbol table info available.\r\n#7 0x00007f4fdf77d73c in tls_accept (c=0x7f4ddf75e2f0, error=0x7ffd3522dc90) at tls_server.c:411\r\n```\r\n\r\nThe 14847 and many others:\r\n\r\n```\r\n#0 0x00007f4fdee39450 in futex_wait (private=\u003coptimized out\u003e, expected=1446580, futex_word=0x7f4dde916294) at ../sysdeps/unix/sysv/linux/futex-internal.h:61\r\n __ret = -512\r\n err = \u003coptimized out\u003e\r\n#1 futex_wait_simple (private=\u003coptimized out\u003e, expected=1446580, futex_word=0x7f4dde916294) at ../sysdeps/nptl/futex-internal.h:135\r\nNo locals.\r\n#2 __pthread_rwlock_wrlock_slow (rwlock=0x7f4dde916288) at pthread_rwlock_wrlock.c:67\r\n waitval = 1446580\r\n result = 0\r\n futex_shared = \u003coptimized out\u003e\r\n#3 0x00007f4fdf207ee9 in CRYPTO_THREAD_write_lock () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1\r\nNo symbol table info available.\r\n#4 0x00007f4fdf1d6658 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1\r\nNo symbol table info available.\r\n#5 0x00007f4fdf51500f in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1\r\nNo symbol table info available.\r\n#6 0x00007f4fdf50d9f1 in SSL_do_handshake () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1\r\nNo symbol table info available.\r\n#7 0x00007f4fdf77d73c in tls_accept (c=0x7f4ddf1993a8, error=0x7ffd3522dc90) at tls_server.c:411\r\n```\r\n\r\nAs I assume by function names, one is trying to do unlock, but still waiting for something, while the others try to acquire the lock.\r\n\r\nThe thing is that openssl (libssl) 1.1.0 don't use anymore the custom lock functions for their internal stuff, being thread safe implicitely. I pushed a patch to master branch, disabling the part where we set the lock/unlock functions for libssl, but that mainly avoids compile warnings, otherwise it doesn't affect how libssl works internally for v1.1.0+.\r\n\r\nIf that unlock takes too long inside libssl/libcrypto, either is bug there in the libs or somehow the pointer to the lock (or the lock structure content) got corrupted. For the first option, you would have to use a libssl less than 1.1.0 -- maybe you can search on the web and see if there are others reporting issues with locking/deadlocks for openssl 1.1.0+.\r\n\r\nThe second option could be also the fault of kamailio, if there is a buffer overflow somewhere. But this should pop up as a bug with openssl less than 1.1.0 (in that case our internal lock structure should be overwritten/corrupted) and nobody reported any deadlock/crash related to tls for very long time, although it is used extensively for very long time.\r\n\r\nI would try to use first openssl less than 1.1.0 and see if the same situation occurs. If not, then it is likely something related to the new locking system done inside openssl 1.1.0.\r\n"}],"action":{"name":"View Issue","url":"https://github.com/kamailio/kamailio/issues/1172#issuecomment-312634272"}}}</script>
_______________________________________________
Kamailio (SER) - Development Mailing List
[hidden email]
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [kamailio/kamailio] Kamailio 5.0.2 suddenly stops processing traffic, then generates a core when restarting. (#1172)

Daniel-Constantin Mierla-11
In reply to this post by Daniel-Constantin Mierla-11

Hi Daniel, thanks for helping troubleshoot the issue.

I have Kamailio v4.4 with Debian 8 working with TLS on other servers and we have had 0 problems (OpenSSL v1.0)

If I install the kamailio 5 nightly deb build from tomorrow, would your latest patch be applied? Or should I manually compile from master branch?

Also, for some reason I had a feeling it could be OpenSSL 1.1, Debian 9 seems to have OpenSSL 1.0 also:

root:~# dpkg -l | grep libssl
ii  libssl1.0.2:amd64                1.0.2l-2                       amd64        Secure Sockets Layer toolkit - shared libraries
ii  libssl1.1:amd64                  1.1.0f-3                       amd64        Secure Sockets Layer toolkit - shared libraries
root:~#

Can I build Kamailio deb package from the sources, compiling with the ssl v1.0 version instead of v1.1, and then install that resulting deb package and wait and see if the problem reproduces?

Let me know what you think.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

<script type="application/json" data-scope="inboxmarkup">{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/kamailio/kamailio","title":"kamailio/kamailio","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/kamailio/kamailio"}},"updates":{"snippets":[{"icon":"PERSON","message":"@joelsdc in #1172: Hi Daniel, thanks for helping troubleshoot the issue.\r\n\r\nI have Kamailio v4.4 with Debian 8 working with TLS on other servers and we have had 0 problems (OpenSSL v1.0)\r\n\r\nIf I install the kamailio 5 nightly deb build from tomorrow, would your latest patch be applied? Or should I manually compile from master branch?\r\n\r\nAlso, for some reason I had a feeling it could be OpenSSL 1.1, Debian 9 seems to have OpenSSL 1.0 also:\r\n\r\n```\r\nroot:~# dpkg -l | grep libssl\r\nii libssl1.0.2:amd64 1.0.2l-2 amd64 Secure Sockets Layer toolkit - shared libraries\r\nii libssl1.1:amd64 1.1.0f-3 amd64 Secure Sockets Layer toolkit - shared libraries\r\nroot:~#\r\n```\r\n\r\nCan I build Kamailio deb package from the sources, compiling with the ssl v1.0 version instead of v1.1, and then install that resulting deb package and wait and see if the problem reproduces?\r\n\r\nLet me know what you think.\r\n\r\n"}],"action":{"name":"View Issue","url":"https://github.com/kamailio/kamailio/issues/1172#issuecomment-312690313"}}}</script>
_______________________________________________
Kamailio (SER) - Development Mailing List
[hidden email]
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [kamailio/kamailio] Kamailio 5.0.2 suddenly stops processing traffic, then generates a core when restarting. (#1172)

Daniel-Constantin Mierla-11
In reply to this post by Daniel-Constantin Mierla-11

Yes, the patches to tls module will be in nightly builds. But again, they just skipped setting the locking functions for libssl/libcrypto, being not used and the compiler throwing unused functions warnings.

Probably you can link against libssl 1.0, but you need to update the deb specs and/or Makefile for tls module.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

<script type="application/json" data-scope="inboxmarkup">{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/kamailio/kamailio","title":"kamailio/kamailio","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/kamailio/kamailio"}},"updates":{"snippets":[{"icon":"PERSON","message":"@miconda in #1172: Yes, the patches to tls module will be in nightly builds. But again, they just skipped setting the locking functions for libssl/libcrypto, being not used and the compiler throwing unused functions warnings.\r\n\r\nProbably you can link against libssl 1.0, but you need to update the deb specs and/or Makefile for tls module."}],"action":{"name":"View Issue","url":"https://github.com/kamailio/kamailio/issues/1172#issuecomment-312818646"}}}</script>
_______________________________________________
Kamailio (SER) - Development Mailing List
[hidden email]
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [kamailio/kamailio] Kamailio 5.0.2 suddenly stops processing traffic, then generates a core when restarting. (#1172)

Daniel-Constantin Mierla-11
In reply to this post by Daniel-Constantin Mierla-11

Hi Daniel, just an update... I have done the following:

1 - Download src package for v5.0.2: # apt-src install kamailio.
2- Edit the control file: ./kamailio-5.0.2+stretch/debian/control.
3- In the "Build-Depends" section, replace libssl-dev with libssl1.0-dev.
4- Build: dpkg-buildpackage -b

I checked the resulting TLS module package:

# dpkg -I kamailio-tls-modules_5.0.2+stretch_amd64.deb
 new debian package, version 2.0.
 size 362518 bytes: control archive=1025 bytes.
      22 bytes,     1 lines      conffiles
     757 bytes,    18 lines      control
     842 bytes,    10 lines      md5sums
 Package: kamailio-tls-modules
 Source: kamailio
 Version: 5.0.2+stretch
 Architecture: amd64
 Maintainer: Debian VoIP Team <[hidden email]>
 Installed-Size: 995
 Depends: kamailio (= 5.0.2+stretch), libc6 (>= 2.14), libcurl3 (>= 7.16.2), libssl1.0.2 (>= 1.0.2d)
 Section: net
 Priority: optional
 Multi-Arch: same
 Homepage: http://www.kamailio.org/
 Description: TLS support for the Kamailio SIP server (authentication, transport)
  Kamailio is a very fast and flexible SIP (RFC3261)
  server. Written entirely in C, Kamailio can handle thousands calls
  per second even on low-budget hardware.
  .
  This package provides TLS support for encrypted and authenticated
  SIP connections as well as generic TLS support for many Kamailio modules.
root@debian9build:/opt/kamailio-5.0.2-deb#

So it seems it is correctly linked to libssl1.0.2.

I will install this version of Kamailio on one of the SBCs tonight and keep it in close observation. I will update here as soon as I have more info.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

<script type="application/json" data-scope="inboxmarkup">{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/kamailio/kamailio","title":"kamailio/kamailio","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/kamailio/kamailio"}},"updates":{"snippets":[{"icon":"PERSON","message":"@joelsdc in #1172: Hi Daniel, just an update... I have done the following:\r\n\r\n1 - Download src package for v5.0.2: `# apt-src install kamailio`.\r\n2- Edit the control file: `./kamailio-5.0.2+stretch/debian/control`.\r\n3- In the \"Build-Depends\" section, replace `libssl-dev` with `libssl1.0-dev`.\r\n4- Build: `dpkg-buildpackage -b`\r\n\r\nI checked the resulting TLS module package:\r\n\r\n```\r\n# dpkg -I kamailio-tls-modules_5.0.2+stretch_amd64.deb\r\n new debian package, version 2.0.\r\n size 362518 bytes: control archive=1025 bytes.\r\n 22 bytes, 1 lines conffiles\r\n 757 bytes, 18 lines control\r\n 842 bytes, 10 lines md5sums\r\n Package: kamailio-tls-modules\r\n Source: kamailio\r\n Version: 5.0.2+stretch\r\n Architecture: amd64\r\n Maintainer: Debian VoIP Team \u003cpkg-voip-maintainers@lists.alioth.debian.org\u003e\r\n Installed-Size: 995\r\n Depends: kamailio (= 5.0.2+stretch), libc6 (\u003e= 2.14), libcurl3 (\u003e= 7.16.2), libssl1.0.2 (\u003e= 1.0.2d)\r\n Section: net\r\n Priority: optional\r\n Multi-Arch: same\r\n Homepage: http://www.kamailio.org/\r\n Description: TLS support for the Kamailio SIP server (authentication, transport)\r\n Kamailio is a very fast and flexible SIP (RFC3261)\r\n server. Written entirely in C, Kamailio can handle thousands calls\r\n per second even on low-budget hardware.\r\n .\r\n This package provides TLS support for encrypted and authenticated\r\n SIP connections as well as generic TLS support for many Kamailio modules.\r\nroot@debian9build:/opt/kamailio-5.0.2-deb#\r\n```\r\n\r\nSo it seems it is correctly linked to libssl1.0.2.\r\n\r\nI will install this version of Kamailio on one of the SBCs tonight and keep it in close observation. I will update here as soon as I have more info."}],"action":{"name":"View Issue","url":"https://github.com/kamailio/kamailio/issues/1172#issuecomment-317508340"}}}</script>
_______________________________________________
Kamailio (SER) - Development Mailing List
[hidden email]
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [kamailio/kamailio] Kamailio 5.0.2 suddenly stops processing traffic, then generates a core when restarting. (#1172)

Daniel-Constantin Mierla-11
In reply to this post by Daniel-Constantin Mierla-11

Hi Daniel,

Just an update, we have just restarted the SBC with the new Kamailio build using OpenSSL v1.0:

Jul 25 00:59:09 voicesbc1 sbc[1112]: INFO: tls [tls_init.c:639]: init_tls_h(): tls: _init_tls_h:  compiled  with  openssl  version "OpenSSL 1.0.2l  25 May 2017" (0x100020cf), kerberos support: off, compression: on

Now to wait and see if it reproduces. We have 2 SBCs, both Debian9+Kamailio5.0.2:

voicesbc1: Kamailio compiled with OpenSSL v1.0
voicesbc2: Kamailio compiled with OpenSSL v1.1

Until now, we have had to restart kamailio every time the issue appeared on any/both servers, let's see what happens in the next couple days.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

<script type="application/json" data-scope="inboxmarkup">{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/kamailio/kamailio","title":"kamailio/kamailio","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/kamailio/kamailio"}},"updates":{"snippets":[{"icon":"PERSON","message":"@joelsdc in #1172: Hi Daniel, \r\n\r\nJust an update, we have just restarted the SBC with the new Kamailio build using OpenSSL v1.0:\r\n\r\n```\r\nJul 25 00:59:09 voicesbc1 sbc[1112]: INFO: tls [tls_init.c:639]: init_tls_h(): tls: _init_tls_h: compiled with openssl version \"OpenSSL 1.0.2l 25 May 2017\" (0x100020cf), kerberos support: off, compression: on\r\n```\r\n\r\nNow to wait and see if it reproduces. We have 2 SBCs, both Debian9+Kamailio5.0.2:\r\n\r\nvoicesbc1: Kamailio compiled with OpenSSL v1.0\r\nvoicesbc2: Kamailio compiled with OpenSSL v1.1\r\n\r\nUntil now, we have had to restart kamailio every time the issue appeared on any/both servers, let's see what happens in the next couple days.\r\n"}],"action":{"name":"View Issue","url":"https://github.com/kamailio/kamailio/issues/1172#issuecomment-317664494"}}}</script>
_______________________________________________
Kamailio (SER) - Development Mailing List
[hidden email]
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
12
Loading...