crash after using async_route()

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

crash after using async_route()

Vitaliy Aleksandrov
Hello kamailio list,

Recently found a problem in my configuration that uses async_route() functionality.
It crashes after several calls when wait_timer fires.

#0  0xb74a8556 in raise () from /lib/libc.so.6
#1  0xb74a9d78 in abort () from /lib/libc.so.6
#2  0x08293ae2 in qm_free (qmp=0xad65d000, p=0x3d64692d, file=0xb6216a16 "tm: h_table.c", func=0xb621663c <__FUNCTION__.18751> "free_cell_helper", line=187, mname=0xb621664d "tm") at core/mem/q_malloc.c:471
#3  0xb613f103 in free_cell_helper (dead_cell=0xae2cd210, silent=0, fname=0xb6239ea5 "timer.c", fline=655) at h_table.c:187
#4  0xb61e7758 in wait_handler (ti=557858937, wait_tl=0xae2cd258, data=0xae2cd210) at timer.c:655
#5  0x0826a2cc in timer_list_expire (t=557858937, h=0xad6b9668, slow_l=0xad6ba144, slow_mark=312) at core/timer.c:874
#6  0x08267cb1 in timer_handler () at core/timer.c:939
#7  0x0826a4d3 in timer_main () at core/timer.c:978
#8  0x08069575 in main_loop () at main.c:1721
#9  0x080707ca in main (argc=11, argv=0xbf85f044) at main.c:2723

When crash happens, kamailio prints the following message:
Sep  4 16:15:38 [18938]: : <core> [core/mem/q_malloc.c:469]: qm_free(): BUG: qm_free: bad pointer 0x70707553 (out of memory block!) called from tm: h_table.c: free_cell_helper(187) - aborting

Also had a few crashes in retransmission_handler():

#0  0xb750b556 in raise () from /lib/libc.so.6
#1  0xb750cd78 in abort () from /lib/libc.so.6
#2  0xb6249b5a in retransmission_handler (r_buf=0xae036674) at timer.c:367
#3  0xb6247558 in retr_buf_handler (ticks=1234464444, tl=0xae036688, p=0x1f40) at timer.c:594
#4  0x0826a2cc in timer_list_expire (t=1234464444, h=0xad71c668, slow_l=0xad71cd44, slow_mark=2232) at core/timer.c:874
#5  0x08267cb1 in timer_handler () at core/timer.c:939
#6  0x0826a4d3 in timer_main () at core/timer.c:978
#7  0x08069575 in main_loop () at main.c:1721
#8  0x080707ca in main (argc=11, argv=0xbff64134) at main.c:2723

ERROR: tm [timer.c:366]: retransmission_handler(): transaction 0xae0365e0 scheduled for deletion and called from RETR timer (flags 6d)

Both timers fired for an INVITE transaction that was previously suspended by async_route(), then resumed, sent out and received a 4xx reply (407).

This configuration worked fine with kamailio 4.2.x and problem appeared after upgrading to 5.0.2.

Trying to figure out how to narrow down the problem. Any input is appreciated.

_______________________________________________
Kamailio (SER) - Users Mailing List
[hidden email]
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
Reply | Threaded
Open this post in threaded view
|

Re: crash after using async_route()

Daniel-Constantin Mierla-6

Hello,

does it happen to have the pcap (or ngrep) with the sip traffic for the call? It will be useful to see the flow with requests/replies/retransmissions and their timestamps...

Is this version the snapshot of 5.0.2 release or a build from branch 5.0?

Cheers,
Daniel


On 05.09.17 10:01, Vitaliy Aleksandrov wrote:
Hello kamailio list,

Recently found a problem in my configuration that uses async_route() functionality.
It crashes after several calls when wait_timer fires.

#0  0xb74a8556 in raise () from /lib/libc.so.6
#1  0xb74a9d78 in abort () from /lib/libc.so.6
#2  0x08293ae2 in qm_free (qmp=0xad65d000, p=0x3d64692d, file=0xb6216a16 "tm: h_table.c", func=0xb621663c <__FUNCTION__.18751> "free_cell_helper", line=187, mname=0xb621664d "tm") at core/mem/q_malloc.c:471
#3  0xb613f103 in free_cell_helper (dead_cell=0xae2cd210, silent=0, fname=0xb6239ea5 "timer.c", fline=655) at h_table.c:187
#4  0xb61e7758 in wait_handler (ti=557858937, wait_tl=0xae2cd258, data=0xae2cd210) at timer.c:655
#5  0x0826a2cc in timer_list_expire (t=557858937, h=0xad6b9668, slow_l=0xad6ba144, slow_mark=312) at core/timer.c:874
#6  0x08267cb1 in timer_handler () at core/timer.c:939
#7  0x0826a4d3 in timer_main () at core/timer.c:978
#8  0x08069575 in main_loop () at main.c:1721
#9  0x080707ca in main (argc=11, argv=0xbf85f044) at main.c:2723

When crash happens, kamailio prints the following message:
Sep  4 16:15:38 [18938]: : <core> [core/mem/q_malloc.c:469]: qm_free(): BUG: qm_free: bad pointer 0x70707553 (out of memory block!) called from tm: h_table.c: free_cell_helper(187) - aborting

Also had a few crashes in retransmission_handler():

#0  0xb750b556 in raise () from /lib/libc.so.6
#1  0xb750cd78 in abort () from /lib/libc.so.6
#2  0xb6249b5a in retransmission_handler (r_buf=0xae036674) at timer.c:367
#3  0xb6247558 in retr_buf_handler (ticks=1234464444, tl=0xae036688, p=0x1f40) at timer.c:594
#4  0x0826a2cc in timer_list_expire (t=1234464444, h=0xad71c668, slow_l=0xad71cd44, slow_mark=2232) at core/timer.c:874
#5  0x08267cb1 in timer_handler () at core/timer.c:939
#6  0x0826a4d3 in timer_main () at core/timer.c:978
#7  0x08069575 in main_loop () at main.c:1721
#8  0x080707ca in main (argc=11, argv=0xbff64134) at main.c:2723

ERROR: tm [timer.c:366]: retransmission_handler(): transaction 0xae0365e0 scheduled for deletion and called from RETR timer (flags 6d)

Both timers fired for an INVITE transaction that was previously suspended by async_route(), then resumed, sent out and received a 4xx reply (407).

This configuration worked fine with kamailio 4.2.x and problem appeared after upgrading to 5.0.2.

Trying to figure out how to narrow down the problem. Any input is appreciated.


_______________________________________________
Kamailio (SER) - Users Mailing List
[hidden email]
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users

-- 
Daniel-Constantin Mierla
www.twitter.com/miconda -- www.linkedin.com/in/miconda
Kamailio Advanced Training - www.asipto.com
Kamailio World Conference - www.kamailioworld.com

_______________________________________________
Kamailio (SER) - Users Mailing List
[hidden email]
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
Reply | Threaded
Open this post in threaded view
|

Re: crash after using async_route()

Daniel-Constantin Mierla-6

I think I caught the issue and fixed with commit b672d8ef63715cf816390a05ce7a441377c3e468 in master branch.

It was caused by not resetting the T_ASYNC_CONTINUE flag after t_continue(), which caused other parts of code to not reset the reply field of any branch. The reply field could have been set by another process, so at the time of destroying the transaction, the pointer could have been to memory zone of another process, so access it caused the crash.

Along with this fix, I added few other safety checks in my way to investigate the issue.

Can you cherry pick this commit and test in branch 5.0? I want to be sure there is no obvious side effect before porting it.

Cheers,
Daniel


On 05.09.17 11:02, Daniel-Constantin Mierla wrote:

Hello,

does it happen to have the pcap (or ngrep) with the sip traffic for the call? It will be useful to see the flow with requests/replies/retransmissions and their timestamps...

Is this version the snapshot of 5.0.2 release or a build from branch 5.0?

Cheers,
Daniel


On 05.09.17 10:01, Vitaliy Aleksandrov wrote:
Hello kamailio list,

Recently found a problem in my configuration that uses async_route() functionality.
It crashes after several calls when wait_timer fires.

#0  0xb74a8556 in raise () from /lib/libc.so.6
#1  0xb74a9d78 in abort () from /lib/libc.so.6
#2  0x08293ae2 in qm_free (qmp=0xad65d000, p=0x3d64692d, file=0xb6216a16 "tm: h_table.c", func=0xb621663c <__FUNCTION__.18751> "free_cell_helper", line=187, mname=0xb621664d "tm") at core/mem/q_malloc.c:471
#3  0xb613f103 in free_cell_helper (dead_cell=0xae2cd210, silent=0, fname=0xb6239ea5 "timer.c", fline=655) at h_table.c:187
#4  0xb61e7758 in wait_handler (ti=557858937, wait_tl=0xae2cd258, data=0xae2cd210) at timer.c:655
#5  0x0826a2cc in timer_list_expire (t=557858937, h=0xad6b9668, slow_l=0xad6ba144, slow_mark=312) at core/timer.c:874
#6  0x08267cb1 in timer_handler () at core/timer.c:939
#7  0x0826a4d3 in timer_main () at core/timer.c:978
#8  0x08069575 in main_loop () at main.c:1721
#9  0x080707ca in main (argc=11, argv=0xbf85f044) at main.c:2723

When crash happens, kamailio prints the following message:
Sep  4 16:15:38 [18938]: : <core> [core/mem/q_malloc.c:469]: qm_free(): BUG: qm_free: bad pointer 0x70707553 (out of memory block!) called from tm: h_table.c: free_cell_helper(187) - aborting

Also had a few crashes in retransmission_handler():

#0  0xb750b556 in raise () from /lib/libc.so.6
#1  0xb750cd78 in abort () from /lib/libc.so.6
#2  0xb6249b5a in retransmission_handler (r_buf=0xae036674) at timer.c:367
#3  0xb6247558 in retr_buf_handler (ticks=1234464444, tl=0xae036688, p=0x1f40) at timer.c:594
#4  0x0826a2cc in timer_list_expire (t=1234464444, h=0xad71c668, slow_l=0xad71cd44, slow_mark=2232) at core/timer.c:874
#5  0x08267cb1 in timer_handler () at core/timer.c:939
#6  0x0826a4d3 in timer_main () at core/timer.c:978
#7  0x08069575 in main_loop () at main.c:1721
#8  0x080707ca in main (argc=11, argv=0xbff64134) at main.c:2723

ERROR: tm [timer.c:366]: retransmission_handler(): transaction 0xae0365e0 scheduled for deletion and called from RETR timer (flags 6d)

Both timers fired for an INVITE transaction that was previously suspended by async_route(), then resumed, sent out and received a 4xx reply (407).

This configuration worked fine with kamailio 4.2.x and problem appeared after upgrading to 5.0.2.

Trying to figure out how to narrow down the problem. Any input is appreciated.


_______________________________________________
Kamailio (SER) - Users Mailing List
[hidden email]
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users

-- 
Daniel-Constantin Mierla
www.twitter.com/miconda -- www.linkedin.com/in/miconda
Kamailio Advanced Training - www.asipto.com
Kamailio World Conference - www.kamailioworld.com

-- 
Daniel-Constantin Mierla
www.twitter.com/miconda -- www.linkedin.com/in/miconda
Kamailio Advanced Training - www.asipto.com
Kamailio World Conference - www.kamailioworld.com

_______________________________________________
Kamailio (SER) - Users Mailing List
[hidden email]
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
Reply | Threaded
Open this post in threaded view
|

Re: crash after using async_route()

Vitaliy Aleksandrov
Thanks for the quick fix.

Installed the latest 5.0 branch with the mentioned patch and had no crashes so far.
Will do an additional testing and inform if find any issues.

On Wed, Sep 6, 2017 at 12:25 PM, Daniel-Constantin Mierla <[hidden email]> wrote:

I think I caught the issue and fixed with commit b672d8ef63715cf816390a05ce7a441377c3e468 in master branch.

It was caused by not resetting the T_ASYNC_CONTINUE flag after t_continue(), which caused other parts of code to not reset the reply field of any branch. The reply field could have been set by another process, so at the time of destroying the transaction, the pointer could have been to memory zone of another process, so access it caused the crash.

Along with this fix, I added few other safety checks in my way to investigate the issue.

Can you cherry pick this commit and test in branch 5.0? I want to be sure there is no obvious side effect before porting it.

Cheers,
Daniel


On 05.09.17 11:02, Daniel-Constantin Mierla wrote:

Hello,

does it happen to have the pcap (or ngrep) with the sip traffic for the call? It will be useful to see the flow with requests/replies/retransmissions and their timestamps...

Is this version the snapshot of 5.0.2 release or a build from branch 5.0?

Cheers,
Daniel


On 05.09.17 10:01, Vitaliy Aleksandrov wrote:
Hello kamailio list,

Recently found a problem in my configuration that uses async_route() functionality.
It crashes after several calls when wait_timer fires.

#0  0xb74a8556 in raise () from /lib/libc.so.6
#1  0xb74a9d78 in abort () from /lib/libc.so.6
#2  0x08293ae2 in qm_free (qmp=0xad65d000, p=0x3d64692d, file=0xb6216a16 "tm: h_table.c", func=0xb621663c <__FUNCTION__.18751> "free_cell_helper", line=187, mname=0xb621664d "tm") at core/mem/q_malloc.c:471
#3  0xb613f103 in free_cell_helper (dead_cell=0xae2cd210, silent=0, fname=0xb6239ea5 "timer.c", fline=655) at h_table.c:187
#4  0xb61e7758 in wait_handler (ti=557858937, wait_tl=0xae2cd258, data=0xae2cd210) at timer.c:655
#5  0x0826a2cc in timer_list_expire (t=557858937, h=0xad6b9668, slow_l=0xad6ba144, slow_mark=312) at core/timer.c:874
#6  0x08267cb1 in timer_handler () at core/timer.c:939
#7  0x0826a4d3 in timer_main () at core/timer.c:978
#8  0x08069575 in main_loop () at main.c:1721
#9  0x080707ca in main (argc=11, argv=0xbf85f044) at main.c:2723

When crash happens, kamailio prints the following message:
Sep  4 16:15:38 [18938]: : <core> [core/mem/q_malloc.c:469]: qm_free(): BUG: qm_free: bad pointer 0x70707553 (out of memory block!) called from tm: h_table.c: free_cell_helper(187) - aborting

Also had a few crashes in retransmission_handler():

#0  0xb750b556 in raise () from /lib/libc.so.6
#1  0xb750cd78 in abort () from /lib/libc.so.6
#2  0xb6249b5a in retransmission_handler (r_buf=0xae036674) at timer.c:367
#3  0xb6247558 in retr_buf_handler (ticks=1234464444, tl=0xae036688, p=0x1f40) at timer.c:594
#4  0x0826a2cc in timer_list_expire (t=1234464444, h=0xad71c668, slow_l=0xad71cd44, slow_mark=2232) at core/timer.c:874
#5  0x08267cb1 in timer_handler () at core/timer.c:939
#6  0x0826a4d3 in timer_main () at core/timer.c:978
#7  0x08069575 in main_loop () at main.c:1721
#8  0x080707ca in main (argc=11, argv=0xbff64134) at main.c:2723

ERROR: tm [timer.c:366]: retransmission_handler(): transaction 0xae0365e0 scheduled for deletion and called from RETR timer (flags 6d)

Both timers fired for an INVITE transaction that was previously suspended by async_route(), then resumed, sent out and received a 4xx reply (407).

This configuration worked fine with kamailio 4.2.x and problem appeared after upgrading to 5.0.2.

Trying to figure out how to narrow down the problem. Any input is appreciated.


_______________________________________________
Kamailio (SER) - Users Mailing List
[hidden email]
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users

-- 
Daniel-Constantin Mierla
www.twitter.com/miconda -- www.linkedin.com/in/miconda
Kamailio Advanced Training - www.asipto.com
Kamailio World Conference - www.kamailioworld.com

-- 
Daniel-Constantin Mierla
www.twitter.com/miconda -- www.linkedin.com/in/miconda
Kamailio Advanced Training - www.asipto.com
Kamailio World Conference - www.kamailioworld.com


_______________________________________________
Kamailio (SER) - Users Mailing List
[hidden email]
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
Reply | Threaded
Open this post in threaded view
|

Re: crash after using async_route()

Daniel-Constantin Mierla-6

OK, I will wait a bit and then backport.

Thanks for testing and assisting with troubleshooting.

Daniel


On 06.09.17 14:29, Vitaliy Aleksandrov wrote:
Thanks for the quick fix.

Installed the latest 5.0 branch with the mentioned patch and had no crashes so far.
Will do an additional testing and inform if find any issues.

On Wed, Sep 6, 2017 at 12:25 PM, Daniel-Constantin Mierla <[hidden email]> wrote:

I think I caught the issue and fixed with commit b672d8ef63715cf816390a05ce7a441377c3e468 in master branch.

It was caused by not resetting the T_ASYNC_CONTINUE flag after t_continue(), which caused other parts of code to not reset the reply field of any branch. The reply field could have been set by another process, so at the time of destroying the transaction, the pointer could have been to memory zone of another process, so access it caused the crash.

Along with this fix, I added few other safety checks in my way to investigate the issue.

Can you cherry pick this commit and test in branch 5.0? I want to be sure there is no obvious side effect before porting it.

Cheers,
Daniel


On 05.09.17 11:02, Daniel-Constantin Mierla wrote:

Hello,

does it happen to have the pcap (or ngrep) with the sip traffic for the call? It will be useful to see the flow with requests/replies/retransmissions and their timestamps...

Is this version the snapshot of 5.0.2 release or a build from branch 5.0?

Cheers,
Daniel


On 05.09.17 10:01, Vitaliy Aleksandrov wrote:
Hello kamailio list,

Recently found a problem in my configuration that uses async_route() functionality.
It crashes after several calls when wait_timer fires.

#0  0xb74a8556 in raise () from /lib/libc.so.6
#1  0xb74a9d78 in abort () from /lib/libc.so.6
#2  0x08293ae2 in qm_free (qmp=0xad65d000, p=0x3d64692d, file=0xb6216a16 "tm: h_table.c", func=0xb621663c <__FUNCTION__.18751> "free_cell_helper", line=187, mname=0xb621664d "tm") at core/mem/q_malloc.c:471
#3  0xb613f103 in free_cell_helper (dead_cell=0xae2cd210, silent=0, fname=0xb6239ea5 "timer.c", fline=655) at h_table.c:187
#4  0xb61e7758 in wait_handler (ti=557858937, wait_tl=0xae2cd258, data=0xae2cd210) at timer.c:655
#5  0x0826a2cc in timer_list_expire (t=557858937, h=0xad6b9668, slow_l=0xad6ba144, slow_mark=312) at core/timer.c:874
#6  0x08267cb1 in timer_handler () at core/timer.c:939
#7  0x0826a4d3 in timer_main () at core/timer.c:978
#8  0x08069575 in main_loop () at main.c:1721
#9  0x080707ca in main (argc=11, argv=0xbf85f044) at main.c:2723

When crash happens, kamailio prints the following message:
Sep  4 16:15:38 [18938]: : <core> [core/mem/q_malloc.c:469]: qm_free(): BUG: qm_free: bad pointer 0x70707553 (out of memory block!) called from tm: h_table.c: free_cell_helper(187) - aborting

Also had a few crashes in retransmission_handler():

#0  0xb750b556 in raise () from /lib/libc.so.6
#1  0xb750cd78 in abort () from /lib/libc.so.6
#2  0xb6249b5a in retransmission_handler (r_buf=0xae036674) at timer.c:367
#3  0xb6247558 in retr_buf_handler (ticks=1234464444, tl=0xae036688, p=0x1f40) at timer.c:594
#4  0x0826a2cc in timer_list_expire (t=1234464444, h=0xad71c668, slow_l=0xad71cd44, slow_mark=2232) at core/timer.c:874
#5  0x08267cb1 in timer_handler () at core/timer.c:939
#6  0x0826a4d3 in timer_main () at core/timer.c:978
#7  0x08069575 in main_loop () at main.c:1721
#8  0x080707ca in main (argc=11, argv=0xbff64134) at main.c:2723

ERROR: tm [timer.c:366]: retransmission_handler(): transaction 0xae0365e0 scheduled for deletion and called from RETR timer (flags 6d)

Both timers fired for an INVITE transaction that was previously suspended by async_route(), then resumed, sent out and received a 4xx reply (407).

This configuration worked fine with kamailio 4.2.x and problem appeared after upgrading to 5.0.2.

Trying to figure out how to narrow down the problem. Any input is appreciated.


_______________________________________________
Kamailio (SER) - Users Mailing List
[hidden email]
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users

-- 
Daniel-Constantin Mierla
www.twitter.com/miconda -- www.linkedin.com/in/miconda
Kamailio Advanced Training - www.asipto.com
Kamailio World Conference - www.kamailioworld.com

-- 
Daniel-Constantin Mierla
www.twitter.com/miconda -- www.linkedin.com/in/miconda
Kamailio Advanced Training - www.asipto.com
Kamailio World Conference - www.kamailioworld.com


-- 
Daniel-Constantin Mierla
www.twitter.com/miconda -- www.linkedin.com/in/miconda
Kamailio Advanced Training - www.asipto.com
Kamailio World Conference - www.kamailioworld.com

_______________________________________________
Kamailio (SER) - Users Mailing List
[hidden email]
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
Reply | Threaded
Open this post in threaded view
|

Re: crash after using async_route()

Vitaliy Aleksandrov
Everything is OK so far. Haven't found any issues with the patch.

On Wed, Sep 6, 2017 at 4:01 PM, Daniel-Constantin Mierla <[hidden email]> wrote:

OK, I will wait a bit and then backport.

Thanks for testing and assisting with troubleshooting.

Daniel


On 06.09.17 14:29, Vitaliy Aleksandrov wrote:
Thanks for the quick fix.

Installed the latest 5.0 branch with the mentioned patch and had no crashes so far.
Will do an additional testing and inform if find any issues.

On Wed, Sep 6, 2017 at 12:25 PM, Daniel-Constantin Mierla <[hidden email]> wrote:

I think I caught the issue and fixed with commit b672d8ef63715cf816390a05ce7a441377c3e468 in master branch.

It was caused by not resetting the T_ASYNC_CONTINUE flag after t_continue(), which caused other parts of code to not reset the reply field of any branch. The reply field could have been set by another process, so at the time of destroying the transaction, the pointer could have been to memory zone of another process, so access it caused the crash.

Along with this fix, I added few other safety checks in my way to investigate the issue.

Can you cherry pick this commit and test in branch 5.0? I want to be sure there is no obvious side effect before porting it.

Cheers,
Daniel


On 05.09.17 11:02, Daniel-Constantin Mierla wrote:

Hello,

does it happen to have the pcap (or ngrep) with the sip traffic for the call? It will be useful to see the flow with requests/replies/retransmissions and their timestamps...

Is this version the snapshot of 5.0.2 release or a build from branch 5.0?

Cheers,
Daniel


On 05.09.17 10:01, Vitaliy Aleksandrov wrote:
Hello kamailio list,

Recently found a problem in my configuration that uses async_route() functionality.
It crashes after several calls when wait_timer fires.

#0  0xb74a8556 in raise () from /lib/libc.so.6
#1  0xb74a9d78 in abort () from /lib/libc.so.6
#2  0x08293ae2 in qm_free (qmp=0xad65d000, p=0x3d64692d, file=0xb6216a16 "tm: h_table.c", func=0xb621663c <__FUNCTION__.18751> "free_cell_helper", line=187, mname=0xb621664d "tm") at core/mem/q_malloc.c:471
#3  0xb613f103 in free_cell_helper (dead_cell=0xae2cd210, silent=0, fname=0xb6239ea5 "timer.c", fline=655) at h_table.c:187
#4  0xb61e7758 in wait_handler (ti=557858937, wait_tl=0xae2cd258, data=0xae2cd210) at timer.c:655
#5  0x0826a2cc in timer_list_expire (t=557858937, h=0xad6b9668, slow_l=0xad6ba144, slow_mark=312) at core/timer.c:874
#6  0x08267cb1 in timer_handler () at core/timer.c:939
#7  0x0826a4d3 in timer_main () at core/timer.c:978
#8  0x08069575 in main_loop () at main.c:1721
#9  0x080707ca in main (argc=11, argv=0xbf85f044) at main.c:2723

When crash happens, kamailio prints the following message:
Sep  4 16:15:38 [18938]: : <core> [core/mem/q_malloc.c:469]: qm_free(): BUG: qm_free: bad pointer 0x70707553 (out of memory block!) called from tm: h_table.c: free_cell_helper(187) - aborting

Also had a few crashes in retransmission_handler():

#0  0xb750b556 in raise () from /lib/libc.so.6
#1  0xb750cd78 in abort () from /lib/libc.so.6
#2  0xb6249b5a in retransmission_handler (r_buf=0xae036674) at timer.c:367
#3  0xb6247558 in retr_buf_handler (ticks=1234464444, tl=0xae036688, p=0x1f40) at timer.c:594
#4  0x0826a2cc in timer_list_expire (t=1234464444, h=0xad71c668, slow_l=0xad71cd44, slow_mark=2232) at core/timer.c:874
#5  0x08267cb1 in timer_handler () at core/timer.c:939
#6  0x0826a4d3 in timer_main () at core/timer.c:978
#7  0x08069575 in main_loop () at main.c:1721
#8  0x080707ca in main (argc=11, argv=0xbff64134) at main.c:2723

ERROR: tm [timer.c:366]: retransmission_handler(): transaction 0xae0365e0 scheduled for deletion and called from RETR timer (flags 6d)

Both timers fired for an INVITE transaction that was previously suspended by async_route(), then resumed, sent out and received a 4xx reply (407).

This configuration worked fine with kamailio 4.2.x and problem appeared after upgrading to 5.0.2.

Trying to figure out how to narrow down the problem. Any input is appreciated.


_______________________________________________
Kamailio (SER) - Users Mailing List
[hidden email]
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users

-- 
Daniel-Constantin Mierla
www.twitter.com/miconda -- www.linkedin.com/in/miconda
Kamailio Advanced Training - www.asipto.com
Kamailio World Conference - www.kamailioworld.com

-- 
Daniel-Constantin Mierla
www.twitter.com/miconda -- www.linkedin.com/in/miconda
Kamailio Advanced Training - www.asipto.com
Kamailio World Conference - www.kamailioworld.com


-- 
Daniel-Constantin Mierla
www.twitter.com/miconda -- www.linkedin.com/in/miconda
Kamailio Advanced Training - www.asipto.com
Kamailio World Conference - www.kamailioworld.com


_______________________________________________
Kamailio (SER) - Users Mailing List
[hidden email]
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
Reply | Threaded
Open this post in threaded view
|

Re: crash after using async_route()

Daniel-Constantin Mierla-6

Thanks for following up! I backported the commit to branch 5.0.

Cheers,
Daniel


On 07.09.17 11:00, Vitaliy Aleksandrov wrote:
Everything is OK so far. Haven't found any issues with the patch.

On Wed, Sep 6, 2017 at 4:01 PM, Daniel-Constantin Mierla <[hidden email]> wrote:

OK, I will wait a bit and then backport.

Thanks for testing and assisting with troubleshooting.

Daniel


On 06.09.17 14:29, Vitaliy Aleksandrov wrote:
Thanks for the quick fix.

Installed the latest 5.0 branch with the mentioned patch and had no crashes so far.
Will do an additional testing and inform if find any issues.

On Wed, Sep 6, 2017 at 12:25 PM, Daniel-Constantin Mierla <[hidden email]> wrote:

I think I caught the issue and fixed with commit b672d8ef63715cf816390a05ce7a441377c3e468 in master branch.

It was caused by not resetting the T_ASYNC_CONTINUE flag after t_continue(), which caused other parts of code to not reset the reply field of any branch. The reply field could have been set by another process, so at the time of destroying the transaction, the pointer could have been to memory zone of another process, so access it caused the crash.

Along with this fix, I added few other safety checks in my way to investigate the issue.

Can you cherry pick this commit and test in branch 5.0? I want to be sure there is no obvious side effect before porting it.

Cheers,
Daniel


On 05.09.17 11:02, Daniel-Constantin Mierla wrote:

Hello,

does it happen to have the pcap (or ngrep) with the sip traffic for the call? It will be useful to see the flow with requests/replies/retransmissions and their timestamps...

Is this version the snapshot of 5.0.2 release or a build from branch 5.0?

Cheers,
Daniel


On 05.09.17 10:01, Vitaliy Aleksandrov wrote:
Hello kamailio list,

Recently found a problem in my configuration that uses async_route() functionality.
It crashes after several calls when wait_timer fires.

#0  0xb74a8556 in raise () from /lib/libc.so.6
#1  0xb74a9d78 in abort () from /lib/libc.so.6
#2  0x08293ae2 in qm_free (qmp=0xad65d000, p=0x3d64692d, file=0xb6216a16 "tm: h_table.c", func=0xb621663c <__FUNCTION__.18751> "free_cell_helper", line=187, mname=0xb621664d "tm") at core/mem/q_malloc.c:471
#3  0xb613f103 in free_cell_helper (dead_cell=0xae2cd210, silent=0, fname=0xb6239ea5 "timer.c", fline=655) at h_table.c:187
#4  0xb61e7758 in wait_handler (ti=557858937, wait_tl=0xae2cd258, data=0xae2cd210) at timer.c:655
#5  0x0826a2cc in timer_list_expire (t=557858937, h=0xad6b9668, slow_l=0xad6ba144, slow_mark=312) at core/timer.c:874
#6  0x08267cb1 in timer_handler () at core/timer.c:939
#7  0x0826a4d3 in timer_main () at core/timer.c:978
#8  0x08069575 in main_loop () at main.c:1721
#9  0x080707ca in main (argc=11, argv=0xbf85f044) at main.c:2723

When crash happens, kamailio prints the following message:
Sep  4 16:15:38 [18938]: : <core> [core/mem/q_malloc.c:469]: qm_free(): BUG: qm_free: bad pointer 0x70707553 (out of memory block!) called from tm: h_table.c: free_cell_helper(187) - aborting

Also had a few crashes in retransmission_handler():

#0  0xb750b556 in raise () from /lib/libc.so.6
#1  0xb750cd78 in abort () from /lib/libc.so.6
#2  0xb6249b5a in retransmission_handler (r_buf=0xae036674) at timer.c:367
#3  0xb6247558 in retr_buf_handler (ticks=1234464444, tl=0xae036688, p=0x1f40) at timer.c:594
#4  0x0826a2cc in timer_list_expire (t=1234464444, h=0xad71c668, slow_l=0xad71cd44, slow_mark=2232) at core/timer.c:874
#5  0x08267cb1 in timer_handler () at core/timer.c:939
#6  0x0826a4d3 in timer_main () at core/timer.c:978
#7  0x08069575 in main_loop () at main.c:1721
#8  0x080707ca in main (argc=11, argv=0xbff64134) at main.c:2723

ERROR: tm [timer.c:366]: retransmission_handler(): transaction 0xae0365e0 scheduled for deletion and called from RETR timer (flags 6d)

Both timers fired for an INVITE transaction that was previously suspended by async_route(), then resumed, sent out and received a 4xx reply (407).

This configuration worked fine with kamailio 4.2.x and problem appeared after upgrading to 5.0.2.

Trying to figure out how to narrow down the problem. Any input is appreciated.


_______________________________________________
Kamailio (SER) - Users Mailing List
[hidden email]
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users

-- 
Daniel-Constantin Mierla
www.twitter.com/miconda -- www.linkedin.com/in/miconda
Kamailio Advanced Training - www.asipto.com
Kamailio World Conference - www.kamailioworld.com

-- 
Daniel-Constantin Mierla
www.twitter.com/miconda -- www.linkedin.com/in/miconda
Kamailio Advanced Training - www.asipto.com
Kamailio World Conference - www.kamailioworld.com


-- 
Daniel-Constantin Mierla
www.twitter.com/miconda -- www.linkedin.com/in/miconda
Kamailio Advanced Training - www.asipto.com
Kamailio World Conference - www.kamailioworld.com


-- 
Daniel-Constantin Mierla
www.twitter.com/miconda -- www.linkedin.com/in/miconda
Kamailio Advanced Training - www.asipto.com
Kamailio World Conference - www.kamailioworld.com

_______________________________________________
Kamailio (SER) - Users Mailing List
[hidden email]
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users