|
Posted by Ben Ramsey on 09/08/05 17:22
The following message is from a co-worker. I'm passing it along as a
favor. Any help would be greatly appreciated since we're both somewhat
stuck with no clue how to proceed with this.
Thanks,
Ben Ramsey
I'm having a major problem with my PHP scripts getting stuck on one of
our production servers.
The system runs normally for a while, but after some time, certain httpd
processed get wedged. This causes the server to spawn more httpd
processes, but those eventually get stuck too, and the whole mess shuts
down once we reach MaxServers.
Without knowing too much about PHP, this appears to be related to some
sort of deadlock in the session handling code. The problem seems to
occur regardless of which script is being used, but I admittedly cannot
figure out how to reproduce it. It only seems to occur when the system
gets under heavy load, though. (The Red Cross linked to our site this
morning to help collect volunteers for the hurricane relief effort, and
that caused the server to die in short order. :-( )
We're using Apache 2.0.46-RH (prefork MPM) and PHP 4.3.2 (RedHat pl23).
Unfortunately, we are running in a managed hosting environment, so we
cannot easily change Apache or PHP versions.
I was able to attach a GDB process to one of the running Apaches, and I
found the following stack trace. Unfortunately, it couldn't figure out
how to find the symbols (see above comment about the managed hosting
environment), but there were a few useful tidbits:
(gdb) bt
#0 0x0042b291 in flock () from /lib/tls/libc.so.6
#1 0x010c23de in zm_info_session () from /etc/httpd/modules/libphp4.so
#2 0x00000002 in ?? ()
#3 0x00000180 in ?? ()
#4 0x094919cc in ?? ()
#5 0x09273220 in ?? ()
#6 0xbfffa5a4 in ?? ()
#7 0xbfffa594 in ?? ()
#8 0xbfffa110 in ?? ()
#9 0x094107f4 in ?? ()
#10 0x093f32cf in ?? ()
#11 0x00000001 in ?? ()
#12 0xbfffa120 in ?? ()
#13 0x706d742f in ?? ()
#14 0x7365732f in ?? ()
#15 0x32645f73 in ?? ()
#16 0x63393966 in ?? ()
#17 0x62393063 in ?? ()
#18 0x37663530 in ?? ()
#19 0x38643466 in ?? ()
#20 0x34353166 in ?? ()
#21 0x32366630 in ?? ()
#22 0x33663961 in ?? ()
#23 0x00003965 in ?? ()
#24 0x011f9003 in ?? () from /etc/httpd/modules/libphp4.so
#25 0xb5650760 in ?? ()
#26 0xb5650758 in ?? ()
#27 0xbfffa198 in ?? ()
#28 0xbfffa594 in ?? ()
#29 0x00000000 in ?? ()
I poked around in the PHP code until I found the zm_info_session
function in the session handler. I didn't see how that function could
be related, but I noticed an associated flock() call in the files
module. On that hunch, I then decided to see what I could find with lsof:
/usr/sbin/lsof | sort -k 9 | grep sess
httpd 15498 apache 81uW REG 104,5 1729 17162
/tmp/sess_d2f99cc09b05f7f4d8f1540f62a9f3e9
httpd 15594 apache 81u REG 104,5 1729 17162
/tmp/sess_d2f99cc09b05f7f4d8f1540f62a9f3e9
httpd 15728 apache 81u REG 104,5 1729 17162
/tmp/sess_d2f99cc09b05f7f4d8f1540f62a9f3e9
httpd 15934 apache 81u REG 104,5 1729 17162
/tmp/sess_d2f99cc09b05f7f4d8f1540f62a9f3e9
httpd 15955 apache 81u REG 104,5 1729 17162
/tmp/sess_d2f99cc09b05f7f4d8f1540f62a9f3e9
httpd 15957 apache 81u REG 104,5 1729 17162
/tmp/sess_d2f99cc09b05f7f4d8f1540f62a9f3e9
httpd 15959 apache 81u REG 104,5 1729 17162
/tmp/sess_d2f99cc09b05f7f4d8f1540f62a9f3e9
httpd 15982 apache 81u REG 104,5 1729 17162
/tmp/sess_d2f99cc09b05f7f4d8f1540f62a9f3e9
httpd 16071 apache 81u REG 104,5 1729 17162
/tmp/sess_d2f99cc09b05f7f4d8f1540f62a9f3e9
httpd 16072 apache 81u REG 104,5 1729 17162
/tmp/sess_d2f99cc09b05f7f4d8f1540f62a9f3e9
[... many, many httpd processes later ...]
httpd 16368 apache 81u REG 104,5 1729 17162
/tmp/sess_d2f99cc09b05f7f4d8f1540f62a9f3e9
httpd 16372 apache 81u REG 104,5 1729 17162
/tmp/sess_d2f99cc09b05f7f4d8f1540f62a9f3e9
httpd 16373 apache 81u REG 104,5 1729 17162
/tmp/sess_d2f99cc09b05f7f4d8f1540f62a9f3e9
httpd 16375 apache 81u REG 104,5 1729 17162
/tmp/sess_d2f99cc09b05f7f4d8f1540f62a9f3e9
httpd 16376 apache 81u REG 104,5 1729 17162
/tmp/sess_d2f99cc09b05f7f4d8f1540f62a9f3e9
Aha! The "81uW" appears to tell us that PID 15498 has an exclusive
write lock on the entire file, and that the other httpd processes have
read or write locks of some length. All of these processes *stayed* in
this state for a very long time.
As an experiment, I tried killing PID 15498, but it just moved the "W"
lock to some other process within the group, per lsof. Nonetheless, it
seemed like all of the processes were waiting for something.
In one other case, later in the day, I found a situation where the
processes also seemed to be in the same state. It hadn't gotten to the
point where it killed the machine, but I saw a bunch of httpd's running
and trying to access the same session file.
In that particular case, I was able to get a stack trace. It appears
that Apache is trying to shut the process down
(ap_graceful_stop_signalled?), but it seemed to get stuck forever in the
poll() loop. Like I said, I'm not sure if this is the same bug as the
above (and whether or not it is even normal behavior), but it does have
more details:
#0 0x002a8689 in poll () from /lib/tls/libc.so.6
#1 0x00e0bc54 in apr_poll () from /usr/lib/libapr-0.so.0
#2 0x00e0c3a2 in apr_wait_for_io_or_timeout () from /usr/lib/libapr-0.so.0
#3 0x00e01d73 in apr_socket_sendv () from /usr/lib/libapr-0.so.0
#4 0x00e02244 in apr_sendv () from /usr/lib/libapr-0.so.0
#5 0x0807bb65 in ap_add_output_filters_by_type ()
#6 0x0807d038 in ap_core_translate ()
#7 0x0807483c in ap_pass_brigade ()
#8 0x08062d11 in ap_http_header_filter ()
#9 0x0807483c in ap_pass_brigade ()
#10 0x08076edf in ap_content_length_filter ()
#11 0x0807483c in ap_pass_brigade ()
#12 0x08064aa4 in ap_byterange_filter ()
#13 0x0807483c in ap_pass_brigade ()
#14 0x01161a2b in execute () from /etc/httpd/modules/libphp4.so
#15 0x01134195 in php_ob_get_length () from /etc/httpd/modules/libphp4.so
#16 0x0113422f in php_ob_get_length () from /etc/httpd/modules/libphp4.so
#17 0x01133039 in php_body_write () from /etc/httpd/modules/libphp4.so
#18 0x01122916 in php_request_shutdown () from /etc/httpd/modules/libphp4.so
#19 0x0114e0dc in zend_print_zval_ex () from /etc/httpd/modules/libphp4.so
#20 0x0114e062 in zend_print_zval () from /etc/httpd/modules/libphp4.so
#21 0x0114dc47 in zend_print_variable () from /etc/httpd/modules/libphp4.so
#22 0x0115c2ff in execute () from /etc/httpd/modules/libphp4.so
#23 0x00486126 in apc_copy_class_entry_for_execution () from
/usr/lib/php4/apc.so
#24 0x0114f306 in zend_execute_scripts () from /etc/httpd/modules/libphp4.so
#25 0x01123e33 in php_execute_script () from /etc/httpd/modules/libphp4.so
#26 0x0116269d in execute () from /etc/httpd/modules/libphp4.so
#27 0x0807483c in ap_pass_brigade ()
#28 0x0807c489 in ap_core_translate ()
#29 0x080685c5 in ap_run_handler ()
#30 0x08068bdf in ap_invoke_handler ()
#31 0x08065266 in ap_process_request ()
#32 0x080608bc in _start ()
#33 0x08e64d78 in ?? ()
#34 0x00000004 in ?? ()
#35 0x08e64d78 in ?? ()
#36 0x080722dc in ap_run_pre_connection ()
#37 0x08072195 in ap_run_process_connection ()
#38 0x08066ae1 in ap_graceful_stop_signalled ()
#39 0x08066c34 in ap_graceful_stop_signalled ()
#40 0x08066ed9 in ap_graceful_stop_signalled ()
#41 0x08067570 in ap_mpm_run ()
#42 0x0806da4f in main ()
Does this provide any clues? Like I said, this is happening on a
production machine, and I'm getting slammed by my superiors to make this
work Real Soon Now. Any help or hints you can provide would be appreciated!
Thanks,
Scott
--
Scott Dudley
Technology Manager
Hands On Network
http://www.handsonnetwork.org
[Back to original message]
|