Kernel stack issues on master backend
My logical setup:
mythtv0
As the master is headless, I rarely got to read any console errors and when a head was attached, a hang caused reams and reams of kernel debugging to fly by. In an attempt to debug this properly, I used netconsole. This is a kernel module that lets you send kernel errors to an alternate box over the network. You need to give the module a bunch of options and also configure an alternate box on your network to receive the messages. However, that's fairly easy as you just need to open up syslogd to allow remote logging or run netcat (nc) to monitor the UDP port and dump the log to a file.
As I already had mythtv1 logging via syslog to mythtv0, I used nc in /etc/rc.d/rc.local to avoid creating a network logging loop:
Rather than revert to 8k stacks (likely to be deprecated soon) with a 2.6.12 FC3-patched kernel, I pulled the vanilla 2.6.15 kernel and built a custom config to include the following features:
UPDATE 25/2/06: Vanilla 2.6.15 definitely cured the stack problem. I've had no crashes on my master backend since making the change!
mythtv0
- Master backend
- DVB-T (unused at present)
- MySQL server
- MythTV filestore server (via NFS)
- Headless
- 1GHz PIII, 384Mb RAM
- Slave backend
- PVR350 fed from cable-box
- Primary frontend driving TV
- 2.53GHz Celeron D, 512Mb RAM
As the master is headless, I rarely got to read any console errors and when a head was attached, a hang caused reams and reams of kernel debugging to fly by. In an attempt to debug this properly, I used netconsole. This is a kernel module that lets you send kernel errors to an alternate box over the network. You need to give the module a bunch of options and also configure an alternate box on your network to receive the messages. However, that's fairly easy as you just need to open up syslogd to allow remote logging or run netcat (nc) to monitor the UDP port and dump the log to a file.
As I already had mythtv1 logging via syslog to mythtv0, I used nc in /etc/rc.d/rc.local to avoid creating a network logging loop:
# nc -u -l -p 6666 > /var/log/mythtv0.log &On mythtv0, I modprobed netconsole in /etc/rc.d/rc.local and added these options to /etc/modprobe.conf (192.168.0.6 is the IP address of mythtv1):
options netconsole netconsole=@/,6666@192.168.0.6/Finally, I started getting useful debugging and it was pointing to kernel stack corruption. The cause of the stack corruption wasn't entirely obvious but it always seemed to involve either XFS or the device mapper (DM). After reading LKML and googling a bit, I discovered the following:
- FC3 kernels use 4k stacks
- XFS uses a lot of stack
- NFS uses a lot of stack
- DM/LVM uses a lot of stack
- The combination of an XFS filesystem sitting on top of LVM and exported over NFS can cause excessive stack pressure and actually exhaust the kernel stack
- kernel/driver developers are continuing to reduce stack usage wherever possible
Rather than revert to 8k stacks (likely to be deprecated soon) with a 2.6.12 FC3-patched kernel, I pulled the vanilla 2.6.15 kernel and built a custom config to include the following features:
- 100Hz
- 4k stacks
- no preemption
- deadline scheduler default
- optimise for Pentium III
UPDATE 25/2/06: Vanilla 2.6.15 definitely cured the stack problem. I've had no crashes on my master backend since making the change!