Forgot to mention I could reboot after the crash with Alt+PrtSc+REISUB (if that matters).
Forgot to mention I could reboot after the crash with Alt+PrtSc+REISUB (if that matters).
Forgot to mention I could reboot after the crash with Alt+PrtSc+REISUB (if that matters).
A crash that threw me (from the desktop) into the console to show the Oops. Hasn’t happened before as to my knowledge, looks different from bug #917668 (which I’m affected by), but similar to bug #886706 (and bug #495322), but kernel bugs are too much out of my league to tell for sure.
Wasn’t doing anything specific that I could reproduce this with currently. At least Chromium, Transmission and Rhythmbox were active during the time.
Syslog has some data, I’ll attach it manually if apport doesn’t bring it in automatically.
I’ll happily do further testing and/or log uploads at request.
@jsalisbury: On this setup, there were seemingly similar freezes before Precise (I was using Lucid until then), but being so far apart and without a reliable recipe for reproducing, I mostly just ignored the issue. To give a clue as to the rarity, an (again seemingly) similar freeze happened just now, for the first time since I reported the bug, so if it’s the same issue, it’s been in hiding for over a month.
Unfortunately the logs didn’t have anything about this crash, and I couldn’t ssh in either. As I have yet to gather any substantial data apart from the little I posted above, there’s no way of knowing whether it’s always been the same issue or not. The symptom on the surface has always been very similar, but I guess that’s true for most freezes are even if brought on by unrelated causes.
I’m not afraid of testing the mainline kernel per se, but I’m hesitant because with this occurence rate, wouldn’t I be trying to prove a negative? Would 2 months without the issue constitute a ’kernel-fixed-upstream’? 6 months? Also, should I install v3.3-rc2-precise as you suggested, or the more recent 3.3-rc4-precise now? If the more recent one, should I then stick to it, or keep upgrading as new mainline kernels are built?
With both 3.2.0-16 and 3.2.0-17, what I said in #8 still holds, with -16 and -17 behaving just as -15 did. 3.2.0-17 added something interesting though: booting 3.2.0-17-pae in recovery mode ”breaks” the -pae’s like (non-recovery booting) 3.2.0-14-pae does. To be sure, I tried recovery booting other kernels going back to 3.2.0-14, and couldn’t reproduce this with them (not even with 3.2.0-14-pae!) . Recovery booting 3.2.0-17 non-pae also doesn’t bring it on, it’s just recovery booting 3.2.0-17-pae.
The steps to reproducing this freeze with 3.2.0-17 are:
1. Boot 3.2.0-17-pae in recovery mode.
2. In the recovery menu, select ”root”.
3. From the root prompt, just reboot.
4. Boot 3.2.0-17-pae (normally).
The ”fix” also still holds: just boot a non-pae kernel once, and the pae’s again work.
After dozens and dozens of boots with the 3.2.0-14 and 3.2.0-15 kernels, here’s what I know.
1. This *is* tied to wistron_btns as I reported. Without it, boot never fails (the way I initially reported, though I’ll redefine what ”fails” means further below).
2. With non-pae kernels, boot never fails.
3. With 3.2.0-14-pae, the boot always fails.
4. A cold boot with 3.2.0-15-pae never fails.
5. A re-boot with 3.2.0-15-pae after a *non-failing* boot never fails.
6. A re-boot of 3.2.0-15-pae, after a *failing* boot (of 3.2.0-14-pae for instance), is *almost* sure to fail. I’d give it a 10% chance of not failing.
If you put it another way, this appears is pretty interesting:
1. You can ”break” 3.2.0-15-pae by booting 3.2.0-14-pae first.
2. You ”fix” a thus ”broken” 3.2.0-15-pae by booting a non-pae kernel.
I suspect this brokenness is actually hidden in the hardware, in something (the wifi key perhaps?) controlled by wistron_btns. Booting 3.2.0-14-pae puts the controller(?) in a ”broken” state from which 3.2.0-15-pae can’t recover, but a non-pae kernel can. And though 3.2.0-15-pae can’t recover a ”broken” controller, it also cannot put it into that ”broken” state (which is a good turn of development).
So now, about that ”fails” part.
I discovered by accident that although the system appears to freeze in boots I referred to as ”fails”, it has in fact been brought down to *almost* complete halt, but *just* almost. If I’m patient enough to wait, it does actually boot into LDM, from where I can switch to another VT and log in… slooooooowly.
Thus I was able to find out what’s going on that makes it so slow:
jani@amilo:~$ head dmesg.fail
stron_btns: Unknown key code 10
[ 1011.554522] wistron_btns: Unknown key code 10
[ 1011.554722] wistron_btns: Unknown key code 10
[ 1011.554921] wistron_btns: Unknown key code 10
[ 1011.555120] wistron_btns: Unknown key code 10
[ 1011.555320] wistron_btns: Unknown key code 10
[ 1011.555518] wistron_btns: Unknown key code 10
[ 1011.555717] wistron_btns: Unknown key code 10
[ 1011.555916] wistron_btns: Unknown key code 10
[ 1011.556134] wistron_btns: Unknown key code 10
jani@amilo:~$ grep wistron dmesg.fail | wc -l
2520
Note that this is unrelated to pressing any actual physical buttons. It’ wistron_btns misbehaving under the conditions I described above.
jani@amilo:~$ LC_ALL=C aptitude show linux-image-3.2.0-14-generic-pae | grep ”more then”
Geared toward 32 bit desktop systems with more then 4GB RAM.
I believe the fix is to ’s/more then/more than/’ in DEBIAN/control.
While waiting for this to reoccur, now that I came to think of it: I have radeon.audio=1 on my kernel parameters due to Bug #864735. Radeon audio is considered too buggy by developers to be enabled by default, so my force-enabling it definitely makes it a suspect here.
Hi David, thanks for responding. I tested fglrx just now and every time I launched VLC with audio, or in this case even Totem with audio, the X session went boom right before any sound came out. So if you meant ”does it work without tsched=0 when using fglrx”, I guess the answer is no. I’ll attach the X log, although this crash is probably unrelated to this report. (I only use the free drivers myself so I won’t bother to report this separately. It was pretty consistent and should be easily reproducible though.)
I didn’t know whether the radeon.audio=1 kernel parameter matters when fglrx is in use, so I tried both with it and without it, with the same result (X crash).
The only thing audiowise that didn’t crash the session was PA’s speaker test (from the audio settings). It didn’t make any sound either though.
Luckily, enabling Radeon audio in the kernel hasn’t given me any problems on this setup, at least such that I could link to it. I do have Bug #917668 filed in, but it’ll have to reoccur to get more data to see if that’s connected.
Bug #751265 describes the symptom: when VLC uses Pulseaudio for audio output, the sound from it becomes garbled after playing for a while, with heavy digital artefacts and echoing. Comment #23 in that report suggests modifying /etc/pulse/default.pa so that load-module module-udev-detect is followed by tsched=0. I’ve done that, and with it VLC seems to work fine with Pulseaudio. Furthermore, in comment #30 @David Henningsson prompted us suffering from this and with the tsched=0 workaround working to file our own reports for each specific hardware. This is my report.
I believe apport adds data about the hardware automatically. I’ll add to that that for me this only occurs with the Radeon HDMI output; through the analog output (via headphones) the audio works fine. As Bug #864735 describes, Radeon audio is off by default in recent kernels, but I’ve re-enabled it by passing the radeon.audio=1 kernel commandline parameter.
If I switch to ALSA output for VLC (without tsched=0), VLC audio goes mute after a while. After some time of silence it sort of fast forwards itself to get up to sync with the video again. This keeps repeating, so it’s not really a workaround.
(My previous comment was after trying 3.2.0-14.)