I still don’t have a surefire recipe for reproducing this, but it seems especially prone to occur when I have multiple Gnome terminal windows open, or one with multiple tabs in it, in addition to other apps. I have a gut feeling it’s triggered 4/5 times by switching from something else into the set of Gnome terminal windows (with the mouse, via launcher).
I went back in Precise kernels [1] all the way back to 3.0.0-12.20. There seem to be no easy answers: now even 3.0.0-12.20 crashes with -intel.
I think this either means that the hardware’s broken, or that the issue has been lurking in kernels all the way back to (at least) 3.0.0-12.20, and was only triggered by some early Precise updates (during the time window I described above). As I said, it (definitely) wasn’t there when I filed Bug #903831 on 2011-12-13 (because I couldn’t have gotten far enough to trigger that bug with this on the way).
I’ll attach shots of current results with the early Precise kernels below just in case there’s anything useful there.
I think I’ll try ruling out hardware failure with Oneiric, either with the live disc (if that uses -intel) or by reinstalling.
* [1] https://launchpad.net/ubuntu/precise/+source/linux/
V3.3-rc6 still crashes, irregardless of RC6 being enabled/disabled.
That’s a negative: this one persists.
Confirming: the fix works.
With persistent booting I was able to get a panic [1] showing with 3.3.0-030300rc4, and it looks the same as what the dmesg I posted in #29 [2] showed: print_bad_pte+0x187/0x1e0 is on top the Trace. Despite the numerous boots I was still unable to reproduce the initial printk+0x2d/0x2f, so it may be fixed in Main or masked by the print_bad_pte+0x187/0x1e0 (though this still is based only on two datapoints in a frustratingly random issue).
Whether RC6 is enabled or disabled doesn’t seem to have bearing on this. 3.2.0-17 produces printk+0x2d/0x2f either way [3], and 3.2.0-18.28 also panics, though less consistently: I was only able to produce a sure printk+0x2d/0x2f once [4], with 3.2.0-18.28 non-pae. Mostly the errors fail to reveal themselves, and when they do, they are different from printk+0x2d/0x2f but also from each other: a couple of times a warn_slowpath_common+0x72/0xa0 (as in Bug #917668, though the hardware and pointers are different) occurred [5], and once it was a Bad page map [6] in unity-greeter.
* [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/926007/comments/32
* [2] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/926007/comments/29
* [3] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/926007/comments/33
* [4] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/926007/comments/34
* [5] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/926007/comments/35
* [6] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/926007/comments/36
I’ll upload a bunch of new screenshots for reference. They’re all related to testing this so bear with me, I’ll explain them further after uploading.
Still present in upstream 3.3.0-030300rc4 as it was in 3.2.0-17.27.
I tested 3.3.0-030300rc4 and couldn’t verify that the panic that all the 3.2’s above have is still present. Unfortunately I couldn’t prove it doesn’t either: with -intel, the first boot resulted in the ’low graphics mode’ failsafe dialog with Traces in dmesg (I’m attaching it). All subsequent boots resulted in panics that didn’t reveal a Trace, so they may or may not have been the one at hand. The panics still occurred when LDM should’ve launched, visually it either just showed the last lines of boot log or that with the mouse cursor. (The what’s-that-key was also blinking on the keyboard.)
I’ve been waiting for an i386 build of RC5 to appear in the directory but it hasn’t. Should I try RC4 instead or keep waiting until a newer i386 build appears? AMD64 isn’t supported by the processor.
Unfortunately I didn’t make a note of when exactly the issue began. But I can give you a timeframe: it wasn’t there when I filed Bug #903831 on 2011-12-13, probably still not there on 2011-12-16 when I made comment #5 on the bug, and probably was there when I made comment #7 on that bug on 2012-01-06. (I’m being cautious with the ’probablies’ because of all the overlapping issues here.)