Avainsanana Linux

wistron_btns ”breaks” -pae: floods with ”Unknown key code 10”, causing severe slowdown

3. maaliskuuta 2012 klo 15.01
Sijainti: Vianhallintajärjestelmät: Kernel Bug Tracker
Avainsanat: Linux

My summary’s crap because this is difficult to summarize, hopefully the explanation below makes it clearer. I have little understanding of kernel internals, so I’ll first just try and describe the symptom as it appears.

I’ve come across an issue on my Fujitsu Siemens Amilo M7400 laptop with wistron_btns that is triggered by certain kernels, and once triggered, seems to affect all subsequent attempts to reboot with -pae kernels until a non-pae kernel is booted. I initially reported this on Launchpad [1].

I can currently trigger the issue by (cold or re-) booting 3.2.0-14-pae (these are Ubuntu’s packaged kernels) or by booting (for example) 3.3.0-030300rc4-generic-pae in recovery mode (= ”ro recovery nomodeset”). The recovery boot seems to work normally, but the 3.2.0-14-pae boot already exhibits the failure: it seemingly freezes. (More about the exact nature of ”failure” below.)

Once I’ve triggered the issue, rebooting with any -pae kernel fails similar to how 3.2.0-14-pae behaves irregardless of preceding boots.

I can ”fix” this by booting a non-pae kernel (which never fails). After that subsequent reboots with -pae kernels (apart from 3.2.0-14-pae) no longer fail — not until I do any of the triggering actions again.

Now, the ”failure” looks like a freeze, but it’s actually just an extreme slowdown. With patience, I can actually have the boot finish and can inspect logs. Dmesg reveals that wistron_btns is repeating ”Unknown key code 10” over and over.

If I comment wistron_btns out of /etc/modules so that it isn’t loaded, the issue goes away, meaning I can no longer trigger it.

As I said, I have little understanding of kernel bugs, so what I say next may be completely off, but the way I’ve interpreted this is that the ”brokenness” is actually hidden in the hardware, in something controlled by wistron_btns. Booting 3.2.0-14-pae/recovery booting any -pay puts the controller(?) in a ”broken” state from which a -pae kernel can’t recover, but a non-pae kernel can. And although -pae kernels later than 3.2.0-14 can’t recover a ”broken” controller, they also cannot put it into that ”broken” state (which is a good turn of development).

I’ll be happy to provide more info as requested. I’m attaching dmesg output for
starters.

* [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/926012

Vastaa viestiin sen kontekstissa (Kernel Bug Tracker)

With persistent booting I was able to get a panic

3. maaliskuuta 2012 klo 11.25
Sijainti: Vianhallintajärjestelmät: Launchpad
Avainsanat: Intel, Linux

With persistent booting I was able to get a panic [1] showing with 3.3.0-030300rc4, and it looks the same as what the dmesg I posted in #29 [2] showed: print_bad_pte+0x187/0x1e0 is on top the Trace. Despite the numerous boots I was still unable to reproduce the initial printk+0x2d/0x2f, so it may be fixed in Main or masked by the print_bad_pte+0x187/0x1e0 (though this still is based only on two datapoints in a frustratingly random issue).

Whether RC6 is enabled or disabled doesn’t seem to have bearing on this. 3.2.0-17 produces printk+0x2d/0x2f either way [3], and 3.2.0-18.28 also panics, though less consistently: I was only able to produce a sure printk+0x2d/0x2f once [4], with 3.2.0-18.28 non-pae. Mostly the errors fail to reveal themselves, and when they do, they are different from printk+0x2d/0x2f but also from each other: a couple of times a warn_slowpath_common+0x72/0xa0 (as in Bug #917668, though the hardware and pointers are different) occurred [5], and once it was a Bad page map [6] in unity-greeter.

* [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/926007/comments/32
* [2] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/926007/comments/29
* [3] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/926007/comments/33
* [4] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/926007/comments/34
* [5] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/926007/comments/35
* [6] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/926007/comments/36

Vastaa viestiin sen kontekstissa (Launchpad)

I’ll upload a bunch of new screenshots for reference

3. maaliskuuta 2012 klo 11.20
Sijainti: Vianhallintajärjestelmät: Launchpad
Avainsanat: Intel, Linux

I’ll upload a bunch of new screenshots for reference. They’re all related to testing this so bear with me, I’ll explain them further after uploading.

Vastaa viestiin sen kontekstissa (Launchpad)

Still present in upstream 3.3.0-030300rc4 as it was in 3.2.0-17.27.

2. maaliskuuta 2012 klo 21.57
Sijainti: Vianhallintajärjestelmät: Launchpad
Avainsanat: Linux

Still present in upstream 3.3.0-030300rc4 as it was in 3.2.0-17.27.

Vastaa viestiin sen kontekstissa (Launchpad)

Couldn’t verify that the panic is still present in 3.3.0-030300rc4

2. maaliskuuta 2012 klo 21.53
Sijainti: Vianhallintajärjestelmät: Launchpad
Avainsanat: Intel, Linux

I tested 3.3.0-030300rc4 and couldn’t verify that the panic that all the 3.2’s above have is still present. Unfortunately I couldn’t prove it doesn’t either: with -intel, the first boot resulted in the ’low graphics mode’ failsafe dialog with Traces in dmesg (I’m attaching it). All subsequent boots resulted in panics that didn’t reveal a Trace, so they may or may not have been the one at hand. The panics still occurred when LDM should’ve launched, visually it either just showed the last lines of boot log or that with the mouse cursor. (The what’s-that-key was also blinking on the keyboard.)

Vastaa viestiin sen kontekstissa (Launchpad)

I’ve been waiting for an i386 build of RC5 to appear in the directory but it hasn’t

2. maaliskuuta 2012 klo 17.25
Sijainti: Vianhallintajärjestelmät: Launchpad
Avainsanat: Intel, Linux

I’ve been waiting for an i386 build of RC5 to appear in the directory but it hasn’t. Should I try RC4 instead or keep waiting until a newer i386 build appears? AMD64 isn’t supported by the processor.

Unfortunately I didn’t make a note of when exactly the issue began. But I can give you a timeframe: it wasn’t there when I filed Bug #903831 on 2011-12-13, probably still not there on 2011-12-16 when I made comment #5 on the bug, and probably was there when I made comment #7 on that bug on 2012-01-06. (I’m being cautious with the ’probablies’ because of all the overlapping issues here.)

Vastaa viestiin sen kontekstissa (Launchpad)

3.2.0-17.27 seems to be interchangeable with 3.2.0-17.26

29. helmikuuta 2012 klo 17.03
Sijainti: Vianhallintajärjestelmät: Launchpad
Avainsanat: Linux

3.2.0-17.27 seems to be interchangeable with 3.2.0-17.26 in what I described above, i.e. no change wrt. this bug.

Vastaa viestiin sen kontekstissa (Launchpad)

@jsalisbury: Yeah, my Intel hardware’s got its own set of problems. :) I’ll get back to doing tests on those later this week.

28. helmikuuta 2012 klo 17.07
Sijainti: Vianhallintajärjestelmät: Launchpad
Avainsanat: Intel, Linux

@jsalisbury: Yeah, my Intel hardware’s got its own set of problems. :) I’ll get back to doing tests on those later this week.

Vastaa viestiin sen kontekstissa (Launchpad)

I’ve just had a GPU lockup with 3.3.0-030300rc4-generic

26. helmikuuta 2012 klo 20.20
Sijainti: Vianhallintajärjestelmät: Launchpad
Avainsanat: Linux, Radeon

I’ve just had a GPU lockup with 3.3.0-030300rc4-generic. Would this be yet another issue (i.e. not this bug nor bug #938894)? If so, where (if anywhere) should I file it? I’m pasting below here what was in syslog about it.

Feb 26 19:53:38 saegusa kernel: [ 8031.040107] radeon 0000:01:05.0: GPU lockup CP stall for more than 419884msec
Feb 26 19:53:38 saegusa kernel: [ 8031.040113] GPU lockup (waiting for 0x00103023 last fence id 0x00103022)
Feb 26 19:53:38 saegusa kernel: [ 8031.040125] [drm] Disabling audio support
Feb 26 19:53:38 saegusa kernel: [ 8031.041548] radeon 0000:01:05.0: GPU softreset
Feb 26 19:53:38 saegusa kernel: [ 8031.041550] radeon 0000:01:05.0: R_008010_GRBM_STATUS=0xA0003030
Feb 26 19:53:38 saegusa kernel: [ 8031.041551] radeon 0000:01:05.0: R_008014_GRBM_STATUS2=0x00000003
Feb 26 19:53:38 saegusa kernel: [ 8031.041553] radeon 0000:01:05.0: R_000E50_SRBM_STATUS=0x20000040
Feb 26 19:53:38 saegusa kernel: [ 8031.041559] radeon 0000:01:05.0: R_008020_GRBM_SOFT_RESET=0x00007FEE
Feb 26 19:53:38 saegusa kernel: [ 8031.056444] radeon 0000:01:05.0: R_008020_GRBM_SOFT_RESET=0x00000001
Feb 26 19:53:38 saegusa kernel: [ 8031.072335] radeon 0000:01:05.0: R_008010_GRBM_STATUS=0xA0003030
Feb 26 19:53:38 saegusa kernel: [ 8031.072337] radeon 0000:01:05.0: R_008014_GRBM_STATUS2=0x00000003
Feb 26 19:53:38 saegusa kernel: [ 8031.072339] radeon 0000:01:05.0: R_000E50_SRBM_STATUS=0x20008040
Feb 26 19:53:38 saegusa kernel: [ 8031.073333] radeon 0000:01:05.0: GPU reset succeed
Feb 26 19:53:38 saegusa kernel: [ 8031.094096] [drm] PCIE GART of 512M enabled (table at 0x00000000C0040000).
Feb 26 19:53:38 saegusa kernel: [ 8031.094153] radeon 0000:01:05.0: WB enabled
Feb 26 19:53:38 saegusa kernel: [ 8031.094156] [drm] fence driver on ring 0 use gpu addr 0xa0000c00 and cpu addr 0xffff88020f609c00
Feb 26 19:53:38 saegusa kernel: [ 8031.127631] [drm] ring test on 0 succeeded in 1 usecs
Feb 26 19:53:38 saegusa kernel: [ 8031.127656] [drm] ib test on ring 0 succeeded in 1 usecs
Feb 26 19:53:38 saegusa kernel: [ 8031.127659] [drm] Enabling audio support

Vastaa viestiin sen kontekstissa (Launchpad)

Running 3.3.0-030300rc4-generic now

23. helmikuuta 2012 klo 17.10
Sijainti: Vianhallintajärjestelmät: Launchpad
Avainsanat: Linux

@jsalisbury: Alright, thanks. I’m running 3.3.0-030300rc4-generic now. Let’s see how it works out!

Vastaa viestiin sen kontekstissa (Launchpad)

« Uudempia - Vanhempia »