This answer is starting to look good. I also found allow_discards in dm-crypt’s current documentation; everything seems to imply it’s not a kernel parameter but an option for the dm-crypt device-mapper target. I’m still trying to find out if those can be passed on the linux command line. That would explain the instructions parroted all over, otherwise it is probably just misinformation.
Sorry, I had forgot about this one after it vanished and was only reminded by a private email from someone suffering something similar.
I went through my collection of panic photos and (as my recollection also was) there seem to have been none of this ’warn_slowpath_common’ kind since I last commented.
Except for one just a week ago, on completely new hardware: this one with 3.8.0 rc2 when I was testing it wrt Bug #1096802, which turned out to be caused by bad card reader firmware. It was tied to usb-storage as most if not all of the panics caused by the firmware problem, so it was most likely another symptom of that, but I’m posting that one here too just in case it still contains a hint of the conditions under which ’warn_slowpath_common’ can occur.
Meanwhile, I’m marking this as fixed as per Joseph’s request above. For the record, as far as I’m concerned, a installing 3.3 or newer series kernel was a definite fix for this issue.
A lot of SSD-related instructions online currently say you should add allow-discards and root_trim=yes to your GRUB_CMDLINE_LINUX. I have yet to find one that says why you should do that, i.e. what exactly (if anything!) do those parameters do. Where is the documentation on this and what does it say about those two parameters’ purpose?
According to Cryptsetup 1.4.0 Release Notes,
Since kernel 3.1, dm-crypt devices optionally (not by default) support block discards (TRIM) commands. If you want to enable this operation, you have to enable it manually on every activation using –allow-discards
cryptsetup luksOpen --allow-discards /dev/sdb test_disk
but is it the same when passed to the kernel (via GRUB_CMDLINE_LINUX)?
Edit: Kernel.org’s list of kernel parameters doesn’t (currently, Jan 2013, at least) have either of these options.
I’m happy to report that this seems to have been a firmware issue: with a temporary install of MS Windows, which the card reader manufacturer’s firmware upgrading software required , I managed to upgrade the card reader’s bought-with firmware version 551 to manufacturer’s current latest version 563 (released just last month). After this there were no more ”disabled ep” messages in any boot, the reader works just fine and there have been no kernel panics of any kind.
This was with the mainline 3.8 kernel so I’m not marking this bug invalid just yet. I’ve now switched back to the Quantal kernel I initially reported this with and will report here next week on how it goes.
The lsusb listing attached by apport above seems to not list the card reader at all. This did happen on some sessions, IIRC there were no panics or ”disabled ep” messages then either but naturally, the reader also wouldn’t read any cards, it was as if disconnected.
I’m attaching output of `sudo lsusb -v` here, with the card reader (004:002) detected and showing.
There’s a definite pattern here, and it’s definitely tied to xhci_hcd, usb-storage and the Akasa/Genesys card reader. Here’s what I’ve done since reporting this:
1) Set ”Legacy USB 3.0” in the BIOS from ”Enabled” to ”Disabled”, and ”Intel xHCI Mode” from ”Smart Auto” to ”Enabled”. I tested briefly with latter set to ”Disabled” and Legacy 3.0 ”Enabled”, but then the card reader wasn’t detected at all and I’d prefer a working XHCI anyway.
2) Switched to mainline kernel 3.8.0-030800rc2-generic #201301022235. With the earlier kernels (3.5 and 3.2 from Precise repo) things have seemed similar to my findings below with mainline, but data with 3.2 and 3.5 are too few to say conclusively there’s no difference at all. I’ve concentrated my testing to mainline just to keep things simpler.
Here’s the pattern with mainline:
1) Cold boot. Early in the boot, the card reader in USB #4 is asked to reset. This results in a flood of ”xHCI xhci_drop_endpoint called with disabled ep ffff880403c8d500”:
Jan 8 10:32:12 saegusa kernel: [ 721.015249] usb 4-4: reset SuperSpeed USB device number 2 using xhci_hcd
Jan 8 10:32:12 saegusa kernel: [ 721.032596] xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep ffff880403c8d500
Jan 8 10:32:12 saegusa kernel: [ 721.032599] xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep ffff880403c8d540
When these messages are there, the session will crash (panic/freeze) at some point; using the card reader isn’t necessary (panics without it eventually too), but it’s easy enough to trigger just by sticking an SD card into the reader. The reader reports buffer errors on the card and then boom.
Jan 8 10:32:13 saegusa kernel: [ 721.713295] sd 8:0:0:2: [sdf] Unhandled error code
Jan 8 10:32:13 saegusa kernel: [ 721.713299] sd 8:0:0:2: [sdf]
Jan 8 10:32:13 saegusa kernel: [ 721.713300] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
Jan 8 10:32:13 saegusa kernel: [ 721.713303] sd 8:0:0:2: [sdf] CDB:
Jan 8 10:32:13 saegusa kernel: [ 721.713304] Read(10): 28 00 00 00 20 00 00 00 08 00
Jan 8 10:32:13 saegusa kernel: [ 721.713314] end_request: I/O error, dev sdf, sector 8192
Jan 8 10:32:13 saegusa kernel: [ 721.713318] Buffer I/O error on device sdf1, logical block 0
Another USB #4 -related message often preceding a panic in syslog is this:
Jan 8 17:38:21 saegusa kernel: [ 148.282256] usb 4-4: Disable of device-initiated U1 failed.
Jan 8 17:38:21 saegusa kernel: [ 148.285747] usb 4-4: Disable of device-initiated U2 failed.
The panics, when visible, are always in Pid: usb-storage, and mostly of the ”ring_doorbell_for_active_rings” type (above), but I did have at least one ”warn_slowpath_common” too (will attach a picture if requested).
2) Reboot after the panic, USB #4 doesn’t get reset and no ”disabled ep” or ”Disable of device-initiated …” messages appear in syslog. The card reader works perfectly (i.e. SD card can be inserted and read/written without problems).
3) The panic isn’t necessary to get the card reader working: it’s enough to reboot after one cold boot and the reset signal being sent in that session. Just don’t stay in that cold boot session, because it’ll panic eventually.
I’ll attach a complete syslog from yesterday and today to give context to what I’ve quoted above.
I don’t see the syslog I mentioned attached above, so I’m attaching it here. Also, the USB-related stuff just preceding the panic starts slightly earlier than what I claimed above, with this reset attempt:
Jan 7 10:45:01 saegusa kernel: [ 413.991127] usb 4-4: reset SuperSpeed USB device number 2 using xhci_hcd
Last week I replaced the internals of my desktop computer with a new ASUS P8H77-M PRO, Intel G2120 and 16 GB RAM. With one round of Memtest passed, I booted into my old Precise install and got bit hard by what appears to be Bug #993187: frequent hard lockups (multiple within a few hours of use). I installed linux-image-generic-lts-quantal (currently 188.8.131.52.29) and that seemed to resolve the lockups: I got more than two days of uptime (ending with an intended shutdown), half of which I was actively using the computer.
This morning, soon (half an hour?) after login, the kernel paniced with a reference to usb-storage (I’ll attach a picture). This was different from the lockups with the stock Precise (3.2) kernel: with them I never had any panics shown (only the frozen desktop) and the system had to be powered off to reboot, whereas with the panic here I could reboot using the chassis reset button.
There’s some USB activity in syslog just prior to the panic (Jan 7 at around 10:50), but I wasn’t using any USB devices at the time. I have used them with this kernel previously though, without issues, and currently am too (to transfer the panic picture from my phone). There’s a memory card reader/USB port panel (Akasa AK-ICR-17) permanently plugged into internal USB 2 and 3.
Rolling release oli muistaakseni… juu, Arch kuten muistelinkin.
From a similar question on ServerFault (and particularly one response there), one possible explanation for the disparity is that there are processes hanging on to files they’ve accessed on /tmp that have since been deleted.
# lsof | grep deleted
will list such files along with the processes still attached to them.