Announcement

Collapse
No announcement yet.

Interesting apparent 5.19 kernel dmesg bug re. SSD - seems harmless though...

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    Interesting apparent 5.19 kernel dmesg bug re. SSD - seems harmless though...

    Kind of a long tale, so don't be afraid to skip this post, LOL

    I had a couple older SATA SSDs on the shelf (250GB Samsung 840 Pros almost 60,000 power on hours). and my server had two open SATA ports the SSDs in there. A little extra space couldn't hurt, right?

    Formatted them both as a single BTRFS file system (not RAID, just JBOD) using the whole disks - full disk format, no partition table. Not long after I ran dmesg and got these messages 1000's of times:

    [Fri Jan 5 10:47:32 2024] ata1.00: Enabling discard_zeroes_data
    [Fri Jan 5 10:47:54 2024] ata5.00: Enabling discard_zeroes_data


    About every 5 minutes it repeats ad nauseum. These are the SATA ports these two drives are connected to. Note that there are 2 other drives SATA connected with BTRFS full-disk formatting and they do not produce this message. However they are both HDDs not SSDs. There is also another SSD that does not produce the messages.

    Web searching revealed a few similar complaints. Most of the time the "fix" was a new SATA cable or the drive itself was failing. But most of these posts reference several other errors along with the above message. Seemed unlikely that I would have grabbed two new SATA cables that were bad and both drives checked out just fine. So I kept digging.

    I found a couple reports determining that the message (note - just a message, not ERROR or WARNING) had something to do with some SSDs and Trim and even mentioned Samsung drives as being more susceptible to this occurrence. I finally found a suggestion that a kernel parameter would stop the dmesg spam:

    libata.force=noncq

    and I tried it, but made no difference.

    Then later that day I was reviewing what to actually do with the space provided by two drives and worked out that I didn't want them how I had initially configured them. So I split them up and partitioned the drive on SATA1 (/dev/sda) as a backup boot device - which meant a partition table - and left the drive on SATA5 as a whole disk BTRFS device.

    So now the messages still appear but only for the drive on SATA5! This kind of proves my assumption the it's not a cable or the drives. The change was reformatting sda and adding a partition table. The other SSD that does not produce messages has been formatted with a partition table.

    So I think I've discovered a bug (or whatever) that appears in a very specific combination of events:

    SSD
    BTRFS whole drive format
    Kernel 5.19.0-91

    I'm still getting the messages even though the drive itself isn't even mounted, which seems to indicate it's not discard/trim or has anything to do with drive activity.

    I had already moves 163GB of data to the drive, but when I get the time I may move the data off, add a partition table to the drive, and see if the message goes away. Since the kernel and install are old (Ubuntu Server 20.04) I'm not going to bother reporting this. If it remains after I eventually upgrade to 24.04 later this year, then I'll report it.

    Please Read Me
Working...
X