Announcement

Collapse
No announcement yet.

zfs issues

Collapse
This topic is closed.
X
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    zfs issues

    Hi All,

    My first post. Wish it was for something more pleasant. I'm relatively new to kubuntu, but I've been using Linux and KDE since the RH5 days. I built a new system earlier this year and took a different path than what I've typically done in the past. I've always used a hardware RAID for my bulk storage, but I read about the advantages of using a software RAID like zfs, so I decided to give that a try. I installed kubuntu 19.10, setup a zfs RAID, and off I went.
    Things worked perfectly for a while, but then a few weeks back a batch of routine updates seemed to throw everything sideways. My RAID was no longer there. I tried a few things, got it working (so I thought), and life was good again, until the next reboot. RAID was gone again. Couldn't get it back. I saw the 20.04 release and decided to try that in hopes that it would fix things. Even though the install was fast and uneventful, it didn't help my RAID.

    I've poked around the interwebs looking for a solution, but nothing seems to fit my problem (or maybe I'm so new to zfs that I don't know how to connect the dots). Here are some of the things I've tried and the results.
    Code:
    [FONT=monospace]zpool status -x
    no pools available[/FONT]
    Code:
    [FONT=monospace]zpool list
    no pools available[/FONT]
    Code:
    [FONT=monospace]sudo zpool import -D
    no pools available to import[/FONT]
    Code:
    [FONT=monospace]sudo zpool create raid raidz2 /dev/sda /dev/sdc /dev/sdd /dev/sdf /dev/sdg
    invalid vdev specification
    use '-f' to override the following errors:
    /dev/sda1 is part of exported pool 'raid'
    /dev/sdc1 is part of exported pool 'raid'
    /dev/sdd1 is part of exported pool 'raid'
    /dev/sdf1 is part of exported pool 'raid'
    /dev/sdg1 is part of exported pool 'raid'[/FONT]
    I don't remember ever exporting the pool and I'm not confident enough to override those errors until I know more about what I'm doing. The physical drives and partitions all appear to be healthy. Anyone have any thoughts on what I should try next? Hoping not to destroy over 6TB of data. TIA.

    #2
    Have you tried
    sudo zpool import -d /dev raid
    assuming your pool name is "raid"?
    "A nation that is afraid to let its people judge the truth and falsehood in an open market is a nation that is afraid of its people.”
    – John F. Kennedy, February 26, 1962.

    Comment


      #3
      Originally posted by GreyGeek View Post
      Have you tried
      sudo zpool import -d /dev raid
      assuming your pool name is "raid"?
      Thanks! That appears to have done something at least. I can see the array now. Here's what zpool status is returning.
      Code:
      [FONT=monospace]zpool status
      pool: raid
      state: DEGRADED
      status: One or more devices is currently being resilvered. The pool will
            continue to function, possibly in a degraded state.
      action: Wait for the resilver to complete.
      scan: resilver in progress since Thu May 21 20:15:00 2020
            2.85T scanned at 13.3G/s, 77.3G issued at 362M/s, 7.48T total
            30.9G resilvered, 1.01% done, 0 days 05:57:34 to go
      config:
      
            NAME        STATE     READ WRITE CKSUM
            raid        DEGRADED     0     0     0
              raidz2-0  DEGRADED     0     0     0
                sda     DEGRADED     0     0     0  too many errors
                sdc     ONLINE       0     0     0  (resilvering)
                sdd     DEGRADED     0     0     0  too many errors
                sdf     DEGRADED     0     0     0  too many errors
                sdg     ONLINE       0     0     0  (resilvering)
      errors: List of errors unavailable: permission denied
      
      errors: 1 data errors, use '-v' for a list
      [/FONT]
      Those errors don't look promising. Not really a happy camper right now. I thought zfs was supposed to be rock solid. I'll wait for it to finish of course, but if this is the sort of stuff I'll be dealing with it will be kicked to the curb in short order. Any recommendations for a better RAID solution? Thanks again.

      Comment


        #4
        Any recommendations for a better RAID solution? Thanks again.
        He asks the author of the first post to the BTRFS subforum ...

        However, I'd recommend doing what the error msg said ...

        The pool will continue to function, possibly in a degraded state.
        action: Wait for the resilver to complete.
        ZFS is pretty solid.

        BTW, My first foray into Linux was RH5 on May 1, 1998.
        Been running Linux every since, and KDE since SuSE 5.3 in September of 1998.

        Almost 5 years ago I began using BTRFS.
        I've tried every configuration of it, and switched back and forth between configurations. I settled on a SINGLETON for my SSD sda and an archive for my SSD sdb and my HD sdc.
        Since 14.04 I've been running the LTS and installed BTRFS as the root filesystem on 16.04 LTS.

        I am currently running Kubuntu 20.04 with BTRFS as the root fs.
        Whether running 1, 2 or 3 drive raids I've never had a failure or even hiccup.
        I keep around 4 snapshots of @ and @home on my sda and do incrementals to sdb and sdc.
        On occasion I slap on a USB drive and send a snapshot set to it.
        I always do snapshots and sends manually because it is so easy.
        Never had a failure doing that either.

        https://btrfs.wiki.kernel.org/index.php/Status#RAID56
        Last edited by GreyGeek; Jun 19, 2020, 01:40 PM.
        "A nation that is afraid to let its people judge the truth and falsehood in an open market is a nation that is afraid of its people.”
        – John F. Kennedy, February 26, 1962.

        Comment


          #5
          SO. Me too is suddenly having issues with my zfs file system. Last week, I had to replace one of my hdd's with a spare, due to a lot of checksum errors, suddenly (as in when I powered it on). Today, I powered it on, and I got a bunch more, over several drives. I am currently waiting for the silver thing to finish. Q. Are there any zfs support forums, to ask this question, or is this forum OK with it? Q. ZFS vs BTRFS ? Is there much of a difference.

          PS. I just remembered, a month or so ago, one HDD showed problems, I swapped it, tested it, then replaced it back. Not one of the HDD's I currently have problems with. Could having this HDD out for a few days then replacing it back cause any problems, if it wasn't re-formatted ?

          Comment


            #6
            GreyGeek may be the only real ZFS expert around here. Search his posts for his findings and conclusions

            Most of us that foray into file systems other than EXT use BTRFS and that includes GreyGeek.

            Honestly, there's not much discussion about ZFS on this forum, but if you all solve your issues it would be helpful to post what you find out.

            Please Read Me

            Comment


              #7
              I must apologize that I never followed up with the resolution to my issue. I let the resilvering process run for over 6 days before I gave up. I had a good backup, so I just blew the whole thing away and started over. The one thing I did differently was instead of just creating folders on the root, I created zfs datasets. They look just like folders so I don't know what the significance is there other than maybe zfs manages them differently. I was still getting some errors even though everything was new and empty, so I ran the command "zpool clear raid" and it got happy. It's been rock solid ever since. (Fingers crossed that I didn't just jinx it.)

              Here are the steps that have worked for me when zfs starts acting up and the array doesn't appear.

              Code:
              sudo zpool import
              This should give you some basic info about the array. Next,
              Code:
              sudo zpool import -f raid
              This should mount the array. Check it to make sure with:
              Code:
              sudo zpool status
              If this comes back good then you should be gold.

              Comment


                #8
                I'm glad you jumped in with your ZFS experience, Neoneuro.
                BTW, I use Hard Disk Sentinel GUI to monitor the health of my drives.

                Far from being the resident ZFS expert I don't even qualify as a neophyte, since I only tested ZFS in a VM for a few weeks to see if it could match or beat BTRFS and make my backup life easier. In my own experience, since I do not run a server, I found it couldn't beat, and in many areas couldn't match, BTRFS. For those who are interested, ZFS depends on a large amount of RAM, especially Error Correcting RAM, but ECR isn't needed, just recommended. Most people running it on laptops don't use ECR RAM. I have only 16GB of RAM. Reading about the problem CharleiDaves is having suggested to me that he is having bad RAM problems. Also, ZFS has dozens of switches that the must/may have to set for his/her circumstance. BTRFS has only one, which turns the rw switch on a snapshot on or off. (The send command to send a BTRFS snapshot to another ROOT_FS only works if the snapshot is read only.) All other adjustments to BTRFS are automatic. Another area of difference is recovery using a snapshot. IF I have BTRFS snapshots A, B, C, D and E, where A is the oldest a E is the most recent, if I restore from, say, C, the other snapshots remain and are not harmed. If they were ZFS snapshots and I restored from C, then D and E would lost. But, ZFS is more complicated that just A through E snapshots. When I made a single snapshot of /home it created 11 different snapshots, as shown in this forum posting. They were confusing, and ten too many for me. Last fall I merged @home into @ so that I need to make only one snapshot per day. I keep a week of snapshots (7) on my main SSD and send each one to my archive SSD using the incremental switch ("-p") of the send command. The creation of the daily snapshot takes only an instant. Sending it to the archive SSD takes only about 15 seconds. I've been using BTRFS for about 6 years now, and it has not presented me with a single problem. I have written 15.89TB to my primary SSD and 919GB to my archive SSD.
                Last edited by GreyGeek; Feb 17, 2021, 07:55 PM.
                "A nation that is afraid to let its people judge the truth and falsehood in an open market is a nation that is afraid of its people.”
                – John F. Kennedy, February 26, 1962.

                Comment


                  #9
                  Originally posted by GreyGeek View Post
                  I'm glad you jumped in with your ZFS experience, Neoneuro.
                  "Hey what about me. It isn't fair"
                  Originally posted by GreyGeek View Post
                  BTW, I use Hard Disk Sentinel GUI to monitor the health of my drives.
                  Hmmm. Interesting program. I'll try smartmontools as it is a ubuntu recommended software
                  Originally posted by GreyGeek View Post
                  Far from being the resident ZFS expert I don't even qualify as a neophyte, since I only tested ZFS in a VM for a few weeks to see if it could match are beat BTRFS and make my backup life easier. In my own experience, since I do not run a server, I found it couldn't beat, and in many areas couldn't match, BTRFS. For those who are interested, ZFS depends on a large amount of RAM, especially Error Correcting RAM, but ECR isn't needed, just recommended. Most people running it on laptops don't use ECR RAM. I have only 16GB of RAM. Reading about the problem CharleiDaves is having suggested to me that he is having bad RAM problems.
                  I'm about to run UBCD and test memory.

                  FYI, I have 2 ZFS (actually 3, one is external) Never had a problem with the first older vdev pool, say for the occasional Drive offline due to not being avail when PC turned on. (I just power off, wiggle all the connectors, switch on) If I have RAM issues, it would be affecting both vdev pools. If you GreyGeek are not as experted as advertised, do you know of any other zfs support forums? I only ask as when I search and selected they appeared to be for other things where zfs exists, like macos, open NAS, etc. I was hoping for zfs regardless of where used. I have asked a support question in openNAS once before. Still, 5 years later waiting for a reply
                  Originally posted by GreyGeek View Post
                  Also, ZFS has dozens of switches that the must/may have to set for his/her circumstance. BTRFS has only one, which turns the rw switch on a snapshot on or off. (The send command to send a BTRFS snapshot to another ROOT_FS only works if the snapshot is read only.) All other adjustments to BTRFS are automatic. Another area of difference is recovery using a snapshot.

                  IF I have BTRFS snapshots A, B, C, D and E, where A is the oldest a E is the most recent, if I restore from, say, C, the other snapshots remain and are not harmed. If they were ZFS snapshots and I restored from C, then D and E would lost. But, ZFS is more complicated that just A through E snapshots. When I made a single snapshot of /home it created 11 different snapshots, as shown in this forum posting. They were confusing, and ten too many for me. Last fall I merged @home into @ so that I need to make only one snapshot per day. I keep a week of snapshots (7) on my main SSD and send each one to my archive SSD using the incremental switch ("-p") of the send command. The creation of the daily snapshot takes only an instant. Sending it to the archive SSD takes only about 15 seconds. I've been using BTRFS for about 6 years now, and it has not presented me with a single problem. I have written 15.89TB to my primary SSD and 919GB to my archive SSD.
                  I could copy and paste a section of the TAX code/laws if you like.

                  Would make just as much sense as this did.

                  Comment


                    #10
                    Oh Shhhh... Sugar. I just found out, that 4 / 5 HDD's in this failing RAIDZ are normal NAS 5900rpm, and the replacement is a NAS Pro 7200RPM. Is this a problem? I remember reading yonks ago when I first looked into ZFS, that it (zfs) couldn't care less about the drives. It just uses what VDEVS you allocate as one device, and allocates parity across all the devices. Other than some Hardware RAID, that actually allocate an actual HDD as a parity. This was one selling point that I liked. The next hard part was trying to get it to work within Linux correctly. FYI - When creating a zpool, add the VDEVS (Or Disks) as /dev/disk/by-id.

                    do NOT use /dev/sda1 or /dev/sda... If you connect a new disk, or external usb, this may upset the order of the disks, and your zpool will fail. Using disk-by-id = where ever the disk is mounted, it will be found and used correctly.

                    Comment


                      #11
                      Originally posted by CharlieDaves View Post
                      "Hey what about me. It isn't fair" Hmmm. Interesting program. I'll try smartmontools as it is a ubuntu recommended software I'm about to run UBCD and test memory.
                      In order to use HD Sentinel GUI you must have smartmontools installed because that is where the GUI gets its data. When I was using smartmontools alone (CLI) I had to compute the TB Written and calculate the estimated life remaining in the drive. The GUI does that automatically.

                      Originally posted by CharlieDaves View Post
                      FYI, I have 2 ZFS (actually 3, one is external) Never had a problem with the first older vdev pool, say for the occasional Drive offline due to not being avail when PC turned on. (I just power off, wiggle all the connectors, switch on) If I have RAM issues, it would be affecting both vdev pools. If you GreyGeek are not as experted as advertised, do you know of any other zfs support forums? I only ask as when I search and selected they appeared to be for other things where zfs exists, like macos, open NAS, etc. I was hoping for zfs regardless of where used. I have asked a support question in openNAS once before. Still, 5 years later waiting for a reply I could copy and paste a section of the TAX code/laws if you like.

                      Would make just as much sense as this did.
                      I said I wasn't even a neophyte. What more proof do you need?
                      I know of one ZFS forum that Ubuntu supports specifically for ZFS: https://ubuntuforums.org/tags.php?tag=zfs

                      Originally posted by CharlieDaves View Post
                      Oh Shhhh... Sugar. I just found out, that 4 / 5 HDD's in this failing RAIDZ are normal NAS 5900rpm, and the replacement is a NAS Pro 7200RPM. Is this a problem? I remember reading yonks ago when I first looked into ZFS, that it (zfs) couldn't care less about the drives. It just uses what VDEVS you allocate as one device, and allocates parity across all the devices. Other than some Hardware RAID, that actually allocate an actual HDD as a parity. This was one selling point that I liked. The next hard part was trying to get it to work within Linux correctly. FYI - When creating a zpool, add the VDEVS (Or Disks) as /dev/disk/by-id.
                      Seems to be a difference of opinion on mixing rust with SSD's.
                      https://jrs-s.net/2018/07/14/zfs-doe...isks-and-ssds/



                      Originally posted by CharlieDaves View Post
                      do NOT use /dev/sda1 or /dev/sda... If you connect a new disk, or external usb, this may upset the order of the disks, and your zpool will fail. Using disk-by-id = where ever the disk is mounted, it will be found and used correctly.
                      FOR SURE!
                      Never Never Never use "/dev/sdX" to mount a drive, either manually or in fstab. My preferred approach is to use the blkid command to find the "UUID" of the desired drive
                      Code:
                      [FONT=monospace][COLOR=#000000]:~# [B]blkid[/B] [/COLOR]
                      /dev/sda1: [B]UUID[/B]="[B]ce2b5741-c01e-4b3d-b6ba-401ad7f7fcdf[/B]" UUID_SUB="e4e0902f-6a80-47cd-a53a-571632f78cc5" TYPE="
                      btrfs" PARTUUID="e00dfb49-01" 
                      /dev/sdc1: UUID="e84e2cdf-d635-41c5-9f6f-1d0235322f48" UUID_SUB="c78731d5-d423-4546-9335-f9751c148174" TYPE="
                      btrfs" PARTUUID="dc864468-01" 
                      /dev/sdb1: LABEL="sdb1" UUID="17f4fe91-5cbc-46f6-9577-10aa173ac5f6" UUID_SUB="4d5f96d5-c6c6-4183-814b-8811816
                      0b615" TYPE="btrfs" PARTUUID="5fa5762c-9d66-4fdf-ba8f-5c699763e636"
                      [/FONT]
                      and use the UUID part, and mount it thusly:
                      Code:
                      [FONT=monospace][COLOR=#000000]mount /dev/disk/by-uuid/ce2b5741-c01e-4b3d-b6ba-401ad7f7fcdf /mnt
                      [/COLOR][/FONT]

                      When I first set up my BTRFS I used two spinners and mounted them as /dev/sda1 and /dev/sdb1 to make a raid. Later, I replace my CDROM with an hdcaddy and plugged in as my 3rd drive. When I looked for it, expecting to find /dev/sdc, I found instead that my raid was now composed of /dev/sda1 and /dev/sdc1 and the new drive was /dev/sdb.

                      Since then I've never used /dev/sdX to mount a drive of any type.
                      Last edited by GreyGeek; Feb 17, 2021, 08:38 PM.
                      "A nation that is afraid to let its people judge the truth and falsehood in an open market is a nation that is afraid of its people.”
                      – John F. Kennedy, February 26, 1962.

                      Comment


                        #12
                        As John McEnroe stated, several times. "You have got to be F***ing kidding me" Trying to copy off data before ZFS Raid gives up the ghost. Evertime I Power on my PC, I do a "zpool status" just to check. For this conversation, it indicates the 2nd drive in vdev as error, with 29reads, 13writes, 1chksum. Just now. All is fine. Resilivering in progress. 20minutes later, Resilvering completed, and still all Drives in vdev are OK. This is doing my Nut in. This is really not fun. YEP.

                        Read your article on Mixing Rust. But Honestly, I didn't realise I had screwed up. I just ordered a spare drive months ago (just encase) and last week I swapped it over. Today I realised it was the wrong RPM. Until data is safe else where, it will have to do. YEP.
                        Originally posted by GreyGeek View Post
                        I said I wasn't even a neophyte. What more proof do you need?
                        Just looked it up. Newbie or script kiddie. YEP.

                        Read some information on BTRFS. Still don't understand it, except that some people prefer it over zfs. ZFS is now open source, and according to the 'other' people (Those who don't like BTRFS) they like it. Final question. What exactly is this setting or switch that you prefer that ZFS doesn't do? Please? Just so I know.

                        Comment


                          #13
                          Originally posted by CharlieDaves View Post
                          Read some information on BTRFS. Still don't understand it, except that some people prefer it over zfs.
                          Curious as to what it is about BTRFS you don't understand? From a ZFS outsiders viewpoint it seems maybe it's so dang simple that maybe you just think you've missed something? I can't really comment on ZFS as I've never had one reason giving to me why it's better than BTRFS and to be fair I've never bothered to spend much time looking. Posts like yours and the obvious complexity of ZFS have kept me well clear of it. I actually USE my computers so the what seems like hours of maintenance and "fixing" things turned me off of ZFS for good long ago. Good to hear that's it open source if that's actually so. Last post I read from Linus about it said "no way" is ZFS going in the kernel but that was a year ago.

                          I have been using BTRFS since tools v0.19 which is 2009. IIRC it was available in *buntu 09.04, and I have been using it constantly since then. GG probably not too far behind me. We drummed up enough interest in it here to start a subforum dedicated to it. Over the last 4-5 years many members have started using it or at least playing with it. Still, there are some who will stick to the dinosaur EXT until it goes the way of ReiserFS (actually, ResierFS was better than EXT will ever be). Regardless, good to have more expertise on ZFS on the forum.

                          Please Read Me

                          Comment


                            #14
                            Originally posted by Neoneuro View Post
                            I let the resilvering process run for over 6 days before I gave up.
                            Most of this thread is way over my head (except that I'm one of the btrfs converts) but...

                            Do you know about SMR (shingled magnetic recording) drives and ZFS?

                            Resilvering for days sounds exactly like the reported problems. Some drives were sold without making it plain they were SMR, and they cause grief with ZFS. If you have got any SMR drives, you may be able to get them replaced.
                            Regards, John Little

                            Comment


                              #15
                              Originally posted by CharlieDaves View Post
                              ....
                              Read your article on Mixing Rust. But Honestly, I didn't realise I had screwed up. I just ordered a spare drive months ago (just encase) and last week I swapped it over. Today I realised it was the wrong RPM. Until data is safe else where, it will have to do. YEP. Just looked it up. Newbie or script kiddie. YEP.
                              Actually, I programmed since I took "Programming" from the Barns School of Business in 1959. There I learned how to operate the IBM 540 Gangpunch, plug the breadboard on the IBM 402 Tabulator, which used those Hollerith cards. Coming soon was the IBM 700 series transistorized computer, IIRC. After I graduated I was 18 and looked 14 and no one offer me a job "programming". An opportunity came to attend a JR college in Nebraska and I took it. Later, in Texas, while at grad school earning an MS in Science I learned FORTAN IV, entering physics and math programs using a KSR-133 keyboard that punched holes in a yellow tape which was then rolled up and sent to the local bank to run on their Honeywell B200 mainframe, also IIRC. Ten years later, in Sept of 1978, I bought the first Apple II sold in the state of Nebraska. By 1980 I had quit teaching and started my own Computer Consulting business, mainly writing accounting software, and using my Physics and Math doing forensics for local law enforcement. One of my clients, a college, asked me to computerize their school, then asked me to teach computer programming classes. I agreed as long as I could continue with business. My last client, a state finance dept gave me an offer my wife wouldn't let me refuse, so I finished my programming career spending 11 years there, and enjoyed every minute. From 1959 to 2008 I learned and used perhaps a dozen programming languages. Maybe more. My favorite was Forth. I retired in 2008 and haven't written much software since then, but I'm at 79 I'm certainly not a kiddie!

                              Wrong RPM? Well, Life is notorious for giving the test first, and then the lesson. I've taken many lessons from it.

                              Originally posted by CharlieDaves View Post
                              Read some information on BTRFS. Still don't understand it, except that some people prefer it over zfs. ZFS is now open source, and according to the 'other' people (Those who don't like BTRFS) they like it. Final question. What exactly is this setting or switch that you prefer that ZFS doesn't do? Please? Just so I know.
                              IF you have the right hardware then ZFS is a fine file system. IMO, it's just too complicated for a personal laptop. It also has too many settings and folks who go exploring without understanding what those settings do are striking sparks around gun powder. As I pointed out before, when I was trying ZFS and took a snapshot of my home account I was stunned to find it had created 11 different snapshots. Why so complicated? I don't know, and I decided I didn't want to find out. I don't have enough gray cells left to run ZFS.

                              BTRFS, OTOH, automatically tunes itself. The only human adjustable setting is the one which can change the rw status of a snapshot. The send command can only send RO snapshots to remote subvolumes, or non-BTRFS filesystems if the "-F" parameter is used. If one makes all their snapshots RO then before they mv a @YYYYMMDD snapshot to @ they have to switch it to RW. RO snapshots are fixed. They freeze that moment in time in the FS when they are taken. Making a snapshot without the "-r" parameter creates a RW snapshot that can be changed by COW while you are using your system. I always keep one around to recover changes made between RO snapshots, but all of my archival snapshots are RO.

                              As far as stability is concerned, I have been using BTRFS for over 6 years and have not experienced a single problem with it, even while trying various RAIDS and a Singleton, which I am running now.

                              So, bottom line, Linux is blessed with three fine file filesystems: BTRFS, ZFS and EXT4. Pick your poison.
                              Last edited by GreyGeek; Feb 18, 2021, 05:56 PM.
                              "A nation that is afraid to let its people judge the truth and falsehood in an open market is a nation that is afraid of its people.”
                              – John F. Kennedy, February 26, 1962.

                              Comment

                              Working...
                              X