Announcement

Collapse
No announcement yet.

Dummies' guide to BTRFS incremential backups

Collapse
This topic is closed.
X
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    Dummies' guide to BTRFS incremential backups

    I've been trying to understand some of the methods for doing incremental backups with btrfs but I'm not that good of a bash shell programmer. I have managed to create some scripts that work for me and thought I'd document them here in case they might be helpful for others.

    My goal was to run daily snapshots and then create a backup from those snapshots so I can recover from a disaster. I also manually create snapshots and backups before major upgrade events in case I need to roll back those changes, but that is independent of my daily stuff.

    Assumptions:
    1. In my /etc/fstab I have an entry to mount my root (/) at /subvol, and an entry for my external drive to mount at /data.
    2. The external drive is formatted as btrfs
    3. My snapshots are stored in /subvol/snapshots.
    4. My backups are stored in /data/snapshots.

    To get the whole thing started I needed an initial script to create the first snapshot and backup.
    create-initial-snapshot-backup.sh
    Code:
    #!/bin/bash
    #
    if [[ $EUID -ne 0 ]]; then
       echo -e  "must be run as root. use sudo"
       exit 1
    fi
    
    btrfs su sn -r /subvol/@ /subvol/snapshots/@_backup
    btrfs su sn -r /subvol/@home /subvol/snapshots/@home_backup
    sync
    
    btrfs send /subvol/snapshots/@_backup | btrfs receive /data/snapshots
    btrfs send /subvol/snapshots/@home_backup | btrfs receive /data/snapshots
    Next I needed a script to do the daily incremental stuff
    incremental-snapshot-backup.sh
    Code:
    #!/bin/bash
    #
    exec 1> >(logger -s -t $(basename $0)) 2>&1 # Log the script activity
    
    if [[ $EUID -ne 0 ]]; then
       echo -e  "must be run as root. use sudo"
       exit 1
    fi
    
    btrfs su sn -r /subvol/@ /subvol/snapshots/@_backup-new
    sync
    btrfs su sn -r /subvol/@home /subvol/snapshots/@home_backup-new
    sync
    
    btrfs send -p /subvol/snapshots/@_backup /subvol/snapshots/@_backup-new |
    btrfs receive /data/snapshots
    sync
    btrfs send -p /subvol/snapshots/@home_backup /subvol/snapshots/@home_backu
    p-new | btrfs receive /data/snapshots
    sync
    
    btrfs su de /subvol/snapshots/@_backup
    btrfs su de /subvol/snapshots/@home_backup
    
    mv /subvol/snapshots/@_backup-new /subvol/snapshots/@_backup
    mv /subvol/snapshots/@home_backup-new /subvol/snapshots/@home_backup
    
    btrfs su de /data/snapshots/@_backup
    btrfs su de /data/snapshots/@home_backup
    mv /data/snapshots/@_backup-new /data/snapshots/@_backup
    mv /data/snapshots/@home_backup-new /data/snapshots/@home_backup
    For my special snapshot-backups before major events I use the following script.
    create-date-tagged-snapshot-backup.sh

    Code:
    #!/bin/bash
    #
    if [[ $EUID -ne 0 ]]; then
       echo -e  "must be run as root. use sudo"
       exit 1
    fi
    
    snapshot_name=$(date +%Y-%m-%d)
    
    btrfs su sn -r /subvol/@ /subvol/snapshots/@_$snapshot_name
    sync
    btrfs su sn -r /subvol/@home /subvol/snapshots/@home_$snapshot_name
    sync
    
    btrfs send /subvol/snapshots/@_$snapshot_name | btrfs receive /data/snapshots
    btrfs send /subvol/snapshots/@home_$snapshot_name | btrfs receive /data/snapshots
    To run the daily script I use a systemd timer and service. So in /etc/systemd/system I have the following files:

    kubuntu-btrfs-snapshot.service
    Code:
    # This service unit is for daily btrfs-snapshots
    #
    [Unit]
    Description=Daily btrfs snapshots
    
    [Service]
    Type=oneshot
    ExecStart=/usr/local/bin/incremental-snapshot-backup.sh
    kubuntu-btrfs-snapshot.timer
    Code:
    # This timer unit is for kubuntu-btrfs-snapshot.service
    #
    [Unit]
    Description=start kubuntu-btrfs-snapshot.service
    
    [Timer]
    Unit=kubuntu-btrfs-snapshot.service
    OnCalendar=*-*-* 06:00:00
    AccuracySec=5minutes
    RandomizedDelaySec=10minutes
    Persistent=true
    
    [Install]
    WantedBy=timers.target
    Once those files are created you just need to enable the timer.
    Code:
    sudo systemctl enable --now kubuntu-btrfs-snapshot.timer
    Last edited by jfabernathy; Feb 14, 2022, 09:33 AM.

    #2
    Clear and lucid! Nice job, especially the use of systemd units to automate the process.
    One question: how are you limiting the dated snapshots so that they don't accumulate and eat up all the disk space?
    "A nation that is afraid to let its people judge the truth and falsehood in an open market is a nation that is afraid of its people.”
    – John F. Kennedy, February 26, 1962.

    Comment


      #3
      Originally posted by GreyGeek View Post
      Clear and lucid! Nice job, especially the use of systemd units to automate the process.
      One question: how are you limiting the dated snapshots so that they don't accumulate and eat up all the disk space?
      The manual use of the dated backups is going to have to be handled as needed. I figure that I do one prior to a major upgrade and watch the system for a week and then delete it. Right now my boot drive is 500GB with @ and @home only having about 12GB. The daily stuff is replaced daily so no build up there. My external backup is 4TB so I think it's manageable for now.

      And of course all the important files needed to rebuild a system from scratch are on my NAS and also in the Cloud.


      Comment


        #4
        Originally posted by jfabernathy View Post
        Right now my boot drive is 500GB with @ and @home only having about 12GB. The daily stuff is replaced daily so no build up there... My external backup is 4TB...
        You delete the previous backup snapshots both on the source and the backup volumes. Unless you have a very active system, I suggest there's no need to, other than the naming issue.

        Perhaps you have another mechanism for regular snapshots.

        Being able to look back at the state of files into the past just by navigating a few directories is a boon I didn't appreciate till I'd resorted to it a few times. A typical use case is a file that has been damaged in some way, usually by inadvertent deletions.

        And, on the backup volume, you are giving up a major advantage of using btrfs incremental backups. My backup hard drive has weekly (ish) backups going back a year, and monthly back two years. Because the snapshots are incremental on the backup volume too, data that haven't changed only take up space once, just as they do on the source side.

        Now, every so often, a few times a year, I have to check that source or backup volumes are not getting too full. It helps that I've identified data that are subject to a lot of churn and don't need to be in backups, and arrange for them to be in other subvolumes. The main offenders are browsers' caches, and big downloads.
        Regards, John Little

        Comment


          #5
          Originally posted by jlittle View Post
          You delete the previous backup snapshots both on the source and the backup volumes. Unless you have a very active system, I suggest there's no need to, other than the naming issue.

          Perhaps you have another mechanism for regular snapshots.

          Being able to look back at the state of files into the past just by navigating a few directories is a boon I didn't appreciate till I'd resorted to it a few times. A typical use case is a file that has been damaged in some way, usually by inadvertent deletions.

          And, on the backup volume, you are giving up a major advantage of using btrfs incremental backups. My backup hard drive has weekly (ish) backups going back a year, and monthly back two years. Because the snapshots are incremental on the backup volume too, data that haven't changed only take up space once, just as they do on the source side.

          Now, every so often, a few times a year, I have to check that source or backup volumes are not getting too full. It helps that I've identified data that are subject to a lot of churn and don't need to be in backups, and arrange for them to be in other subvolumes. The main offenders are browsers' caches, and big downloads.
          There are 2 reasons I don't save more snapshots on either side.

          1. is that once this system is in full operation, I will not log into it but one a month or so. It's primary function is to be the NAS of the home network and a media server and MythTV DVR for the home. The NAS and Media data is on my mirror and is not snapshoted, but sent to the cloud. The boot drive rarely needs new software, but does get the usual system updates/upgrade.

          2. is I don't fully understand the snapshots or the backups.

          I think that snapshots are simpler to understand. My thinking is that they are "snapshot" in time of the system storage. In the case of weekly, you can look back to the condition of your system at weekly points in time when the snapshot was taken and deleting one has no effect on the more recent snapshots or the current state of your system. At least I hope it works this way.

          My concern is mostly with the incremental backups created with send/receive. I'm not sure how to save the incremental backups for some period of time. I get the initial backup is complete and the incremental backups are just the changes, but somehow I can delete the old backup and"mv" the newest to old status and it supposed to make things right. I don't get that part. I just copied my backup code from a wiki. Maybe that's why people use snapper to avoid thinking about all this.

          I'm also concerned that if I did understand it I could not translate that into a script that would work the way I want.

          Comment


            #6
            Your rationale makes sense to me.

            I don't get that part.
            A snapshot is a subvolume just like the original, of equal status so to speak. It's mostly just another pointer to the same place in the data tree. When data are updated in one subvolume, that part of the tree gets split a little, with a new branch just enough to have the new data. A send traverses the tree finding where there's these splits and sends just them to the receiving side to replicate the branching there.
            Regards, John Little

            Comment


              #7
              Originally posted by jlittle View Post
              Your rationale makes sense to me.



              A snapshot is a subvolume just like the original, of equal status so to speak. It's mostly just another pointer to the same place in the data tree. When data are updated in one subvolume, that part of the tree gets split a little, with a new branch just enough to have the new data. A send traverses the tree finding where there's these splits and sends just them to the receiving side to replicate the branching there.
              So to test the incremental backup, I created a R/O snapshot of @ and @home called 20220215@ and 20220215@home. As normal this is instantaneous. Then I did a send/receive backup; 20220215@ took several minutes and 20220215@home took 10 seconds.

              Next I created a test file in /home and in / so there would be differences in the system since the last snapshot. I took a new pair of snapshots call 20220215-1@ and 20220215-1@home. This time I did an incremental backup.
              Code:
              btrfs send -p /subvol/snapshots/20220215@ /subvol/snapshots/20220215-1@ | btrfs receive /data/snapshots/
              btrfs send -p /subvol/snapshots/20220215@home /subvol/snapshots/20220215-1@home | btrfs receive /data/snapshots/
              The send/receive was much faster as expected this time. But the puzzling thing is both the first and second snapshots were full size. I was expecting the second (incremental) snapshot to be just the differences?
              Code:
              du -h -d 1 /data/snapshots/
              7.8G /data/snapshots/@_backup
              514M /data/snapshots/@home_backup
              7.8G /data/snapshots/20220215@
              514M /data/snapshots/20220215@home
              7.8G /data/snapshots/20220215-1@
              514M /data/snapshots/20220215-1@home
              25G /data/snapshots/
              So from where I sit, the incremental backup is about the time needed to create the backup being reduced, but not the storage needs. In my case 8GB per backup pair is not too much for a dozen or so saved but I don't see the saving in storage space.
              Last edited by jfabernathy; Feb 15, 2022, 07:18 PM.

              Comment


                #8
                Disk usage with btrfs is a murky art, and the standard linux du does not know anything about btrfs subvolumes sharing data, and counts the same space multiple times, so your listing gives little indication of "the storage needs". The figure for /data/snapshots is, if not bogus, very misleading.

                For example, given a subvolume with 1 GB, then snapshot it 10 times. du will say there's 10 GB, but there's only 1 GB, plus some extra metadata. Delete the first subvolume, and there's still 1 GB used.

                And it's not just sharing between subvolumes; one can cp --reflink files within a subvolume, taking up only some metadata space, but du will count it twice. In principle, btrfs supports deduplication, which would be brilliant but make determining usage even murkier.

                I've heard it argued that this mess is implicit with COW file systems.

                There are tools that can give a clearer picture, but I haven't used any of them. btrfs-snapshot-diff appears to give a fair idea where space has gone, assuming an ordering of subvolumes; whether the ordering given is always useful I've no idea.
                Regards, John Little

                Comment


                  #9
                  Thanks, it kind of makes sense that Linux has to think BTRFS snapshots are complete copies of a directory so all Linux commands can operate on the directory just like in EXT4, etc. All the COW stuff is behind the scenes. I installed the btrfs-snapshot-diff tools and it shows more of what I expected to see. So it is a saving of space on the drive, you just don't see it with normal disk tools.

                  Comment


                    #10
                    The btrfs incremental backup requires that a copy of the "parent" snapshot be on the destination subvol. The send command compares the parent with the current snapshot and just sends the differences. The receive command looks for the full copy of the parent and creates a new snapshot using the current snapshot name, which it then adds the differences to.

                    A btrfs send & receive command, parent or not, results in a FULL copy of the current snapshot on the destination subvolume.

                    I recently added a 1 tb NVMe PCLe M.2 2880 Samsung EVO 960 SSD to my system and divided it into two partitions: a 500Gb partition labeled "BACKUP" and the rest labeled "DATA". My install partition, /dev/sda3, shows
                    $ sudo btrfs filesystem usage /
                    ...
                    Overall:
                    Device size: 441.04GiB
                    Device allocated: 143.05GiB
                    Device unallocated: 297.99GiB
                    Device missing: 0.00B
                    Used: 138.24GiB
                    Free (estimated): 301.15GiB (min: 301.15GiB)
                    Data ratio: 1.00
                    Metadata ratio: 1.00
                    Global reserve: 218.52MiB (used: 0.00B)

                    Data,single: Size:140.01GiB, Used:136.85GiB (97.74%)
                    /dev/sda3 140.01GiB
                    Metadata,single: Size:3.01GiB, Used:1.39GiB (46.31%)
                    /dev/sda3 3.01GiB
                    System,single: Size:32.00MiB, Used:48.00KiB (0.15%)
                    /dev/sda3 32.00MiB
                    Unallocated:
                    /dev/sda3 297.99GiB
                    Here is what is on /dev/sda3
                    # vdir /mnt/snapshots/
                    total 0
                    drwxr-xr-x 1 root root 346 Feb 8 15:59 @202202101404
                    drwxr-xr-x 1 root root 334 Feb 10 16:28 @202202112359
                    drwxr-xr-x 1 root root 334 Feb 10 16:28 @202202121903
                    drwxr-xr-x 1 root root 334 Feb 10 16:28 @202202132029
                    drwxr-xr-x 1 root root 334 Feb 10 16:28 @202202141813
                    drwxr-xr-x 1 root root 334 Feb 10 16:28 @202202152053
                    As you can see, I have six -r snapshots. Here is what is on my BACKUP drive:
                    # vdir /backup
                    total 0
                    drwxr-xr-x 1 root root 334 Feb 12 00:01 @202202112359
                    drwxr-xr-x 1 root root 334 Feb 12 20:15 @202202121903
                    drwxr-xr-x 1 root root 334 Feb 13 20:31 @202202132029
                    drwxr-xr-x 1 root root 334 Feb 14 18:43 @202202141813
                    drwxr-xr-x 1 root root 334 Feb 15 20:54 @202202152053
                    Both my primary and my backup drive are 500Gb. On my primary I have the complete KDE Neon User Edition installed, plus a ton of python and Jupyter development software, Minecraft, Universe Sandbox^2, Stellarium, and a ton of other software in addition to SIX snapshots. It still has 297 Gb of unallocated disk space. IF I try to send a 6th snapshot to my BACKUP without first deleting the oldest, the process will fail with an "out of space" error msg. The reason is simple. The snapshots on my primary are not fully populated, and the oldest is usually deleted (I keep no more than 6) after I add the 7th. However, the send & receive commands result in a fully populated copy of the snapshot being sent. Using send without -p takes about 25 minutes. Using -p usually takes less than a minute.
                    (Of course, you'll notice that I have combined @home with @ so that I have only one subvolume, @, which is all I need to snapshot.)

                    So, as you've written, the savings of space isn't seen with the normal file management tools.
                    Last edited by GreyGeek; Feb 16, 2022, 03:07 PM.
                    "A nation that is afraid to let its people judge the truth and falsehood in an open market is a nation that is afraid of its people.”
                    – John F. Kennedy, February 26, 1962.

                    Comment

                    Working...
                    X