Announcement

Collapse
No announcement yet.

Discussion - BTRFS backups, full vs. incremental

Collapse
This topic is closed.
X
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    Discussion - BTRFS backups, full vs. incremental

    I've been party recently to a few short discussions about using "Incremental" backups with BTRFS vs. a "Full" backup. I thought it would be good to have a discussion about the differences and maybe weigh in on the pluses/minuses.

    I'm a big fan of using Incremental backups. Why?

    Advantages
    • Saves time - in some cases a considerable amount.
    • Can act a "rollback" or secondary backup to prevent unintended file loss.

    Disadvanatages
    • Not as straight forward - you have to keep your mind wrapped around what you're doing.
    • Takes a few more commands (assuming you're using the command line to do this rather than a script).
    • May take up more space on your source file system than a simple or "Full" backup.


    Initially, you must make a full backup of your subvolume. But from there you need not. You can send a partial backup of only the differences from the past backup to the current state. This is where the extra space may be consumed. In order for a partial backup to be calculated, there must be something to compare it to - specifically the previous backup. In order to use an incremental backup you must keep at least one previous backup snapshot.

    For this discussion, I will refer to @home as the source subvolume (the one being backed up). All the backups will be named @home_backup along with a number so we can keep them straight. All snapshots/backups will be read-only as this is required for send|receive so I will leave out reference to read-only to save some words. Root access is required for this and I typically use "sudo -i" to begin a root session prior to making backups. I will also abbreviate the btrfs commands as allowed by the command line.

    Here's the process for incremental backups;
    Take a snapshot of @home as @home_backup1:
    btrfs su sn -r @home @home_backup1

    Send this snapshot to the backup file system;
    btrfs send @home_backup1 | btrfs receive /mnt/backup/

    Now you have a full backup of your @home subvolume. Next week I want to make a new backup so I take another snapshot;
    btrfs su sn -r @home @home_backup2

    But rather than sending the entire subvolume again, I will only send the difference;
    btrfs send -p @home_backup1 @home_backup2 | btrfs receive /mnt/backup/
    The "-p" switch here means "parent" as is @home_backup1 is the parent of @home_backup2

    I now have my @home subvolume and 2 backup snapshots on my main filesystem;
    @home @home_backup1 @home_backup2

    and two subvolumes in my backup file system;
    @home_backup1 @home_backup2

    Here's the part you have to understand and keep your mind around:
    The only unique subvolume is @home on the main file system (assuming any changes were made after sending @home_backup2). This will always be true.

    @home_backup1 contains the data when the initial backup snapshot was made. @home_backup2 contains only the changes to @home that occurred during the following week - the differences from the time @home_backup1 was made to the time @home_backup2 was made. This is the same for both the main file system and the backup file system.

    So looking at just @home_backup1 and @home_backup2 - remember, snapshots share file data where it overlaps. In other words, any files that are unchanged from the first snapshot to the second do not make the second snapshot larger. This explains why incremental backups are faster. Only changes are sent.

    Once the send|receive operation is complete, you are free to delete the initial backup snapshots but you must retain both the latest snapshots on the main and (obviously) on the backup file systems. The reason is so that you may continue to send only a small portion of data each week. In other words, once @home_backup2 is received as a backup, you may delete @home_backup1 in both locations. Why? Because when you delete the "parent" snapshot, any shared file data in the "child" snapshot remains in place. Only the changes from @home_backup1 to @home_backup2 are actually deleted.

    Putting numbers and times to the operation to make the concept clearer (these are totally made-up as actual times and sizes will vary wildly);
    Full backup:
    My @home subvolume of 60GB takes two hours to transmit.
    If I do a full backup every week, it takes two hours every week to send the backup.

    Incremental backup:
    If the changes to @home are about 1GB every week - an incremental backup takes only 1/60th or 2 minutes each week to complete.
    The additional commands to delete the previous backups take only a few seconds.

    So you can see there may be a tremendous time savings. The downside? If you must keep the previous snapshot on your source file system to retain the capability to use incremental backups, there will be some additional space used by this extra subvolume. Exactly how much will depend on the changes made to the original subvolume and how long you go between backups. The great thing is: By simply retaining one previous backup snapshot, you can have a complete and constant backup without spending the time required to send all the data every time.

    If you opt to keep several backups rather than delete all the previous backups, you retain a "rollback" ability to a specific week or to restore a file deleted some period ago. Using my example of a weekly backup, if you retain 5 backups you can go back a full month to recover an accidental deletion. Obviously, a long interval would usually mean larger backups so retaining a year's worth of backups might get cumbersome.

    So should you use full or incremental backups? The choice depends on your backup strategy, the time interval between backups, the backup device (installed in your computer or external), and the type of data you're backing up. I use different strategies depending on my use. For example, I don't often add music or videos to my permanent collections so a bi-monthly backup without retaining any rollback is sufficient. My work documents folder undergoes changes almost daily so a month's worth of incremental backups protects me from accidental deletions.

    Please Read Me

    #2
    Excellent post. As a reference, my @ and @home total 115GB. It takes about 10 minutes to send @20180902 and 15 minutes to send @home20180902 to my backup HD. The @20180904 incremental took 6 seconds. The @home20180904 incremental took 12 seconds! My next incremental will be

    btrfs send -p /mnt/snapshots/@20180904 /mnt/snapshots/@201809DD | btrfs receive /backup
    btrfs send -p /mnt/snapshots/@home20180904 /mnt/snapshots/@home201809DD | btrfs receive /backup

    On /backup will be, in addition to previous backups, @201809DD and @home201809DD for what ever DD day I chose to do my next set of snapshots.

    For a post earlier today I pulled out code from a Qt app I developed in the early 2000's. It was a @home backup of my Neon installation made a year ago. It was on a 320GB USB passport drive. I mounted it and browsed the @home20171011 subvolume with Dolphin and used Kate to pull out some code and past it into a post. Since I did not install that "work" directory on this Kubuntu Bionic installation the contents of that directory exist only in that snapshot and on some CDs stored in the garage.
    "A nation that is afraid to let its people judge the truth and falsehood in an open market is a nation that is afraid of its people.”
    – John F. Kennedy, February 26, 1962.

    Comment


      #3
      What is a an incremental snapshot based on? Storage block level or file level? Means, if one byte changes in a large file, does that lead to the storage block (whatever size it is) to go into the snapshot or the whole file?
      A practical example would be my 70GB VirtualBox image which I use frequently for some Powershell work. Would that lead to 70 GB snapshots over and over again or just the few blocks which got changed in the virtual disk?
      Last edited by Thomas00; Sep 05, 2018, 11:14 AM. Reason: Typo

      Comment


        #4
        To be clear, in incremental snapshot requires that a base, or beginning, snapshot exists on both the system and the archival server which is the target of the send command. So, to begin a process of making incremental backups one must FIRST create a snapshot, say @home20180905, and send it to a mounted archival storage, say /backup. Depending on the size of your @home subvolume this can take 15 to 60 minutes or more. Now, both your system and /backup have an identical copy of a snapshot of @home.

        Tomorrow, at the end of the day, you decide to create an incremental backup. First you snapshot @home, creating @home20180906, but you DO NOT send it to /backup.
        After mounting /backup you use the following command:

        btrfs send -p /mnt/snapshots/@home20180905 /mnt/snapshots/@home20180906 | btrfs receive /backup

        The receive command sees the "parent" subvolume referenced in the command as @home20180905 and uses ITS copy on /backup to make @home20180906. It then uses the information in the send stream to convert /backup/@home20180906 into an exact copy of @home20180906 that exists on the system. However, since only the differences are sent the send & receive command is much faster. In my case it reduces the total send & receive time for both @ and @home from 25 minutes to 18 seconds. The amount of time actually saved will depend on the amount of change taken place between the 05 and the 06 snapshots. Little change, low times, lots of changes, longer times.


        Btrfs keeps track of the file data in lists of blocks called extents. When a block is updated, btrfs may copy the entire extent if it is small, or it may break the extent into pieces, keeping unchanged sections of the file as smaller extents called "bookends", plus the new changed section. These new smaller bookend extents are just new pointers to the original data blocks, so the unchanged data is not copied. So snapshots will share the unchanged blocks in a large file. IF you make AND KEEP a bunch of snapshots then over time changes you make in your system will result in older snapshots becoming more populated. If the total space for @ + @home + all your snapshots => 90% of your HD space then your Btrfs system will begin to slow down. That is why Btrfs developers recommend that you limit your total number of snapshots to about a dozen per subvolume. People who think they can take snapshots every 5 minutes (or less!) are living in a fools paradise, which will come back to bite them severely.

        More information can be found here: https://en.wikipedia.org/wiki/Btrfs#Extents
        "A nation that is afraid to let its people judge the truth and falsehood in an open market is a nation that is afraid of its people.”
        – John F. Kennedy, February 26, 1962.

        Comment


          #5
          Thanks GreyGeek, that is good to know! I wasn't able to test this here because for some reasons my backups stop after a couple of hours with an error:
          root@hermes:~# btrfs send /mnt/snapshots/@home20180907 | btrfs receive /media/thomas/Backup/
          At subvol /mnt/snapshots/@home20180907
          At subvol @home20180907
          ERROR: crc32 mismatch in command
          Btrfs device stats shows no errors and I already did a btrfs scrub on the sending and receiving end with no effect. Think I'll try btrfs check next.

          Comment


            #6
            USB device? Have you tried again just to see if the device timed out?

            Last time I was on IRC talking about this using USB devices was considered iffy for send|receive. The preferred method was to send to a file and copy the file to the USB instead. However, I have tested this here without issue. The concern was the stability of the USB device connection. The above error would occur if the device momentarily disconnected. Usually, during a file copy, a momentary disconnect will result in a pause and resume of the copy process whereas I believe BTRFS send|receive does not yet handle those sort of interrupts gracefully.

            Please Read Me

            Comment


              #7
              Discussion - BTRFS backups, full vs. incremental

              Originally posted by Thomas00 View Post
              Thanks GreyGeek, that is good to know! I wasn't able to test this here because for some reasons my backups stop after a couple of hours with an error:

              Btrfs device stats shows no errors and I already did a btrfs scrub on the sending and receiving end with no effect. Think I'll try btrfs check next.
              I wouldn't bother with a btrfs check if you can send & receive, stats and scrub and no problems are indicated.

              Are you saying that your send & receive command to a USB device stops after a "couple hours"? I'd agree with oshunluver about it timing out. Did the stick get over 90% full? That slows down a lot of file systems.

              My @home subvolume is 105GB. I made a snapshot of it labeled @home20180907 and then used the send command to sent the ASCII version to /backup

              btrfs send -f /backup/@hometxt /mnt/snapshots/@home20180907

              It took 8 minutes to send the 109GB. I browsed @hometxt with the F3 View function in mc. It was a normal ASCII file that represented my @home subvolume. As oshunluver says, I could copy that text file anywhere, even to a USB stick, if the stick is big enough.

              The only problem I have with the "btrfs send -f " command, which truly creates an ASCII file of your subvolume, I have NOT found a command sequence in the man pages that turns an ASCII back into a subvolume again.

              Using send & receive doesn't work:
              Code:
              btrfs send /backup/@home07txt | btrfs receive /mnt/snapshots/ 
              ERROR: failed to get flags for subvolume /backup/@home07txt: Invalid argument
              Snapshotting it doesn't work:
              Code:
              btrfs su snapshot  /backup/@home07txt /mnt/snapshots/@hometxtsubvol   
              ERROR: not a subvolume: @home07txt
              Got me stumped.


              EDIT:
              I figured it out. I took an ro snapshot of @home:

              btrfs su snapshot -r /mnt/@home /mnt/snapshots/@home20180907
              Create a readonly snapshot of '/mnt/@home' in '/mnt/snapshots/@home20180907'

              I sent it to a text file:

              btrfs send -f /mnt/snapshots/@home07txt /mnt/snapshots/@home20180907
              At subvol /mnt/snapshots/@home20180907

              The "send -f" syntax is the <outfile> first and the subvolume source second. The <outfile> was created with "send -f" so "receive -f" should receive it and convert it to a normal subvolume:

              btrfs receive -f /mnt/snapshots/@home07txt /backup/
              At subvol @home20180907

              Notice the "At subvol" give the name of the orginal subvolume, @home20180907.

              As a text file @home07txt can be moved between or outside of a btrfs subvolume, and the "btrfs receive -f <filename> /mountpoint" command will convert it back to a subvolume.

              Ah, all is well in the garden now.
              Last edited by GreyGeek; Sep 07, 2018, 07:40 PM.
              "A nation that is afraid to let its people judge the truth and falsehood in an open market is a nation that is afraid of its people.”
              – John F. Kennedy, February 26, 1962.

              Comment


                #8
                Yes, USB device. An external drive with 500GB. I tried a few times by now and it' usually after 1-2 hours when the error occurs. I have it running in the background so wont spot the error immediately.
                As far as timing is concerned, creating the snapshot is almost immediate, which I think is to be expected, and given that this is not (yet) an incremental backup I would expect that with the send command everything will have to be copied to the USB drive which in my case means ~240GB, so the timing looks about right to me.
                Maybe I start all over again and empty the drive and try again. I will also try the file approach.

                By the way, heard today that Dropbox will stop supporting non EXT4 file systems in November. Not good as the Linux client was working really well for me...

                Comment


                  #9
                  Yes, snapshots take basically zerotime and zero data space - at least until you begin making changes to the source subvolume. They do take metadata space.

                  I dropped dropbox (pun intended) a few weeks ago when I heard the initial announcement.

                  Please Read Me

                  Comment


                    #10
                    Originally posted by oshunluvr View Post
                    Yes, snapshots take basically zerotime and zero data space - at least until you begin making changes to the source subvolume. They do take metadata space.

                    I dropped dropbox (pun intended) a few weeks ago when I heard the initial announcement.
                    So did I.
                    I created an account with MEGA (because they have megatools in the repository, which adds services to Dolphin, which work great. Their "Free" offer is for 50GB, except that it's really only 15GB because 35GB is a "limitied time bonus" for signing up. I installed their FF button, which works great. Yesterday I moved 343 folders (Documents and subfolders) containing 19,544 files, to MEGA. They take 7.85GB out of 15Gb. I was getting upload speeds of 35Mb/s.

                    Here's what the Dolphin services look like:
                    Click image for larger version

Name:	mega_dolphin_services.jpg
Views:	1
Size:	96.5 KB
ID:	643983

                    megatools installs the following apps:

                    /usr/bin/megacopy
                    /usr/bin/megadf
                    /usr/bin/megadl
                    /usr/bin/megaget
                    /usr/bin/megals
                    /usr/bin/megamkdir
                    /usr/bin/megaput
                    /usr/bin/megareg
                    /usr/bin/megarm
                    and they can be run from the CLI or from bash scripts. Here is the man page for megaput:
                    MEGAPUT(1) Megatools Manual MEGAPUT(1)

                    NAME
                    megaput - upload files to your Mega.nz account

                    SYNOPSIS
                    megaput [--no-progress] [--path <remotepath>] <paths>...

                    DESCRIPTION
                    Uploads files to your Mega.nz account.

                    NOTE: If you want to upload entire directories, use megacopy(1).

                    OPTIONS
                    --path <remotepath>
                    Remote path to upload to. If this path is a directory, files are placed into the directory. If this path doesn’t
                    exist, and it’s parent directory does, the file will be uploaded to a specified path (this only works if you specify
                    exactly one file).

                    --no-progress
                    Disable upload progress reporting.

                    --disable-previews
                    Never generate and upload file previews, when uploading new files

                    -u <email>, --username <email>
                    Account username (email)

                    -p <password>, --password <password>
                    Account password

                    --no-ask-password
                    Never ask interactively for a password

                    --reload
                    Reload filesystem cache

                    --speed-limit <speed>
                    Set maximum allowed upload and download speed in KiB/s. This option overrides config file settings. 0 means no limit.

                    --proxy <proxy>
                    Use proxy server to connect to mega.nz. This option overrides config file settings. More information can be found in
                    libcurl documentation at https://curl.haxx.se/libcurl/c/CURLOPT_PROXY.html. Some acceptable values are:

                    · none : Disable proxy if it was enabled in the config file.

                    · socks5://localhost:9050 : Local SOCKSv5 proxy server

                    · socks5h://localhost:9050 : Local SOCKSv5 proxy server with DNS handled by the proxy

                    --config <path>
                    Load configuration from a file

                    --ignore-config-file
                    Disable loading .megarc

                    --debug [<options>]
                    Enable debugging of various aspects of the megatools operation. You may enable multiple debugging options separated
                    by commas. (eg. --debug api,fs)

                    Available options are:

                    · api: Dump Mega.nz API calls

                    · fs: Dump Mega.nz filesystem (may require --reload to actually print something)

                    · cache: Dump cache contents

                    --version
                    Show version information

                    <paths>
                    One or more local files to upload.

                    EXAMPLES
                    · Upload file to the /Root:

                    $ megaput README
                    $ megals /Root

                    /Root
                    /Root/README

                    · Upload file, while naming it differently:

                    $ megaput --path /Root/README.TXT README
                    $ megals /Root

                    /Root
                    /Root/README.TXT

                    REMOTE FILESYSTEM
                    Mega.nz filesystem is represented as a tree of nodes of various types. Nodes are identified by a 8 character node handles
                    (eg. 7Fdi3ZjC). Structure of the filesystem is not encrypted.

                    Megatools maps node tree structure to a traditional filesystem paths (eg. /Root/SomeFile.DAT).

                    NOTE: By the nature of Mega.nz storage, several files in the directory can have the same name. To allow access to such
                    files, the names of conflicting files are extended by appending dot and their node handle like this:

                    /Root/conflictingfile
                    /Root/conflictingfile.7Fdi3ZjC
                    /Root/conflictingfile.mEU23aSD

                    You need to be aware of several special folders:

                    /Root
                    Writable directory representing the root of the filesystem.

                    /Trash
                    Trash directory where Mega.nz web client moves deleted files. This directory is not used by megatools when removing
                    files.

                    /Inbox
                    Not sure.

                    /Contacts
                    Directory containing subdirectories representing your contacts list. If you want to add contacts to the list, simply
                    create subdirectory named after the contact you want to add.

                    /Contacts/<email>
                    Directories representing individual contacts in your contacts list. These directories contain folders that others
                    shared with you. All shared files are read-only, at the moment.

                    SEE ALSO
                    megatools(7), megarc(5), megadf(1), megadl(1), megaget(1), megals(1), megamkdir(1), megaput(1), megareg(1), megarm(1),
                    megacopy(1).

                    MEGATOOLS
                    Part of the megatools(7) suite.

                    BUGS
                    Report bugs at https://github.com/megous/megatools or megous@megous.com.

                    AUTHOR
                    Megatools was written by Ondrej Jirman <megous@megous.com>, 2013-2016.

                    Official website is http://megatools.megous.com.

                    megatools 1.9.98 11/03/2016 MEGAPUT(1)
                    Last edited by GreyGeek; Sep 07, 2018, 03:22 PM.
                    "A nation that is afraid to let its people judge the truth and falsehood in an open market is a nation that is afraid of its people.”
                    – John F. Kennedy, February 26, 1962.

                    Comment

                    Working...
                    X