Announcement

Collapse
No announcement yet.

How do Incremental Backups Work?

Collapse
This topic is closed.
X
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    How do Incremental Backups Work?

    Lets say that I have 2 snapshots with some changes between them.

    I also have a new disk that has not backups on it. If I send and incremental backup to this new hard drive, what exactly happens?

    Does it have a list of changes from the original system? If it does, how does it know what the original system looked like?
    Does btrfs detect that there isn't a backup on the disk and send the original system state in this case?

    I'm guessing that something similar happens if you send multiple incremental backup to the drive and delete the original?

    How does this work exactly?

    #2
    Originally posted by PhysicistSarah View Post
    Does it have a list of changes from the original system? If it does, how does it know what the original system looked like?
    BTRFS keeps track of what has changed: https://www.tummy.com/blogs/2010/11/...-have-changed/

    Originally posted by PhysicistSarah View Post
    Does btrfs detect that there isn't a backup on the disk and send the original system state in this case?
    AFAIK, No, but try it and see what happens. In any case it wouldn't send the "original" state because that has no meaning. It will either fail or send garbage unless the prior full backup is on the target. I suspect the incremental send will just fail.

    Originally posted by PhysicistSarah View Post
    I'm guessing that something similar happens if you send multiple incremental backup to the drive and delete the original? How does this work exactly?
    I'm not sure what you're asking here. Deleting the previous backup would be the normal course of action unless you were keeping a catalog of backups. "Incremental" refers to what is sent, not what exists on the target.

    As a simple example:

    The source btrfs file system has 3 files; A, B, and C.
    You send a full backup to another btrfs file system. It now contains files; A, B, and C.
    You add file D to the source.
    You send a incremental backup and only file D is sent.
    Your backup file system now contains two subvolumes. The first has A, B, and C and the second has A, B, C, and D.
    The two backup subvolumes share the data space for files A, B, and C.
    If you delete the first subvolume, the second still contains A, B, C, and D.

    Please Read Me

    Comment


      #3
      How do incremental backups work?

      Adding to what Oshunluver said:

      Assume your main BTRFS subvolumes @ and @home are on sda. Also assume that you eventually back them up to the remote storage on sdb. (Remote in the sense that the subvolumes moved to that device have no real-time connection to the live BTRFS file system on sda.)

      Day 1
      You enter root and mount sda to /mnt
      sudo -i
      (supply your account password)

      Code:
      mount /dev/disk/by-uuid/uuid_of_sda  /mnt
      You can determine the uuid of a storage device using blkid. It will be on the sda line between the quotes following 'UUID=

      Create the backup subdirectory under / while in the root konsole
      Code:
      mkdir /backup
      Mount sdb to /backup
      Code:
      mount /dev/disk/by-uuid/uuid_of_sdb /backup
      Create a subdirectory (not subvolume) to hold snapshots under /mnt
      Code:
      mkdir /mnt/snapshots
      Still in the root konsole you create your first set of read-only snapshots using the "-r" parameter
      Code:
      btrfs su snapshot -r /mnt/@ /mnt/snapshots/@20190728
      sync
      btrfs su snapshot -r /mnt/@home /mnt/snapshots/@home20190728
      sync

      Send those two snapshots to /backup
      Code:
      btrfs send /mnt/snapshots/@20180728 | btrfs receive /backup
      sync
      btrfs send /mnt/snapshots/@home20190728 | btrfs receive /backup
      sync
      (Repeat sync until it comes back quickly)

      Unmount /backup and /mnt
      Code:
      umount /backup
      umount /mnt
      exit root
      exit konsole

      Now /backup contains a set of "base" snapshots that can be used as the basis for incremental backups.

      Day 2 (May be the next day or a number of days after Day 1)
      Enter a root konsole and mount sda to /mnt and sdb to /backup
      Create a set of snapshots on /mnt

      Code:
      btrfs su snapshot -r /mnt/@ /mnt/snapshots/@20190729
      sync
      btrfs su snapshot -r /mnt/@home /mnt/snapshots/@home20190729
      sync
      Now we use the incremental backup parameter: "-p" to send the DIFFERENCE between the snapshots taken on the 29th and those taken on the 28th.
      Since the snapshots taken on the 28 already reside on /backup the receive command will create a copy of the 28th subvolumes and populate them with the data sent by the send command which, because of the -p parameter, only sends the difference between the snapshot taken on the 28th and the one taken on the 29th.

      Code:
      btrfs send [B]-p[/B] /mnt/snapshots/@20190728 /mnt/snapshots/@20190729 | btrfs receive /backup
      sync
      btrfs send [B]-p[/B] /mnt/snapshots/@home20190728 /mnt/snapshots/@home20190729 | btrfs receive /backup
      sync
      sync
      The next day, and on all subsequent days the -p command is used to send the incremental stream to the backup storage. In my case the send/receive command for both @somedate and @homesomedate were taking a total of 45 minutes for 130GB of data. After I got incremental backup going the send/receive command for both subvolumes usually took less than 3 minutes.

      Notice the form for the -p command:
      btrfs send -p /mnt/snapshots/@some_previously_made_ro_snapshot /mnt/snapshots/@current_ro_snapshot | btrfs receive /backup

      How the send command works:

      The send command compares the previous snapshot with the current one and sends ONLY the difference between the two. The receive command sees the send command and expects to see @some_previously_made_ro_snapshot among its list of snapshots. If receive does not have @some_previously_made_ro_snapshot among its list of snapshots the receive command will fail. The send command also tells the receive command what the name of the new snapshot will be: @current_ro_snapshot. If a snapshot by the name of @current_ro_snapshot already exists among the list of snapshots on /backup the receive command will fail. The receive command opens a copy of the @some_previously_made_ro_snapshot already residing on /backup and then fills it with the data from the send command, and then renames it to @current_ro_snapshot.

      Because changes made by adding and removing software from your installation may affect BOTH the @ and @home subvolumes, it is recommended that both subvolumes have snapshots made at the same time using a datestamp (or other unique identifier). If necessity requires that you restore from @home_somedate then also restore @_somedate too. Never mix the snapshot @ for one date with the snapshot of @home of another date. The reboot may or may not work but it is certain the some files and/or applications will be broken.

      A note of caution:
      As I pointed out (and linked to) the BTRFS devs recommended against storing more than 8 snapshots per subvolume. Sending too many snapshots to /mnt/snapshots can fill your storage medium and bring BTRFS to a halt, or slow it down considerably.

      Before you use the send command you can query the /backup to determine how much storage space remains:
      Code:
      btrfs fi usage /backup
      IF the size of your pending snapshot is bigger than the available free space then you should delete one or more pairs of your older snapshots:

      Code:
      btrfs subvol delete -C /mnt/snapshots/@oldest_date
      btrfs subvol delete -C /mnt/snapshots/@home_oldest_date
      Last edited by GreyGeek; Aug 02, 2019, 08:57 PM.
      "A nation that is afraid to let its people judge the truth and falsehood in an open market is a nation that is afraid of its people.”
      – John F. Kennedy, February 26, 1962.

      Comment


        #4
        Originally posted by GreyGeek View Post
        ...The send command compares the previous snapshot with the current one
        In the spirit of answering the OP's question (how does it work) I'm adding this. Btrfs stores data in a tree structure, that gets a branch when data are changed or added. The branches are labelled with a generation id and a date. So to find what's changed, it just has to traverse the tree till it finds a branch that's later than the previous send. It doesn't have to compare the data, and just has to send the changes to the tree, which can be efficiently applied at the receiving btrfs.

        A note of caution:
        As I pointed out (and linked to) the BTRFS devs recommended against storing more than 8 snapshots per subvolume.
        What? Aargh! My main SSD btrfs has more than 20 on the root subvolume, mostly "taken" automatically by snapper, even after drastically reducing the numbers from the default configuration.
        Regards, John Little

        Comment


          #5
          Originally posted by jlittle View Post
          What? Aargh! My main SSD btrfs has more than 20 on the root subvolume, mostly "taken" automatically by snapper, even after drastically reducing the numbers from the default configuration.
          That's a practical limit, not a hard one. The hard subvolume limit is 2 to the 64th, or 18,446,744,073,709,551,616.

          From the btrfs mailing list:
          The (practical) answer depends to some extent on how you use btrfs.

          Btrfs does have scaling issues due to too many snapshots (or actually the reflinks snapshots use, dedup using reflinks can trigger the same scaling issues), and single to low double-digits of snapshots per snapshotted subvolume remains the strong recommendation for that reason.

          But the scaling issues primarily affect btrfs maintenance commands themselves, balance, check, subvolume delete. While millions of snapshots will make balance for example effectively unworkable (it'll sort of work but could take months), normal filesystem operations like reading and saving files doesn't tend to be affected, except to the extent that fragmentation becomes an issue (tho cow filesystems such as btrfs are noted for fragmentation, unless steps like defrag are taken to reduce it).
          My take-away is; have as many snapshots as you like, but clean house before doing things like balance. I think storing snapshots isn't the issue, it's what you do with them. I keep a rolling 7 daily snapshots of my important subvolumes and a weekly backup.

          There are some use-cases where I could see having more snapshots, like doing something potentially destructive or a sequential build-up of a new installation, etc. As soon as the procedure is done, simply delete those not needed.
          Last edited by oshunluvr; Aug 03, 2019, 08:22 AM.

          Please Read Me

          Comment


            #6
            What jlittle and oshunluver said ....
            My main concern is that as snapshots grow in number AND changes take place, beginning with the older snapshots they begin being filled with changes. Doing something like an update that downloads 400-500 app updates may cause the older snapshots to become nearly or completely populated. If one's total data for a subvolume pair (@ and @home) is, for example, 100GB and one accumulated a large number of snapshots per subvolume pair, say 20, or 30 or more, then eventually the older snapshot pairs will began to approach full population. If your primary drive (or archive medium) is, say, 1 TB, then it would take only 10 snapshot pairs to fill the drive up.

            The command "btrfs fi usage /path" give an output which includes the "unallocated" size and the "metadata" size. BTRFS first allocates chunks, mostly data or metadata before it can write to them, So, although you may have lots of data chunks with free space, IF your metadata is very close to being full then the chunk allocation cannot continue and the storage medium will be reported as full even if it reports more "unallocated" space than what you need to do another set of snapshots. The solution to the disparity between the size of "unallocated" and with metadata used being almost equal to max metadata is to use the balance command:
            btrfs balance start -dusage=75 /mnt

            If that command fails (you'll get ENOSPC errors) then redo it with a smaller value, like 5 or 10, and repeat it with increasing value sizes (15, 25, 35, etc.) until you reach 75.
            If the 5 value fails try it with 1. If 1 fails then it is time to move or delete older subvolumes until you can do a balance.
            Redoing the balance should minimize the metadata value and allow you to use some more of that "unallocated" space.
            See "man btrfs-balance" for more info on the use of -d, -m and -s filters.
            "A nation that is afraid to let its people judge the truth and falsehood in an open market is a nation that is afraid of its people.”
            – John F. Kennedy, February 26, 1962.

            Comment

            Working...
            X