Announcement

Collapse
No announcement yet.

Backing up to a networked computer using BTRFS and SSH

Collapse
This topic is closed.
X
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    Backing up to a networked computer using BTRFS and SSH

    I have a server and a desktop both using btrfs for most of my file systems. The server contains media; documents, pictures, music, etc., for use via DLNA. The server has 2 pairs of drives in RAID1 (duplicate) configuration using btrfs. RAID1 is an excellent to way to protect you from drive failure event. If one drive fails, you replace it and rebuild the RAID1 array.

    However, most people will tell you RAID1 isn't a backup - and it's not. For example, if I log into the server and accidentally delete my family photos folder from 2009, it's gone - on both drives. So backups are needed to protect from dumb mistakes. Also, random file system corruption or some other catastrophic event could wipe the entire file system.

    So I needed a backup procedure to go along with the RAID1 configuration. It is better to have your backup on a different system when possible, so I decided backing up to my desktop was in order, but how to do this?

    The btrfs backup procedure uses the "send" and "receive" commands to copy an entire subvolume from one btrfs filesystem to another. I have used this many times on a single machine, but never tried to do it between two machines. Since I already use ssh to access the server, it seemed a natural choice to use for this process. There are other methods to copy files from one computer to another, but most (if not all) if them use a network file system of one type or another which means I can't use btrfs. I even tried mounting the target drive using "sshfs" which works, but it's not btrfs, so send|receive was not usable. My desire was to use a simple bash script and a cronjob to send backups of the important data like family photos to my desktop computer in an automated fashion - totally unattended. This means passwords can't be required, since I won't be there to enter them. Also, btrfs commands require root (or sudo) access. Even if it could be done with sudo (it can't over ssh), I don't want my password stored in plain text. Obviously, removing all security just to make backups is a horrible idea, so I needed something better: Root access via SSH using a secure key.

    The environment:
    A server and a desktop computer on an internal network, behind the firewall, both with openssh-client and -server installed.

    The tasks:
    1. Securely gain root level ssh access from the server to the desktop without requiring a password.
    2. Verify the use of btrfs send|receive over ssh.
    3. Automate the backup process.


    This thread borrowed heavily from here and lightly from other locations on the internet.

    [#]btrfs[/#] [#]backup[/#]
    [#]ssh[/#]
    Last edited by oshunluvr; Nov 20, 2017, 09:47 AM. Reason: credit and tags

    Please Read Me

    #2
    Task 1

    Task 1, Part 1:
    The goal of this first part of the task is to be able to send commands via ssh from my server computer to my desktop computer. To use ssh in this way, my server computer becomes the "Client" (where I ssh from) and the desktop computer becomes the "Server" (where I ssh to). I will refer to them as Client and Server from here on out.

    First, logging into a root session using "sudo -i", I prepared the Client for and created a secure key without a password:

    --On the Client--
    $ sudo -i
    # mkdir /root/.ssh
    # cd /root/.ssh
    # ssh-keygen -f serverpass


    "ssh-keygen" will prompt you for a password for the new key - just hit enter and your key will not require a password. "serverpass" is the name I choose. You can use something else.

    Now we need to get the key onto the "server" (the desktop computer in this case). This requires root login and I don't have a root password on my desktop (I use sudo) so I need to make one - which I will delete later:

    --On the Server--
    $ sudo passwd

    I will also allow root access via ssh using a password - again, a generally ill-advisable way to configure your computer, but it will also be temporary. This requires 2 edits of /etc/ssh/sshd_config and a restart of the ssh server program:

    --On the Server--

    $ sudo nano /etc/ssh/sshd_config

    Look for these two lines:
    PermitRootLogin prohibit-password
    PasswordAuthentication no
    and change them to:
    PermitRootLogin yes
    PasswordAuthentication yes
    and save the edits. Then restart the ssh server:
    $ sudo service ssh restart

    Now we need to send the secure key to the Server:

    --On the Client--
    Code:
    ssh-copy-id -i ~/.ssh/serverpass root@server
    Now we can lock the Server back down:

    --On the Server--
    $ sudo nano /etc/ssh/sshd_config

    Look for these two lines:
    PermitRootLogin yes
    PasswordAuthentication yes
    and change them to:
    PermitRootLogin without-password
    PasswordAuthentication no
    and save the edits. Then restart the ssh server:
    $ sudo service ssh restart

    Finally, remove the root password:
    $ sudo passwd root -d

    You should now be able to log in as root on the Server from the Client:

    --On the Client--
    $ ssh -i ~/.ssh/serverpass root@server
    The "@server" would need to be either the IP address of the Server computer, like "root@192.169.1.100" or the hostname of the Server computer. In my case, the computer on the receiving end of the ssh command (the Server) is my office desktop so I use "root@office".

    Task 1, Part 2
    The goal of this second part of the first task is to simplify the sending ssh commands and access.

    As of now, you must specify the identity file with each SSH command. Also, it's generally safer to use non-standard ports for SSH. I tend to use a different port for all my computers. This means each time I send a command from the Client to the Server, I must specify the location of the identity file, the port, and the user name and hostname or IP of the Server computer and any command. I don't want to have to remember all that, so I with a simple config file on the Client and can reduce:
    $ ssh -i ~/.ssh/serverpass root@office -p 2345 ls

    to simply
    $ ssh office ls

    The config file is kept in the home folder of the user under the .ssh folder. The dot preceding ssh means the folder is hidden and the file named config is not there by default. In this case, I'm attempting to connect my root user to access the root user on the other computer, so I need the put the config file under /root/.ssh/. In it I will put all the needed info for the ssh command to automatically access.

    --On the Client--
    $ sudo nano /root/.ssh/config
    This will open an empty editor window - since the file doesn't exist yet. Now paste or type this into it, using the correct pieces for your system(s):
    Code:
    Host office
      Port 2345
      User root
      Hostname office
      IdentityFile ~/.ssh/serverpass
    and save and exit nano.

    The first line - "Host office" means I can now issue a command using just "ssh office" followed by the command, or open a terminal window on the office computer using "ssh office".

    I usually take the final step of adding this to my ~/.bash_aliases file to shorten it even further:
    Code:
    alias office='ssh office'
    Task 1 is complete.
    Last edited by oshunluvr; Nov 23, 2017, 07:50 AM.

    Please Read Me

    Comment


      #3
      Task 2

      Task 2:
      One of the many benefits to BTRFS is the ability to make a copy of an entire subvolume and send it to another btrfs file system. No easier backup method exists. Moving a subvolume to another computer is only slightly more difficult.

      To make and send a subvolume backup, we must take a read-only "snapshot" of the subvolume and send|receive it to another file system. This creates a backup - a full copy - of the subvolume. The addition of using SSH will allow the receive part of the command to take place on another computer. The completion of Task 1 makes this possible, with the added benefit of not requiring a password, which means we can do this unattended if desired.

      I'm backing up a subvolume name "@Pictures" on my media server (the "Client" in Task 1) and sending it to my office desktop computer (the "Server" in Task 1). To make and send snapshots you have to mount the "root" or top level of the btrfs filesystem to expose the subvolumes. In my case, the media subvolumes are all on /dev/sdc3.
      ***For those new to BTRFS, subvolumes themselves work similarly to device or a drive and they can be mounted in the same way. Each of my media folders on my media server are actually separately mounted subvolumes. This keeps the data segregated for backup purposes like this task. With several TB of media, backup it up all at once is an extremely time-consuming task. Anything like that can take hours or even days and therefore subject to failure from unintended interruption like a power loss. By dividing the data into subvolumes, I can back up only the portion (subvolume) that requires it and spread the backup operations out over time while keeping the time of each individual backup much shorter than a full one-shot backup.***

      First, we mount the root level file system and "cd" into it"
      $ sudo mount /dev/sdc3 /mnt/all_media
      $ cd /mnt/all_media


      This puts us in the best location to make and send the snapshot. A quick listing shows all my subvolumes on /dev/sdc3
      $ ls -l
      Code:
      drwxrwsr-x 1 nobody   share    41094 Nov 18 09:46 @Movies/
      drwxrwsr-x 1 nobody   share    11310 Sep 20 09:10 @Music/
      drwxrwsr-x 1 nobody   share      418 Nov 19 13:23 @Pictures/
      drwxrwsr-x 1 nobody   share      206 Nov 18 18:50 @Videos/
      This command makes the read-only snapshot require before sending it:
      $ sudo btrfs subvolume snapshot -r @Pictures @Pictures_ro


      and now we have:
      Code:
      drwxrwsr-x 1 nobody   share    41094 Nov 18 09:46 @Movies/
      drwxrwsr-x 1 nobody   share    11310 Sep 20 09:10 @Music/
      drwxrwsr-x 1 nobody   share      418 Nov 19 13:23 @Pictures/
      drwxrwsr-x 1 nobody   share      418 Nov 21 08:49 @Pictures_ro/
      drwxrwsr-x 1 nobody   share      206 Nov 18 18:50 @Videos/
      Now we need only send it to the backup computer. I have already prepared and mounted a BTRFS file system on the other computer (my "office" computer) at /server_backups so that's where I'm sending it. Using a root shell via "sudo -i" is required because btrfs subvolume commands require root access and we set up our SSH server to send root commands to the "Server" as well:
      $ sudo -i

      This command puts us in the root home folder so lets go back to the mounted media subvolumes location:
      # cd /mnt/all_media

      and send the backup to the office computer:
      # btrfs send @Pictures_ro/ | ssh office "btrfs receive /mnt/server_backups"

      You should see this somewhat cryptic response in the terminal window:
      Code:
      At subvol @Pictures_ro/
      At subvol @Pictures_ro
      This means the command is working and the cursor will return when it's done. That's it. Now we just wait for it to complete. If you want, you can force the command into the background by ending it with " &" so you can go on doing other things. Or you could just open a new terminal window if you have other task to complete. With few exceptions, BTRFS allows you to continue using the file system while you're doing even big tasks like this on. The reason a read-only snapshot is required is to prevent accidental corruption during the send|receive operation.

      Depending on the size of the subvolume and the speed of your network this can take hours. In this case 42.6GB took a about twenty minutes. Since this is a one-for-one copy, the results are an identical subvolume that is now in the backup location. I can access it like any other btrfs file system or subvolume.
      Last edited by oshunluvr; Nov 23, 2017, 07:55 AM.

      Please Read Me

      Comment


        #4
        “The make and send ...” s/b “To make and send ...”

        I’m looking forward to finding out how fast using ssh is.
        "A nation that is afraid to let its people judge the truth and falsehood in an open market is a nation that is afraid of its people.”
        – John F. Kennedy, February 26, 1962.

        Comment


          #5
          Task 3

          At this point, I have to take a break and do some real work. I'll come back to automating (Task 3) when I have time.
          The purpose of this exercise is to make my computing life more secure while lessening my daily workload.

          I plan to be able to at least:
          1. Determine when a change in a subject subvolume has occurred.
          2. Ensure the target location is available with enough space.
          3. Trigger the backup based on the change without user intervention.
          4. When needed, perform cleanup of the backups on a schedule and with priority.


          More tasks or ideas might come to light when I get back to this.

          Also worthy of noting that BTRFS has an "incremental" backup feature so you only have to send the difference between to snapshots of the same subvolume. This will drastically reduce time to send|receive and I will incorporating this in my backup plan.

          Please Read Me

          Comment


            #6
            Originally posted by GreyGeek View Post
            “The make and send ...” s/b “To make and send ...”

            I’m looking forward to finding out how fast using ssh is.
            TY, fixed.

            The real benefit will be using Incremental send|receive later. My 42.6GB of photos will no doubt grow as we add our cellphone and digital camera shots periodically, but an incremental backup would only require the new data be sent.
            Last edited by oshunluvr; Nov 20, 2017, 09:51 AM.

            Please Read Me

            Comment


              #7
              I just corrected the send post with regard to the time. I incorrectly assume the 42GB send took two hours the last time I did it, but then I realized the date/time stamp on the snapshot mimics the source subvolume. I'm doing another send|receive right now with another smaller subvolume and I used the "time" command this time.

              I'll post the results.

              Please Read Me

              Comment


                #8
                26.6GB subvolume:

                real 4m44.119s
                user 1m8.952s
                sys 1m18.300s


                This over a gigabit network with PCs connected through a switch.

                Please Read Me

                Comment


                  #9
                  So, over my 100Mbps connection those times would be 10X longer. Taking into account my 90Gb snapshots total and I'd be looking at 30X as long, or about 3.5 hours.

                  Thanks for the info.
                  "A nation that is afraid to let its people judge the truth and falsehood in an open market is a nation that is afraid of its people.”
                  – John F. Kennedy, February 26, 1962.

                  Comment

                  Working...
                  X