Announcement

Collapse
No announcement yet.

Block adverts with an automatically generated hosts file on a Linux router

Collapse
This topic is closed.
X
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    [ROUTER] Block adverts with an automatically generated hosts file on a Linux router

    Steve has written a great script to automate building an adblocking hosts file.

    This is great for blocking adverts on a specific device, but I'd like to use it on my router to block all adverts at home without the fuss of updating the hosts file on each device individually: my girlfriend's laptop, her phone, two tablets, etc.

    I've created this thread because I don't want to take over Steve's, which might be best remaining specific to Kubuntu.

    My router runs OpenWrt, but the plan is to create something that would run on any Linux router (DD-WRT etc).

    Steve's original script, which creates a host file from multiple sources, is here (call it gethosts.sh):
    Code:
    #!/bin/bash
    
    # If this is our first run, save a copy of the system's original hosts file and set to read-only for safety
    if [ ! -f ~/hosts-system ]
    then
     echo "Saving copy of system's original hosts file..."
     cp /etc/hosts ~/hosts-system
     chmod 444 ~/hosts-system
    fi
    
    # Perform work in temporary files
    temphosts1=$(mktemp)
    temphosts2=$(mktemp)
    
    # Obtain various hosts files and merge into one
    echo "Downloading ad-blocking hosts files..."
    wget -nv -O - http://winhelp2002.mvps.org/hosts.txt >> $temphosts1
    wget -nv -O - http://hosts-file.net/ad_servers.asp >> $temphosts1
    wget -nv -O - http://someonewhocares.org/hosts/hosts >> $temphosts1
    wget -nv -O - "http://pgl.yoyo.org/adservers/serverlist.php?hostformat=hosts&showintro=0&mimetype=plaintext" >> $temphosts1
    
    # Do some work on the file:
    # 1. Remove MS-DOS carriage returns
    # 2. Delete all lines that don't begin with 127.0.0.1
    # 3. Delete any lines containing the word localhost because we'll obtain that from the original hosts file
    # 4. Replace 127.0.0.1 with 0.0.0.0 because then we don't have to wait for the resolver to fail
    # 5. Scrunch extraneous spaces separating address from name into a single tab
    # 6. Delete any comments on lines
    # 7. Clean up leftover trailing blanks
    # Pass all this through sort with the unique flag to remove duplicates and save the result
    echo "Parsing, cleaning, de-duplicating, sorting..."
    sed -e 's/\r//' -e '/^127.0.0.1/!d' -e '/localhost/d' -e 's/127.0.0.1/0.0.0.0/' -e 's/ \+/\t/' -e 's/#.*$//' -e 's/[ \t]*$//' < $temphosts1 | sort -u > $temphosts2
    
    # Combine system hosts with adblocks
    echo Merging with original system hosts...
    echo -e "\n# Ad blocking hosts generated "$(date) | cat ~/hosts-system - $temphosts2 > ~/hosts-block
    
    # Clean up temp files and remind user to copy new file
    echo "Cleaning up..."
    rm $temphosts1 $temphosts2
    echo "Done."
    echo
    echo "Copy ad-blocking hosts file with this command:"
    echo " sudo cp ~/hosts-block /etc/hosts"
    echo
    echo "You can always restore your original hosts file with this command:"
    echo " sudo cp ~/hosts-system /etc/hosts"
    echo "so don't delete that file! (It's saved read-only for your protection.)"
    echo
    I wanted to re-enable google analytics for my website, which is on a server behind my router, so I have written another script that will allow you to re-enable certain hosts.

    Call this one edit-hosts.sh
    Code:
    #!/bin/bash
    
    #Before calling this script, create a whitelist file containing phrases to allow, one phrase per line
    
    if [ $# -ne 1 ]; then
    echo "Usage: $0 whitelist_file_location"
    exit
    fi
    
    INPUT_FILE=~/hosts-block
    OUTPUT_FILE=~/hosts-block-less-whitelist
    
    #first, remove empty lines from whitelist_file (or next step will throw an error)
    sed '/^$/d' $1 > tt
    mv tt $1
    echo 'Removed empty lines from whitelist_file'
    
    cp $INPUT_FILE $OUTPUT_FILE
    
    #now, read lines from whitelist file and remove entries with matching content from OUTPUT_FILE
    cat $1 | while read line; do
            sed -e '/'$line'/d' $OUTPUT_FILE > tt
            mv tt $OUTPUT_FILE
            echo 'Removed any lines containing' $line
    done
    The script takes the product of Steve's gethosts.sh (which is the file ~/hosts-block), and removes any lines with terms that match a whitelist file that you create. The whitelist file should contain one phrase per line, and can be located anywhere: you tell edit-hosts.sh where it is when you call it (e.g. edit-hosts.sh ~/whitelist). The product is the file ~/hosts-block-less-whitelist.

    The script works on Kubuntu, but I'm still working on editing it so it will work on OpenWrt and I've now figured out how to make it work with OpenWrt too. In fact, I've merged the two scripts into one and added some extra bits. Have a look in the next post for the new script.
    Last edited by Feathers McGraw; Oct 31, 2013, 04:47 PM.
    samhobbs.co.uk

    #2
    Below is 'my' all in one script in its current form:

    Code:
    #!/bin/bash
    
    # check root
    if [ $(id -u) != "0" ]; then
        echo "You must have root access to make changes to your /etc/hosts file" >&2
        exit 1
    fi
    
    # Folder for all script files
    FILES_DIRECTORY=~/adblock
    
    if [ ! -d $FILES_DIRECTORY ]
    then
     echo "Before running this script, please create the directory $FILES_DIRECTORY"
     echo "Create the directory as whichever user you would like to own the config files"
     echo "i.e.       mkdir $FILES_DIRECTORY"
     exit 1
    fi
    
    # More variables
    WHITELIST_FILE=$FILES_DIRECTORY/whitelist
    SYSTEM_HOSTS=$FILES_DIRECTORY/hosts-system
    ADBLOCK_HOSTS=$FILES_DIRECTORY/hosts-block
    ADBLOCK_HOSTS_PERSONAL=$FILES_DIRECTORY/hosts-block-personal
    HOSTS_SOURCES=$FILES_DIRECTORY/hosts-sources
    
    PERMISSION=$(ls -ld $FILES_DIRECTORY | awk '{print $3}')
    
    # Create and populate host sources file if it doesn't exist
    if [ ! -f $HOSTS_SOURCES ]
    then
     echo
     echo "No sources file detected, creating one for you now: $HOSTS_SOURCES"
     echo "Edit this file to add or remove sources for your host file"
     echo
     touch $HOSTS_SOURCES
     echo -e "http://winhelp2002.mvps.org/hosts.txt" >> $HOSTS_SOURCES
     echo -e "http://hosts-file.net/ad_servers.asp" >> $HOSTS_SOURCES
     echo -e "http://someonewhocares.org/hosts/hosts" >> $HOSTS_SOURCES
     echo -e "http://pgl.yoyo.org/adservers/serverlist.php?hostformat=hosts&showintro=0&mimetype=plaintext" >> $HOSTS_SOURCES
    fi
    
    
    # Create whitelist file if it doesn't exist
    if [ ! -f $WHITELIST_FILE ]
    then
     echo "No whitelist file detected, creating a blank one here: $WHITELIST_FILE"
     echo "Add keywords you want to unblock to this file, one per line, and re-run script"
     echo
     touch $WHITELIST_FILE
    fi
    
    # If this is our first run, save a copy of the system's original hosts file
    if [ ! -f $SYSTEM_HOSTS ]
    then
     echo "Saving copy of system's original hosts file..."
     echo
     cp /etc/hosts $SYSTEM_HOSTS
    fi
    
    # Perform work in temporary files
    temphosts1=$(mktemp)
    temphosts2=$(mktemp)
    temphosts3=$(mktemp)
    temphosts4=$(mktemp)
    
    # Obtain various hosts files and merge into one
    echo "Downloading ad-blocking hosts files..."
    cat $HOSTS_SOURCES | while read line; do
    wget -nv -O - $line >> $temphosts1
    done
    echo
    
    # Do some work on the file:
    # 1. Remove MS-DOS carriage returns
    # 2. Delete all lines that don't begin with 127.0.0.1
    # 3. Delete any lines containing the word localhost because we'll obtain that from the original hosts file
    # 4. Replace 127.0.0.1 with 0.0.0.0 because then we don't have to wait for the resolver to fail
    # 5. Scrunch extraneous spaces separating address from name into a single tab
    # 6. Delete any comments on lines
    # 7. Clean up leftover trailing blanks
    # Pass all this through sort with the unique flag to remove duplicates and save the result
    echo "Parsing, cleaning, de-duplicating, sorting..."
    echo
    sed -e 's/\r//'               \
        -e '/^127.0.0.1/!d'       \
        -e '/localhost/d'         \
        -e 's/127.0.0.1/0.0.0.0/' \
        -e 's/ \+/\t/'            \
        -e 's/#.*$//'             \
        -e 's/[ \t]*$//'          \
        < $temphosts1 |
        sort -u > $temphosts2
    
    # Combine system hosts with adblocks
    echo Merging with original system hosts...
    echo
    echo -e "\n# Ad blocking hosts generated "$(date) |
        cat $SYSTEM_HOSTS - $temphosts2 > $ADBLOCK_HOSTS
    
    # EDITING HOST FILE TO RE-ENABLE WHITELIST
    
    # First, remove empty lines from $WHITELIST_FILE (or next step will throw an error)
    sed '/^$/d' $WHITELIST_FILE > $temphosts3
    mv $temphosts3 $WHITELIST_FILE
    echo "Removed empty lines from $WHITELIST_FILE"
    echo
    cp $ADBLOCK_HOSTS $temphosts3
    
    # Now, read lines from whitelist file and remove entries with matching content from $ADBLOCK_HOSTS
    cat $WHITELIST_FILE | while read line; do
    	sed -e '/'$line'/d' $temphosts3 > $temphosts4
    	mv $temphosts4 $temphosts3
    	echo "Removed lines matching $line from $ADBLOCK_HOSTS_PERSONAL"
    done
    cp $temphosts3 $ADBLOCK_HOSTS_PERSONAL
    
    
    # Tell the user what has been removed from their host file
    echo
    echo "The following lines have been removed from your personal hosts file:"
    comm -3 $ADBLOCK_HOSTS $ADBLOCK_HOSTS_PERSONAL
    echo
    
    # Copy new host file to /etc/hosts
    echo "Copying new host file to /etc/hosts"
    cp $ADBLOCK_HOSTS_PERSONAL /etc/hosts
    echo
    
    # Change ownership of files to match owner of $FILES_DIRECTORY
    chown -R $PERMISSION:$PERMISSION $FILES_DIRECTORY
    
    # Make the user has access to all files created
    chmod 644 $FILES_DIRECTORY/*
    # Set system hosts file read only for safety
    chmod 444 $SYSTEM_HOSTS
    
    # Clean up temp files
    echo "Cleaning up..."
    rm -f $temphosts1 $temphosts2 $temphosts3 $temphosts4
    echo "Done."
    echo
    echo "You can always restore your original hosts file with this command:"
    echo " cp " $SYSTEM_HOSTS " /etc/hosts (requires root)"
    echo "so don't delete that file! (It's saved read-only for your protection.)"
    echo
    The script should explain itself, but here are some instructions anyway:

    INSTRUCTIONS
    1) Ensure you have all of the necessary utilities installed - OpenWrt comes with BusyBox by default, which contains many commands like wget, but you may need to look for a few extras (like comm from the package coreutils-comm). You may also need the full (coreutils) versions of some commands, and not the BusyBox versions - I'm still investigating this. You may want to comment out the line "cp $ADBLOCK_HOSTS_PERSONAL /etc/hosts" the first time you run the script, and make sure the output is sensible before you copy it to /etc/hosts.
    2) Place the script somewhere sensible, such as ~/bin/adblock.sh
    3) Create (as your normal user) the directory ~/adblock
    4) Run the script. It will generate some config files inside the folder you created.
    5) If you want to add exceptions, add them to ~/adblock/whitelist (one keyword per line) and re-run. The script will tell you which lines it has removed as a result of your whitelist.

    Summary of differences in procedure between OpenWrt and Kubuntu (WIP)

    1) OpenWrt uses ash not bash
    OpenWrt uses a lightweight shell called ash, not bash. As far as I am aware, the main difference is the size (ash is more lightweight).

    If you try to run the script as it is, it will fail because bash is not installed. There are two solutions, the first of which is simply to install BASH:

    Since I'm running OpenWrt with root on an external filesystem (a USB flash drive), I have loads of space and was able to install bash using openwrt's "opkg":

    Code:
    opkg update
    opkg install bash
    A different solution that requires no extra space is to simply change the first line to call ash and not bash, i.e. "#!/bin/bash" becomes "#!/bin/ash". I have tested this and it works.

    2) On OpenWrt, you are the root user
    Unlike Kubuntu, when you log in to OpenWrt you are the root user. Your home folder (~) is /root, not /home/username. On Kubuntu, if you create a folder in your home called /bin, it is automatically added to your path so you can call the script by typing "gethosts.sh" instead of "~/bin/gethosts.sh". This doesn't work on OpenWrt, so I'll have to find a way to add /root/bin/ to $PATH. WIP.
    Last edited by Feathers McGraw; Oct 31, 2013, 05:24 PM.
    samhobbs.co.uk

    Comment


      #3
      Here's the current output of Steve's script on the router.

      The script wouldn't run with ash, after installing bash the output is:

      Code:
      root@OpenWrt:~# ~/bin/gethosts.sh
      Saving copy of system's original hosts file...
      Downloading ad-blocking hosts files...
      wget: invalid option -- n
      BusyBox v1.19.4 (2013-03-14 11:28:31 UTC) multi-call binary.
      
      Usage: wget [-c|--continue] [-s|--spider] [-q|--quiet] [-O|--output-document FILE]
              [--header 'header: value'] [-Y|--proxy on/off] [-P DIR]
              [--no-check-certificate] [-U|--user-agent AGENT] URL...
      
      Retrieve files via HTTP or FTP
      
              -s      Spider mode - only check file existence
              -c      Continue retrieval of aborted transfer
              -q      Quiet
              -P DIR  Save to DIR (default .)
              -O FILE Save to FILE ('-' for stdout)
              -U STR  Use STR for User-Agent header
              -Y      Use proxy ('on' or 'off')
      
      wget: invalid option -- n
      BusyBox v1.19.4 (2013-03-14 11:28:31 UTC) multi-call binary.
      
      Usage: wget [-c|--continue] [-s|--spider] [-q|--quiet] [-O|--output-document FILE]
              [--header 'header: value'] [-Y|--proxy on/off] [-P DIR]
              [--no-check-certificate] [-U|--user-agent AGENT] URL...
      
      Retrieve files via HTTP or FTP
      
              -s      Spider mode - only check file existence
              -c      Continue retrieval of aborted transfer
              -q      Quiet
              -P DIR  Save to DIR (default .)
              -O FILE Save to FILE ('-' for stdout)
              -U STR  Use STR for User-Agent header
              -Y      Use proxy ('on' or 'off')
      
      wget: invalid option -- n
      BusyBox v1.19.4 (2013-03-14 11:28:31 UTC) multi-call binary.
      
      Usage: wget [-c|--continue] [-s|--spider] [-q|--quiet] [-O|--output-document FILE]
              [--header 'header: value'] [-Y|--proxy on/off] [-P DIR]
              [--no-check-certificate] [-U|--user-agent AGENT] URL...
      
      Retrieve files via HTTP or FTP
      
              -s      Spider mode - only check file existence
              -c      Continue retrieval of aborted transfer
              -q      Quiet
              -P DIR  Save to DIR (default .)
              -O FILE Save to FILE ('-' for stdout)
              -U STR  Use STR for User-Agent header
              -Y      Use proxy ('on' or 'off')
      
      /root/bin/gethosts.sh: line 48: syntax error near unexpected token `)'
      /root/bin/gethosts.sh: line 48: `echo "so don't delete that file! (It's saved read-only for your protection.)"'

      V---------------------------------EDIT-----------------------------------V
      It seems as though a lightweight version of wget was being used (perhaps it comes in BusyBox?). I installed a full version of wget and the script now gets further:

      Code:
      root@OpenWrt:~/bin# opkg update
      Downloading http://downloads.openwrt.org/attitude_adjustment/12.09/ar71xx/generic/packages/Packages.gz.
      Updated list of available packages in /var/opkg-lists/attitude_adjustment.
      root@OpenWrt:~/bin# opkg install wget
      Installing wget (1.13.4-1) to root...
      Downloading http://downloads.openwrt.org/attitude_adjustment/12.09/ar71xx/generic/packages/wget_1.13.4-1_ar71xx.ipk.
      Installing libopenssl (1.0.1e-1) to root...
      Downloading http://downloads.openwrt.org/attitude_adjustment/12.09/ar71xx/generic/packages/libopenssl_1.0.1e-1_ar71xx.ipk.
      Configuring libopenssl.
      Configuring wget.
      root@OpenWrt:~/bin# ~/bin/gethosts.sh
      Downloading ad-blocking hosts files...
      2013-10-28 20:29:56 URL:http://winhelp2002.mvps.org/hosts.txt [566133/566133] -> "-" [1]
      2013-10-28 20:30:03 URL:http://hosts-file.net/.%5Cad_servers.txt [423245/423245] -> "-" [1]
      2013-10-28 20:30:06 URL:http://someonewhocares.org/hosts/hosts [311201/311201] -> "-" [1]
      /root/bin/gethosts.sh: line 48: syntax error near unexpected token `)'
      /root/bin/gethosts.sh: line 48: `echo "so don't delete that file! (It's saved read-only for your protection.)"'
      Looks like the temporary file that the host files are downloaded to isn't being created properly. Will investigate.

      V-----------------------------EDIT_2----------------------------------V

      Turns out the command mktemp is a package that isn't available for OpenWrt.

      Code:
      root@OpenWrt:~# opkg update
      Downloading http://downloads.openwrt.org/attitude_adjustment/12.09/ar71xx/generic/packages/Packages.gz.
      Updated list of available packages in /var/opkg-lists/attitude_adjustment.
      root@OpenWrt:~# opkg install mktemp
      Unknown package 'mktemp'.
      Collected errors:
       * opkg_install_cmd: Cannot install package mktemp.
      root@OpenWrt:~#
      I may need to edit Steve's script so that it creates an actual file in /root, and then deletes it when done.

      V-----------------------------EDIT_3----------------------------------V

      Spoke too soon. It is available, it's just not called mktemp

      Code:
      root@OpenWrt:~# opkg list | grep mktemp
      coreutils-mktemp - 8.16-1 - Full version of standard GNU mktemp utility. Normally, you would not use this package, since the functionality in BusyBox is more than sufficient.
      root@OpenWrt:~# opkg install coreutils-mktemp
      Installing coreutils-mktemp (8.16-1) to root...
      Downloading http://downloads.openwrt.org/attitude_adjustment/12.09/ar71xx/generic/packages/coreutils-mktemp_8.16-1_ar71xx.ipk.
      Installing coreutils (8.16-1) to root...
      Downloading http://downloads.openwrt.org/attitude_adjustment/12.09/ar71xx/generic/packages/coreutils_8.16-1_ar71xx.ipk.
      Configuring coreutils.
      Configuring coreutils-mktemp.
      The output of the script is the same after installing.

      Could mktemp be behaving badly because I'm running it as root?

      V-----------------------------EDIT_4----------------------------------V

      I've checked the /tmp directory on the router, and it looks like the files are being created properly, but the second file never gets populated.

      Code:
      root@OpenWrt:/tmp# ls -l
      -rw-------    1 root     root       1300579 Oct 28 22:11 tmp.1WREOC
      -rw-------    1 root     root             0 Oct 28 22:10 tmp.fjTnbf
      Looking at the script, I guess this means it is a problem with either sed or sort.

      V-----------------------------EDIT_5----------------------------------V

      Installed sed (package name "sed") but no change. Tried to install sort (package name "coreutils-sort") but there was a conflict with the binary provided by busybox... will try and figure out how to change the symlink from /usr/bin/sort to one pointing to coreutils-sort.

      Code:
      root@OpenWrt:/tmp# opkg install coreutils-sort
      Installing coreutils-sort (8.16-1) to root...
      Downloading http://downloads.openwrt.org/attitude_adjustment/12.09/ar71xx/generic/packages/coreutils-sort_8.16-1_ar71xx.ipk.
      Collected errors:
       * check_data_file_clashes: Package coreutils-sort wants to install file /usr/bin/sort
              But that file is already provided by package  * busybox
       * opkg_install_cmd: Cannot install package coreutils-sort.
      V-----------------------------EDIT_6----------------------------------V

      To get coreutils-sort to install I had to force overwrite the symlink to busybox:

      Code:
      root@OpenWrt:~# opkg update
      Downloading http://downloads.openwrt.org/attitude_adjustment/12.09/ar71xx/generic/packages/Packages.gz.
      Updated list of available packages in /var/opkg-lists/attitude_adjustment.
      root@OpenWrt:~# opkg install --force-overwrite coreutils-sort
      Installing coreutils-sort (8.16-1) to root...
      Downloading http://downloads.openwrt.org/attitude_adjustment/12.09/ar71xx/generic/packages/coreutils-sort_8.16-1_ar71xx.ipk.
      Configuring coreutils-sort.
      Here is the current output:

      Code:
      root@OpenWrt:/usr/bin# ~/bin/gethosts.sh
      Downloading ad-blocking hosts files...
      2013-10-29 15:26:52 URL:http://winhelp2002.mvps.org/hosts.txt [566133/566133] -> "-" [1]
      2013-10-29 15:26:54 URL:http://hosts-file.net/.%5Cad_servers.txt [434375/434375] -> "-" [1]
      2013-10-29 15:26:57 URL:http://someonewhocares.org/hosts/hosts [311201/311201] -> "-" [1]
      Parsing, cleaning, de-duplicating, sorting...
      /root/bin/gethosts.sh: line 48: syntax error near unexpected token `('
      /root/bin/gethosts.sh: line 48: `echo "so don't delete that file! (It's saved read-only for your protection.)"'
      This still results in two temporary files, one of which is empty.
      Last edited by Feathers McGraw; Oct 29, 2013, 09:28 AM. Reason: progress
      samhobbs.co.uk

      Comment


        #4
        Oh dear. I'm such an idiot!

        When I copied the script over from Kubuntu to the router, some of the lines got truncated.

        So I wan't actually running Steve's script, I was running the Swiss Cheese version.



        Now it works. Probably no need to install the full GNU versions of all those utilities
        samhobbs.co.uk

        Comment

        Working...
        X