Announcement

Collapse
No announcement yet.

Script to automate building an adblocking hosts file

Collapse
This topic is closed.
X
This is a sticky topic.
X
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    Script to automate building an adblocking hosts file

    After comparing the performance of browser-based ad blockers to custom-crafted hosts files, I've concluded that the latter is better. I've found four reasonably updated sources -- the winhelp one is the largest and probably most familiar, but it seems to be updated less frequently than some of the others.

    So I've spent the last couple hours teaching myself bash scripts and especially the handy little sed utility. I've built a script that downloads the files, cleans out all their comments, de-duplicates entries, and merges the result with your system's original hosts file.

    Code:
    #!/bin/bash
    
    # If this is our first run, save a copy of the system's original hosts file and set to read-only for safety
    if [ ! -f ~/hosts-system ]
    then
     echo "Saving copy of system's original hosts file..."
     cp /etc/hosts ~/hosts-system
     chmod 444 ~/hosts-system
    fi
    
    # Perform work in temporary files
    temphosts1=$(mktemp)
    temphosts2=$(mktemp)
    
    # Obtain various hosts files and merge into one
    echo "Downloading ad-blocking hosts files..."
    wget -nv -O - http://winhelp2002.mvps.org/hosts.txt >> $temphosts1
    wget -nv -O - http://hosts-file.net/ad_servers.asp >> $temphosts1
    wget -nv -O - http://someonewhocares.org/hosts/hosts >> $temphosts1
    wget -nv -O - "http://pgl.yoyo.org/adservers/serverlist.php?hostformat=hosts&showintro=0&mimetype=plaintext" >> $temphosts1
    
    # Do some work on the file:
    # 1. Remove MS-DOS carriage returns
    # 2. Delete all lines that don't begin with 127.0.0.1
    # 3. Delete any lines containing the word localhost because we'll obtain that from the original hosts file
    # 4. Replace 127.0.0.1 with 0.0.0.0 because then we don't have to wait for the resolver to fail
    # 5. Scrunch extraneous spaces separating address from name into a single tab
    # 6. Delete any comments on lines
    # 7. Clean up leftover trailing blanks
    # Pass all this through sort with the unique flag to remove duplicates and save the result
    echo "Parsing, cleaning, de-duplicating, sorting..."
    sed -e 's/\r//' -e '/^127.0.0.1/!d' -e '/localhost/d' -e 's/127.0.0.1/0.0.0.0/' -e 's/ \+/\t/' -e 's/#.*$//' -e 's/[ \t]*$//' < $temphosts1 | sort -u > $temphosts2
    
    # Combine system hosts with adblocks
    echo Merging with original system hosts...
    echo -e "\n# Ad blocking hosts generated "$(date) | cat ~/hosts-system - $temphosts2 > ~/hosts-block
    
    # Clean up temp files and remind user to copy new file
    echo "Cleaning up..."
    rm $temphosts1 $temphosts2
    echo "Done."
    echo
    echo "Copy ad-blocking hosts file with this command:"
    echo " sudo cp ~/hosts-block /etc/hosts"
    echo
    echo "You can always restore your original hosts file with this command:"
    echo " sudo cp ~/hosts-system /etc/hosts"
    echo "so don't delete that file! (It's saved read-only for your protection.)"
    echo
    Save the text above into a file called ~/gethosts. Make it executable with this command:
    Code:
    chmod +x ~/gethosts
    To run the script, simply:
    Code:
    ~/gethosts
    The first time you run the script, it saves your existing /etc/hosts to ~/hosts-system because it will reuse this each time you run it. You can re-run the script whenever you feel like updating your ad blocking hosts.

    The script outputs the file ~/hosts-block. Each time you run it, you'll need to manually replace your existing host file with this command:
    Code:
    sudo cp ~/hosts-block /etc/hosts
    I think this would be a neat thing to schedule in /etc/cron.weekly, but I'll need to put some more smarts into it first. At least you can start playing around with it on your own now. Enjoy.

    Minor addition
    If you want slightly shorten the number of keystrokes required to run the utility, you can create a bin subdirectory in your home folder. When you start a shell, if the directory ~/bin exists, it is automatically added to your $PATH. Now, place the script in this subdirectory. Then you can simply run
    Code:
    gethosts
    without any of that extra tedious punctuation

    (Thanks to SecretCode for the idea!)
    Last edited by SteveRiley; Nov 14, 2013, 03:33 AM.

    #2
    Script to automate building an adblocking hosts file

    Originally posted by SteveRiley
    After comparing the performance of browser-based ad blockers to custom-crafted hosts files, I've concluded that the latter is better. I've found four reasonably updated sources -- the winhelp one is the largest and probably most familiar, but it seems to be updated less frequently than some of the others.
    I second this observation, however I don't have any technical information to back it up. Just notice that the browser seems to be a bit faster.

    The only advertisements that do sneak through that the browser ad-blocker used to catch are the occasional flash based. Looking for ways to block those, but at this point, I like the performance of the hosts file.

    Comment


      #3
      Script to automate building an adblocking hosts file

      Wow -- verrrrry cool, Steve! I wish I could script like that.

      I might add, chromium-browser has the ghostery plugin that does a nice job of blocking trackers (and letting you see it).

      Comment


        #4
        Script to automate building an adblocking hosts file

        Already discovered a minor inconvenient side effect... if you receive marketing email and want to click the opt-out link, many of the hostnames in those URLs are included in the block lists. You'll have to temporarily disable the blocking hosts file (sudo mv, then sudo mv back) for the opt-out to work.

        Over the long weekend I plan to make this script smarter. I hope to figure out a way to incorporate it into VMs, too... if you're using Windows on VirtualBox, say, then when you update your hosts's hosts (haha), you can pull that into your VM's hosts file, too.

        I will likely start a new thread and move this over there, to improve discoverability.

        Comment


          #5
          Script to automate building an adblocking hosts file

          Originally posted by SteveRiley
          ......
          Over the long weekend I plan to make this script smarter. I hope to figure out a way to incorporate it into VMs, too... if you're using Windows on VirtualBox, say, then when you update your hosts's hosts (haha), you can pull that into your VM's hosts file, too.
          ....
          Steve,
          Did you ever get around to making that script smarter?
          If so, I'd like to mooch it!
          GG
          "I would rather have questions that can't be answered, than answers that can't be questioned." ― Richard Feynman

          Comment


            #6
            Script to automate building an adblocking hosts file

            Originally posted by GreyGeek
            Did you ever get around to making that script smarter?
            Not yet...got distracted by other things, mostly my KDE-from-scratch experiment on my Mini. Still planning to extend the script's capabilities, though. Besides the VM integration, is there anything else you'd like to see?

            Comment


              #7
              Script to automate building an adblocking hosts file

              Not to crap on Steve's parade, I found an app called MoBlock which I've just begun playing with. Cross-platform and auto-updating and uses Blue Tack's hosts list or any others you want to config. Worth looking at.

              As far as Steve's tool - Add different settings for different users. As in: allow the adults to see 18+ sites, but the kids can't, everyone get ad-blocking, and Mom (for some strange reason) can't log into any shopping sites...<chuckle>
              Please Read Me
              Be not the first by whom the new are tried, Nor yet the last to lay the old aside. - Alexander Pope, An Essay on Criticism, 1711

              Comment


                #8
                Script to automate building an adblocking hosts file

                Originally posted by oshunluvr
                I found an app called MoBlock which I've just begun playing with. Cross-platform and auto-updating and uses Blue Tack's hosts list or any others you want to config
                Interesting...it appears the project stopped a while ago, and the work is transitioning to a Linux version of PeerGuardian. I've PG on Windows, didn't even think to check whether there was a Linux version.

                And, alas, Bluetack is having some problems staying afloat...

                Originally posted by oshunluvr
                As far as Steve's tool - Add different settings for different users. As in: allow the adults to see 18+ sites, but the kids can't, everyone get ad-blocking, and Mom (for some strange reason) can't log into any shopping sites...<chuckle>
                That's a lot easier to do with browser plug-ins like AdBlock because those tools rely on lists of regular expressions. I'd have to search for hosts file lists that differentiate between 18+ sites and also add some form of authentication awareness into my script so it knows who's logged into the box.

                Comment


                  #9
                  Script to automate building an adblocking hosts file

                  Originally posted by SteveRiley
                  ....
                  Besides the VM integration, is there anything else you'd like to see?
                  The wgets are very fast and the sed command is completed in under a second on my box, leaving a file with 30,328 blocked entries.

                  What you have is, for most people, pretty inclusive. It is for me, anyway.

                  Perhaps a kdialog gui and also include the option to
                  temporarily disable the blocking hosts file (sudo mv, then sudo mv back)
                  to allow clicking some links in email?

                  Code:
                  Usage: kdialog [Qt-options] [KDE-options] [options] [arg] 
                  
                  KDialog can be used to show nice dialog boxes from shell scripts
                  
                  Generic options:
                   --help          Show help about options
                   --help-qt         Show Qt specific options
                   --help-kde        Show KDE specific options
                   --help-all        Show all options
                   --author         Show author information
                   -v, --version       Show version information
                   --license         Show license information
                   --            End of options
                  
                  Options:
                   --yesno <text>      Question message box with yes/no buttons
                   --yesnocancel <text>   Question message box with yes/no/cancel buttons
                   --warningyesno <text>   Warning message box with yes/no buttons
                   --warningcontinuecancel <text> Warning message box with continue/cancel buttons
                   --warningyesnocancel <text> Warning message box with yes/no/cancel buttons
                   --sorry <text>      'Sorry' message box
                   --error <text>      'Error' message box
                   --msgbox <text>      Message Box dialog
                   --inputbox <text> <init> Input Box dialog
                   --password <text>     Password dialog
                   --textbox <file> [width] [height] Text Box dialog
                   --textinputbox <text> <init> [width] [height] Text Input Box dialog
                   --combobox <text> item [item] [item] ... ComboBox dialog
                   --menu <text> [tag item] [tag item] ... Menu dialog
                   --checklist <text> [tag item status] ... Check List dialog
                   --radiolist <text> [tag item status] ... Radio List dialog
                   --passivepopup <text> <timeout> Passive Popup
                   --getopenfilename [startDir] [filter] File dialog to open an existing file
                   --getsavefilename [startDir] [filter] File dialog to save a file
                   --getexistingdirectory [startDir] File dialog to select an existing directory
                   --getopenurl [startDir] [filter] File dialog to open an existing URL
                   --getsaveurl [startDir] [filter] File dialog to save a URL
                   --geticon [group] [context] Icon chooser dialog
                   --progressbar <text> [totalsteps] Progress bar dialog, returns a D-Bus reference for communication
                   --getcolor        Color dialog to select a color
                   --title <text>      Dialog title
                   --default <text>     Default entry to use for combobox, menu and color
                   --multiple        Allows the --getopenurl and --getopenfilename options to return multiple files
                   --separate-output     Return list items on separate lines (for checklist option and file open with --multiple)
                   --print-winid       Outputs the winId of each dialog
                   --dontagain <file:entry> Config file and option name for saving the "do-not-show/ask-again" state
                   --slider <text> [minvalue] [maxvalue] [step] Slider dialog box, returns selected value
                   --calendar <text>     Calendar dialog box, returns selected date
                   --attach <winid>     Makes the dialog transient for an X app specified by winid
                  
                  Arguments:
                   arg            Arguments - depending on main option
                  Examples are here.
                  "I would rather have questions that can't be answered, than answers that can't be questioned." ― Richard Feynman

                  Comment


                    #10
                    Re: Script to automate building an adblocking hosts file

                    Originally posted by GreyGeek
                    Perhaps a kdialog gui and also include the option to
                    temporarily disable the blocking hosts file (sudo mv, then sudo mv back)
                    to allow clicking some links in email?
                    Hey, I like that idea -- using the GUI for the interactive portions of the script. Dunno why, but I was originally going to stick with something in text mode. Seems kinda silly now that I think about it!

                    Comment


                      #11
                      Re: Script to automate building an adblocking hosts file

                      I forgot to include the link to the Kdialog Tutorial


                      Click image for larger version

Name:	table2.png
Views:	1
Size:	249.0 KB
ID:	640383
                      Last edited by Snowhog; Jun 12, 2013, 12:06 AM.
                      "I would rather have questions that can't be answered, than answers that can't be questioned." ― Richard Feynman

                      Comment


                        #12
                        Re: Script to automate building an adblocking hosts file

                        In my case - I'm using multiple host files tailored to the user and creating a login-time host file based on individual users. I use a base file called hosts.local and then each user has .hosts.bad in their home. At log in the .local file is added to the .bad file as /etc/hosts.

                        The nice thing about moblock is being able to turn it off when needed. The above method makes it more difficult. However, I like that it doesn't use ram or cpu time.
                        Please Read Me
                        Be not the first by whom the new are tried, Nor yet the last to lay the old aside. - Alexander Pope, An Essay on Criticism, 1711

                        Comment


                          #13
                          Jeez, I was not aware that adblock+ was eating so much RAM... thanks for the script Steve. +1
                          Ok, got it: Ashes come from burning.

                          Comment


                            #14
                            Originally posted by SteveRiley View Post
                            ....
                            So I've spent the last couple hours teaching myself bash scripts and especially the handy little sed utility. ...
                            A couple hours, eh?

                            Tell me again what planet you are from .... I forgot!

                            Thanks for the fish!!! I'm going to eat it!
                            "I would rather have questions that can't be answered, than answers that can't be questioned." ― Richard Feynman

                            Comment


                              #15
                              Uh...Jerry? You OK?

                              Comment

                              Working...
                              X