Announcement

Collapse
No announcement yet.

Script to automate building an adblocking hosts file

Collapse
This topic is closed.
X
This is a sticky topic.
X
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    #46
    Wow! This is a PRIME example of how Open Source is supposed to work. Steve creates a very neat Bash script, his first, to create a specialized /etc/hosts file and folks jump in and add mods for their special purposes. Everyone benefits! Now, suppose he had created a binary to sell as shareware? Only he could have made changes, depriving himself and others of improvements, changes, bug fixes, etc..., that other more experienced Bash script writers could have contributed. Everyone benefits from Steve, Feathers and the other contributers.
    Last edited by GreyGeek; Oct 28, 2013, 05:22 AM.
    "A nation that is afraid to let its people judge the truth and falsehood in an open market is a nation that is afraid of its people.”
    – John F. Kennedy, February 26, 1962.

    Comment


      #47
      Plus, if either of us have done something stupid (even if it does work), people can suggest an alternative, and we all learn. Now all I need to do is work out which Google things to enable, which may turn out to be the most difficult bit!
      samhobbs.co.uk

      Comment


        #48
        Google Analytics for Wordpress works by inserting the analytics code into the header of each page. This is the code:

        Code:
                    <script type="text/javascript">//<![CDATA[
                    // Google Analytics for WordPress by Yoast v4.3.3 | http://yoast.com/wordpress/google-analytics/
                    var _gaq = _gaq || [];
                    _gaq.push(['_setAccount', 'XXXXXXXXX']);
        				            _gaq.push(['_setCustomVar',2,'post_type','page',3],['_setCustomVar',4,'year','2013',3],['_trackPageview']);
                    (function () {
                        var ga = document.createElement('script');
                        ga.type = 'text/javascript';
                        ga.async = true;
                        ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
        
                        var s = document.getElementsByTagName('script')[0];
                        s.parentNode.insertBefore(ga, s);
                    })();
                    //]]></script>
        So I'm going to try putting "google-analytics" in that whitelist file and see what happens. Thought I might need a few more phrases!

        Feathers
        Last edited by Feathers McGraw; Oct 28, 2013, 12:11 PM.
        samhobbs.co.uk

        Comment


          #49
          I don't want to take over Steve's thread, so I created a new one here for the router project, so I don't clutter this one with stuff not specific to Kubuntu.
          samhobbs.co.uk

          Comment


            #50
            Kum-by-yah, Jerr-ee, kum-by-yah...

            LOLOLOLOLOL

            Comment


              #51
              Hehe
              samhobbs.co.uk

              Comment


                #52
                Nice script, but there are a few inherent weaknesses in /etc/hosts blocking. For example, it only affects DNS queries...so things that use IP addresses instead of DNS queries are unaffected (I think this is somewhat of a tendency these days...like Google safebrowsing).

                I'd personally prefer something like privoxy (http://en.wikipedia.org/wiki/Privoxy) for ad-blocking/privacy, but of course there is nothing wrong in using both.

                In case you're interested in suggestions for the script:
                1. since /etc/hosts is systemwide, I'd probably use something like /var/local/hostblock instead of users $HOME for storing backup of hosts, and /usr/local/sbin for the script.
                2. use variables instead of hardcoded paths/filenames for easier modification.
                3. You could create separate dynamic hostblock files in /var/local/hostblock which could be used to generate the hosts file, like:
                hostblock.localhost (localhost hosts entries)
                hostblock.static (static addresses, like static lan addresses)
                hostblock.dynamic (user configurable dynamic addresses queried at runtime, like shortcut entries for dynamic DNS hostnames)
                hostblock.block (null addresses for ad-blocking)
                hostblock.blacklist (user configurable additions to blocked hosts)
                hostblock.whitelist (addresses user wants to whitelist, removed from blocked hosts)
                and possibly some configuration files:
                hostblock.conf (could be used to store the variables)
                hostblock.blocklists (store list of urls of adblocking hosts-files downloaded from the net)
                4. make it cronjob friendly, this could include variable times for changing hosts (entries in hostblock.dynamic could be checked every 10 minutes, hostblock.block once a week etc.) and some error checking to make sure it'll make a valid hosts file in case /etc/hosts is modified automatically.

                (All just suggestions, of course, if you prefer to keep it simple that's completely fine)

                Comment


                  #53
                  Can you not add an IP address to a hosts file?
                  samhobbs.co.uk

                  Comment


                    #54
                    Originally posted by Feathers McGraw View Post
                    Can you not add an IP address to a hosts file?
                    Connections to IP addresses bypass DNS (DNS is there to match domain names to IP addresses, which obviously isn't necessary for IP addresses)...so the hosts file doesn't come into play.

                    Comment


                      #55
                      I see.

                      Having a read of that Privoxy link now, looks really interesting, thanks!
                      samhobbs.co.uk

                      Comment


                        #56
                        "A nation that is afraid to let its people judge the truth and falsehood in an open market is a nation that is afraid of its people.”
                        – John F. Kennedy, February 26, 1962.

                        Comment


                          #57
                          Originally posted by kubicle View Post
                          In case you're interested in suggestions for the script
                          Thanks, those are great suggestions. Would certainly be useful for broadening my Bash script understanding, and make the code more inline with Linux conventions.

                          Comment


                            #58
                            Since, two years later, you're still interested in comment...

                            Feathers had a problem with truncation copying the script, I think, and it brought to mind a reaction I had originally, but dismissed at the time as just being pernickety, though now I regret not speaking up. Anyway, I would lay out the long sed command using line continuations, maybe:
                            Code:
                            sed -e 's/\r//'               \
                                -e '/^127.0.0.1/!d'       \
                                -e '/localhost/d'         \
                                -e 's/127.0.0.1/0.0.0.0/' \
                                -e 's/ \+/\t/'            \
                                -e 's/#.*$//'             \
                                -e 's/[ \t]*$//'          \
                                < $temphosts1 |
                                sort -u > $temphosts2
                            and the merge:
                            Code:
                            echo -e "\n# Ad blocking hosts generated "$(date) |
                                cat ~/hosts-system - $temphosts2 > ~/hosts-block
                            Looks cool IMO with syntax colouring, like kate provides.

                            Regards, John Little
                            Regards, John Little

                            Comment


                              #59
                              Originally posted by Feathers McGraw View Post
                              If I've done anything embarrassingly inefficient, let me know lol.
                              Right you are then, the comm command does this stuff, only it needs sorted input, so would have to be applied before Steve's merge step, and the whitelist would need to be sorted:

                              Code:
                              comm -23 $temphosts2 whitelist > $temphosts3
                              Regards, John Little
                              Regards, John Little

                              Comment


                                #60
                                Originally posted by jlittle View Post
                                Right you are then, the comm command does this stuff, only it needs sorted input, so would have to be applied before Steve's merge step, and the whitelist would need to be sorted:

                                Code:
                                comm -23 $temphosts2 whitelist > $temphosts3
                                Regards, John Little
                                Thanks for this. If I put it all in one script (probably will for convenience) I'll look into it.
                                samhobbs.co.uk

                                Comment

                                Working...
                                X