URL Filter and self updating blacklists?

@mutley
I am using for… I forgot how many years.

Gakd to see is stil maintained that script : I closed my Raspberry Pi 1 (oh yeah!) that was providing PiHole service when discovered your script.

Happy to report it works also in core 162… Or at least I did not spot any error (yet).

At one moment in time, just for fun, I reached a dew millions lines in blocklist.conf by using 10 or more sources that aggregate other hundreds of sources…

Thank you,
Great script!
H&M

Hi @hjkl would you mind sharing your set of additional blocklists?
Thanks,
@cbrown

@hjkl Really glad its been useful to you, and also glad you posted about it. it’s been running for years on my system without issues. There are a few small updates that will make managing your own lists easier now.

Still need to figure out why the ipfire safe search unbound configuration effects it though. But after looking at how that works, I’m kinda surprised it was even implemented, seems to defeat the purpose of configuring your own DNS servers.

@cbrown you can use any blocklist that uses a known format, so the majority of pi-hole block lists. There is a good list in the below url.

1 Like

@mut ley:

I guess that this is not possible with your script?

Deep CNAME inspection

“This will allow Pi-hole to find whether any domain in the CNAME chain is known to be blocked. If one is found, Pi-hole can now block the original query. The feature defaults to being enabled but can be disabled with an FTL config option ( CNAME_DEEP_INSPECT=false ).”

Yes it can. Read the NXDOMAIN part of the information.

Unbound can’t return an IP for these cases, it has to reject at the domain level, so you turn this on by setting the return to be something other than an IP address. Valid returns are listed on the page, but something like -r always_nxdomain will do what you want.

Word of caution, I found this to be too restrictive. Many of the blocklists that are maintained by individuals or small teams are not the most accurate, and once a domain added, usually it’s their for ever. So when you start blocking blindly at the next level higher you tend to block valid stuff. So be careful with the blocklists you pick.
But on the positive side, it will reduce the size of the blocklist it passes to unbound, so less overhead their.

Thank you very much. With your Script Pi-Hole is then no longer necessary. Disadvantage would be less comfort compared to a GUI with statistics (especially if you want to find out why something causes problems).

Edit:
I tried the script:

dns_blocklist.sh -s 1,2,3,4,“https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/spam.mails",“https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/easylist”,“https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/crypto”,“https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/gambling”,“https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/malware”,“https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/notserious”,“https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/pornblock1”,“https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/pornblock2”,“https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/pornblock3”,“https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/pornblock4”,“https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts”,“https://s3.amazonaws.com/lists.disconnect.me/simple_tracking.txt”,“https://s3.amazonaws.com/lists.disconnect.me/simple_ad.txt”,“https://raw.githubusercontent.com/ookangzheng/dbl-oisd-nl/master/dbl.txt”,“https://raw.githubusercontent.com/notracking/hosts-blocklists/master/dnscrypt-proxy/dnscrypt-proxy.blacklist.txt”,“https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/Fake-Science”,“https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/Phishing-Angriffe”,“https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/Streaming”,“https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/Win10Telemetry”,“https://easylist.to/easylistgermany/easylistgermany.txt”,“https://easylist.to/easylist/easyprivacy.txt”,“https://easylist.to/easylist/fanboy-social.txt”,“https://raw.githubusercontent.com/crazy-max/WindowsSpyBlocker/master/data/hosts/extra.txt”,“https://raw.githubusercontent.com/crazy-max/WindowsSpyBlocker/master/data/hosts/spy.txt”,“https://s3.amazonaws.com/lists.disconnect.me/simple_tracking.txt”,“https://s3.amazonaws.com/lists.disconnect.me/simple_ad.txt”,“https://raw.githubusercontent.com/AdguardTeam/AdguardFilters/master/MobileFilter/sections/adservers.txt”,"https://raw.githubusercontent.com/AdguardTeam/AdguardFilters/master/MobileFilter/sections/spyware.txt” -r 0.0.0.0
Retreived 0 domain names from local blacklist file
Retreived 7528 domain names from https://adaway.org/hosts.txt
Retreived 8730 domain names from https://winhelp2002.mvps.org/hosts.txt
Retreived 97346 domain names from https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts
Retreived 76779 domain names from https://raw.githubusercontent.com/notracking/hosts-blocklists/master/hostnames.txt
Retreived 0 domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/spam.mails
Retreived 0 domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/easylist
Retreived 0 domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/crypto
Retreived 0 domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/gambling
Retreived 0 domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/malware
Retreived 0 domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/notserious
Retreived 0 domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/pornblock1
Retreived 0 domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/pornblock2
Retreived 0 domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/pornblock3
Retreived 0 domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/pornblock4
Retreived 97346 domain names from https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts
Retreived 0 domain names from https://s3.amazonaws.com/lists.disconnect.me/simple_tracking.txt
Retreived 0 domain names from https://s3.amazonaws.com/lists.disconnect.me/simple_ad.txt
Retreived 0 domain names from https://raw.githubusercontent.com/ookangzheng/dbl-oisd-nl/master/dbl.txt
Retreived 0 domain names from https://raw.githubusercontent.com/notracking/hosts-blocklists/master/dnscrypt-proxy/dnscrypt-proxy.blacklist.txt
Retreived 0 domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/Fake-Science
Retreived 0 domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/Phishing-Angriffe
Retreived 0 domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/Streaming
Retreived 0 domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/Win10Telemetry
Retreived 33 domain names from https://easylist.to/easylistgermany/easylistgermany.txt
Retreived 12893 domain names from https://easylist.to/easylist/easyprivacy.txt
Retreived 4 domain names from https://easylist.to/easylist/fanboy-social.txt
Retreived 295 domain names from https://raw.githubusercontent.com/crazy-max/WindowsSpyBlocker/master/data/hosts/extra.txt
Retreived 378 domain names from https://raw.githubusercontent.com/crazy-max/WindowsSpyBlocker/master/data/hosts/spy.txt
Retreived 0 domain names from https://s3.amazonaws.com/lists.disconnect.me/simple_tracking.txt
Retreived 0 domain names from https://s3.amazonaws.com/lists.disconnect.me/simple_ad.txt
Retreived 967 domain names from https://raw.githubusercontent.com/AdguardTeam/AdguardFilters/master/MobileFilter/sections/adservers.txt
Retreived 0 domain names from https://raw.githubusercontent.com/AdguardTeam/AdguardFilters/master/MobileFilter/sections/spyware.txt
Cleaning & Sorting list of 302299 entries
Writing list of 148965 entries to unbound configuration
Stopping Unbound DNS Proxy… [ OK ]
Starting Unbound DNS Proxy… [ OK ]
/root/bin/dns_blocklist.sh: Blocked Hosts Update, 148965 hosts blocked

Retreived 0 => this probably means that the list is incompatible. Similiar issue: https://forum.kuketz-blog.de/viewtopic.php?p=76319#p76319

That is correct. When I wrote this script all block-lists followed one of 3 standards, now that Pi-Hole has become popular it looks like another “standard” or non-standard has emerged, that is simply a list of domains with nothing else.
I have been thinking of if (and how) to incorporate those. But I also really haven’t seen anything in those lists that you can’t get in the better maintained ones.

@anon87475738 I’ve posted an update that now supports that type of block-list.
Using all those sources you’ll now get 9460672 domains sorted to 8381088, and a file just under 400MB. The quality of many of those lists is quite poor IMO, but they will now work.

1 Like

Thank you for the improvements. I will try it out soon.

Hey folks,

I found this list that seems to work:

ftp://ftp.ut-capitole.fr/pub/reseau/cache/squidguard_contrib/blacklists.tar.gz

I have my URL Filter settings configured to update it weekly, and so far so good.

-Paul

Hi @mutley
Here’s a weird wart. When I try to redirect stdout to a file, I get a long list of errors at line 240 in
./dns_blocklist.sh

[root@ipfire bin]# [root@ipfire bin]# ./dns_blocklist.sh -o blocklist.conf > output.txt
expr: syntax error: missing argument after ‘-’
./dns_blocklist.sh: line 240: [: -le: unary operator expected
expr: syntax error: missing argument after ‘-’
expr: syntax error: missing argument after ‘-’
./dns_blocklist.sh: line 240: [: -le: unary operator expected
expr: syntax error: missing argument after ‘-’
expr: syntax error: missing argument after ‘-’

Any clue as to what is going on here?

Thanks,
@cbrown

This guy seems to make lists for “every” device out there.

https://oisd.nl/downloads

I wonder if it is usable for you

Nick has some lists for Unbound as well

For facebook, scroll all the way down

It’s a “feature” …:slight_smile: The script will log to syslog when run from something without terminal (like fcron), or log to terminal if exists (like ssh access). I’ll fix the issue, but suggest running it without redirection for the moment.

1 Like

Hmm, here’s another weird wart. When I have it run from fcron, the blocklist somehow shrinks markedly. The sources I use produces over 3 million blocked entries when run from console but some how shrinks to less than 200k when run from fcron. Here’s relative info from syslog:

Feb 1 07:31:23 ipfire dns_blocklist.sh: Retreived domain names from https://adaway.org/hosts.txt
Feb 1 07:31:24 ipfire dns_blocklist.sh: Retreived domain names from https://winhelp2002.mvps.org/hosts.txt
Feb 1 07:31:25 ipfire dns_blocklist.sh: Retreived domain names from https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts
Feb 1 07:31:25 ipfire dns_blocklist.sh: Retreived domain names from https://raw.githubusercontent.com/notracking/hosts-blocklists/master/hostnames.txt
Feb 1 07:31:26 ipfire dns_blocklist.sh: Retreived domain names from https://easylist.to/easylist/easylist.txt
Feb 1 07:31:26 ipfire dns_blocklist.sh: Retreived domain names from https://easylist.to/easylist/fanboy-annoyance.txt
Feb 1 07:31:26 ipfire dns_blocklist.sh: Retreived domain names from https://easylist.to/easylist/easyprivacy.txt
Feb 1 07:31:59 ipfire dns_blocklist.sh: Retreived domain names from https://adaway.org/hosts.txt
Feb 1 07:31:59 ipfire dns_blocklist.sh: Retreived domain names from https://winhelp2002.mvps.org/hosts.txt
Feb 1 07:32:00 ipfire dns_blocklist.sh: Retreived domain names from https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts
Feb 1 14:06:48 ipfire dns_blocklist.sh: Retreived domain names from https://adaway.org/hosts.txt
Feb 1 14:06:49 ipfire dns_blocklist.sh: Retreived domain names from https://winhelp2002.mvps.org/hosts.txt
Feb 1 14:06:50 ipfire dns_blocklist.sh: Retreived domain names from https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts
Feb 2 01:25:01 ipfire dns_blocklist.sh: Retreived domain names from https://adaway.org/hosts.txt
Feb 2 01:25:01 ipfire dns_blocklist.sh: Retreived domain names from https://winhelp2002.mvps.org/hosts.txt
Feb 2 01:25:02 ipfire dns_blocklist.sh: Retreived domain names from https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts
Feb 2 01:25:03 ipfire dns_blocklist.sh: Retreived domain names from https://raw.githubusercontent.com/notracking/hosts-blocklists/master/hostnames.txt
Feb 2 01:25:03 ipfire dns_blocklist.sh: Retreived domain names from https://easylist.to/easylist/easylist.txt
Feb 2 01:25:03 ipfire dns_blocklist.sh: Retreived domain names from https://easylist.to/easylist/fanboy-annoyance.txt
Feb 2 01:25:04 ipfire dns_blocklist.sh: Retreived domain names from https://easylist.to/easylist/easyprivacy.txt
Feb 2 01:25:04 ipfire dns_blocklist.sh: Retreived domain names from https://easylist.to/easylist/fanboy-social.txt
Feb 2 01:25:04 ipfire dns_blocklist.sh: Retreived domain names from https://easylist.to/easylistgermany/easylistgermany.txt
Feb 2 01:25:04 ipfire dns_blocklist.sh: Retreived domain names from https://raw.githubusercontent.com/AdguardTeam/AdguardFilters/master/MobileFilter/sections/adservers.txt
Feb 2 01:25:04 ipfire dns_blocklist.sh: Retreived domain names from https://raw.githubusercontent.com/AdguardTeam/AdguardFilters/master/MobileFilter/sections/spyware.txt
Feb 2 01:25:05 ipfire dns_blocklist.sh: Retreived domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/Fake-Science
Feb 2 01:25:06 ipfire dns_blocklist.sh: Retreived domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/Phishing-Angriffe
Feb 2 01:25:07 ipfire dns_blocklist.sh: Retreived domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/Streaming
Feb 2 01:25:07 ipfire dns_blocklist.sh: Retreived domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/Win10Telemetry
Feb 2 01:25:07 ipfire dns_blocklist.sh: Retreived domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/crypto
Feb 2 01:25:08 ipfire dns_blocklist.sh: Retreived domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/easylist
Feb 2 01:25:11 ipfire dns_blocklist.sh: Retreived domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/gambling
Feb 2 01:25:13 ipfire dns_blocklist.sh: Retreived domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/malware
Feb 2 01:25:14 ipfire dns_blocklist.sh: Retreived domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/notserious
Feb 2 01:25:14 ipfire dns_blocklist.sh: Retreived domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/spam.mails
Feb 2 01:25:15 ipfire dns_blocklist.sh: Retreived domain names from https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts
Feb 2 01:25:15 ipfire dns_blocklist.sh: Retreived domain names from https://raw.githubusercontent.com/crazy-max/WindowsSpyBlocker/master/data/hosts/extra.txt
Feb 2 01:25:15 ipfire dns_blocklist.sh: Retreived domain names from https://raw.githubusercontent.com/crazy-max/WindowsSpyBlocker/master/data/hosts/spy.txt
Feb 2 01:25:16 ipfire dns_blocklist.sh: Retreived domain names from https://raw.githubusercontent.com/notracking/hosts-blocklists/master/dnscrypt-proxy/dnscrypt-proxy.blacklist.txt
Feb 2 01:25:20 ipfire dns_blocklist.sh: Retreived domain names from https://raw.githubusercontent.com/ookangzheng/dbl-oisd-nl/master/dbl.txt
Feb 2 01:25:20 ipfire dns_blocklist.sh: Retreived domain names from https://s3.amazonaws.com/lists.disconnect.me/simple_ad.txt
Feb 2 01:25:21 ipfire dns_blocklist.sh: Retreived domain names from https://s3.amazonaws.com/lists.disconnect.me/simple_ad.txt
Feb 2 01:25:21 ipfire dns_blocklist.sh: Retreived domain names from https://s3.amazonaws.com/lists.disconnect.me/simple_tracking.txt
Feb 2 01:25:21 ipfire dns_blocklist.sh: Retreived domain names from https://s3.amazonaws.com/lists.disconnect.me/simple_tracking.txt
Feb 2 01:25:21 ipfire dns_blocklist.sh: Cleaning & Sorting list of 322805 entries
Feb 2 01:25:22 ipfire dns_blocklist.sh: Written 165870 entries to /var/tmp/unbound_blocklist.conf

When I run from console it looks like this:

dns_blocklist.sh: Retreived 7041 domain names from https://adaway.org/hosts.txt
dns_blocklist.sh: Retreived 8730 domain names from https://winhelp2002.mvps.org/hosts.txt
dns_blocklist.sh: Retreived 100077 domain names from https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts
dns_blocklist.sh: Retreived 76566 domain names from https://raw.githubusercontent.com/notracking/hosts-blocklists/master/hostnames.txt
dns_blocklist.sh: Retreived 15654 domain names from https://easylist.to/easylist/easylist.txt
dns_blocklist.sh: Retreived 103 domain names from https://easylist.to/easylist/fanboy-annoyance.txt
dns_blocklist.sh: Retreived 12903 domain names from https://easylist.to/easylist/easyprivacy.txt
dns_blocklist.sh: Retreived 4 domain names from https://easylist.to/easylist/fanboy-social.txt
dns_blocklist.sh: Retreived 34 domain names from https://easylist.to/easylistgermany/easylistgermany.txt
dns_blocklist.sh: Retreived 967 domain names from https://raw.githubusercontent.com/AdguardTeam/AdguardFilters/master/MobileFilter/sections/adservers.txt
dns_blocklist.sh: Retreived 1 domain names from https://raw.githubusercontent.com/AdguardTeam/AdguardFilters/master/MobileFilter/sections/spyware.txt
dns_blocklist.sh: Retreived 2381 domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/Fake-Science
dns_blocklist.sh: Retreived 464400 domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/Phishing-Angriffe
dns_blocklist.sh: Retreived 4187 domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/Streaming
dns_blocklist.sh: Retreived 22 domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/Win10Telemetry
dns_blocklist.sh: Retreived 51435 domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/crypto
dns_blocklist.sh: Retreived 259564 domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/easylist
dns_blocklist.sh: Retreived 805643 domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/gambling
dns_blocklist.sh: Retreived 1033884 domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/malware
dns_blocklist.sh: Retreived 87321 domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/notserious
dns_blocklist.sh: Retreived 1081 domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/spam.mails
dns_blocklist.sh: Retreived 100077 domain names from https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts
dns_blocklist.sh: Retreived 295 domain names from https://raw.githubusercontent.com/crazy-max/WindowsSpyBlocker/master/data/hosts/extra.txt
dns_blocklist.sh: Retreived 378 domain names from https://raw.githubusercontent.com/crazy-max/WindowsSpyBlocker/master/data/hosts/spy.txt
dns_blocklist.sh: Retreived 258686 domain names from https://raw.githubusercontent.com/notracking/hosts-blocklists/master/dnscrypt-proxy/dnscrypt-proxy.blacklist.txt
dns_blocklist.sh: Retreived 1119532 domain names from https://raw.githubusercontent.com/ookangzheng/dbl-oisd-nl/master/dbl.txt
dns_blocklist.sh: Retreived 2701 domain names from https://s3.amazonaws.com/lists.disconnect.me/simple_ad.txt
dns_blocklist.sh: Retreived 2701 domain names from https://s3.amazonaws.com/lists.disconnect.me/simple_ad.txt
dns_blocklist.sh: Retreived 34 domain names from https://s3.amazonaws.com/lists.disconnect.me/simple_tracking.txt
dns_blocklist.sh: Retreived 34 domain names from https://s3.amazonaws.com/lists.disconnect.me/simple_tracking.txt
dns_blocklist.sh: Cleaning & Sorting list of 4416436 entries
dns_blocklist.sh: Removed 158 domain names due to whitelist
Writing list of 3256256 entries to unbound configuration
dns_blocklist.sh: Written 3256256 entries to blocklist.conf

Note the missing number value between Retrieved and domain in the syslog produced when run from fcron.

Look at the number of lists/sources it’s consuming, your config for fcron looks to be different than from the command line. One is 30 lists, the other 43.

The shrinking blocklist is what led me to playing with redirect stdout hoping to debug the issue. I wasn’t aware of output already going to syslog. I’ve setup to run now doing output to file (no unbound restart) running out of /etc/frcron.hourly to see what gets produced – running script with same args used at command line

Well, hourly just ticked – got the short / shrunk blocklist with only 165902 entries
Again I would like to point out there is no value showing in the log lines for each fetched source between Retrieved and domain names in the syslog entries. Whereas when run from the console I see Retrieved “some-number-value” domain names in the stdout on the console.

[edit] to be clear: when run from fcron, the log lines show: Retrieved blank domain names from some-source

Just updated the script. This will over come the redirection of stdout, and print full information to syslog.

1 Like

Cool, I queue that up for next fcon.hourly :grinning: