URL Filter and self updating blacklists?

@anon87475738 I’ve posted an update that now supports that type of block-list.
Using all those sources you’ll now get 9460672 domains sorted to 8381088, and a file just under 400MB. The quality of many of those lists is quite poor IMO, but they will now work.

1 Like

Thank you for the improvements. I will try it out soon.

Hey folks,

I found this list that seems to work:

ftp://ftp.ut-capitole.fr/pub/reseau/cache/squidguard_contrib/blacklists.tar.gz

I have my URL Filter settings configured to update it weekly, and so far so good.

-Paul

Hi @mutley
Here’s a weird wart. When I try to redirect stdout to a file, I get a long list of errors at line 240 in
./dns_blocklist.sh

[root@ipfire bin]# [root@ipfire bin]# ./dns_blocklist.sh -o blocklist.conf > output.txt
expr: syntax error: missing argument after ‘-’
./dns_blocklist.sh: line 240: [: -le: unary operator expected
expr: syntax error: missing argument after ‘-’
expr: syntax error: missing argument after ‘-’
./dns_blocklist.sh: line 240: [: -le: unary operator expected
expr: syntax error: missing argument after ‘-’
expr: syntax error: missing argument after ‘-’

Any clue as to what is going on here?

Thanks,
@cbrown

This guy seems to make lists for “every” device out there.

https://oisd.nl/downloads

I wonder if it is usable for you

Nick has some lists for Unbound as well

For facebook, scroll all the way down

It’s a “feature” …:slight_smile: The script will log to syslog when run from something without terminal (like fcron), or log to terminal if exists (like ssh access). I’ll fix the issue, but suggest running it without redirection for the moment.

1 Like

Hmm, here’s another weird wart. When I have it run from fcron, the blocklist somehow shrinks markedly. The sources I use produces over 3 million blocked entries when run from console but some how shrinks to less than 200k when run from fcron. Here’s relative info from syslog:

Feb 1 07:31:23 ipfire dns_blocklist.sh: Retreived domain names from https://adaway.org/hosts.txt
Feb 1 07:31:24 ipfire dns_blocklist.sh: Retreived domain names from https://winhelp2002.mvps.org/hosts.txt
Feb 1 07:31:25 ipfire dns_blocklist.sh: Retreived domain names from https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts
Feb 1 07:31:25 ipfire dns_blocklist.sh: Retreived domain names from https://raw.githubusercontent.com/notracking/hosts-blocklists/master/hostnames.txt
Feb 1 07:31:26 ipfire dns_blocklist.sh: Retreived domain names from https://easylist.to/easylist/easylist.txt
Feb 1 07:31:26 ipfire dns_blocklist.sh: Retreived domain names from https://easylist.to/easylist/fanboy-annoyance.txt
Feb 1 07:31:26 ipfire dns_blocklist.sh: Retreived domain names from https://easylist.to/easylist/easyprivacy.txt
Feb 1 07:31:59 ipfire dns_blocklist.sh: Retreived domain names from https://adaway.org/hosts.txt
Feb 1 07:31:59 ipfire dns_blocklist.sh: Retreived domain names from https://winhelp2002.mvps.org/hosts.txt
Feb 1 07:32:00 ipfire dns_blocklist.sh: Retreived domain names from https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts
Feb 1 14:06:48 ipfire dns_blocklist.sh: Retreived domain names from https://adaway.org/hosts.txt
Feb 1 14:06:49 ipfire dns_blocklist.sh: Retreived domain names from https://winhelp2002.mvps.org/hosts.txt
Feb 1 14:06:50 ipfire dns_blocklist.sh: Retreived domain names from https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts
Feb 2 01:25:01 ipfire dns_blocklist.sh: Retreived domain names from https://adaway.org/hosts.txt
Feb 2 01:25:01 ipfire dns_blocklist.sh: Retreived domain names from https://winhelp2002.mvps.org/hosts.txt
Feb 2 01:25:02 ipfire dns_blocklist.sh: Retreived domain names from https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts
Feb 2 01:25:03 ipfire dns_blocklist.sh: Retreived domain names from https://raw.githubusercontent.com/notracking/hosts-blocklists/master/hostnames.txt
Feb 2 01:25:03 ipfire dns_blocklist.sh: Retreived domain names from https://easylist.to/easylist/easylist.txt
Feb 2 01:25:03 ipfire dns_blocklist.sh: Retreived domain names from https://easylist.to/easylist/fanboy-annoyance.txt
Feb 2 01:25:04 ipfire dns_blocklist.sh: Retreived domain names from https://easylist.to/easylist/easyprivacy.txt
Feb 2 01:25:04 ipfire dns_blocklist.sh: Retreived domain names from https://easylist.to/easylist/fanboy-social.txt
Feb 2 01:25:04 ipfire dns_blocklist.sh: Retreived domain names from https://easylist.to/easylistgermany/easylistgermany.txt
Feb 2 01:25:04 ipfire dns_blocklist.sh: Retreived domain names from https://raw.githubusercontent.com/AdguardTeam/AdguardFilters/master/MobileFilter/sections/adservers.txt
Feb 2 01:25:04 ipfire dns_blocklist.sh: Retreived domain names from https://raw.githubusercontent.com/AdguardTeam/AdguardFilters/master/MobileFilter/sections/spyware.txt
Feb 2 01:25:05 ipfire dns_blocklist.sh: Retreived domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/Fake-Science
Feb 2 01:25:06 ipfire dns_blocklist.sh: Retreived domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/Phishing-Angriffe
Feb 2 01:25:07 ipfire dns_blocklist.sh: Retreived domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/Streaming
Feb 2 01:25:07 ipfire dns_blocklist.sh: Retreived domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/Win10Telemetry
Feb 2 01:25:07 ipfire dns_blocklist.sh: Retreived domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/crypto
Feb 2 01:25:08 ipfire dns_blocklist.sh: Retreived domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/easylist
Feb 2 01:25:11 ipfire dns_blocklist.sh: Retreived domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/gambling
Feb 2 01:25:13 ipfire dns_blocklist.sh: Retreived domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/malware
Feb 2 01:25:14 ipfire dns_blocklist.sh: Retreived domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/notserious
Feb 2 01:25:14 ipfire dns_blocklist.sh: Retreived domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/spam.mails
Feb 2 01:25:15 ipfire dns_blocklist.sh: Retreived domain names from https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts
Feb 2 01:25:15 ipfire dns_blocklist.sh: Retreived domain names from https://raw.githubusercontent.com/crazy-max/WindowsSpyBlocker/master/data/hosts/extra.txt
Feb 2 01:25:15 ipfire dns_blocklist.sh: Retreived domain names from https://raw.githubusercontent.com/crazy-max/WindowsSpyBlocker/master/data/hosts/spy.txt
Feb 2 01:25:16 ipfire dns_blocklist.sh: Retreived domain names from https://raw.githubusercontent.com/notracking/hosts-blocklists/master/dnscrypt-proxy/dnscrypt-proxy.blacklist.txt
Feb 2 01:25:20 ipfire dns_blocklist.sh: Retreived domain names from https://raw.githubusercontent.com/ookangzheng/dbl-oisd-nl/master/dbl.txt
Feb 2 01:25:20 ipfire dns_blocklist.sh: Retreived domain names from https://s3.amazonaws.com/lists.disconnect.me/simple_ad.txt
Feb 2 01:25:21 ipfire dns_blocklist.sh: Retreived domain names from https://s3.amazonaws.com/lists.disconnect.me/simple_ad.txt
Feb 2 01:25:21 ipfire dns_blocklist.sh: Retreived domain names from https://s3.amazonaws.com/lists.disconnect.me/simple_tracking.txt
Feb 2 01:25:21 ipfire dns_blocklist.sh: Retreived domain names from https://s3.amazonaws.com/lists.disconnect.me/simple_tracking.txt
Feb 2 01:25:21 ipfire dns_blocklist.sh: Cleaning & Sorting list of 322805 entries
Feb 2 01:25:22 ipfire dns_blocklist.sh: Written 165870 entries to /var/tmp/unbound_blocklist.conf

When I run from console it looks like this:

dns_blocklist.sh: Retreived 7041 domain names from https://adaway.org/hosts.txt
dns_blocklist.sh: Retreived 8730 domain names from https://winhelp2002.mvps.org/hosts.txt
dns_blocklist.sh: Retreived 100077 domain names from https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts
dns_blocklist.sh: Retreived 76566 domain names from https://raw.githubusercontent.com/notracking/hosts-blocklists/master/hostnames.txt
dns_blocklist.sh: Retreived 15654 domain names from https://easylist.to/easylist/easylist.txt
dns_blocklist.sh: Retreived 103 domain names from https://easylist.to/easylist/fanboy-annoyance.txt
dns_blocklist.sh: Retreived 12903 domain names from https://easylist.to/easylist/easyprivacy.txt
dns_blocklist.sh: Retreived 4 domain names from https://easylist.to/easylist/fanboy-social.txt
dns_blocklist.sh: Retreived 34 domain names from https://easylist.to/easylistgermany/easylistgermany.txt
dns_blocklist.sh: Retreived 967 domain names from https://raw.githubusercontent.com/AdguardTeam/AdguardFilters/master/MobileFilter/sections/adservers.txt
dns_blocklist.sh: Retreived 1 domain names from https://raw.githubusercontent.com/AdguardTeam/AdguardFilters/master/MobileFilter/sections/spyware.txt
dns_blocklist.sh: Retreived 2381 domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/Fake-Science
dns_blocklist.sh: Retreived 464400 domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/Phishing-Angriffe
dns_blocklist.sh: Retreived 4187 domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/Streaming
dns_blocklist.sh: Retreived 22 domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/Win10Telemetry
dns_blocklist.sh: Retreived 51435 domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/crypto
dns_blocklist.sh: Retreived 259564 domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/easylist
dns_blocklist.sh: Retreived 805643 domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/gambling
dns_blocklist.sh: Retreived 1033884 domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/malware
dns_blocklist.sh: Retreived 87321 domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/notserious
dns_blocklist.sh: Retreived 1081 domain names from https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/spam.mails
dns_blocklist.sh: Retreived 100077 domain names from https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts
dns_blocklist.sh: Retreived 295 domain names from https://raw.githubusercontent.com/crazy-max/WindowsSpyBlocker/master/data/hosts/extra.txt
dns_blocklist.sh: Retreived 378 domain names from https://raw.githubusercontent.com/crazy-max/WindowsSpyBlocker/master/data/hosts/spy.txt
dns_blocklist.sh: Retreived 258686 domain names from https://raw.githubusercontent.com/notracking/hosts-blocklists/master/dnscrypt-proxy/dnscrypt-proxy.blacklist.txt
dns_blocklist.sh: Retreived 1119532 domain names from https://raw.githubusercontent.com/ookangzheng/dbl-oisd-nl/master/dbl.txt
dns_blocklist.sh: Retreived 2701 domain names from https://s3.amazonaws.com/lists.disconnect.me/simple_ad.txt
dns_blocklist.sh: Retreived 2701 domain names from https://s3.amazonaws.com/lists.disconnect.me/simple_ad.txt
dns_blocklist.sh: Retreived 34 domain names from https://s3.amazonaws.com/lists.disconnect.me/simple_tracking.txt
dns_blocklist.sh: Retreived 34 domain names from https://s3.amazonaws.com/lists.disconnect.me/simple_tracking.txt
dns_blocklist.sh: Cleaning & Sorting list of 4416436 entries
dns_blocklist.sh: Removed 158 domain names due to whitelist
Writing list of 3256256 entries to unbound configuration
dns_blocklist.sh: Written 3256256 entries to blocklist.conf

Note the missing number value between Retrieved and domain in the syslog produced when run from fcron.

Look at the number of lists/sources it’s consuming, your config for fcron looks to be different than from the command line. One is 30 lists, the other 43.

The shrinking blocklist is what led me to playing with redirect stdout hoping to debug the issue. I wasn’t aware of output already going to syslog. I’ve setup to run now doing output to file (no unbound restart) running out of /etc/frcron.hourly to see what gets produced – running script with same args used at command line

Well, hourly just ticked – got the short / shrunk blocklist with only 165902 entries
Again I would like to point out there is no value showing in the log lines for each fetched source between Retrieved and domain names in the syslog entries. Whereas when run from the console I see Retrieved “some-number-value” domain names in the stdout on the console.

[edit] to be clear: when run from fcron, the log lines show: Retrieved blank domain names from some-source

Just updated the script. This will over come the redirection of stdout, and print full information to syslog.

1 Like

Cool, I queue that up for next fcon.hourly :grinning:

Well, that did the tick, @mutley :grinning:.

Feb 2 14:01:40 ipfire dns_blocklist.sh: Cleaning & Sorting list of 4415935 entries
Feb 2 14:01:43 ipfire dns_blocklist.sh: Removed 158 domain names due to whitelist
Feb 2 14:01:47 ipfire dns_blocklist.sh: Writing list of 3256031 entries to unbound nxdomain configuration
Feb 2 14:01:49 ipfire dns_blocklist.sh: Written 3256031 entries to /var/tmp/blocklist.conf

Thanks,
@cbrown

1 Like

Hello,
Adding that source does not add more lines in blacklist.
It seems that the script needs a file (a filename) to download…
@mutley - can you confirm that source must be a file ?

wget https://oisd.nl/downloads

does return +900K lines…but no filename

@hjkl That is a link to a webpage that lists URL’s of blacklists. So no that specific URL will not have anything that the script can use since it’s a webpage and not a blacklist file.

From that URL it looks like the link in the section called “hosts” would be the best one to use.
That would be ‘https://hosts.oisd.nl/’ for the complete list.

That works: Retreived 970152 domain names from https://hosts.oisd.nl/!

Many thanks!

A post was split to a new topic: Teenage boys at home and need content filtering (malware, really nasty porn, etc.)