Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AdGuardHome does not apply rules from popular filter lists (EasyList, EasyPrivacy) #867

Closed
mmotti opened this issue Jul 2, 2019 · 11 comments

Comments

@mmotti
Copy link

mmotti commented Jul 2, 2019

Steps to reproduce

  1. Add: https://easylist.to/easylist/easylist.txt
  2. Add: https://easylist.to/easylist/easyprivacy.txt

Expected behavior

Rules should be processed and domains should be blocked.

Actual behavior

No rules in these files are processed as they mostly have a $third-party suffix. I'm assuming that AdGuardHome skips over these intentionally, but the user is given no notice of this. In-fact, it's quite misleading as a 'rule count' is displayed (number of lines as opposed to actually processed rules) which leads the user to believe that entries have been found and will be processed accordingly.

Extra

I'm aware that your 'AdGuard Simplified Domain Names filter' may well include the filters from these files, but is there no way for AdGuardHome to be able to process these individually? I know that we don't have the flexibility to determine whether the requests are from third-party or not etc, but is it not safe to assume that even if they are listed as third-party, they could be blocked in the same way as ||test.com^?

Your environment

Description Value
Version of AdGuard Home server: 0.96-hotfix
How did you setup DNS configuration: (Router)
If it's a router or IoT, please write device model: Raspberry Pi 3b
Operating system and version: Raspbian Buster
@ameshkov
Copy link
Member

ameshkov commented Jul 2, 2019

but is it not safe to assume that even if they are listed as third-party, they could be blocked in the same way as ||test.com^

Tbh, it's hard to say what exact issues it will cause. It's easier to try and see how it goes.

I'll mark this as a feature request.

@ameshkov ameshkov added this to the v0.99 milestone Jul 2, 2019
@ameshkov
Copy link
Member

ameshkov commented Jul 2, 2019

If anyone decides to help with this task, the changes need to be done here:
https://github.com/AdguardTeam/urlfilter/blob/master/dns_engine.go#L74

Request needs to be marked as "third-party".

@mmotti
Copy link
Author

mmotti commented Jul 2, 2019

Tbh, it's hard to say what exact issues it will cause. It's easier to try and see how it goes.
I'll mark this as a feature request.

Thanks! I appreciate your consideration :-)

@alexsannikov
Copy link

I personally use prepared hosts file RuAdList + EasyList from repo by raletag (http://cdn.raletag.gq/rueasyhosts.txt). No idea how original filter list is converted into hosts file, but it is definitely useful for me and works perfectly. I suppose there should be some script converting EasyList records into hosts file records, despite most of "high-level" logic which is not supported by DNS.

@mmotti
Copy link
Author

mmotti commented Jul 8, 2019

@alexsannikov the problem with converting filters to standard host format is that you won't block all of the subdomains. E.g. 2o7.net, which is listed in your host file, has plenty of blacklisted sub-domains in other host files, but your host file lists only the top level entry.

I believe, although I could be wrong, that uBlock Origin classes each standard host file entry as ||something.com^ - Which would enable this list to work properly with the browser extension. But obviously things are processed differently with AdGuard.

I've extracted what I believe are the appropriate filters (restrictive and whitelist) from various sources over here: https://github.com/mmotti/adguard-home-filters (filters.txt).

The easiest way to do it is (in Python, at least):

  1. Run a regex str replace on the filter strings:
# Remove lines that don't start with @@|| or ||
'^(?!(?:@@)?\|\|).+$': '',
# Remove $third-party suffix
'\$third-party$': '',
# Remove IP addresses
'^\|\|(?:[0-9]{1,3}\.){3}[0-9]{1,3}\^$': '',
# Remove empty lines
'^[\t\s]*(?:\r?\n|\r)+': ''
  1. Match remaining entries against:
    ( I realise there is little validation here of whether the actual domain is valid)
filters = '^\|\|([a-z0-9-_.]+)\^(?:\$(?:third-party|document))?$'
wildcards = '^@@\|\|([a-z0-9-_.]+)\^(?:\||\$third-party)?$'
  1. For each filter object: check whether test.com is in whitelist, or .test.com is in whitelist.
  2. Reverse match against the whitelist for the partial string (.test.com) to identify the partial domain that matched --> domain.test.com
  3. Collect verified whitelist items rules only(rules that have a conflicting restrictive filter rule) to an array / set. Otherwise we are importing lots of unnecessary whitelist entries that don't apply to our standard wildcard blocks.
  4. Convert filter strings: test.com --> ||test.com^
  5. Convert whitelist strings: domain.test.com --> @@||domain.test.com^

Or at least that's how I had to do it in Python.

I know this is probably blatantly obvious to you guys, but if it saves you any time at all, then it's worth me writing it down.

@alexsannikov
Copy link

@mmotti You are right, without additional scripting using of plain hosts files is painful.
You have a great logic to prepare AdGuard Home filter lists from 3rd party hosts files, I personally use similar one: I sort the list and go through line by line. If I have any top-level domain I am leaving this record only and removing all the strings with sub-domains. Finally, I just add || at the beginning and thus all the sub-domains are blocked by default.
This logic makes sense, 'cause if we block root domain (TLD) it means all sub-domains should be blocked too. I never met requirement to block TLD but unblock sub-domain.
Anyway, as you already explained, we can 'whitelist' some domains/sub-domains with additional "@@ $important" imperative (don't forget about "important"!), which has higher priority than any other filter and allows required domain.
And yes, ||something.com^ is enough to block whole domain name.
P.S. I don't use pure hosts files, I collapse them (usually 2-3 times or even more), match records for all of them to avoid duplicates, and then add some AdGuard Home regexps (||, @@, ^, $important etc.).
My post was about the way how to get easily list of domains for blocking in "pure" format. Otherwise, some extra logic must be applied to EasyList or any other filtering list in AdBlock format to exclude any unsupported expressions. As I am not programmer, I chose the simplest way :)

@ameshkov
Copy link
Member

ameshkov commented Jul 8, 2019

Guys, if you're trying to convert adblock rules to the AG format, this might be useful:
https://github.com/AdguardTeam/AdGuardSDNSFilter/blob/master/Filters/parser.py

Basically, here's what you need:

  1. Fork https://github.com/AdguardTeam/AdGuardSDNSFilter/tree/master/Filters
  2. Clean exceptions.txt and exclusions.txt and rules.txt
  3. Edit filter.template and include the filter lists you'd like to have there
  4. Run parser.py

@alexsannikov
Copy link

@ameshkov Wow, that's cool! Will look into that. Thanks!

@mmotti
Copy link
Author

mmotti commented Jul 17, 2019

Just to note with this:
I noticed that ||ospserver.net^$third-party from EasyPrivacy breaks Samsung phone updates.

It may be a one off, but it could also mean that a significant whitelist update could be required or the user will need to be aware that if they aren't using the specially provided AdGuard filter, that they may need to be more active with whitelisting.

@ameshkov
Copy link
Member

Merging into #1160

@LitileXueZha
Copy link

Still appears, it seems that AdguradHome limits the max 1024 length and non-printable character per rule. See @ainar-g 's reply in #6003.

v0.107.34 on Armbian arm64 device. One more think, why not ignore these?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants