Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre-processor directives for filters #917

Closed
ameshkov opened this issue Dec 15, 2017 · 26 comments
Closed

Pre-processor directives for filters #917

ameshkov opened this issue Dec 15, 2017 · 26 comments

Comments

@ameshkov
Copy link
Member

ameshkov commented Dec 15, 2017

Based on this discussion:
gorhill/uBlock#3331

We should provide multiple pre-processor directives that can be used by filters maintainers to improve compatibility with different ad blockers.

Syntax

!#if condition
Anything goes here
!#include URL_or_a_relative_path
!#endif
  • !#if, !#endif -- filters maintainers can use these conditions to supply different rules depending on the ad blocker type.
  • condition -- just like in some popular programming languages, pre-processor conditions are based on constants declared by ad blockers. Ad blocker authors define on their own what exact constants do they declare.
  • !#include -- this directive allows to include contents of a specified file into the filter.

Conditions

When an adblocker encounters an !#if directive, followed eventually by an !#endif directive, it will compile the code between the directives only if the specified condition is true. Condition supports all the basic logical operators.

Example:

!#if (adguard && !adguard_ext_safari)
||example.org^$third-party
!#endif

Including a file

The !#include directive supports only files from the same origin to make sure that the filter maintainer is in control of the specified file. The included file can also contain pre-processor directives (even other !#include directives).

Ad blockers should consider the case of recursive !#include and implement a protection mechanism.

Examples

Filter URL: https://example.org/path/filter.txt

!
! Valid (same origin):
!#include https://example.org/path/includedfile.txt
!
! Valid (relative path):
!#include /includedfile.txt
!#include ../path2/includedfile.txt
!
! Invalid (another origin):
!#include https://example.com/path/includedfile.txt

Remarks

  • If included file is not found or unavailable, the whole filter update should fail.
  • A conditional directive beginning with a #if directive must explicitly be terminated with a #endif directive.
  • Whitespaces matter. !#if is a valid directive, while !# if is not.

AdGuard-specific

  • Any mistake in a pre-processor directive will lead to AdGuard failing the filter update in the same way as if the filter URL was unavailable.
  • Pre-processor directives can be used in the user filter (or in the custom local filters). Same-origin limitation should be disabled for local filters.

What constants should we declare

  • adguard -- Declared always. Lets maintainers know that this is one of AdGuard products. Should be enough in 95% of cases.

Product-specific constants for some rare cases (or not so rare, thx Safari):

  • adguard_app_windows -- AG for Windows
  • adguard_app_mac -- AG for Mac
  • adguard_app_android -- AG for Android
  • adguard_app_ios -- AG for iOS
  • adguard_ext_chromium -- AG browser extension for Chrome
  • adguard_ext_firefox -- AG browser extension for Firefox
  • adguard_ext_edge -- AG browser extension for Edge
  • adguard_ext_safari -- AG browser extension for Safari
  • adguard_ext_opera -- AG browser extension for Opera
  • adguard_ext_android_cb -- AG content blocker for Samsung/Yandex
@ameshkov
Copy link
Member Author

@gorhill check this out plz, would you like to change something?

As I understand, @mat41997 and @hawkeye116477 are okay with this solution.

@ameshkov
Copy link
Member Author

ameshkov commented Dec 15, 2017

@mat41997 frankly, I don't understand how do you manage to live with a single file.

Most of the large filters I know split the rules into multiple files just because it's easier to support it this way:
https://github.com/easylist/easylist/tree/master/easylist
https://github.com/easylist/easylistgermany/tree/master/easylistgermany

@gorhill
Copy link

gorhill commented Dec 15, 2017

@ameshkov Looks all good to me, thanks for fleshing out this. I started implementing the !#include directive (I just need to remove the space) because it solves an immediate issue, for the other directives I will personally go on a per-need basis.

For the !#include directive, you state:

!#include relative_path_to_a_file.txt

I think for simplicity it should be the full path, but for the same origin as the main list. Personally I accept full URL, but the origins have to match. If no full URL is given, I treat as full path to which the main origin is prepended. Relative paths can be rather complicated to deal with, and having to specify full path shouldn't be such a burden on filter list maintainers, while keeping the code simpler.

gorhill added a commit to gorhill/uBlock that referenced this issue Dec 15, 2017
@ameshkov
Copy link
Member Author

Relative paths can be rather complicated to deal with, and having to specify full path shouldn't be such a burden on filter list maintainers, while keeping the code simpler.

As I understand, this is almost enough to support relative paths:

I treat as full path to which the main origin is prepended

XHR will understand a path like that properly: "https://filters.adtidy.org/android/filters/../../windows/filters/2.txt"

@hawkeye116477
Copy link

hawkeye116477 commented Dec 15, 2017

@mat41997 frankly, I don't understand how do you manage to live with a single file.

Most of the large filters I know split the rules into multiple files just because it's easier to support it this way:
https://github.com/easylist/easylist/tree/master/easylist
https://github.com/easylist/easylistgermany/tree/master/easylistgermany

@ameshkov We want to have one list with rules for Adblock and rules which Adblock have problems with it, because of no scroll for example. So rules should be commented to Adblock not use it. But for beginning it can be just !#include relative_path_to_a_file.txt And I hope that soon this will be extended also to this what I talk 😄 .

@gorhill
Copy link

gorhill commented Dec 15, 2017

We want to have one list with rules for Adblock and rules which Adblock have problems with

@hawkeye116477 Having to comment every single filter does not make sense to me. The !#include directive is what works best and what is most sound design-wise. Having rules for specific purpose in one separate file is what works best, and this is actually already like this for POL. The only change required by POL filter list maintainers is to merely add a one line !#include directive in the main list, with no further changes -- the uBO-specific filter list is already a separate list. Can't be simpler than this.

We want to have one list

Who is "we"?

@jspenguin2017
Copy link

jspenguin2017 commented Dec 15, 2017

I propose !#include to just try to load the supplement filter and just move on if it fails to load, and !#require to load the supplement filter and abort if it fails to load.

@mikhaelkh
Copy link

mikhaelkh commented Dec 15, 2017

  1. I hope you could handle the situation when list a includes itself or there exists a cycle: a0 includes a1, a1 includes a2, ..., an includes a1
  2. Will this raise the amount of net requests to the server? What about update time for adblock users?

@ameshkov
Copy link
Member Author

  1. That's explicitly mentioned in the description:

Ad blockers should consider the case of recursive !#include and implement a protection mechanism.

2.Will this raise the amount of net requests to the server? What about update time?

Sure, by the number of include directives.

@mikhaelkh
Copy link

mikhaelkh commented Dec 15, 2017

Then wouldn't it be better for everybody to merge lists to one on server side, not on client side? Users wouldn't have to download multiple files and your servers wouldn't have to answer to multiple net requests. include relative paths should not be used in popular filterlists.

@ameshkov
Copy link
Member Author

Then wouldn't it be better for everybody to merge lists to one on server side, not on client side? Users wouldn't have to download multiple files and your servers wouldn't have to answer to multiple net requests.

Ideally, it would indeed.

However, unfortunately, not everyone possesses the necessary knowledge to automate building different filters for different ad blockers.

Also, the suggested !#if directives are more flexible. Note that maintainers can distinguish not just ad blockers, but even browsers.

@mikhaelkh
Copy link

mikhaelkh commented Dec 15, 2017

Also many lists have multiple mirrors with identical content. Maybe another directive !#mirror_of main_url should provide main url of the current filterlist to prevent multiple downloads of the same filterlist?

@ameshkov
Copy link
Member Author

ameshkov commented Dec 15, 2017

Guys, I find it quite important to keep development incremental and to do as less as possible, just enough to achieve the current goal. Let's first see how it goes with a single include directive @gorhill has implemented.

@Hunter-Github
Copy link

As a bystander and a happy user of uBO, I am less than thrilled about the include directive from a security perspective (and will defer upgrading to uBO 23 for a while). If a list maintainer's site is compromised or an HTTP site is MITM'ed, users' browsers can be forced to do requests to unknown sites, possibly opening them up to drive-by attacks.

Just my 2c.

@Hunter-Github
Copy link

Another consideration: GNU m4 is a perfect macro preprocessor that can do includes, defines, dozens of other things. A single run, and a new list is done. This saves thousands and millions of adblock list users a bunch of DNS lookups and connections.

Moving towards adding Turing-completeness to adblock lists looks a bit ... excessive.

@gorhill
Copy link

gorhill commented Dec 16, 2017

users' browsers can be forced to do requests to unknown sites

The implementation in uBO does not allow cross-origin requests, the embedded filter lists have to be on the same server. If not, it is rejected.

@gorhill
Copy link

gorhill commented Dec 16, 2017

Also, note that if a server is compromised, an attacker may as well redirect the request for the main list to another server, so if this is what worries you, you will have to prevent your blocker from downloading all lists from any server. In any case, as said above, uBO reject cross-origin requests for embedded filter lists.

@Hunter-Github
Copy link

@gorhill - thanks for the clarification, upgrading to 1.14.23 now.

From an outsider's viewpoint, this restriction is important enough to land in the spec.

@ameshkov
Copy link
Member Author

From an outsider's viewpoint, this restriction is important enough to land in the spec.

It is there:

The !#include directive supports only files from the same origin to make sure that the filter maintainer is in control of the specified file.

@gorhill
Copy link

gorhill commented Jan 1, 2018

Product-specific constants for some rare cases

I would already need now an ability to negate -- not sure what is best:

!#ifnot ext_firefox
...
!#endif

Or:

!#if !ext_firefox
...
!endif

Or whatever else is deemed better.

@jspenguin2017
Copy link

I vote for !#if !ext_firefox, this also allow more complex combination with && and ||.

@ameshkov
Copy link
Member Author

ameshkov commented Jan 3, 2018

@gorhill I am for supporting boolean operators, it'd be much more helpful to filters maintainers.

Eval-based implementation is rather trivial. Although, for the production-ready we should use a simple expressions parser instead:

var str = '(ublock_ext_firefox || adguard_ext_firefox) && !ublock_ext_firefox_mobile';
var defines = ['adguard', 'adguard_ext_firefox'];
var ctx = {};
str.split(/[()|&! ]/).forEach(function(el) {
    if (el) {
        ctx[el] = (defines.indexOf(el) !== -1);
    }
});
var result = false;
with (ctx) {
    result = eval(str);
}

@ameshkov ameshkov added this to the 2.9 milestone Feb 9, 2018
@collinbarrett
Copy link

A bit of a thread hijack, but wasn't sure where else to ask. Does a rule starting with ! ever have any other meaning other than being a comment or now potentially a pre-processor directive? Are there any other ways that Adguard, uBlock, etc. parse a rule starting with !? Thanks.

@ameshkov
Copy link
Member Author

@collinbarrett filter metadata looks like a comment (! Expires, ! Title, etc).

@ameshkov ameshkov modified the milestones: 2.9, 2.10 Mar 29, 2018
Mizzick added a commit that referenced this issue Mar 30, 2018
Mizzick added a commit that referenced this issue Mar 30, 2018
Mizzick added a commit that referenced this issue Mar 30, 2018
Mizzick added a commit that referenced this issue Apr 23, 2018
Mizzick added a commit that referenced this issue Apr 23, 2018
Mizzick added a commit that referenced this issue Apr 23, 2018
…issues/917-2 to master

* commit 'c58f6e9f6195c51b4310f771f4d8670c0eb624a0':
  #917 filter directives
  #917 filter directives
@zebrum zebrum closed this as completed Aug 13, 2018
adguard pushed a commit that referenced this issue Jun 21, 2021
Merge in EXTENSIONS/browser-extension from feature/AG-8668 to feature/AG-2737

Squashed commit of the following:

commit b275da0
Merge: b0b40c9 34fa747
Author: Vladimir Zhelvis <[email protected]>
Date:   Mon Jun 21 15:46:10 2021 +0300

    Merge branch 'feature/AG-2737' into feature/AG-8668

commit b0b40c9
Author: Vladimir Zhelvis <[email protected]>
Date:   Mon Jun 21 14:39:01 2021 +0300

    add comments,  separate conditional rendering in request preview component

commit 5fea753
Author: Vladimir Zhelvis <[email protected]>
Date:   Mon Jun 21 13:47:32 2021 +0300

    refactoring request preview state machine, fix styles
@mjethani
Copy link

@gorhill any idea why EasyList does not use these directives?

For example, EasyList could have two versions of the same filter:

!#if ext_ublock
||example.com^$important
!#else
||example.com^
!#endif

uBlock Origin could ignore the text in the !#else block.

I see people asking for features in Adblock Plus that the developers will almost certainly never implement (e.g. adblockpluscore#323).

One more thing that might help is if two !s would cancel each other out:

!#if ext_ublock
!!||example.com^$important
!#else
||example.com^
!#endif

This way Adblock Plus would treat the filter with the $important flag as a comment whereas uBlock Origin would parse it as a regular filter.

@gorhill
Copy link

gorhill commented Jul 30, 2021

EasyList has already started using uBO syntax, for example in https://secure.fanboy.co.nz/fanboy-cookiemonster.txt, which has a !#include directive understood by uBO but not by APB:

! Include ubO specific
!#include easylist_cookie_specific_uBO.txt

Probably the best approach since it groups together filters targeted at a specific platform.

For ABP-specific filters explictly not meant for uBO, they can already use !#if !ext_ublock blocks.

I consider !! too dangerous since this could be used as a matter of style, for instance there are such lines starting with !! in AdGuard Japanese.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests