-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Format
Keep the list of suffix (if more than one) in alphabetical order. Sort first by the TLD, then the first label to the left of the TLD, and so forth. For example, the following is sorted by TLD first (com
, invalid
, net
, org
, test
), then within each TLD, sorted by the first label (example
), then, for entries with more labels, each label is sorted, with shorter entries appearing first:
beta.example.com
alpha.beta.example.com // alpha.beta has more labels than beta, and thus comes second.
delta.example.invalid // invalid sorts between com and net
beta.example.net
delta.example.net // delta sorts after beta, as example and net are the same
charlie.example.org // org sorts after net, because sorting is by the right-most label, not the left-most
alpha.example.test
This is a simple Ruby script (called psl-sort
) you can use to sort entries in a file:
#!/usr/bin/env ruby
input = STDIN.read.split("\n")
output = input.sort_by { |x| x.split(".").reverse.join(".") }
puts output.join("\n")
Usage:
$ cat file.txt | psl-sort
-
The list is a set of rules, with one rule per line.
-
Each line is only read up to the first whitespace; entire lines can also be commented using //.
-
Each line which is not entirely whitespace or begins with a comment contains a rule.
-
Each rule lists a public suffix, with the subdomain portions separated by dots (.) as usual. There is no leading dot.
-
The wildcard character * (asterisk, Specifically
U+002A * 2a ASTERISK
) matches any valid sequence of characters in a hostname part. Wildcards in the PSL follow the syntax defined in RFC1034, section 4.3.3 (pp24-25), and are restricted to appear only in the leftmost position and must wildcard an entire label. -
If a hostname matches more than one rule in the file, the longest matching rule (the one with the most levels) will be used.
-
An exclamation mark (!) at the start of a rule marks an exception to a previous wildcard rule. An exception rule takes priority over any other matching rule.
-
The list uses Unicode, not Punycode forms, and is encoded using UTF-8.
The following characters are used explicitly, please avoid Unicode variants of the following: Space " " (Dec/Hex: 32/20), Exclamation "!" (Dec/Hex: 33/21), Forward Slash "/" (Dec/Hex: 47/2F), Period/Dot (Dec/Hex: 46/2E), and Asterisk "*" (Dec/Hex: 42/2A). When in doubt, use the ASCII characters between Dec/Hex 32/20 and 126/7E -
Entries should not have trailing whitespace
Examples of valid entries and ! or * / wildcard usage:
Entry | Valid/Invalid | Why |
---|---|---|
*.foo | Valid | Correct Use of Wildcard |
!specificsite.foo | Valid | Correct use of Exclamation to indicate exception to previous wildcard rule |
*.bar.foo | Valid | Correct Use of Wildcard |
*.예 | Valid | Correct Use of Wildcard |
*.예.예 | Valid | Correct Use of Wildcard |
Examples of invalid entries and ! or * / wildcard usage:
Entry | Valid/Invalid | Why |
---|---|---|
*.*.bar.foo | Invalid | Multiple Wildcard |
bar.*.foo | Invalid | Nested Wildcard (must be in leftmost position, only) |
*bar.foo | Invalid | Wildcard Non-delimited by dot |
예.*.foo | Invalid | Nested Wildcard |
ǃspecificsite.예.예 | Invalid | Used latin letter retroflex click unicode char (01C3) instead of exclamation (33) |