Skip to content
Jothan Frakes edited this page Feb 8, 2022 · 9 revisions

Format

Right-to-left sorting

Keep the list of suffix (if more than one) in alphabetical order. Sort first by the TLD, then the first label to the left of the TLD, and so forth. For example, the following is sorted by TLD first (com, invalid, net, org, test), then within each TLD, sorted by the first label (example), then, for entries with more labels, each label is sorted, with shorter entries appearing first:

beta.example.com
alpha.beta.example.com   // alpha.beta has more labels than beta, and thus comes second.
delta.example.invalid    // invalid sorts between com and net
beta.example.net
delta.example.net        // delta sorts after beta, as example and net are the same
charlie.example.org      // org sorts after net, because sorting is by the right-most label, not the left-most
alpha.example.test

This is a simple Ruby script (called psl-sort) you can use to sort entries in a file:

#!/usr/bin/env ruby

input  = STDIN.read.split("\n")
output = input.sort_by { |x| x.split(".").reverse.join(".") }

puts output.join("\n")

Usage:

$ cat file.txt | psl-sort

Entry Specification

  • The list is a set of rules, with one rule per line.

  • Each line is only read up to the first whitespace; entire lines can also be commented using //.

  • Each line which is not entirely whitespace or begins with a comment contains a rule.

  • Each rule lists a public suffix, with the subdomain portions separated by dots (.) as usual. There is no leading dot.

  • The wildcard character * (asterisk, Specifically U+002A * 2a ASTERISK) matches any valid sequence of characters in a hostname part. Wildcards in the PSL follow the syntax defined in RFC1034, section 4.3.3 (pp24-25), and are restricted to appear only in the leftmost position and must wildcard an entire label.

  • If a hostname matches more than one rule in the file, the longest matching rule (the one with the most levels) will be used.

  • An exclamation mark (!) at the start of a rule marks an exception to a previous wildcard rule. An exception rule takes priority over any other matching rule.

  • The list uses Unicode, not Punycode forms, and is encoded using UTF-8.
    The following characters are used explicitly, please avoid Unicode variants of the following: Space " " (Dec/Hex: 32/20), Exclamation "!" (Dec/Hex: 33/21), Forward Slash "/" (Dec/Hex: 47/2F), Period/Dot (Dec/Hex: 46/2E), and Asterisk "*" (Dec/Hex: 42/2A). When in doubt, use the ASCII characters between Dec/Hex 32/20 and 126/7E

  • Entries should not have trailing whitespace

Examples of valid entries and ! or * / wildcard usage:

Entry Valid/Invalid Why
*.foo Valid Correct Use of Wildcard
!specificsite.foo Valid Correct use of Exclamation to indicate exception to previous wildcard rule
*.bar.foo Valid Correct Use of Wildcard
*.예 Valid Correct Use of Wildcard
*.예.예 Valid Correct Use of Wildcard

Examples of invalid entries and ! or * / wildcard usage:

Entry Valid/Invalid Why
*.*.bar.foo Invalid Multiple Wildcard
bar.*.foo Invalid Nested Wildcard (must be in leftmost position, only)
*bar.foo Invalid Wildcard Non-delimited by dot
예.*.foo Invalid Nested Wildcard
ǃspecificsite.예.예 Invalid Used latin letter retroflex click unicode char (01C3) instead of exclamation (33)
Clone this wiki locally