73

Public Suffix List

For those first discovering the PSL, a brief review.

There are ~90 prior comments concentrated mostly in two prior submissions from 2016 and 2021 so far: https://news.ycombinator.com/from?site=publicsuffix.org

This is the top comment on the 2021 discussion:

> Before you begin to make use of the PSL, consider some of its problems: https://github.com/sleevi/psl-problems

There are another couple dozen comments on a few submissions of that: https://news.ycombinator.com/from?site=github.com/sleevi

HN frequently suggests that DNS should be used to solve this; sleevi replied a few years back with:

> This has been a common suggestion since before the Publix Suffix List existed, as you can see from the linked issues in the text (and the references to the IETF DBOUND WG). Like most things, on first glance, it seems like it does make sense. Except it has a lot of issues, which you can see have been discussed for 15 years without resolution, even though yes, it would scale better.

a day agoaltairprime

[flagged]

a day agotrympet

The Public Suffix List changes often. I have once worked with a team who built a major feature on top of PSL, but the person who built it did not at all consider how it might handle changes to it. Basically, the feature analyzed domains and uses PSL data to extract the "important part" of the domain, and then stored that in the database as part of a primary key in a table. But when the PSL changes, the database needed to be taken offline for certain tables to be completely rebuilt. And code querying the database had to be updated in lockstep with the database changes. This design made zero-downtime deployments difficult. It then took quite a while for the team to evolve the schema such that the database contents would not depend on the PSL.

This is just one cautionary tale I have personally experienced.

a day agokccqzy

It's also full of non-icann extensions. So a naive implementation will identify "github.io" as a TLD. There are lots of nuances to working with this list. Our team has a pretty robust internal (Python) library now that we hope to open source soon.

a day agowhalesalad

The whole point of PSL is to identify "github.io" as a TLD. Anyone can create a subdomain of it. Just like anyone can create a new subdomain of "com" (a real TLD).

a day agokccqzy

The difference is you don't register a domain under github.io, you merely loan it. Some countries, like Poland, have a bunch that are real domain suffixes

https://www.dns.pl/en/list_of_functional_domain_names

a day agotype0

Loaning or renting (registering) amount to the same thing for the purposes of the the public suffix list: because the *public* can create entries under github.io, you cannot assume that alice.github.io and eve.github.io are controlled by the same entity, so you should not share alice.github.io's data (e.g. cookies) with eve.github.io.

a day agodegamad

There is no formal ICANN TLD list. The PSL is your best shot. So it is actually wrong to assume that your situation is the sole purpose.

For instance, https://data.iana.org/TLD/tlds-alpha-by-domain.txt

Where is .co.uk ? That is - for all intents and purposes - considered a TLD.

So PSL is currently doing double-duty and the distinction is very important.

12 hours agowhalesalad

This list sees a lot more updates than you'd probably think: https://github.com/publicsuffix/list/commits/main/

I was looking at this in terms of trying to keep an app up-to-date, and there was a lot more churn than I expected. If you have a security reason to be reading this, you may need to put some effort into maintaining this... at least, technically. I doubt there's hardly an app out there "properly" keeping up with this and the world seems to largely hold together even so.

a day agojerf

TIL! Guess I have to do a `go get -u golang.org/x/net/publicsuffix` now.

a day agoyegle

It says this is a project of Mozilla, but it seems like something that would make sense under IANA. Is there a reason why it is not maintained by a standards organization? Maybe the definition of what is/isn't a public suffix is too fuzzy to standardize?

edit: After reading https://github.com/sleevi/psl-problems maybe the standards organizations just don't think it's a good idea

a day agocsb6

What sort of backwards system is this? Why is this not in DNS? Just drop an RFC that says how to add a trust demarcation record already. Here is a how i would do it.

TXT v=ps1 ;trust boundary at this point

TXT v=ps2 exception1.my.network. ; trust boundary with exceptions at this point

And then let the big operators argue for a few years on why this in insufficient and we need a complicated dsl (cough spf cough) v=ps3. and what to do when both ps1 and ps2 entries exist. (confused operator, ignore exceptions)

a day agosomat

I worked on a DNS resolver that detects DNS exfiltration in part by using this list to aggregate high entropy subdomains to the first level below the TLD. And, indeed I didn’t account for the list updating frequently and need to fix that.

a day agomlhpdx

I only became passively aware of this because Let's Encrypt uses the PSL for limits on registrations for domains not in the PSL. Been meaning to setup a dyndns service for a few of my domains and need to get them on the PSL so users can manage to do HTTPS without issue.

Edit: I still think that domains hosted with major dyndns services (like freedns.afraid.org) should be treated like PSLs.

a day agotracker1

I'm surprised most of the free dyndns domains aren't in there already. The first time I learned about the list was when Let's Encrypt was in closed beta, and they already had a warning on the site telling people not to add their own domain as a means to circumvent registration limits for ACME certs.

a day agoextraduder_ire

Story time!

I came across the PSL when a state government department contacted my consultancy and asked what the impact would be of uncommenting a line in the PSL. They were focused on the effect this would have on DMARC and SPF records of child agencies under the parent TLD, but I realised that it also meant that cookies that could previously be shared across agency boundaries would suddenly be siloed at a different level, potentially breaking web apps. (Think authentication portals using shared cookies across a bunch of things.)

But how to test this!?

I discovered that the PSL is embedded in browser executables when they’re compiled. So I came up with the approach of making two Chromium builds, one with the PSL change and one without the change. Since it has a nice blue icon I changed the modified build to have a red icon. I called these the “red pill” and “blue pill” versions.

The idea was that web devs could test their sites with the two nearly identical browsers side-by-side and so any observed difference is a sign of a potential issue. I also used Playwright to scan over ten thousand public URLs with both a compared the traces programmatically.

Another trick I used was to spin up spot priced “HPC” instances in Azure with 120 AMD EPYC cores to run the builds.

One of the most fun projects I’ve ever worked on.

No, they never changed the PSL, it’s still incorrect.

I only found one site that has an issue, but that made them too nervous and they gave up…

a day agojiggawatts

Why "suffix"? They are tehnically domains?

a day agovzaliva

They can happen at multiple levels of the hierarchy

a day agoakerl_

That just means it is not limited to "top-level" domains. example.foo.com is a domain as foo.com, com.

a day agovzaliva

This feels like you've accidentally waxed pedantic a bit. In common parlance, com is a TLD, example.com is a domain, foo.example.com is a subdomain. The suffix list is designed to capture all of that and maps to how it's used (you take the suffix list and check if anything in it is a suffix map for the name you've been given).

a day agoakerl_

I always thought:

  - com, example.com, foo.example.com are all domains
  - com is a TLD
  - subdomain is a relative term, not an absolute one:
    . example.com is a subdomain of com
    . foo.example.com is a subdomain of example.com
    . bar.foo.example.com is a subdomain of foo.example.com
17 hours agoroelschroeven

Yup, you’re correct. But in common usage, it would be weird to refer to example.com as a subdomain. Depending on the context, it would also be weird to refer to foo.example.com as a domain instead of a subdomain.

If somebody asked me what domain you’re using and you said “com”, you would technically have answered accurately but they’d be confused.

15 hours agoakerl_

OK, makes sense.