Mobile app version of vmapp.org
Login or Join
Cugini213

: How do domain names get into WHOIS database and which databases get updated for each domain? The WHOIS query and response protocol is widely used for databases holding various Internet resources

@Cugini213

Posted in: #Database #Dns #DomainRegistration #Domains #Whois

The WHOIS query and response protocol is widely used for databases holding various Internet resources like IPv4 assignments, IPv6 assignments, AS numbers and RIR (for example ARIN in USA or RIPE in Europe). Members need (details depend on RIR) to keep their records in those databases.

In addition, lots of automation in ISP world happens based on those objects in those RIR WHOIS databases. For example whois.ripe.net for RIPE region or whois.apnic.net for APNIC region.

In which WHOIS databases are the domain names kept? Is it mandatory for domain registrar to make a record to this/those WHOIS database? I would guess "yes", because otherwise the other domain registers would not know that the domain is in use. Is there a RFC which defines what information besides the domain name that will go to this database?

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Cugini213

2 Comments

Sorted by latest first Latest Oldest Best

 

@Courtney195

whois is a very poorly defined protocol (or more precisely: defined well enough at the moment it went into use, but now clearly outdated and missing key elements needed today, like tiered access and formatted output), with only RFC 3912 covering it (obsoleting RFC 954), which says mostly nothing: you send a query on one line, the server replies with a blob of text, that is all.

To put it in context for domain names let us go back and explain things historically.


Network Solutions aka NSI (and later Verisign) was the registry of all .COM/.NET/.ORG domain names. It was a thin registry, which means it did not store any data about contacts on the domain name, it had only the name, dates, and nameservers basically (and some extra stuff like who is the registrar, what are the status and so on). The registry had a whois server, and then a specific "field" in the output was giving the name of the registrar whois server to be queried to access the other data, that is mainly contact info (name, postal adress, phone and fax numbers, email address of the registrant, administrative, billing and technical contacts of the domain name).
Since the whois protocol has no concept of autodiscovery (to find the registry whois server based on the domain name, since at that time other registries like ccTLDs did have whois servers too) nor "redirection" (as it needs to hop on from registry whois server to registrar whois server) the whois clients have often hardcoded values of servers to connect to (see for example github.com/rfc1036/whois/blob/next/tld_serv_list or raw.githubusercontent.com/whois-server-list/whois-server-list/master/whois-server-list.xml), first for registries and sometimes for registrars. See my other answer here: unix.stackexchange.com/a/407030/211833 for more information on alternative ways, specially last point. Other than that they are programmed to read registry whois server reply, extract the relevant field there to find the appropriate registrar whois server to go further along, connect to it to redo the same query and get back more data. Note that this last step of "redirection" is purely under the control of the client (so it can depend on which software you use) as there is nothing in the whois protocol standard that defines this. Note also that there was a recent change mandated by ICANN in registry whois output especially on the field carrying the registrar whois server name, and this broke multiple whois clients that were not able (until updated) to find the registrar whois server among registry whois output.
For gTLDs, registrars and registries are under a contract with ICANN (see www.icann.org/resources/pages/registries/registries-agreements-en and www.icann.org/resources/pages/registrars/registrars-en); these contracts mandate them to run a whois server, each one at its level for the domain names it sponsored (see specifically Specification 4 of registry contract and for registrars: www.icann.org/resources/pages/approved-with-specs-2013-09-17-en#whois). In ccTLDs on the contrary, at that time very few used registrars, and even then the registry had all the information so registry whois output would have displayed all data needed.
It was then decided to put .ORG in a bidding process (see www.icann.org/resources/board-material/prelim-report-2002-03-14-en#orgReassignment) . It finally got delegated to PIR (see www.icann.org/resources/board-material/prelim-report-2002-10-14-en#SuccessorOperatorfororgRegistry) and soon thereafter switched (from RRP) to EPP (this was part of the bidding, see archive.icann.org/en/tlds/org/applications/isoc/section5.html#c27A3 -- it made sense at that time even if EPP was not already completely published as an RFC then) for registrars-registry communications, which also had a consequence of switching to a thick model: the registry would store all data, including for contacts and hence a whois output for .ORG would start to display everything, without the need to query the registrar whois server at all in fact. But contracts in place did not change, so registrars still have a mandate to run a whois server for the domain name they manage in .ORG even if the registry has already all the data. By the way also there were initially some plans to also put .NET in a bidding process to give it to someone else than Verisign (see for example afilias.info/news/2005/01/19/afilias-bids-rights-run-net-registry or www.core-plusplus.net/faq.do) but then change in course of actions resulted in .NET being permanently tied to .COM for the foreseable future.
ICANN started to standardized the whois output format, first at registry level and then at the registrar. At the beginning, since the whois protocol by itself does not define how the response is structured, each whois server had its own format, from simple key-value ones (easy to parse) to some really complicated ones (some whois server even changed the format - for the same query - from one response to the other to deter automated scraping of all this data). That was a problem for registrars in gTLDs as they needed, especially for transfers, to know the email addresses of the contacts of the domain and the best source of data what then the whois output (and if you followed what is written above, this meant going to the registrar whois output as the registry - for .COM/.NET that remains the king - output did not have the relevant data). But as the same time many people were scraping whois data for various purposes, from legal and useful ones (like IP protection) to less legal and useful ones (like spams to offer webhosting services for domain names just registered).
.COM/.NET also switched to EPP sometimes later (in 2005-2006, see www.circleid.com/posts/additional_domain_name_transfer_requirement/ and web.archive.org/web/20061017164241/http://www.verisign.com/Resources/Naming_Services_Resources/Registrar_Connections/page_038962.html#01000005), but still remaining a thin registry.
But a process is underway to convert the last gTLD thin registry (.COM/.NET and .JOBS) into a thick one. The process got a little delayed, but the goal is still here. See www.icann.org/resources/pages/thick-whois-transition-policy-2017-02-01-en . Note that when that is achieved there is no more a (technical) reason for registrars to have a whois server s the registry has all the data, but there are still no specific plan to stop registrars doing that (on the contrary, their contract mandates them to continue). Things may however change in 2018 with the introduction of the GDPR, new European regulations on privacy that has direct impact on services like whois, showing private data of individuals.
The RDAP protocol was defined at IETF (see RFC 7480, RFC 7481, RFC 7482, RFC 7483, RFC 7484, and RFC 8056) to replace whois in the future, as it closes multiple shortcomings in whois: structured output (thanks to the use of JSON), possible redirections (thanks to the use of HTTP) and authentication features (again thanks to HTTP), internationalisation (whois protocol for example does not defined anything related to encoding, which is a challeng for non-ASCII based registries, and a nightmare for interoperability), possibilities in extending it and adapting to all kinds of registries, etc. It has a bootstrap mechanism so RDAP clients could work without having any kind of hardcoded values of RDAP server to query.
There are now discussions at ICANN to mandate all registries (and possibly registrars) to implement RDAP and schedule a plan for sunsetting whois. There are not any more technical discussions but more political problems to define what to display exactly to what public (see www.icann.org/resources/pages/rdap-operational-profile-2016-07-26-en), and this again is impacted by national regulations and laws, like the new GDPR. For now there is only a pilot for domain name registries (RDAP is already used by RIRs in production): community.icann.org/display/RP/RDAP+Pilot and a bunch of useful data and links available at about.rdap.org/

To specifically reply to your points: things would work even if registrars did not have a whois/RDAP server. This is not where lies the authoritative information on who sponsors which domain name, this is at the registry level, and only there would you purely technically need some whois/RDAP server to see who manages which domain. The fact that registrars run some, especially today in gTLDs is only because they are contractually mandated to do so by their contract with ICANN which also defines precisely now the format that needs to be implemented. It is however even a cause of confusion sometimes, see my other reply here: serverfault.com/a/885149/396475 for an example.

Also, please do not use the term "whois database", it is both technically incorrect and also misleading. There is a "database" maintained by the registry with various data in it about all domain names under the concerned TLD. Content of this database can be modified or queried through various protocols, for different client and uses: registrars use EPP to provision it, registry make it available to query through the whois protocol (note that in gTLDs, and sometimes in ccTLDs, you can query the registry whois server for other data than domain names, like contact data or nameserver data, this is rarely used but it exists), registry use it to publish the zone at its authoritative nameservers, etc.

10% popularity Vote Up Vote Down


 

@Smith883

I'll be quoting wikipedia on this, but I believe in this case the information is reliable:


WHOIS servers operated by Regional Internet Registries (RIR) can be
queried directly to determine the Internet Service Provider
responsible for a particular resource.

The records of each of these registries are cross-referenced, so that
a query to ARIN for a record which belongs to RIPE will return a
place-holder pointing to the RIPE WHOIS server. This lets the WHOIS
user making the query know that the detailed information resides on
the RIPE server. In addition to the RIRs servers, commercial services
exist, such as the Routing Assets Database used by some large networks
(e.g., large Internet providers that acquired other ISPs in several
RIR areas).


About Server Discovery, they say this:


There is currently no standard for determining the responsible WHOIS
server for a DNS domain, though a number of methods are in common use
for top-level domains (TLDs). Some WHOIS lookups require searching the
procuring domain registrar to display domain owner details.


Source: Whois

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme