Looking Up Domain Names
Article by Steve Fryatt
In the previous howtos in this series, we have had an introduction to IP addresses and seen how they can be selected and set on a home network which includes RISC OS systems. We have seen that each machine needs to be given an IP address, and that these addresses can then be used to transfer data between any two points on the network.
This article is one of the twelve that form The WROCC Guide to Networking, which is available to purchase from the Club in A5 printed or PDF form.
Easy to remember
In the first article I mentioned that similar addresses are used across the internet: all the computers attached to the network are identified by an IP address, and – with the exception of networks protected by routers performing Network Address Translation (NAT) – each one has its own address which is completely unique.
When we visit websites, we usually type in addresses like www.bbc.co.uk or www.wrocc.org.uk – not least because they are easy to remember. Before your browser can visit the site and fetch the pages it contains, however, it needs to turn those textual names into numerical IP addresses. You could just as easily type http://18.104.22.168/ into NetSurf, and you would still end up on the BBC website (or at least on the server that powers it).
The process of converting textual addresses (or ‘domains’) into IP addresses is known as ‘name resolution’, and on RISC OS it is handled by the Resolver (although some applications use their own implementation, supplied via libraries such as UnixLib).
Addresses on the internet
As already noted, when NetSurf is faced by a domain such as www.wrocc.org.uk, or POPstar has to access smtp.orpheusnet.co.uk, they will make use of a resolver. If the application chooses to use the central RISC OS Resolver module it can access it via a set of SWI calls; otherwise it will be using one built into itself; similar options exist on other systems.
Either way, the first thing that the resolver will do is to check whether the address it has been given is one that it already knows about: that is, one that it has been asked to resolve ‘recently’. If POPstar fetches mail every ten minutes from smtp.orpheusnet.co.uk, for example, then the first time in a session it will need to look it up but on subsequent visits it may be able to remember the IP address from the last time.
The reason for that ‘may’ is that while a domain will always map to the same IP address for simple servers, this isn’t always the case for larger sites. Places like Google, for example, will have many servers located around the world, and the address that www.google.com resolves to may change on a regular basis as demand varies or sites go offline for maintenance. To allow for this, when the resolver gets details of the address belonging to a domain it will also be told how long it can remember it for.
If the resolver doesn’t know the details of the address itself, the next step is to check local lists of addresses in something called the ‘hosts file’. If a match for the domain is found here, then the associated IP address will be used. This can be a useful tool, and we’ll return to it later in the article.
A giant internet directory
If the resolver can’t find details of the address in its memory or the hosts file, the final step is to look up the domain in the Domain Name System (or DNS). This is a directory which contains details of every possible domain (the bit of an internet address following the initial ‘http://’ and up to the first ‘/’) – as you might expect, it contains a lot of information.
To make the system manageable and robust, it is divided up into a strict hierarchy of servers, each of which look after parts of the domain name. At the top of the tree are a set of servers known as the ‘root’, which deal with spitting the system up into groups based on the Top Level Domain (or TLD). Domains ending .com will be handled separately from those ending .org; those ending .uk will be handled separately again, and so on. The system is looked after by the IANA along with ICANN.
At the next level down the tree are servers that deal with the addresses belonging to each TLD. In the UK, all addresses ending .uk will be handed by a collection of DNS servers managed by Nominet – the body who also handle the registration of most .uk domain names. For small TLDs this may be all that is required, but generally there will be further levels below this: in the UK, for example, Nominet break the domains down into groups such as .co.uk, .org.uk, .me.uk and so on; they handle most of them, but some have been delegated to third parties (such as .ac.uk and .gov.uk, which are administered by JANET).
At the bottom of the tree, each domain has a DNS server that knows its details. For the club site at www.wrocc.org.uk, for example, this is handled by the DNS servers at our hosts, Purley Hosting. Every request to look up the IP address of our site will end up here, although just like the resolver, other servers in between will remember the details after the first request for as long as they are allowed to do so to ease the load on Purley Hosting’s server.
Using the DNS
To use the DNS on our computers, we need to point the resolver to one or more DNS servers. On RISC OS, details are set up in either the Resolver section of Network or the Host names section of Network – Internet Configuration in Configure, depending on what version of the OS you have installed. Three fields here allow primary, secondary and tertiary servers to be set up: the idea is that if the resolver can’t get an answer from the first, it will try the second and then as a last resort the third.
Most ISPs will supply two DNS servers which act as a gateway between their subscribers’ machines and the rest of the DNS system: the IP addresses will usually be given in their sign-up information (this is one place where using textual domain names isn’t possible). Generally it isn’t necessary to have a third, tertiary server specified – if you wish to do so, then there are some free services available. Take care here – a malicious DNS server can direct you to whatever internet addresses it chooses, so make sure the ones you choose are legitimate.
If you access the internet via a broadband router, then it will usually receive details of your ISP’s DNS servers automatically as part of the connection process. In this case, it is often best to set the primary server to be the local address of your router and leave the other two fields blank: this way the router handles all DNS requests and takes care of any problems should your ISP change the addresses of its servers. Since the mainstream operating systems can also collect this information automatically, many ISPs don’t bother to notify their subscribers of such changes – leaving us RISC OS users scratching our heads when the resolver stops working.
The host names dialogue also contains fields to set the machine’s own name and local domain. If you set up names for the machines on your network as described below, then the host name should be the same as the one given here. The local domain doesn’t really matter: if you don’t know better, using ‘home’, ‘invalid’ or something similar will be OK.
Earlier I mentioned that the resolver will look for a local hosts file before going to the DNS servers that have been configured. This is a simple text file stored on the machine, and contains a list of domains and the IP addresses that they relate to. It can be used to give names to addresses on the local network, as well as to override entries in the public DNS.
On RISC OS, the hosts file lives at InetDBase:Hosts, which is usually located inside the !Internet resource within !Boot. Fortunately its exact location doesn’t usually matter, as it can be accessed from Configure.
If you have RISC OS Six or a recent version of RISC OS 4, then opening Configure and going to Network – Hosts will access the hosts database. This will normally contain a single entry for loopback, which should be left alone; new entries can be added by selecting New host... from the menu and entering an IP address and name. More than one name can be given to the same machine using Alias: the entry for 127.0.0.1 has two additional names by default.
On other versions of RISC OS, go to the Network section of Configure, then to Internet – Host names. On RISC OS 5 there is a text file icon at the bottom with the name Hosts file, which can be double-clicked to open Hosts into a text editor; the same dialogue on RISC OS 4.02 has a Hosts file... button which does the same thing.
If you have to edit the file manually, the format is fairly simple. It consists of a series of entries, one per line; lines starting with ‘#’ are comments and get ignored. Each entry consists of an IP address followed by some spaces or tabs, and a domain name. For example, if the following lines were in a hosts file:
then typing http://latrigg would have the same effect as http://192.168.0.4 to a browser.
Changing the internet
Entries in the hosts file don’t have to be on the local network, however. Assuming that the details above about the BBC’s website always hold true, we could add an entry reading
and then all attempts to look up www.bbc.co.uk would be handed without having to ask the DNS for help. In most cases this would not be a good idea, as there is no guarantee that the BBC will never change the IP address of their website. However, if you wanted to block all access to the BBC’s site from the computer in question, an entry of
could be added to Hosts and then all attempts to access the site would simply try and connect to a local webserver (which would fail unless something like WebJames was installed).