IT people often take DNS servers for granted. They perform an essential but simple task, and they're just supposed to work. Most of the time they do, but if they go wrong, the effects can be disastrous. An unreliable server means an unreliable Internet connection. A rogue server can let malicious sites impersonate legitimate ones. DNS monitoring can catch problems before they become major.
The global DNS system is highly distributed. Local name servers all over the world, including ones in private networks, get their information from authoritative name servers. It may take several hops through recursive servers before reaching an authoritative one. Recursive servers function as caches, retaining information for as long as the record's TTL parameter allows.
If any server along the path has problems, devices may not get accurate information about a domain's IP address.
Catching device failure
Many networks run local DNS servers for efficiency. If all domain queries on a network go through one, it becomes a critical component. Its failure will look like a loss of Internet service to users. Keeping the outage as short as possible requires identifying its source quickly. If the server fails intermittently, or if one server in a pool misbehaves, quick detection is more difficult.
Configuration errors are the source of many DNS problems. If the list of name servers' IP addresses gets one of the addresses wrong, then client machines waste time trying to access a nonexistent server, while one of the actual servers is left idle.
DNS spoofing and blocking
A compromised name server can give a deliberately inaccurate IP address. The spoofed address could be a malicious site or not lead to a working server at all. A name server may incorrectly give a name error, claiming the domain doesn't have an IP address when it does.
Many people treat the terms "DNS spoofing" and "DNS cache poisoning" as synonyms, but the first one is the broader term. Cache poisoning means getting a name server, router, or other device to give a different IP address from the one it should have received. Spoofing can also include subverting an authoritative name server or impersonating a legitimate one.
Blocking can happen for several reasons. A local name server may block the lookup of certain domains as a matter of policy. It functions as a firewall against them. Some governments mandate that public name servers block lookups of certain sites, or they directly control the most important name servers for their nation.
A network's internal name server may use spoofing or blocking as part of its security setup. It blacklists domains that are on a list of malicious ones, or which deliver content which is contrary to company policy. This practice is often called "blackholing." Errors in the configuration of blackholing could block necessary business traffic.
Government-controlled name servers sometimes spoof DNS information rather than blocking it. Spoofing is more effective, since devices that get an error message or timeout may try another name server. The "Great Firewall of China" uses this technique, thereby poisoning the caches of all devices that rely on its controlled servers. Sometimes a machine outside the controlled area will accidentally get the false IP data, and others may rely in turn on its incorrect information.
Secure connections are less vulnerable to domain spoofing, since connections to an impersonator will fail SSL/TLS authentication. Users may override browser warnings, though, and then give their passwords to the bogus site before realizing their mistake.
Malware-aided spoofing
In 2011, the FBI discovered a network of rogue name servers. Trojan horse software called DNSChanger altered computers' DNS information so they used those servers. If people with infected computers tried to restore their previous name server settings, the malware changed them back. The rogue servers mostly delivered accurate domain information, but they would replace some content on browsers with pages or ads from malicious sites.
The FBI redirected the rogue servers' IP addresses to a legitimate domain name service, since otherwise the affected machines would have lost their ability to access the Internet. In 2012, it finally shut down the service. About 300,000 computers were cut off from the Net on July 9.
Denial of service attacks
DDoS attacks can target DNS servers, indirectly crippling access to sites that put their name records there. They can also use DNS servers as a tool to attack other sites.
The most famous DDoS attack on name servers was the one against Dyn on October 21, 2016. Major sites, such as Twitter, the New York Times, Amazon, and GitHub became almost unreachable because Dyn's servers were too overwhelmed to handle legitimate domain lookups. The situation lasted for most of a day. The attack hit multiple data centers and used several techniques, including massive numbers of requests for nonexistent subdomains of real sites.
Defenses exist against all types of denial of service attacks. Sometimes, though, the magnitude of the attack overwhelms the defense.
DNS amplification attacks make name servers the weapon rather than the target. The attacker issues a large number of DNS requests to multiple name servers and spoofs the victim's IP address as the request's originating address. The victim is flooded with responses, and they all come from respectable machines that have no connection to the attacker. Blocking those name servers might make the victim unable to perform its own domain lookups.
Open recursive servers are especially vulnerable to becoming weaponized. They will accept a request from anyone and pass it on to an authoritative server. Authoritative servers are less vulnerable because they handle only certain domain names and will ignore most random requests. A recursive server which is unintentionally left open to public access and has no protection against abuse is just waiting for someone with evil intentions to discover it.
DNS tunneling
DNS traffic is everywhere, and no one thinks much about seeing it. The packets don't often get analyzed. But data is just data and could mean anything. A name server could accept packets with secret encodings. The domain name in a query can be as many as 255 characters long and doesn't have to be a registered domain. The server can return as many as 65536 bytes in RDATA, including TXT records which can hold arbitrary text.
This technique is called DNS tunneling. A command-and-control server can exchange any kind of information with devices this way, and many kinds of security software won't pay any attention. The server will function correctly with valid requests, so it will be hard to detect. Traffic analysis can catch tunneling by discovering bursts of traffic to a previously unused name server or by a large number of abnormally long requests and responses.
DNS tunneling has uses which are legitimate, if secretive. It can be a way to send data through a stringent firewall. It can sneak messages past government-mandated censorship. Within a business's network, though, any unauthorized passing of disguised information is a cause for concern.
Domain hunting
A certain rate of failed DNS lookups is normal. People mistype domains, and formerly active domains may die. A high rate, though, could be a sign of malicious activity. Rogue domains tend to get shut down or blocked, so malware often uses a flexible strategy to find a command-and-control server. It uses a list of possible domains or an algorithm to generate a series of names. The people behind the operation use the same process to register one domain after another.
When malware infects a device, it may start out by trying a sequence of domains till it can reach a live server. It's likely to switch among servers after that to avoid detection. It could engage in a lot of unsuccessful domain lookups while doing its dirty work.
Logging and monitoring
Even a small network will generate a huge amount of DNS traffic. Discovering problems with it requires a well-designed approach. Analytical and monitoring software can go through the traffic and catch significant failure rates and suspicious patterns. Atypical domain queries and high failure rates will trigger notifications that something may be wrong.
A monitoring setup that pays attention to domain queries keeps the network safe against both name server failures and malicious activity. There's less downtime, and administrators can catch and remove malware more quickly.