Dictionary attack

From Citizendium
Revision as of 21:25, 26 July 2010 by imported>Sandy Harris (→‎Defenses)
Jump to navigation Jump to search
This article is a stub and thus not approved.
Main Article
Discussion
Related Articles  [?]
Bibliography  [?]
External Links  [?]
Citable Version  [?]
 
This editable Main Article is under development and subject to a disclaimer.

A dictionary attack is an attack on a password or other user authentication system where the system being attacked stores an encrypted form of the passwords. The attacker encrypts an entire dictionary, builds a large table of encrypted candidate passwords, then compares the actual encrypted passwords to that. If anyone's password is in the dictionary, then he can break into that account. The technique is fairly widely used, both by actual attackers and by systems administrators who want to check if users are using weak passwords which make the system vulnerable.

The attacker's methods

The advantage to the attacker is that he does most of his work offline. He can do the whole dictionary encryption at his leisure before even approaching the target system. It is usually possible to use a single encrypted dictionary against many targets; the same encrypted dictionary is useful against all machines using the same password encryption algorithm, which often means all machines running the same operating system. Also, a single dictionary attack generally checks all passwords on a system. This makes it worthwhile for the attacker to expend considerable resources on a large dictionary, both time to compute all the encrypted forms and space to store them.

The resources needed are not huge. One common algorithm for password encryption is SHA-1 which gives a 160-bit or 20-byte hash, so with a million-word dictionary the attacker needs 20 megabytes to store the hashes. A password test cannot take more than about a second without inconveniencing users, so an enemy can certainly encrypt a million-word dictionary in under a million seconds, about 12 days, using a single machine about as fast as the target systems. In many cases the password algorithm is faster than that. It may also be faster on the attacker's machine than on the target; consider an attacker with a good PC trying to break into an embedded system with a limited CPU. In some cases, an attacker may deploy larger resources; the dictionary calculation is easily parallelized if the attacker has access to multiple machines.

It is quite common to add additional things to the dictionaries, Nearly all password-cracking dictionaries include a list of women's names; these have been shown to be very common password choices. Many dictionaries also add names of movie, TV or comic book characters, cities, and so on. This catches the user who imagines that "Gandalf" or "Isphahan" is obscure enough to be a good password. Entire foreign language dictionaries may also be added. A word in Spanish or Hindi is not a good password either. Nor are common phrases; "hello,world" is a dreadful choice. Some attackers' dictionaries even contain initials from common phrases to catch users who create passwords that way, so "ouatiagfa" from "Once upon a time in a galaxy far away" may not be a good password either.

The attacker can use variants on dictionary words as well. If the dictionary has "wombat", he might try "Wombat", "w0mbat" and "tabmow" as well, and perhaps also wombat1, wombat2, ... Generating such variants is easily automated. There is a trade-off he can make, doing more work and using more storage versus improving the chances of success.

The resource constraints are not tight. With a dictionary size of a few tens of millions of words — say half a dozen languages plus lists of names and geographic terms — all you need to try the attack is access to the encrypted passwords, a reasonable computer, and a few weeks to work on it.

Defenses

Back in the 1970s, Unix introduced the idea of adding salt to passwords and today nearly all systems do that. This is a random number that is added to the password before encryption. The attacker, not knowing the salt, is forced to try all possibilities. For a 12-bit salt, as used in the original Unix method, each word in the attacker's dictionary gives 4096 possibilities. If encrypting the unsalted dictionary would need a few hours and a few megabytes of storage, he now needs months and gigabytes. Modern systems often use much larger salt. A minor but desirable side effect of this is that if a user uses the same password on several systems (not a good idea, but fairly common) then, because the salt is different, the encrypted forms will be different on each system.

If the attacker can copy the password file from the target, then he can do all the comparisons between the dictionary and the encrypted passwords offline as well. This was possible against early Unix systems; the password file was readable by any user and it contained all the encrypted passwords. The primary defense against dictionary attacks is to prevent the enemy from getting the password list. For example, on modern Unix systems, the data is partitioned; there is still a world-readable /etc/passwd file so anyone can look up user names, etc. but the actual password data is in a "shadow" password file which only root (the system administrator) can read.

If the attacker cannot get the password file for an offline attack, then his only dictionary-based methods are to try repeated logins or, from a user account, to make repeated tries to elevate his privileges to those of an administrator. This is inherently much slower than an offline attack — trying all of a million possible passwords is far more expensive than taking a known user password and checking for a match in a million-entry table. It is also much easier for defenders to detect; with good logging or an intrusion detection system, or alert system administrators, an attack that tests many passwords is quite likely to be noticed. Also, on some systems, every time a user logs on there is a little notice about when and where he or she last logged in; this allows an alert user to detect compromise of the account. As a general rule, these direct password-guessing attacks are impractical; they take too long and the risk of detection is too high.

Various methods can be used to slow such attacks. One technique is to add a delay in login processing. Ideally, make it an exponentially increasing delay on successive login attempts — say half a second on the first attempt and doubling for each later attempt. It is more effective to apply this to all attempts from the same terminal or IP address, rather than just attempts to log in to the same account. The inconvenience to legitimate users is small, but any delay makes testing many passwords more difficult and exponential increase makes it effectively impossible.

Another defense is to set up the system to automatically disable an account after some number of failed login attempts. This prevents password-guessing attacks, but it opens the system up to a form of denial of service attack. An attacker with a list of valid user IDs, which can be easily obtained for many systems, need only set up a program that tries to log in to all of them enough times with an arbitrary improbable password. If the target is a company with a 5-day work week and he runs that program over the weekend, utter chaos will ensue on Monday morning. If the attacker has even a small botnet at his disposal, he can cause chaos rather quickly. Suppose a login attempt takes 10 seconds and an account is disabled after three bad attempts; a single attacking computer can then disable two accounts a minute or 120 an hour. A botnet with a few hundred zombies (compromised machines) could do quite a bit of damage before the system administrators could notice and block the attack. In most cases, it is probably better to just log failed login attempts and set up something that alerts administrators if there are too many.

Using challenge-response protocol blocks dictionary attacks. In such a protocol, the system sends a random challenge on each login attempt. To gain access, the other player must produce a response that depends in some fairly complex way on both the password and the challenge.