1Password Cracking Challenge - Part 1

I had this idea for the 1Password cracking contest. The contest is a brute force hash cracking contest. There are 7 three-word passwords that need to be cracked. Finding even one is enough for a win. Simple enough.

The word list is 18,328 words long. We know from the contest hints that the passphrases contain three words, separated by spaces. That makes for $18328^3 = 6156660823552$ possible combinations.

There are a few problems to start with. First, the possible key space is pretty huge. If every passphrase is 20 bytes, the entire list would take up 123 TB of space. I don't have any drives this big (or want to provision one in AWS and pay for it). When doing some testing, it also took forever to create these huge files. I'm using hashcat and know it does have some modes to pipe in input and also modes to combine a couple of files, so I may not have to generate all of the data up front. However, in testing the combination approach it seemed to slow down hashcat too much. Also, since this is a contest, I would need to run this on many servers at once.

To avoid having to coordinate a bunch of workers and create a centralized database, I had another idea. Spin up a bunch of servers. Each server would then generate a sub-list from the main list. For example, each server could generate a random 250-word list and go over all the possibilities in that list. If all the servers are generating different 250-word lists, then we do not need to coordinate them - we just need to know when one of them has found a match.

This sounds great, but I'd like to be able to calculate how much worse this approach is. It seems like it would be worse, but I need to know if it's just a little worse, or significantly worse.

Cost of Linear Run Through the Keyspace

Calculating the cost of finding a solution by a linear search of the key space is straight forward. I'll calculate this in time time and in dollar cost since I am interested in finding a solution quickly, and cheaply.

\begin{equation*} time = \frac{ keyspace }{hash\:rate * number\:of\:servers } * confidence \end{equation*}
\begin{equation*} cost = time * number\:of\:servers * cost\:per\:server \end{equation*}

For this specific challenge, to be 90 % sure of a find, I just need to search 90 % of the key space.

I am most interested in using AWS instances to do the cracking. I ran a quick benchmark on an AWS GPU instance (p3.8xlarge) and came up with this:

Session..........: hashcat
Status...........: Running
Hash.Type........: PBKDF2-HMAC-SHA256
Hash.Target......: sha256:100000:HMDm4nxLqKHvZhX62DQ1kQ==:TWUgfWwoHXtn...lqsdc=
Time.Started.....: Thu Aug 16 04:27:54 2018 (2 mins, 1 sec)
Time.Estimated...: Thu Aug 16 04:39:20 2018 (9 mins, 25 secs)
Guess.Base.......: File (wordlist)
Guess.Queue......: 1/1 (100.00%)
Speed.Dev.#1.....:    29530 H/s (70.70ms) @ Accel:64 Loops:64 Thr:640 Vec:1
Speed.Dev.#2.....:    29523 H/s (70.69ms) @ Accel:64 Loops:64 Thr:640 Vec:1
Speed.Dev.#3.....:    29599 H/s (70.45ms) @ Accel:64 Loops:64 Thr:640 Vec:1
Speed.Dev.#4.....:    29603 H/s (70.45ms) @ Accel:64 Loops:64 Thr:640 Vec:1
Speed.Dev.#*.....:   118.3 kH/s
Recovered........: 0/1 (0.00%) Digests, 0/1 (0.00%) Salts
Progress.........: 13107200/80000000 (16.38%)
Rejected.........: 0/13107200 (0.00%)
Restore.Point....: 3276800/80000000 (4.10%)
Candidates.#1....: aardvark autobahn redact -> aardvark ban lie
Candidates.#2....: aardvark ban lied -> aardvark bebop enslave
Candidates.#3....: aardvark ape caspian -> aardvark artless uppercut
Candidates.#4....: aardvark artless uppish -> aardvark autobahn red
HWMon.Dev.#1.....: Temp: 72c Util:100% Core:1530MHz Mem: 877MHz Bus:16
HWMon.Dev.#2.....: Temp: 66c Util:100% Core:1530MHz Mem: 877MHz Bus:16
HWMon.Dev.#3.....: Temp: 64c Util:100% Core:1530MHz Mem: 877MHz Bus:16
HWMon.Dev.#4.....: Temp: 75c Util:100% Core:1530MHz Mem: 877MHz Bus:16


What's important is that the GPU cores are all 100% utilized, and the total hashrate is 118,300 hashes/sec.

Plugging this into the previous equation yields:

\begin{equation*} time = \frac{ 18328^3 }{118300 * 1 } * 0.9 = 542.11 \:days \end{equation*}
\begin{equation*} cost = 542.11 * 1 * (24 * 13.464) = $175,175.26 \end{equation*} The cost of being 90% sure of finding one password, using AWS resources doing straight crawl of the keyspace, is$175k.

Cost of Creating Random Lists

Let's compare this to the cost of the other idea I had: creating a bunch of short word lists and hoping eventually we create a list that contains the three words that make up a passphrase.

First, I need to know the odds of creating a short word list out of a long one and successfully choosing all 3 words. When dealing with statistics, I've found If I can rephrase my problem into something describing dice rolls, card draws, or choosing colored balls from an urn, I can then google enough things to come across a forumla that should work for me. Imagine for this test that there is an urn containing 18328 balls, 3 of them colored green, and you draw 250 balls out. What are the odds of drawing all 3 balls? This is one experiment. Then you just repeat that experiment over and over.

Enter the Hypergeometric Distribution

\begin{equation*} P(X = k) = \frac{number\: of\: ways\: for\: success * number \:of \:ways \:for \: N - i \:failures}{ total\: number\: of\: ways \: to \: select} \end{equation*}

Or, more concisely, using the combination function from http://www.statisticshowto.com/hypergeometric-distribution-examples/

\begin{equation*} P(X = k) = \frac{\binom{K}{k} \binom{N - K}{n - k}}{\binom{N}{n}} \end{equation*}

Where:

• K is the number of successes in the population
• k is the number of observed successes
• N is the population size
• n is the number of draws
\begin{equation*} P(X = 3) = \frac{\binom{3}{3} \binom{18328 - 3}{250 - 3}}{\binom{18328}{250}} \end{equation*}

and the combination function is

\begin{equation*} \binom{n}{k} = \frac{n!}{(n - k)k!} \end{equation*}

Here is the python code for calculating the combination function and hypergeometric probability.

from decimal import Decimal

# This one from stack overflow : https://gist.github.com/rougier/ebe734dcc6f4ff450abf
#
def c(n, k):
"""
A fast way to calculate binomial coefficients by Andrew Dalke (contrib).
"""
if 0 <= k <= n:
ntok = 1
ktok = 1
for t in xrange(1, min(k, n - k) + 1):
ntok *= n
ktok *= t
n -= 1
return Decimal(ntok) / Decimal(ktok)
else:
return 0

# The rest I wrote
def geom(K, k, N, n):
return (c(K, k) * c(N - K, n - k)) / c(N, n)

p_x_3 = geom(3, 3, 18328, 250)
print "probability: %0.15f" % (p_x_3)


And running it yields:

$python odds.py probability: 0.000002507938592  So the odds of choosing 250 words, and those words including the three words of the passphrase, is 0.000002507938592 , in other words, super tiny. Now that we know the odds of finding a match, we just need to repeat the experiment enough times. The more times we repreat the experiment, the higher the odds will be. The binomial distribution will let us calculate exactly how many times we need to repeat the experiment to have a certain confidence of finding a match. The key here is that we are going to repeat an experiment a number of times, and have the same chance for success on each experiment. \begin{equation*} b(x; n, P) = \binom{n}{x} * P^x * (1 - P)^{n - x} \end{equation*} Where: • x = the number of successes • n = the number of trials (number of 250-word lists we create) • P = the probability of success on any trial Python code for this is as follows: def binomial(x, n, P): return c(n, x) * Decimal(pow(P, x)) * Decimal(pow(1 - P, n - x))  Now we just need to figure out how many trials would yield a 90% chance of success. trials = 950000 prob_nonzero = 1 - binomial(0, trials, prob) print "%d trials would yield %f prob of success" % (trials, prob_nonzero)  with output: 950000 trials would yield 0.907685 prob of success  950,000 trials, before we have a good chance of success. Let's figure out how long this is and how much it would actually cost. \begin{equation*} time = \frac{keyspace\: for\: one\: trial * number\: of\: trials }{ hashrate } \end{equation*} \begin{equation*} time = \frac{250^3 * 950000}{118300} = 1452.26 \:days \end{equation*} \begin{equation*} cost = 1452.26 * 24 * 13.464 =$469,277.49 \end{equation*}

$469k!! Compare that with the$175k from a straight run through. This is not only worse, but almost 3 times as bad as just going straight through the list. I thought about joining in the contest by cranking up some AWS resources and doing exactly this strategy. I ran a bunch of simulations but could only really calculate the average of my simulations. Doing the math here, rather than simulations, showed me this idea was much worse than I though and saved me from spending a lot of money on AWS hardware.

It turns out, fiddling with the numbers above, even creating a list of 5000 words, which is almost 1/3 of the original list size, would require 1406 days to guarantee a 90% probability of winning. This would also require a 2.5TB list, which is larger than I had wanted to create.

Summary: Just Do A Linear Run

So the best solution to this problem is going to be going through the list in a straightforward manner. To do that on one server will take too long, so the problem needs to be split up on multiple servers. Doing a little research (ok, just googling) distributed hashcat, there are a few options:

I really just need a bare bones approach with a way to run some code to generate a slice of the dictionary, weed out pasphrases I don't want, not repeat work, and know when I've found a match. I'm not sure if any of the above really do what I need, or if I should just roll my own. The work coordination, locking, and not duplicating slices would be the most difficult part of this.

I have some more thoughts on the contest, but, as it's not over yet, and I might jump in if conditions are right, I'll save it for another post.

Pricing Notes

One quick note on AWS hardware and pricing - it might be possible to reduce the cost by using spot instances, but there is no guaranteed availability and the pricing is difficult to predict. In some tests, I was able to create spot instances but in some other later tests there was no capacity when I tried to create spot instances, which is why I based my price figures on the AWS on-demand pricing.

Also, the folks making the challenge had a far lower estimate of the total solution price than I did. They must have estimated solving this challege on pre-owned hardware and not public cloud / on-demand hardware.

Now, on to build some more ops contraptions.