Question: For the first task, you are to create hulk.py, which is a script that uses brute-force to smash a set of MD5 hashes: $ ./hulk.py
For the first task, you are to create hulk.py, which is a script that uses brute-force to smash a set of MD5 hashes:
$ ./hulk.py -h Usage: hulk.py [-a alphabet -c CORES -l LENGTH -p PREFIX -s HASHES] -a ALPHABET Alphabet to use in permutations -c CORES CPU Cores to use -l LENGTH Length of permutations -p PREFIX Prefix for all permutations -s HASHES Path of hashes file
Given an ALPHABET (default is abcdefghijklmnopqrstuvwxyz0123456789), hulk.py will compute the MD5 hash of every permutation of the ALPHABET for the specified LENGTH and check if it is in the set of HASHES. If a PREFIX is specified, then this should be inserted before each candidate permutation.
Examples
For instance, suppose we had a sample.hashes file that contained the following MD5 checksums:
0cc175b9c0f1b6a831c399e269772661 92eb5ffee6ae2fec3ad71c777531578f 4a8a08f09d37b73795649038408b5f33 900150983cd24fb0d6963f7d28e17f72
If we executed hulk.py with a LENGTH of 1, we should get the following result:
$ ./hulk.py -l 1 -s sample.hashes a b c
That is, hulk.py determined that three of the hashes correspond to the passwords a, b, and c.
If we executed hulk.py with a LENGTH of 2 and a PREFIX of a, we should get the following result:
$ ./hulk.py -l 2 -s sample.hashes -p a abc
That is, hulk.py determined that 1 of the hashes corresponds to the password abc. Note, we could have achieved the same results by executing:
$ ./hulk.py -l 3 -s sample.hashes abc
The difference is the former searches each permutation in the form aXX where X is a letter from the ALPHABET due to the PREFIX of a, while the latter searches each permutation in the form XXX.
Finally, if CORES is greater than 1 and the LENGTH is greater than 1, the script will use the concurrent.futures module to brute-force passwords in parallel with the specified number of CORES:
$ ./hulk.py -l 1 -c 2 -s sample.hashes a b c
Skeleton Code
To help you get started, the instructor has provided you with the following skeleton code:
# Download skeleton code $ curl -LO https://raw.githubusercontent.com/nd-cse-20289-sp23/cse-20289-sp23-assignments/master/homework06/hulk.py
The skeleton code contains the following:
import concurrent.futures import hashlib import os import string import sys # Constants ALPHABET = string.ascii_lowercase + string.digits # Functions def usage(exit_code=0): progname = os.path.basename(sys.argv[0]) print(f'''Usage: {progname} [-a ALPHABET -c CORES -l LENGTH -p PATH -s HASHES] -a ALPHABET Alphabet to use in permutations -c CORES CPU Cores to use -l LENGTH Length of permutations -p PREFIX Prefix for all permutations -s HASHES Path of hashes file''') sys.exit(exit_code) def md5sum(s): ''' Compute md5 digest for given string. ''' return '' def permutations(length, alphabet=ALPHABET): ''' Recursively yield all permutations of the given length using the provided alphabet. ''' yield None def flatten(sequence): ''' Flatten sequence of iterators. ''' yield None def crack(hashes, length, alphabet=ALPHABET, prefix=''): ''' Return all password permutations of specified length that are in hashes by sequentially trying all permutations. ''' return [] def whack(arguments): ''' Call the crack function with the specified list of arguments ''' return [] def smash(hashes, length, alphabet=ALPHABET, prefix='', cores=1): ''' Return all password permutations of specified length that are in hashes by concurrently subsets of permutations concurrently. ''' return [] # Main Execution def main(): arguments = sys.argv[1:] alphabet = ALPHABET cores = 1 hashes_path = 'hashes.txt' length = 1 prefix = '' if __name__ == '__main__': main() You are to complete the sections marked TODO in order to complete the hulk.py script.
Hints
For md5sum, you must compute the MD5 hash of a string by using the hashlib library. To convert a unicode string into bytes, you can call the str.encode method of the string object.
For permutations, you must use a recursive algorithm to implement a generator using the yield statement. You may not use any of the utilities in the itertools module for this function.
For the base case, you will want to consider what happens with a length of 1 and for the recursive case, how you can combine the alphabet with the results of permutations of length - 1 to yield each permutation of the specified length.
if length == 1: # Base case yield each letter in alphabet else: # Recursive case yield the combination of each letter with each permutation of length - 1
For flatten, you must use the yield statement to implement a generator that takes a sequence of iterables and converts it into a single sequence of values from the nested data.
The idea with this function is that given a nested sequence of iterators such as [[A, B, C], [D, E, F]], you yield each item from the subsequences: A, B, C, D, E, F.
For crack, you must implement the function using only list comprehensions or generator expressions (you can use multiple ones).
Pay careful attention to not use too much memory in this function, particularly when calling permutations.
For whack, you must implement the function by calling the crack function with the provided list of arguments.
This function exists because map can only pass a single argument to the function it applies. To workaround this limitation, we create the whack helper function which takes a tuple of arguments which can then unpack to call crack. That is, whack takes a tuple of four items which correspond to the parameters to crack and we simply need to call crack with these items:
# Calling whack should result in calling crack with the arguments # tuple unpacked as parameters. whack((hashes, length, alphabet, prefix)) -> crack(hashes, length, alphabet, prefix)
For smash, you must implement the function by using the map method of the ProcessPoolExecutor from the concurrent.futures module in order to take advantage of multiple CPU cores.
To allow for parallel execution, you will need to do divide up the work normally done by one process. The easiest way to accomplish this is by taking advantage of the prefix feature of crack.
For instance, if the specified LENGTH is 4, you can crack passwords of length 3 with prefixes of [a, b, ..., 9] to achieve the same result:
Passwords of length 4 Prefixes + Passwords of length 3 aaaa a + aaa -> aaaa aaab a + aab -> aaab .... .... baaa b + aaa -> baaa baab b + aab -> baab .... ....
In smash, you will need to create an arguments sequence that consists of the argument tuples to pass to whack. As explained above, each tuple consists of the arguments to crack: hashes, length, alphabet, prefix. Based on the previous description, only the length and prefix arguments need to be modified in the arguments sequence:
(original hashes, new length, original alphabet, new prefix)
For simplicity, we suggest that you use each letter in the alphabet as part of the prefix in the arguments tuple, while keeping in mind that you still need to account for the prefix passed to smash itself. Because we are now using the prefix feature of crack, you will also need to update the length argument in the tuple.
Your code should look something like this:
# Generator expression representing a sequence of arguments to pass # to `whack` (hashes, length, alphabet remain constant, but prefix # should change for each argument tuple). arguments = ((...) for ... in ...) # Create a ProcessPoolExecutor and then apply whack to the # arguments with concurrent.futures.ProcessPoolExecutor(cores) as executor: ... executor.map(...)
Because the executor.map will return a sequence of sequences, you will want to use the flatten function you wrote to convert the result into a single sequence.
You can parse the command line options as we have done in the past, or you can investigate using argparse or getopt.
You must use a set to store the collection of MD5 hashes.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
