Day 4: Data Structures – I Made a Hash Table

Apart from hashtags, my association with hashes is encrypting passwords in a database. That association is correct in terms of what a hash is, but there are also hash tables, hash functions, hash maps, and oh my gosh, more hash things for sure. Let’s focus on a hash function and a hash table and building…

A diagram of a hash table if people's names

Apart from hashtags, my association with hashes is encrypting passwords in a database. In a WordPress MySQL database, for example, user passwords are encrypted with the MD5 hash. I know this because I once forgot my password to a local, unconnected-to-email WordPress install and went rooting around the database to change it…let’s not focus on that now though.

A Metaphor

Hashes in the hash table sense refer to using hash codes that link data to its location in an array. In this BaseCS article, Vaidehi Joshi outlines a fantastic metaphor for hash tables as organizing books on shelves. I recommend reading the article, but to summarize her metaphor:

If you have a bunch of books, you could organize them alphabetically or by category or by publish date or by any other means. Any way you choose, there will still be some overhead where you have to sift through other, similar books to get to the one you want.

What if you could create some kind of magic tool/mapping situation where the title of the book could be translated to a shelf for the book to go? That would be SO cool.

In this metaphor, the shelves are the hash table and that magic tool/mapping situation is a hash function.

Like stacks and queues, hash tables are an abstract data structure meaning that they are made of other, literal data types and have a certain behavior. In this case, our shelves are an array. Let’s get away from this metaphor now though, and refer to the hash table as an array of buckets (a term you will see when researching hash tables) rather than shelves.

To reiterate, remember that the literal data type of a hash table is an array, but because we are using hashes to handle the sorting, it is a hash table.

Let’s clarify some terms

The bucket is where the data will be stored, usually an index in an array e.g. [_,_,_,_] would be an array of four buckets, so each “_” is a bucket.

The hash table is the entire set, or array, of buckets. These are sometimes referred to as hash maps or dictionary, I think. It’s kind of an array, but since there can be multiple entries in one bucket, it becomes more like a table. Again, I think so.

The key is a readable string that will be used to identify the data to be stored in the bucket.

The hash is a number determined from the key that will determine in which bucket the data will go.

The hash function maps the key to a bucket.

In JavaScript

In this video (and several others) names and phone numbers are used as examples, and I’m going to piggy back off of that here. Here is a basic set up of some data with a key and associated data. If we were following the book metaphor, the key would be “Harry Potter and the Prisoner of Azkaban” and the data would be the book text.

But back to phone numbers. Let’s say we have a hash table with 10 buckets for phone numbers. This is our first phone number:

const laraData = {
    key: "Lara",
    data: "126-126-1262"
};

Note: that’s not my phone number and laraData reminds me of Lara Dutta who is a Bollywood star and was Miss Universe in 2000. I’ll be sticking to hash tables for now, but you know, #lifegoals.

In order to determine which of the 10 buckets this phone number should be placed in, we can run the key, “Lara” through a simple hash function that will spit out an index for us. Here’s what that simple hash function could look like (credit to Jenn’s CodePen for getting me started on this!):

function simpleHash(str, tableSize) {
    let hash = 0; // Create a variable to hold the hash
    
    // Loop through all the characters in a string
    for( let i; i < str.length; i++) { 
        // Generate an arbitrary number from the character codes
        hash += str.charCodeAt(i);
    }

    // Use a modulus operator to return a number from the hash that is within the table size
    return hash % tableSize; 
}

const bucketCount = 10; // i.e. table size or buckets.length if you had a buckets array
const hash = simpleHash("Lara", bucketCount); // Generate a hash value from the string "Lara"

console.log(hash); // This would be 0, and we can that as the index for where to insert "Lara"

The above is just the hash function, we haven't created the hash table yet. Incredibly, I was pretty much able to figure this out on my own in this here CodePen, but here it is as a JS Bin so you can see the output. Open the console to see the output!

See the Pen WIP: Pretty Simple Hash Table by Lara Schenck (@laras126) on CodePen.

I definitely dig ES6 classes - especially since a lot of the stuff I am studying for CS this and that is in C or Java or Python, having an actual class syntax is pretty cool.

So, one thing about hash tables: we rarely have to implement them on our own. This little hash table I wrote is very, very rudimentary and probably misses a bunch of marks, but hey, I did it! In the real world, they come baked into most programming languages and definitely baked into frameworks, and if not, this is definitely a case where you would seek out a library with a bullet-proof hash table rather than handling it yourself.

Resources

Whew!

I must say, I have exceeded my own expectations in actually keeping up with these posts! Lately, I've been battling a starting-not-finishing things phenomenon - potentially a result of having so many ideas and a partner who is forcing a good work-life balance. Anyhow, it's been quite cathartic to force myself to just write and publish and not obsess over the details – details are important, but in this case, it's my website so it's my call :).

Okay, it's time to move on to algorithms from data structures, I'm sad to say. I liked learning about data structures, and algorithms are intimidating AF. I still need to cover Tries, Heaps, and Vectors/ArrayLists, so maybe, just maybe, there will be some kind of loose-ends data structures post...but don't hold your breath. Algorithms tomorrow!