A Hash Table is an array-based data structure that offers fast insertion and searching of elements. It organizes data so that you can uniquely identify a specific object from a group of similar objects. It is limited in size because it is based on array.

For example, in schools, each student is assigned a unique roll number that can be used to retrieve information about them. So, here the students are hashed to a unique number.

If you need only the basic set operations like add, search, delete, you can implement Hash Table data structure. It uses hash function to convert the key into the index of an array element, where the associated value is to be sought.

So, key values are assigned to elements in a Hash Table using a Hash Function.

A Hash Function helps calculate the best index an item should go in. The index must be small enough for the array size so that it stays within the constraints of the array size. It canâ€™t overwrite other data in the Hash Table.

A Hash Functions basic job is to store values in an array with a limited size. It does it in a way that the array doesnâ€™t need to be searched through whenever the user decides to go and try to find that information in the Hash Table. This allows you to enter values in any order and be able to find them using a calculation instead of searching through the array. That is why it is so fast.

So, Hash Function is a very important part of hash table design and provides uniform distribution of hash values.

So, the information in a hash table has two main components- key and value.

For example, let us take the key as my name and value as my phone number,

```
Key: Abhinab
Value: Phone#
```

So, we are going to map the key to the value. At the heart of the Hash Table, we are going to have an array like the one stated above.

Now, we are going to write a Hash Function. What this Hash Function will do is, it will look at certain key and will evaluate the key. Then, it will spit out some sort of index number and tell us what location in the array will store the information.

```
Hash(Key) -> index
Hash(Abhinab) -> 3
```

Here, the Hash Function evaluated the key(my name) and stored the information in index 3. So, every time you enter the same key, it is going to spit out the same index number.

If we represent keys to their values in terms of symbols, we get something like this-

Here, you can see that Ashu key has the same index number as Abhinab key. So, there is a collision here. I have created a link here to put the Ashu key in index 3. This is called chaining. So, if we get collision where many keys get the same index number, then we can represent it by using chaining.

Collisions are practically unavoidable. Due to collisions, keys are stored also stored in the table and one can distinguish between key-value pairs having the same hash. There are various ways to resolve collision. So, there are two different strategies:-

**Closed Addressing(Open Hashing):-**Each slot of the Hash Table contains a link to another linked list, which stores key-value pairs with the same hash. When there is a collision, it searches for a key-value pair which matches the key. So hash table becomes an array of buckets (linked lists) where the hash points to the bucked where the element can be found. For this reason if hash function simply returns a constant then all the elements would end up in a linked list and worst case search performance from our expected O(1) would degrade to O(n)! Thatâ€™s why it is so important to choose a good hash function for your data.**Open Addressing(Closed Hashing):-**Each slot contains a key-value pair. When there is a collision, open addressing algorithm calculates another location to locate a free slot. There is a drastic performance decrease when you implement open addressing strategy.

For a good hashing mechanism, you should have a good hash function with the following basic requirements:-

- It should be easy to compute
- It should provide a uniform distribution across the Hash Table
- You should avoid collisions that occur when pairs of elements are mapped to the same hash value.

For example, assume that you have to store strings in the hash table by using hashing techniques {â€œabcdeâ€?, â€œbcdefaâ€?, â€œcdefabâ€?, â€œdefabcâ€?}.

Let us take a hash function that states the following:

The index for a specified string will be equal to the sum of the ASCII values of the characters modulo 599.

The ASCII values of a, b, c, d, e, and f are 97, 98, 99, 100, 101, and 102 respectively. Since all the strings contain the same characters with different permutations, the sum will 599.

As the index of all the strings is the same, we can create a list on that index and insert all the strings in that list.

Here, all the strings are sorted at the same index. It will take **O(n)** time to access a specific string. This means that this hash function is not a good hash function.

Now, let us take a different hash function that states:

The index for a specific string will be equal to sum of ASCII values of characters multiplied by their respective order in the string after which it is modulo with 2069.

```
String Hash Function Index
abcdef (971+982+1004+1015+1026)%2069 38
bcdefa (981+992+1003+1014+1025+976)%2069 23
cdefab (991+1002+1013+1024+975+986)%2069 14
defabc (1001+1012+1023+974+985+996)%2069 11
```

Here, all the strings are stored at different indices.

- Lookup take O(1) time on an average.
- It has flexible keys. Most data types can be used for keys a slong as they are hashable.

- Lookups take O(n) time in the worst case.
- The keys are unordered. If you are looking for the largest key, you will look through every key to find it.
- Many hash table implementations use linked lists, which are not placed next to each other in memory.

Under certain assumptions, the average time required to search for an element in a hash table is **O(1)**.

Let us take an example of a string S. We need to count the frequency of all the characters in the string.

`String S = â€œababcdâ€?`

We can achieve this by iterating over all the possible characters and counting their frequency one by one. The time complexity of this approach is O(26*N) where N is the size of the string.

```
for(char c = â€˜aâ€™;c <= â€˜zâ€™;++c)
{
int frequency = 0;
for(int i = 0;i < S.length();++i)
if(S[i] == c)
frequency++;
System.out.println(c + â€˜ â€™+ endl);
}
```

```
Output:
a 2
b 2
c 1
d 1
e 0
f 0
...
z 0
```

Let us apply hashing to this problem. Take an array frequency of size 26 and hash the 26 characters with indices of the array using the hash function. Then, iterate over the string and increase the value in the frequency at the corresponding index for each other. The **complexity of the approach is O(N)** where N is the size of the string.

```
int Frequency[26];
int hashFunc(char c)
{
return (c - â€˜aâ€™);
}
public class HashTable{
public void countFre(string S)
{
for(int i = 0;i < S.length();++i)
{
int index = hashFunc(S[i]);
Frequency[index]++;
}
for(int i = 0;i < 26;++i)
System.out.println((char)(i+â€™aâ€™) + â€˜â€™ + Frequency[i] + endl);
}
}
```

```
Output:
a 2
b 2
c 1
d 1
e 0
f 0
...
z 0
```

**Index = hashFun(char)**

If all the keys caused hash collisions, we will be at risk to walk through all of the values for a single lookup. We will have one big linked list. This is the worst case.

As the number of keys and values in a hash map exceeds the number of indices, hash collisions are bound to take place. To avoid this, we can expand the array whenever things start to get crowded.

That requires allocating a larger array and rehashing all of the existing keys to figure out their new position- O(n) time.

Myself KarthiQ, I am the author of this blog, I know ways to write a good article but some how I donot have the skills to make it to reach people, would you like help me to reach more people By sharing this Article in the social media.

You can attend first 3 classes for free, the total

The course time would be 8.00 PM(IST) for the first three classes

If you are interested to learn, then you can join the course by sending email to

| |||||