Friday, 6 March 2020

Why Set do not have get(int index) method?

Set in Java works on the principle of hashing and do not maintain any order while storing. This is the reason ‘why a set do not have get method by index’.

What is Hashing?
Hashing is a principle used to store and retrieve data in constant O(1) time.

Hash Table
Hash table is a data structure used to store collection of items in a way that it is easy to find them in later. Usually you can store and search data of Hash table in constant (O(1)) time.

For example, consider following hash table of size 13 (Technically each position of hash table is called slot, so it has 13 slots). Initially hash table is empty. Let us try to insert some data into hash table based on some hash function.
What is Hash function?
Hash function is used to map an item to a slot (position) in Hash table. Hash function takes an item as input and returns a slot in range between 0 and hash table size (exclusive).

To demonstrate example, I am going to use following hash function.

h(item)=item%13

Item
Hash Value
103
12
65
0
35
9
19
6
23
10
76
11

Based on the Hash value computed, all the items are placed in respective slots. Element 103 is placed in slot 12, Element 65 is placed in slot 0 etc.,
There is a problem in above approach. Suppose you want to insert an element 26 in Hash table, 26 % 13 = 0, 26 has to be placed in slot 0, but slot 0 is already occupied with element 65. Here the concept of collision comes into picture. I will explain more about this later in the post.

Load Factor
Load factor is denoted by λ, λ = (Number Of Items)/(Hash table size). In our example λ = 6/13 = 0.46153846153846156.

How to search for an element in Hash Table?
We can use the same hash function to search for an element in Hash table. Suppose you want to search for an element 23.

a.    Apply hash function on element 23, to find the slot. 23 % 13 = 10
b.    Go to slot 10 and check whether element 23 exist or not.

What is Perfect Hash function?
Perfect Hash function maps each element to a unique slot. Ideally there is no systematic way to construct a perfect hash function. Usually depends on the load factor, hash table is resized to reduce the collisions.

Collision Resolution
Collision occurs when two items mapped to same slot. Collision resolution techniques play an important role in hashing. One way to address the problem of collisions is linear probing.

Linear Probing
In linear probing technique, we try to find next open slot in hash table sequentially. For example, hash table is already occupied with elements 103, 65, 35, 19, 23, 76.

When we try to place element 26 (slot = 26 % 13 = 0) in hash table, collision occurs. We try to find next open slot from the position 0 and place the element 26 in it. Next open position after slot 0 is 1, so we place the element 26 in slot 1.

Suppose you want to insert element 48 into hash table, Find the slot for the element 48 (slot = 48 % 13 = 9). Since slot 9 is already occupied with element 35, we need to find next open slot sequentially.  Slots 10, 11, 12, 0, 1 are occupied with elements; slot 2 is open for element 48. Place the element 48 into the slot 2. Same linear probing technique is used while searching elements in Hash table.
How you search for an element in linear probing?
Suppose you want to search for an element 35, find the slot value using hash function (slot = 35 % 13 = 9). Check the slot 9 for the element 35. Slot 9 has the element 35, return TRUE.

Suppose you want to search for an element 48, calculate the slot value (slot = 48 % 13 = 9). Check the slot 9 for the value 48, slot 9 is occupied with value 35. We cannot simply return FALSE, since there is a possibility of collisions, we must do a sequential search, starting at position 10, looking until either we find the item 48 or we find an empty slot.

Chaining
Another way of solving the Collisions is using chaining. Chaining maintains a linked list kind of data structure at each slot to resolve collisions.

For example, Elements 103, 65, 35, 19, 23, 76 are inserted into hash table of size 13 using the hash function h(item)=item%13.
Suppose you want insert an element 130 into Hash table, calculate the slot (slot = 130 % 13  = 0). Slot 0 is already occupied with element 65, so create a new node with element 130 attach this node the slot 0. If you add elements 130, 91, 71, 50, 89, 206 to the hash table, hash table change like below.

Suppose you want to search for an element 206 in the hash table, you calculate slot for the element 206 (slot = 206 % 13 = 11), you go to the slot 11 and traverse entire linked list at slot 11, if you found the value 206 in the linked list, you return TRUE, else FALSE. One problem with this approach is if linked list size grows, time complexity increases for search and insert operations. To solve this issue Java8 maintains a binary search tree kind of data structure.

I hope with this you got some idea in Hashing, load factor, collision resolution techniques, lets start discuss on HashMap internal working in Java.

HashMap works on the principle of Hashing. HashMap Use two properties, which impact the performance.

a. Capacity
b. Load Factor

Capacity
Capacity is the number of buckets(slots) in the hash table.

Load Factor
The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. By default Load Factor for HashMap is 0.75.

For Example let us assume
Initial capacity is 10 (I.e, Number Of Buckets (or) slots)
Load Factor is 0.75
Each Bucket has the capacity to store 100 data items.

So when all the items reached 0.75 * 100 * 10 = 750, then the internal data structures rehashed and the Capacity (Number Of Buckets (or) slots) increases depends on implementation. The same logic applies in future also. HashMap is not synchronized, So user has to take care while performing put kind of operations.

You may like

No comments:

Post a Comment