Tuesday, 12 December 2023

How databases, data ware houses benefitted from Bitmap indexing?

 

A bitmap index is a specialized form of database index designed to optimize the performance of certain types of queries. Instead of maintaining a list of row identifiers for each unique value in a column, a bitmap index employs bit arrays, known as bitmaps. These bitmaps effectively represent the presence or absence of values for each distinct value in the indexed column, providing an efficient way to query and retrieve data.

 

Cardinality

Cardinality is the number of unique values within a column.

 

For instance, consider a "gender" column, which might only encompass two distinct values ("male" and "female"). Here the cardinality of gender column is 2.

 

Bitmap indexes are well-suited for low-cardinality columns

Bitmap indexes are optimized for low-cardinality columns. Each distinct value in the column is represented by a single bit in the array. If a row has that value, the corresponding bit is set to 1. else, it is set to 0.

 

Conversely, bitmap indexes may not be as efficient for high-cardinality columns. As the number of distinct values in a column increases, the size of the bitmap index also increases. This can lead to increased storage requirements and slower query performance.

 

For columns with many distinct values, You can use a traditional B-tree index instead of a bitmap index.

 

Let’s try to understand better with below data set.

 

order_id

customer_id

amount

order_status

payment_type

1

23

1000

SHIPPED

CASH

2

43

550

DELIVERED

CASH

3

1

345

RETURNED

ONLINE

4

298

10000

CACNCELLED

ONLINE

5

543

45000

SHIPPED

CASH

6

987

87000

DELIVERED

ONLINE

7

65

450

DELIVERED

ONLINE

 

Here is an example to illustrate the significance of bitmap indexing in DBMS and data warehouses. Imagine an E-commerce database has a table called 'orders,' responsible for storing order details such as order_id, customer_id, amount, order_status, and payment_type.

 

As it is crucial to note that bitmap indexes yield optimal results when applied to columns with low cardinality. In the provided scenario, it becomes advantageous to create a bitmap index for columns like order_status and payment_type.

 

Cardinality for order_status is 4 (There are four distinct values here SHIPPED, DELIVERED, RETURNED, CANCELLED).

 

Cardinality for payment_type column is 2 (There are two distinct values CASH and ONLINE).

 

Bitmap index for the payment_type column

 

1

2

3

4

5

6

7

CASH

1

1

0

0

1

0

0

ONLINE

0

0

1

1

0

1

1

 

Given that the payment_type for records 1, 2, and 5 is CASH, the CASH bitmap index assigns bits at positions 1, 2, and 5 a value of 1.

 

Likewise, for records 3, 4, 6, and 7 where the payment_type is ONLINE, the ONLINE bitmap index designates bits at positions 3, 4, 6, and 7 as 1.

 

Bitmap index for the order_status column

 

1

2

3

4

5

6

7

SHIPPED

1

0

0

0

1

0

0

DELIVERED

0

1

0

0

0

1

1

RETURNED

0

0

1

0

0

0

0

CANCELLED

0

0

0

1

0

0

0

 

Let’s see how the queries are performed fastly using bitmap indexes.

 

Query 1: Select all the orders of payment_type is CASH

Select * from orders where payment_type= 'CASH'

 

In this example, we can get all the set bits location of CASH bitmap and return the records with specific ids.

 

Query 2: Select all the oders of payment_type is ONLINE and order_status is DELIVERED.

 

In this, example, we need to apply bitwise AND operator on ONLINE and DELIVERED bitmaps.

 

 

1

2

3

4

5

6

7

ONLINE

0

0

1

1

0

1

1

 

                                                      AND

 

 

1

2

3

4

5

6

7

DELIVERED

0

1

0

0

0

1

1

 

Final result is

 

 

1

2

3

4

5

6

7

ONLINE and DELIVERED

0

0

0

0

0

1

1

 

So, we can return the records 6 and 7.

 

Advantages:

Efficient Space Utilization: Bitmap indexes demonstrate notable space efficiency, especially for columns exhibiting low cardinality, where there are few distinct values.

 

Faster Query Execution: Bitmap indexes prove highly effective in speeding up queries that involve multiple conditions or range queries.

 

Even though Bitmap indexes has greater advantages, it's essential to be mindful that frequent updates to the underlying data might leads to updating multiple bitmap indexes.

 

Demo Application

Let’s implement the same using Java BitSet. java.util.BitSet is a class in the Java programming language that provides a compact way to represent sets of bits. A BitSet is dynamically resizable, meaning it can grow or shrink as needed.

 

Let’s define BitSet objects to represent bitmap indexes.

BitSet orderShippedBitset = new BitSet();
BitSet orderDeliverdBitset = new BitSet();
BitSet orderReturnedBitset = new BitSet();
BitSet orderCancelledBitset = new BitSet();
BitSet paymentTypeCashBitset = new BitSet();
BitSet paymentTypeOnineBitset = new BitSet();

 

Let’s read all the Order objects and populate BitSet objects.

for (Order order : orders) {
	int orderId = order.getOrderId();
	if (order.getOrderStatus() == OrderStatus.SHIPPED) {
		orderShippedBitset.set(orderId);
	}

	if (order.getOrderStatus() == OrderStatus.DELIVERED) {
		orderDeliverdBitset.set(orderId);
	}

	if (order.getOrderStatus() == OrderStatus.RETURNED) {
		orderReturnedBitset.set(orderId);
	}

	if (order.getOrderStatus() == OrderStatus.CACNCELLED) {
		orderCancelledBitset.set(orderId);
	}

	if (order.getPaymentType() == PaymentType.CASH) {
		paymentTypeCashBitset.set(orderId);
	}

	if (order.getPaymentType() == PaymentType.ONLINE) {
		paymentTypeOnineBitset.set(orderId);
	}

}

 

Select all the orders which are RETURNED or CANCELLED

BitSet result = new BitSet();
result = (BitSet) orderReturnedBitset.clone(); 
result.or(orderCancelledBitset);

 

‘result’ bitset has all the bits set, whose orders are either RETURNED or CANCELLED

 

Find the below working application.

 


OrderStatus.java

package com.sample.app.enums;

public enum OrderStatus {
	SHIPPED, DELIVERED, RETURNED, CACNCELLED
}

 

PaymentType.java

package com.sample.app.enums;

public enum PaymentType {
	CASH, ONLINE
}

 

Order.java

package com.sample.app.model;

import com.sample.app.enums.OrderStatus;
import com.sample.app.enums.PaymentType;

public class Order {
	private Integer orderId;
	private Integer customerId;
	private Double amount;
	private OrderStatus orderStatus;
	private PaymentType paymentType;

	public Order(Integer orderId, Integer customerId, Double amount, OrderStatus orderStatus, PaymentType paymentType) {
		this.orderId = orderId;
		this.customerId = customerId;
		this.amount = amount;
		this.orderStatus = orderStatus;
		this.paymentType = paymentType;
	}

	public Integer getOrderId() {
		return orderId;
	}

	public void setOrderId(Integer orderId) {
		this.orderId = orderId;
	}

	public Integer getCustomerId() {
		return customerId;
	}

	public void setCustomerId(Integer customerId) {
		this.customerId = customerId;
	}

	public Double getAmount() {
		return amount;
	}

	public void setAmount(Double amount) {
		this.amount = amount;
	}

	public OrderStatus getOrderStatus() {
		return orderStatus;
	}

	public void setOrderStatus(OrderStatus orderStatus) {
		this.orderStatus = orderStatus;
	}

	public PaymentType getPaymentType() {
		return paymentType;
	}

	public void setPaymentType(PaymentType paymentType) {
		this.paymentType = paymentType;
	}

	@Override
	public String toString() {
		return "Order [orderId=" + orderId + ", customerId=" + customerId + ", amount=" + amount + ", orderStatus="
				+ orderStatus + ", paymentType=" + paymentType + "]";
	}

}

 

App.java

package com.sample.app;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.BitSet;
import java.util.List;

import com.sample.app.enums.OrderStatus;
import com.sample.app.enums.PaymentType;
import com.sample.app.model.Order;

public class App {
	private static void printOrders(List<Order> orders, List<Integer> orderIds) {
		for (Order order : orders) {
			if (orderIds.contains(order.getOrderId())) {
				System.out.println(order);
			}
		}
	}

	private static List<Integer> getAllSetBits(BitSet bitSet) {
		List<Integer> setBits = new ArrayList<>();

		// Iterate through the set bits
		int nextSetBit = bitSet.nextSetBit(0);
		while (nextSetBit != -1) {
			setBits.add(nextSetBit);
			nextSetBit = bitSet.nextSetBit(nextSetBit + 1);
		}

		return setBits;
	}

	public static void main(String[] args) {
		Order order1 = new Order(1, 23, 1000.0, OrderStatus.SHIPPED, PaymentType.CASH);
		Order order2 = new Order(2, 43, 550.0, OrderStatus.DELIVERED, PaymentType.CASH);
		Order order3 = new Order(3, 1, 345.0, OrderStatus.RETURNED, PaymentType.ONLINE);
		Order order4 = new Order(4, 298, 10000.0, OrderStatus.CACNCELLED, PaymentType.ONLINE);
		Order order5 = new Order(5, 543, 45000.0, OrderStatus.SHIPPED, PaymentType.CASH);
		Order order6 = new Order(6, 987, 87000.0, OrderStatus.DELIVERED, PaymentType.ONLINE);
		Order order7 = new Order(7, 65, 450.0, OrderStatus.DELIVERED, PaymentType.ONLINE);

		List<Order> orders = Arrays.asList(order1, order2, order3, order4, order5, order6, order7);

		BitSet orderShippedBitset = new BitSet();
		BitSet orderDeliverdBitset = new BitSet();
		BitSet orderReturnedBitset = new BitSet();
		BitSet orderCancelledBitset = new BitSet();
		BitSet paymentTypeCashBitset = new BitSet();
		BitSet paymentTypeOnineBitset = new BitSet();

		for (Order order : orders) {
			int orderId = order.getOrderId();
			if (order.getOrderStatus() == OrderStatus.SHIPPED) {
				orderShippedBitset.set(orderId);
			}

			if (order.getOrderStatus() == OrderStatus.DELIVERED) {
				orderDeliverdBitset.set(orderId);
			}

			if (order.getOrderStatus() == OrderStatus.RETURNED) {
				orderReturnedBitset.set(orderId);
			}

			if (order.getOrderStatus() == OrderStatus.CACNCELLED) {
				orderCancelledBitset.set(orderId);
			}

			if (order.getPaymentType() == PaymentType.CASH) {
				paymentTypeCashBitset.set(orderId);
			}

			if (order.getPaymentType() == PaymentType.ONLINE) {
				paymentTypeOnineBitset.set(orderId);
			}

		}
		
		// Select all the orders of payment_type is CASH
		System.out.println("Orders whose payment type is CASH");
		printOrders(orders, getAllSetBits(paymentTypeCashBitset));
		
		// Select all the orders which are RETURNED or CANCELLED
		System.out.println("\nSelect all the orders which are RETURNED or CANCELLED");
		BitSet result = new BitSet();
        result = (BitSet) orderReturnedBitset.clone(); // Clone to preserve the original bitSet1
        result.or(orderCancelledBitset);
        printOrders(orders, getAllSetBits(result));
		
	
	}

}

 

Output

Orders whose payment type is CASH
Order [orderId=1, customerId=23, amount=1000.0, orderStatus=SHIPPED, paymentType=CASH]
Order [orderId=2, customerId=43, amount=550.0, orderStatus=DELIVERED, paymentType=CASH]
Order [orderId=5, customerId=543, amount=45000.0, orderStatus=SHIPPED, paymentType=CASH]

Select all the orders which are RETURNED or CANCELLED
Order [orderId=3, customerId=1, amount=345.0, orderStatus=RETURNED, paymentType=ONLINE]
Order [orderId=4, customerId=298, amount=10000.0, orderStatus=CACNCELLED, paymentType=ONLINE]

 

You can download this application from this link.



 

                                                             System Design Questions

No comments:

Post a Comment