Tuesday, 12 March 2024

Column vs row oriented Databases

One of the ways to classify databases is by how the data is stored on disk: row- or column-wise. Tables can be partitioned either horizontally (storing values belonging to the same row together), or vertically (storing values belonging to the same column together).

 

Row-oriented databases

Row-oriented database management systems store data in records or rows. Their layout is quite close to the tabular data, xls sheet representation, where every row has the same set of fields.

 

Data is organized into tables and rows

a. Data is organized into tables, similar to spreadsheets. Each table represents a specific category of information, like "Customers" or "Orders."

 

b. Each row in a table represents a single record or data entry. A row holds all the attributes (columns) associated with that particular record.

 

For instance, a row-oriented database can effectively manage employee records, containing information such as names, cities, ages, birth dates, phone numbers, and more.

| id | name           | city          | age | birth_date | phone_number    |
|----|----------------|---------------|-----|------------|-----------------|
| 1  | John Doe       | New York      | 35  | 1989-05-21 | +1 (555) 123456 |
| 2  | Alice Lee      | Los Angeles   | 28  | 1996-12-10 | +1 (555) 987654 |
| 3  | Emily Smith    | Chicago       | 40  | 1982-08-03 | +1 (555) 246810 |
| 4  | Michael Johnson| Houston       | 45  | 1977-03-15 | +1 (555) 135790 |
| 5  | Sarah Brown    | San Francisco | 32  | 1992-09-28 | +1 (555) 369258 |

How to data is stored in row oriented databases?

Data for a single row is typically stored contiguously on disk. This means all the values belonging to a particular record (from different columns) are placed as close together as possible on the storage medium.

 

When data for one row is stored contiguously on a disk, it means that all the information related to that row, like a person's name, age, or city, is kept close together. This organization helps the computer quickly find and access all the details for a single entry, like a person's record in a database.

 

Benefits of row oriented database

1. Quick Retrieval of Entire Rows: Row-oriented databases make it easy to fetch complete records because all the data for a row is stored together. This is great when you need to find specific information about something, like looking up a employee’s details using their ID.

 

2. Faster Updation of Records: When you need to change something in a record, like updating an employee's name, city and age, row-oriented databases make it straightforward. You just need to make changes to the entire row where the data is stored, which is simple because everything is already together.

 

3. Smooth Transaction Processing: Row-oriented databases are also good for handling transactions, like when you're buying something online and your purchase needs to be recorded. They handle these kinds of tasks efficiently, making sure your transactions are processed quickly and accurately.

 

Drawbacks of Row-Oriented Storage

1. Slow when Aggregating Columns: If you're trying to do something like finding the average age of all the employees, row-oriented databases might not be the best choice. They have to look through every row to find the information you need, which can take longer because the data is spread out.

 

2. Compression Isn't as Good: Row-oriented databases have trouble compressing data efficiently because they store different types of information, like numbers and text (name, age, city, id etc.,), all mixed together. This can make them take up more space compared to databases that store similar types of data together.

 

3. Might Not Handle Big Data Analysis Well: If you're working with really big sets of data and need to do complicated analysis involving lots of different columns, row-oriented databases might not be up to the task. They will struggle with these kinds of complex jobs, leading to slower performance compared to databases designed specifically for this type of work.

 

In summary, Row-oriented databases are proficient at handling complete records, while column-oriented databases may deliver superior performance for tasks requiring extensive aggregation or analysis across specific columns. The optimal choice between these database types hinges on your unique data access patterns and query needs.

 

 

Column Oriented databases

Column-oriented database management systems organize data vertically, meaning by column rather than by row. In this arrangement, values within the same column are stored sequentially on disk, contrasting with the contiguous storage of rows seen in row-oriented databases.

 

Similar to row-oriented databases, data in column-oriented systems is structured into tables, yet the internal storage methodology differs significantly. Instead of each record having its own row, the data for each column is grouped and stored together.

 

For instance, take an employee table with columns for id, name, age, city, birth date, and phone number. In a column-oriented database, all the ids would be stored together, followed by all the names, and so forth. These groupings of individual column data are often referred to as column chunks or segments.

 

For example, let’s take same employee table.

| id | name           | city          | age | birth_date | phone_number    |
|----|----------------|---------------|-----|------------|-----------------|
| 1  | John Doe       | New York      | 35  | 1989-05-21 | +1 (555) 123456 |
| 2  | Alice Lee      | Los Angeles   | 28  | 1996-12-10 | +1 (555) 987654 |
| 3  | Emily Smith    | Chicago       | 40  | 1982-08-03 | +1 (555) 246810 |
| 4  | Michael Johnson| Houston       | 45  | 1977-03-15 | +1 (555) 135790 |
| 5  | Sarah Brown    | San Francisco | 32  | 1992-09-28 | +1 (555) 369258 |

In a column-oriented database, the data from the employee table  would be stored differently compared to a row-oriented database. Each column's data would be stored together in contiguous segments. Here's how it would look:

1: John Doe; 2: Alice Lee; 3: Emily Smith; 4: Michael Johnson; 5: Sarah Brown (Name Column)

1: New York; 2: Los Angeles; 3: Chicago; 4: Houston; 5: San Francisco (City Column)

1: 35; 2: 28; 3: 40; 4: 45; 5: 32 (Age Column)

1: 1989-05-21; 2: 1996-12-10; 3: 1982-08-03; 4: 1977-03-15; 5: 1992-09-28 (Birth Date Column)

1: +1 (555) 123456; 2: +1 (555) 987654; 3: +1 (555) 246810; 4: +1 (555) 135790; 5: +1 (555) 369258 (Phone Number Column)

Each column's data is stored contiguously, allowing for efficient retrieval and processing of columnar data.

 

Let’s calculate average age of all the employees in column oriented database

Since the age data is stored contiguously in its own segment, the database system can efficiently process only the age column to calculate the average age. This process involves iterating through the age values and computing the average directly from the stored data, without needing to access any other columns or unnecessary records. This approach significantly improves the performance of such analytical queries compared to row-oriented databases, where scanning all records may be necessary.

 

Benefits of column oriented databases

1. Quick Aggregation for Columns: Column-oriented databases excel in aggregating data across specific columns because all the information for each column is stored together. Tasks like finding the total, average, or count of values become much faster since the database only needs to look through a single, continuous set of data for that particular column. This is particularly useful for applications that need to analyze large amounts of data, like data warehouses or analytics tools.

 

2. Effective Data Compression: Storing similar data types together in each column chunk makes it easier to compress the data efficiently. Column-oriented databases can use advanced compression techniques tailored to the specific data types within each column, resulting in significant reductions in storage space needed.

 

3. Better Performance for Queries: Column-oriented databases often perform better when handling complex queries and large volumes of data. The way they organize and access data makes queries run faster, improving the overall speed and responsiveness of the database system.

 

Drawbacks of column oriented databases

1. Slower Retrieval of Entire Rows: Getting all the data for a complete record (row) can take longer in column-oriented databases compared to row-oriented ones. This is because the database has to gather data fragments from different parts of the storage to put together a full row.

 

2. Complicated Updates: Changing information in a single record involves updating multiple column chunks. This makes updates more complicated compared to row-oriented databases, where you only need to modify one row.

 

3. Less Effective for OLTP Work: Column-oriented databases might not work as well for tasks like Online Transaction Processing (OLTP), where lots of individual transactions need to be quickly recorded and managed. The way the database is set up for analyzing data can make these kinds of transactions slower compared to databases designed specifically for them.

 

In summary, column-oriented databases excel in data warehousing and analytics, offering superior performance for aggregations and filtering by specific columns. Conversely, for tasks necessitating frequent access to complete records, row-oriented databases may prove more suitable. Ultimately, the optimal choice lies on the nature of your workload and query patterns.



 

 

                                                  System Design Questions

No comments:

Post a Comment