What is Big Data?
Big data represents huge amount of data that is usually not possible to store in a single system.
For example, facebook.com generates 4 petabytes of data every day. It is impossible to store and process this huge data with the help of a singe compute.
Big data can be characterized with 4 V’s
a. Volume
b. Velocity
c. Variety
d. Veracity
Volume
Volume of the data is huge. For example, Data size is in terabytes, petabytes or more that a traditional systems can’t handle.
Velocity
Velocity represent the speed at which new data is getting generated. For example, facebook.com is generating 4 petabytes of data every day.
Variety
Data can be in different formats given below.
a. Structured: RDBMS tables data.
b. Semi structured: csv, json, xml data
c. Unstructured: images, video, application log files etc.,
Veracity
Veracity represents the quality of the data. Most of the times, the data is messy, we need to pre-process the data, before getting some insights out of the it. For example, some of the columns in the data do not have a value.
Why traditional systems can’t handle big data?
Big data
is huge in size, and the data is getting accumulated in the sizes of terabytes
every day. We can’t store this huge data in a single system. Traditional
systems which are not distributed in nature by default, needs vertical scaling
(adding more CPUs, memory to the system) to handle this data. Since the data
is getting generated day by day is huge, at some point of time, this vertical scaling will
reach its upper limit and can’t solve the use case.
Big data is in different formats like structured, semi structured, and unstructured, RDBMS systems which require data to follow certain schema can’t store all the varieties of the data.
No comments:
Post a Comment