- What is Big Data?
Understanding
of Big Data
- The term Big Data applies to information that can’t be processed or analyzed using traditional processes or tools. Increasingly, organizations today are facing more and more Big Data challenges. They have access to a wealth of information, but they don’t know how to get value out of it because it is sitting in its most raw form or in a semi structured or unstructured format; and as a result, they don’t even know whether it’s worth keeping (or even able to keep it for that matter).
- Big data is a collection of digital information whose size is beyond
the ability of most software tools and people to capture,
manage, and
process the data.
- Big Data solutions are ideal for analyzing not only raw structured data butsemi-structured data and also unstructured data from a wide variety of source.
- Big Data solutions are ideal when all or most of the data needs to be analyzed versus a sample of the data or a sampling of data is not nearly as effective as a larger set of data from which to derive analysis.
- Big data solutions are ideal for iterative and exploratory analysis when business measures on data are not predetermined.
- Big data technologies describe a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high velocity capture, discovery, and/or analysis.
- “Big data” is a big buzz phrase in the IT and business world right now – and there are a dizzying array of opinions on just what these two simple words really mean. Technology vendors in the legacy database or data warehouse spaces say “big data” simply refers to a traditional data warehousing scenario involving data volumes in either the single or multi-terabyte range. Others disagree: They say “big data” isn’t limited to traditional data warehouse situations, but includes real-time or operational data stores used as the primary data foundation for online applications that power key external or internal business systems. It used to be that these transactional/real-time databases were typically “pruned” so they could be manageable from a data volume standpoint. Their most recent or “hot” data stayed in the database, and older information was archived to a data warehouse via extract-transform-load (ETL) routines.
- But big data has changed dramatically. The evolution of the web has redefined:
• The
speed at which information flows into these primary online
systems.
• The
number of customers a company must deal with.
• The
acceptable interval between the time that data first enters a system,
and its transformation into information that can be analyzed to
make key business decisions.
- Big Data is a term used to describe large collections of data (also known as data sets) that may be unstructured, and grow so large and quickly that it is difficult to manage with regular database or statistics tools.
- Characteristics of Big Data
Four
characteristics define Big Data: Volume,
Variety,
Value
and
Velocity
1.
Volume – TB’s
to PB’s of data
2.
Velocity – how fast the data is coming in
3.
Variety – all types are now being captured.
(structured, semi-structured, unstructured)
4.
Value – mining the valuable pieces of data from among data
that does not matter.
- The Volume of DataThe sheer volume of data being stored today is exploding. In the year 2000, 800,000 petabytes (PB) of data were stored in the world. Of course, a lot of the
data
that’s being created today isn’t analyzed at all and that’s
another problem
we’re
trying to address with BigInsights. We expect this number to reach 35
zettabytes (ZB) by 2020. Twitter alone generates more than 7
terabytes (TB) of data every day, Facebook 10 TB, and some
enterprises generate terabytes of data every hour of
every day of the year.
The
volume of data available to organizations today is on the rise, while
the percent of data they can analyze is on the decline.
- The Variety of Data
The
volume associated with the Big Data phenomena brings along new
challenges
for
data centers trying to deal with it: its variety. With the
explosion of
sensors,
and smart devices, as well as social collaboration technologies, data
in
an
enterprise has become complex, because it includes not only
traditional relational data, but also raw, semi structured, and
unstructured data from web
pages,
web log files (including click-stream data), search indexes, social
media
forums,
e-mail, documents, sensor data from active and passive systems, and
so
on. What’s more, traditional systems can struggle to store and
perform the
required
analytics to gain understanding from the contents of these logs
because
much
of the information being generated doesn’t lend itself to
traditional
database
technologies. In our experience, although some companies are
moving
down the path, by and large, most are just beginning to understand
the
opportunities of Big Data (and what’s at stake if it’s not
considered).
- The Velocity of Data
Just
as the sheer volume and variety of data we collect and store has
changed,
so,
too, has the velocity at which it is generated and needs to be
handled. A conventional understanding of velocity typically
considers how quickly the data is
arriving
and stored, and its associated rates of retrieval. While managing all
of
that
quickly is good—and the volumes of data that we are looking at are
a consequence of how quick the data arrives—we believe the idea of
velocity is actually something far more compelling than these
conventional definitions.
To
accommodate velocity, a new way of thinking about a problem must
start
at the inception point of the data. Rather than confining the idea of
velocity
to
the growth rates associated with your data repositories, we suggest
you
apply this definition to data in motion: The speed at which the data
is
flowing.
After all, we’re in agreement that today’s enterprises are
dealing
with
petabytes of data instead of terabytes, and the increase in RFID
sensors
and
other information streams has led to a constant flow of data at a
pace
that
has made it impossible for traditional systems to handle.
- The Value of Data
The
economic value of different data varies significantly. Typically
there is good information hidden amongst a larger body of
non-traditional data; the challenge is identifying what is valuable
and then transforming and extracting that data for analysis.
Let's
start with the most widely discussed use case, sentiment analysis.
Whether looking for broad economic indicators, specific market
indicators, or sentiments concerning a specific company or its
stocks, there is obviously a trove of data to be harvested here,
available from traditional as well as new media (including social
media) sources. While news keyword analysis and entity extraction
have been in play for a while, and are readily offered by many
vendors, the availability of social media intelligence is relatively
new and has certainly captured the attention of those looking to
gauge public opinion. (In a previous post, I discussed the
applicability of Semantic technology and Entity Extraction for this
purpose, but as promised, I'm sticking to the usage topic this time).Sentiment analysis is considered straightforward, as the data resides outside the institution and is therefore not confined by organizational boundaries. In fact, sentiment analysis is becoming so popular that some hedge funds are basing their entire strategies on trading signals generated by Twitter analytics. While this is an extreme example, most financial institutions at this point are using some sort of sentiment analysis to gauge public opinion about their company, market, or the economy as a whole.
- Predictive Analytics
- Risk Management
- Rogue Trading
- Fraud
- Retail Banking
Banks, however, have additional concerns, as their products all revolve around risk, and the ability to accurately assess the risk profile of an individual or a loan is paramount to offering (or denying) services to a customer. Though the need to protect consumer privacy will always prevail, banks now have more access to web data about their customers – undoubtedly putting more informational options at their fingertips – to provide them with the valuable information needed to target service offerings with a greater level of sophistication and certainty. Additionally, web data can help to signal customer life events such as a marriage, childbirth, or a home purchase, which can help banks introduce opportunities for more targeted services. And again, with location information (available from almost every cell phone) banks can achieve extremely granular customer targeting.
No comments:
Post a Comment