The simplest definition of big data is large and complex unstructured data (images posted on Facebook, email, text messages, GPS signals from mobile phones, tweets, and other social media updates, etc.) that cannot be processed by traditional database tools. To give you an example of the volume of such data, Walmart collects over 2.5 petabytes (1015 bytes) of data every hour from customers’ transactions.
Before talking about big data analytics, a few terms will be explained and defined to understand this concept. Starting from the basics, statistics is using numbers to quantify the data. Data mining is using statistics and programming languages to find patterns hidden in the data. Machine learning uses data mining to build models to predict future outcomes. Artificial intelligence uses models built by machine learning to make machines act in an intelligent way like playing a game or driving a car (e.g., IBM’s Watson supercomputer and the driverless car by Google). Big data analytics is the process of studying big data to uncover hidden patterns and correlations to make better decisions using technologies like NoSQL databases, Hadoop, and MapReduce. The main goal of big data analytics is to help organizations make better business decisions.
The next question: what is the difference between business intelligence (BI) and big data analytics? BI is a reactive ad hoc analysis approach looking at the past, while big data analytics is a proactive approach to extract the relevant info, and analyze it to make businesses focus on the future.
As far back as 2001, industry analyst Doug Laney (currently with Gartner) articulated the now-mainstream definition of big data as the Three Vs: volume, velocity, and variety.
- Volume. Unstructured data streaming in from social media. Increasing amounts of sensor and machine-to-machine data being collected.
- Velocity. Data is streaming in at unprecedented speed and must be dealt with in a timely manner.
- Variety. Data today comes in all types of formats—structured, numeric data in traditional databases. Information created from line-of-business applications.
Big data analytics appeal to businesses by offering savings on three essential levels of any business, namely: time, money, people—reduction in time of processing data translated to saving money and the use of fewer resources to present the data for better decisions. For example, $37,000 for a traditional relational database, $5,000 for a database appliance, and only $2,000 for a Hadoop cluster (Paul Barth at NewVantage Partners supplied these cost figures).
Analytics 3.0 is the new wave of big data analytics, compared to Analytics 1.0, which is BI , and Analytics 2.0, which is used by online companies only (Google, Yahoo, Facebook, etc.). Analytics 3.0 is a new resolve to apply powerful data-gathering and analysis methods not just to a company’s operations but also to its offerings—to embed data smartness into the products and services customers buy.
Some of the attributes defining Analytics 3.0:
- The most important trait is that not only online firms, but virtually any type of firm in any industry, can participate in the data-driven economy.
- Multiple data types: Organizations are combining large and small volumes of data, internal and external sources, and structured and unstructured formats to yield new insights in predictive and prescriptive models.
- A new set of integration options: database appliances, Hadoop clusters, SQL-to-Hadoop environments.
- Technologies and methods are much faster: Big data technologies include a variety of hardware/software architectures, including clustered parallel servers using Hadoop/MapReduce, in-memory analytics, in-database processing, and so forth. All of these technologies are considerably faster than previous generations.
- Integrated and embedded: built into consumer-oriented products and features.
- Data science/analytics/IT teams will work together.
- Chief analytics officers are new leadership positions.
- Prescriptive analytics: There have always been three types of analytics: descriptive, that report on the past; predictive, that use models based on past data to predict the future; and prescriptive, that use models to specify optimal behaviors and actions. Analytics 3.0 includes all types, but there is an increased emphasis on prescriptive analytics.
Last month Google announced acquisition of Nest (smart home devices), a source of massive data from homes all over the United States, confirming the direction of Analytics 3.0 by an online company at the leading edge of Analytics 2.0.
The views expressed in this article are solely those of the author(s) and do not represent the views of Purdue Global.