The rise in digital devices and social media has led to an exponential increase in data production, and with the advent of a phenomenon known as the Internet of Things (IoT), even common household items like refrigerators are getting in on the act. The result is a digital universe filled with enormous amounts of data — and this trend has no end in sight. According to an IDC report, from 2005 to 2020, the digital universe is likely to have grown by a factor of 300, estimating that the amount of data is doubling roughly every two years.1
Data collected from the digital universe is rich with information about market trends, customer sentiment, company performance, and the effectiveness of a company’s assets. This information can produce key insights — in terms of predictive maintenance, brand performance, future product releases, and more accurate plan numbers — helping differentiate an organization from its competition and create new revenue opportunities.
However, the complexity of this data is also increasing dramatically. Data now contains a variety of information, formats, and structures such as text, geographical location, video, machine sensors, and web traffic. This volume and complexity of data has put an extraordinary strain on traditional data management systems, causing organizations either to not collect the data, or to destroy it after short periods of time.
With legacy systems unable to cost-effectively store and process the huge amounts of data available, how can companies gain meaningful insight?
The Need for a Big Data Strategy
To get the most from the digital universe, businesses need to accommodate a variety of new types of data, including structured, unstructured, and IoT information, which present challenges for systems not designed to handle such massive volumes and data sources. With significant investments in legacy systems, organizations need to maximize this investment while taking advantage of new big data platforms. There is no need to replace existing technologies in a data center — rather, organizations can evolve to a modern data architecture as part of their big data strategy. Big data platforms can be integrated with a company’s existing systems to deal with the huge influx of new types of data in real time, making the information collected immediately impactful to an enterprise.
There is no need to replace existing technologies in a data center — rather, organizations can evolve to a modern data architecture as part of their big data strategy.
Apache Hadoop delivered by the Hortonworks Data Platform (HDP) is an enterprise data management platform that is an open-source framework for distributed storage and processing of large sets of data on commodity hardware. HDP enables businesses to gain insight from vast amounts of structured and unstructured data cost effectively through its broad integration with data center tools and applications, allowing organizations to evolve their infrastructure to a modern data architecture.
Instead of not collecting valuable data or storing data in specific silos where access is limited to certain departments or individuals, the Hortonworks Data Platform helps companies store all of its data in a data lake — a place to store unlimited amounts of data in any format that is relatively inexpensive, massively scalable, and accessible by familiar tools and applications. A data lake provides new efficiencies for data architecture through a significantly lower cost of storage, and through optimization of data processing workloads such as data transformation and integration. In addition, the data lake enables new opportunities for business through flexible “schema-on-read” access to all enterprise data, and through multi-use and multi-workload data processing on the same sets of batch and real-time data. But the proper steps must be taken for your organization to benefit from a data lake. Here are four simple, actionable steps that take your existing systems to an architecture with a data lake:
Step #1. Prepare
- Increase awareness of big data solutions within IT
- Determine business requirements
- Evaluate business and IT processes
- Prioritize a use case that aligns with a single application driven by a line of business
Step #2. Implement
- Implement Apache Hadoop on the Hortonworks Data Platform in a focused application deployment and then a point production rollout
- Repeat the process for additional point production rollouts
Step #3. Expand
- Expand the scope and scale of deployment
- Adopt an enterprise-wide data lake
- Implement shared data services
Step #4. Benefit
- Move to a multi-application deployment
- Ensure leadership is rewarded
Best of Both Worlds: HDP and SAP HANA
Storage, processing, and data exploration in HDP is only half of the equation. For modern businesses, being able to take immediate advantage of data can lead to significant competitive advantage. By adding SAP HANA to the mix, organizations can analyze the massive amounts of data stored in the data lake to gain meaningful insight in real time. Together, Hadoop delivered by the Hortonworks Data Platform and SAP HANA provide a powerful platform that ensures actionable results with an infinite scale (see Figure 1).
Other benefits of integrating SAP HANA with HDP include:
- Cost-effective storage of large amounts of historical information
- “Noisy” data, like information from machine
- sensors, can be analyzed quickly and efficiently
- Instant insight based on a significant wealth of data
Hortonworks was founded in 2011 by engineers and architects from the original Yahoo! Hadoop development and operations team. Our engineers are active participants and leaders in Hadoop open-source development, including designing, building, and testing the core Hadoop platform, making Hadoop enterprise grade. Hortonworks architects, implements, and supports enterprise Hadoop for big data solutions that can protect and leverage current investments in SAP HANA.
For more about the Hortonworks Data Platform and to view use cases, visit http://hortonworks.com/partners/sap or contact us via email at firstname.lastname@example.org.
1 IDC, “The Digital Universe in 2020” (December 2012; http://idcdocserv.com/1414). [back]