As an open source software platform, Apache Hadoop is subject to the needs of the times. And for many large enterprises in various industries, such as retail, telecommunications, and oil and gas, the times are changing rapidly, and analysts are demanding real-time access to big data in order to make instantaneous, informed business decisions.
Because Apache Hadoop was born from the need for large web-scale organizations, like Yahoo, Google, and Facebook, to conduct batch analysis of large unstructured datasets, enterprises dealing with mostly structured transactional data were initially slow to see the value of this technology. Soon, however, businesses coping with growing volumes of data got in the game, and as mining larger and faster sets of unstructured data becomes increasingly important to an organization’s bottom line, this enterprise adoption of Apache Hadoop will undoubtedly continue.
Within this context, it is only natural for Intel and SAP — having collaborated to make SAP HANA running on Intel Xeon processor E7 family-based servers the platform of choice for in-memory analytics — to innovate jointly on an enterprise-class solution that combines the big data capabilities of Apache Hadoop with the real-time analytics framework of SAP HANA.
This vision continues to come closer to reality since the release of the Intel Distribution for Apache Hadoop earlier this year. Even as Hadoop continues to evolve in an open source model to meet the demands of data-rich organizations, the collaboration between Intel and SAP will also remain focused on optimizing the entire platform to run smarter, faster, and simpler.
For many SAP enterprise customers exploring SAP HANA in a proof-of-concept stage or testing a limited scale version, the question may loom: Why Hadoop? Won’t SAP HANA deliver these businesses the in-memory analytics they need? After all, it is a platform capable of processing both transactional and analytical data in real time, especially with optimizations delivered by Intel Xeon E7 servers, such as hardware compression tools that help drive the speed for which SAP HANA is known.
Hadoop, though, in addition to analyzing massive amounts of unstructured data, offers enterprises scalability, which is crucial for organizations intending to compile and gain insight from unstructured data to grow the business. This scalability eliminates the need to devote resources to archiving since there is no data storage limitation. Because its cluster-based architecture allows Hadoop to store even petabytes of data, an organization can add nodes to the cluster rather than continuing to invest in additional database servers and other infrastructure. Intel bundles that scalability in its Xeon E7 servers, thus optimizing the platform even as a standalone data source.
With SAP HANA integrated with the Intel Distribution for Apache Hadoop, optimization is dual-fold: Both SAP HANA and the Apache Hadoop platform are optimized for maximum efficiency on Intel Xeon E7 servers, and the integration of SAP HANA with Hadoop optimizations from Intel delivers interactive, real-time access to analytics on large volumes of unstructured data.
Nowhere has the co-innovation and collaboration between Intel and SAP been more fruitful in the integration of Hadoop with SAP HANA than with enabling the SAP HANA smart data access connectors for optimal performance. The connectors are really what allow SAP HANA to connect to the Hadoop clusters to offer real-time analytics. In simple terms, if you envision unstructured data sitting in the Hadoop clusters, SAP HANA acts as a middleman, making sure the right data is in the right place at the right time. Upon being queried, SAP HANA determines how best to extract the data, and if the required data isn’t stored in memory, SAP HANA mines Hadoop data and decides where and how the data will be processed.
This is the game-changer that enables online interactive queries in real time over unstructured data, transcending the traditional use of Apache Hadoop as an engine for overnight batch processing. This is where Intel sees the natural progression of Hadoop. Intel’s partnership with SAP is a key component of the innovations designed to benefit our joint enterprise customers.
Focusing on this progression, Intel and SAP realize that prioritizing features and enhancements to Apache Hadoop that enable more efficient management of unstructured data in an SAP HANA deployment will help make the joint solution the platform of choice for real-time analytics in the enterprise. So while SAP has, of course, made SAP HANA the focus of its growth strategy, its own distribution for Apache Hadoop gives Intel an even greater stake in SAP HANA’s continued success.
To understand which features and enhancements will draw SAP and Intel resources, it helps to work backward from what’s important to the end customer. Enterprises don’t have the luxury of making massive investments to maintain an open source platform from which they will only partially benefit. The Intel and SAP Hadoop collaboration lets enterprises focus on what they have generally come to expect from their software vendors, namely availability, scalability, reliability, and security.
Availability, like scalability, is something Hadoop delivers by virtue of its architecture because of its replicated data sets that can help offset node failures. Reliability and security, however, are not inherent to the Apache Hadoop architecture and therefore are areas of focus for Intel and SAP.
In the case of reliability, for example, Intel is actively working with partners and customers to enable disaster tolerance for some of the more critical single points of failure in a Hadoop cluster. In other words, rather than capitalize on the availability of the cluster by trusting the replicated data, we are looking at ways to detect and correct errors before they cause a failure.
Where security is concerned, Intel launched Project Rhino as an open source effort to enhance the existing data protection capabilities of Hadoop from the silicon up. Intel aims to bring hardware-enhanced encryption authentication, role-based access control, and auditing in the form of a common, consistent framework to all components of the Apache Hadoop platform, as well as integration with external tools such as the Intel API security gateway. By hardening Apache Hadoop from the inside out, Intel aims to provide enterprises the assurance that they can entrust their precious data to the data protection mechanisms in the Intel Distribution for Apache Hadoop.
Of course, these ongoing enhancements benefit existing SAP customers who choose the Intel Distribution because they don’t have to forego any of the benefits they typically associate with SAP software (with security and reliability being two of the most important benefits). And in the open source environment, enhancements can be made to keep pace with shifting customer expectations.
Another expectation from SAP customers is that their applications and software are delivered with enterprise-quality maintenance and support. The close working relationship between Intel and SAP helps ensure that customer support issues surrounding the integration of Intel’s Hadoop Distribution with SAP HANA will entail a quick path to resolution thanks to specific support models put in place. In a mission-critical environment where time is of the essence, this dual support model can be highly beneficial.
A Proven Partnership
The release of and continual enhancements to SAP HANA integrated with the Intel Distribution for Apache Hadoop are a natural extension of the long-standing partnership between SAP and Intel. As the analysis of large volumes of both structured and unstructured data becomes more central to all business operations, and as SAP HANA continues to gain traction in the market, customers have come to expect an optimized solution offering seamless integration with Hadoop. Because of Intel and SAP’s track record of co-innovation, those customers can count on the joint solution, which has been tested, certified, and validated by SAP and Intel, to deliver value.
For more about SAP HANA, visit saphana.com/welcome. Go to hadoop.intel.com to learn more about the Intel Distribution for Apache Hadoop. For more on SAP HANA integrated with the Intel Distribution for Apache Hadoop, visit sap.com/corporate-en/news.epx?PressID=20498. And for information about the Intel Xeon processor E7 family-based servers, go to intel.com/content/www/us/en/processors/xeon/xeon-processor-e7-family.html.
Vin Sharma is responsible for strategic planning for Hadoop at Intel and marketing for the Intel Distribution for Apache Hadoop. In his previous role, Vin helped drive enterprise adoption of Linux, KVM, and OpenStack on Intel Architecture, representing Intel on the boards of the Open Virtualization Alliance and the OpenStack Foundation. Before joining Intel in 2011, Vin held various engineering and marketing roles at HP for 15 years, building enterprise software products based on Linux, Java, XML, and other open source software.
Dietrich O. Banschbach is Director of SAP Engineering in Intel’s Software and Services Group. He leads Intel’s software enabling efforts on site at SAP headquarters in Walldorf, Germany, ensuring that SAP enterprise applications run best on Intel platforms. Previously, he headed all EMEA-scale marketing programs for the Intel Software Network, the Intel Software Partner Program, the Intel AppUp (SM) developer program, the Intel Academic Community Program, and Market Segment Management. Prior to joining Intel in 2007, Dietrich served as Director of R&D EMEA and global SAP Partner Manager at SAS Institute for 15 years.