GRC
HR
SCM
CRM
BI


Article

 

You Can Achieve High Availability in Your Current SAP Landscape

by Dr. Franz-Josef Fritz | SAPinsider

January 1, 2006

by Dr. Franz-Josef Fritz, SAP AG SAPinsider - 2006 (Volume 7), January (Issue 1)
 



Dr. Franz-Josef Fritz,
SAP AG

As more and more business processes rely on IT — everything from order processing to financial asset management, even the ability of consumers to choose products and make purchases — the availability of IT services is increasingly mission-critical in every industry and business area:

  • Customers articulate an increasing demand and urgency for minimized unplanned downtime of systems and services. Depending on the type of business and service, an outage of one hour can cost millions of dollars.

  • Because of the 24x7 nature of IT-supported activities like customer purchases, IT faces very high pressure to reduce planned downtime for release upgrades and other types of maintenance for systems, services, applications, and underlying technical components like hardware, operating systems, and database systems. Many companies can afford only a few hours per quarter or per year for such planned outages.

  • It is no longer sufficient to address just the availability of an ERP system or CRM system. What really counts is the availability of mission-critical services from an end-user point of view, regardless of which system or combination of systems is needed to provide this availability.

The consequences of IT not meeting the demands of business are increasingly costly and severe. Therefore, a very important goal for SAP is not only to provide high availability of systems like SAP NetWeaver and mySAP ERP, but also to facilitate the high and near-continuous availability of cross-system business processes like order management, production management, asset management, and more.

This article will explain what SAP is already doing on the high-availability front and will lay out SAP's roadmap toward providing the continuous availability of mission-critical business services. You'll get practical information and advice about how today's systems can achieve improved high availability at a reasonable cost, and details at the end of the article guide you to additional high-availability resources.

Current Availability Challenges

To achieve the (almost) continuous availability of services and the entire underlying system landscape, your IT team needs to take additional cost and complexity into account. Since decisions must be based on business goals, you have to look at the drivers and trade-offs for these factors.

Cost

The main ingredient for successfully increasing availability in any kind of technical system is redundancy. In other words, if one component or subsystem fails, at least one other component or subsystem must be available to take over. What's more, if you have three identical components, each one capable of providing availability of 99%, the coordinated cooperation of these three components together can provide an overall availability of 99.9999% — assuming, of course, that each component can seamlessly take over for any other component in case of failure.

On the other hand — and this must be stated very clearly — with redundancy comes costs, including the cost of any additional components needed and the cost of ensuring takeover in case of failure ("failover"). And the higher the degree of availability you want to provide, the more challenging and expensive that availability is to achieve.

Sometimes, you may read about availability definitions of six "9"s, meaning 99.9999% uptime — or only 30 seconds of downtime per year. Realistically, most organizations can plan for 99.5% planned uptime, which means less than 10 hours of downtime per quarter. While it's possible for the amount of downtime to be much smaller than that, you have to keep in mind that very high levels of availability can only be achieved by appropriate redundancy — and this has its price.

Businesses need to determine their realistic business needs for availability and balance this availability level with the cost involved — just as you would when determining your premiums for an insurance policy. There may well be cases where 99% availability (outage of less than two hours per week) is good enough.

Complexity

Beyond cost, redundancy also has the price of complexity — not only complexity within the redundant systems, but complexity of management and operations. If one component goes down, other dependent components may also be severely affected.

For example, if we look at a chain of five dependent components, each with 99% availability, the whole chain has an overall availability of only 95%. By adding different dependent components, you're not gaining availability — instead, you're moving in the wrong direction. Therefore, minimizing dependencies is a key IT task. You need strategies to "survive" the failure of some subsystems, components, and auxiliary services.

One important strategy is to minimize the number of single points of failure (SPOFs) within the dependent components. While the principle of redundancy demands that there should be as few SPOFs as possible, situations arise where they cannot be avoided. The guideline, then, is to keep them as isolated as possible, both to minimize the impact of failure and to streamline the time and effort needed for replacement (see Figure 1). For example, the transactional database of mySAP ERP is a SPOF, but techniques like shadow databases and log shipping provide an equivalent to redundancy and minimize switchover time in case of failure.

Figure 1
Strategies for Minimizing Single Points of Failure

Addressing the Issues

To deal with the cost and complexity that can arise with high availability, two key elements must be in place:

  • Cooperation between various parties, both internal teams and external providers

  • Adequate change management and governance policies to limit planned downtime

Let's look at these in more detail.

Cooperation

To achieve a highly available IT landscape, four parties must contribute:

  1. Your company's IT organization (or service provider), which is responsible for an appropriate set of proven operating procedures and change management rules, up to and including disaster recovery measures in worst-case scenarios

  2. SAP, as the provider of the SAP NetWeaver technology platform and the business applications and processes running on SAP NetWeaver

  3. The technical infrastructure providers, who are responsible for hardware, operating system, database system, storage systems, and network components

  4. Your facility management team, who is charged with providing adequate power, cooling, and other physical conditions

Only when these four groups coordinate their plans and actions — coordinating maintenance activities with downtime, defining and testing recovery strategies in the case of breakdowns, etc. — can high-availability landscapes be realized (see Figure 2).

Role Area of Responsibility
IT organization • Disaster recovery plans
• Operating procedures
SAP • Business processes
• Services
• SAP NetWeaver systems
Infrastructure
providers
• Database
• Storage
• Operating system
• Network
• Physical server
Facility management • Power, cooling, etc.
Figure 2
Distribution of Responsibilities in Achieving High Availability

Change Management and Governance

Changes in any system layer have a big potential for disruption and unwanted side effects, and therefore are the biggest challenge for availability. And while change is inevitable, it needs to happen in a controlled fashion — that's why a stringent change management strategy and governance are key success factors in any high-availability project. When deciding on the necessity of changes, organizations need to base their decisions on the risk/benefit trade-offs.

Also, the choreography of changes — in other words, which changes can and should go together and which changes need to be separated — need a clear decision and execution governance. For example, during major changes like release upgrades, additional levels of redundancy (e.g., extra machines and shadow systems) may be required to minimize planned downtime.

Which Steps Has SAP Already Taken?

Some of the high-availability issues we've described above are not new and have been considered by SAP since the design of the R/3 architecture. One SAP system can consist of multiple redundant application servers; also, the database has been clearly separated from the application servers and can protect itself.

With the advent of distributed systems — like supply chain management (SCM), customer relationship management (CRM), and supplier relationship management (SRM) — and the expanded use of Java technology, complexity has increased. Customers kept telling us that some new challenges had to be addressed. To that end, SAP has developed the following techniques and features over the last couple of years:

  • System switch upgrade — Switch-based upgrade procedures reduce planned downtime by temporarily adding another server and set of database tables. The benefit of this "temporary redundancy" approach is that you can do much of the upgrade work in parallel to the normal operation of the system. Statistics, however, show that not all customers leverage the benefits of downtime-minimized upgrade approaches yet.

  • Replicated enqueue service — Like the database, the enqueue service for managing logical locks can exist only once within an SAP NetWeaver system, and it needs special protection. This protection has been achieved by replicating all locks into the main memory of a standby process.

  • Redundant application servers and load distribution for the Java stack — The proven approach from the ABAP stack has been transferred to the Java stack in the SAP NetWeaver 2004 release.

  • Virtual IP addresses — To be able to relocate and replace application servers and central services, we support the use of virtual IP addresses; this also supports the flexible capacity-management capabilities provided by an adaptive computing infrastructure.1

  • Failover support — SAP's standard installation and upgrade procedures support the setup of a failover-capable system.

  • A symmetric and simplified setup of the ABAP and Java stack for the central enqueue and messaging services (possible with SAP NetWeaver 2004s) — This setup also simplifies the protection of these services with hardware clusters.

Taking these developments and features into account, Figure 3 shows the recommended setup for high availability in an SAP NetWeaver 2004s system. This setup is the result of some significant streamlining and reflects best practices to achieve optimal availability at a reasonable cost.

Figure 3
Recommended High-Availability Setup in SAP NetWeaver 2004s

What's more, with the significantly increased focus on high availability during the last few years, SAP is poised to do even more work on the high-availability front.

What Is SAP's High-Availability Outlook?

We've made major steps to get to where we are today in terms of providing high system availability to meet the ever-evolving needs of business. But additional work needs to be done to get closer to the ultimate goal of providing continuous availability. To that end, SAP is involved in these current projects and workstreams:

  • SAP has started to work with customers to explore approaches for the continuous availability of SAP NetWeaver Portal and SAP NetWeaver Exchange Infrastructure. One key idea is to run multiple systems for the same purpose and to establish timely synchronization between these systems. This will also be the prerequisite for a "rolling upgrade" of systems, providing continuous availability even during times where part of the system is down for maintenance.

  • SAP is working on concepts to clearly separate the technical upgrade and business-process upgrade. The goal here is twofold: to keep business processes unaffected by technical infrastructure upgrades and to provide evolutionary business-process changes without having to do technical changes at the same time.

  • SAP is working on projects to minimize downtime for the support-package processes, since applying support packages occurs much more frequently than release upgrades and therefore is a bigger pain point.

  • SAP is finalizing the infrastructure and processes to provide a "rolling kernel update" that will allow for one-by-one kernel patches in an R/3-based or SAP NetWeaver-based system, taking down one application server at a time. A critical piece of this process is compatibility management between kernel patch versions, which is still in the works.

  • SAP is making further investments in the robustness and stability of the Java stack, which will lead to higher availability of the application server instances, better failover capabilities, and faster restart times.

Conclusion

Continuous availability of business operations has become increasingly important for many of our customers. At SAP, we have already strengthened focus and investments in this area over the last few years, and we will continue to increase attention and efforts in the high-availability arena.

We realize that customer review and input is a very important part of our high-availability strategy. Feedback about this article is extremely welcome! Please send your comments and questions to franz-josef.fritz@sap.com.

Important High-Availability References

SAP Service Marketplace

http://service.sap.com/ha
http://service.sap.com/upgrade
http://service.sap.com/upgradeservices

SAP Notes*

803018: Central note for NetWeaver04 High Availability capabilities
711093: Release Restriction Note for Web AS 6.40
709354: Release Restrictions for SAP EP 6.0 on Web AS 6.40
792910: MSCS Installation for SAP Web AS 6.40 SR1 on Windows
774116: SAP XI 3.0 Installation in HA environments
538081: High-availability SAPLICENSE
181543: License key for high-availability environment
676073: MSCS Installation for SAP Web AS 6.40 on Windows
728879: MaxDB: MSCS installation based on Web AS 6.40
106275: Availability of R/3 on Microsoft Cluster Server
527843: Oracle RAC support in the SAP environment
826706: zSeries: HA System Setup
569996: High availability and automation solution for DB2 on z/OS
457512: DB2-z/OS: DB2 V7 Features
606682: High availability and automation solution for Linux zSeries
757692: Changing the hostname for J2EE Engine 6.40 installation
524816: Standalone enqueue server

* SAP notes regarding high availability are subject to revision on an ongoing basis.


1- An adaptive computing infrastructure provides the interfaces and administration components to enable flexible and on-the-fly deployment of services to hardware resources and to instantly adapt to changing workload requirements.


Franz J. Fritz has a Ph.D. in mathematics and 30 years of experience in all areas of IT. Workflow and business process management have been particular areas of interest for much of his life. He has worked at SAP since 1993 as Program Director and Vice President with responsibility for the Business Process Technology and Internet-Business Framework departments. Since 2003, he is responsible for several areas withsin SAP NetWeaver Product Management.

An email has been sent to:






More from SAPinsider



COMMENTS

Please log in to post a comment.

No comments have been submitted on this article. Be the first to comment!


SAPinsider
FAQ