SAP NetWeaver does not make disaster recovery planning any easier. "It makes it necessary." So says Hendrik Knopper of EMC Corp., a manager of the company's global alliance team with SAP.
Moving to SAP NetWeaver requires you to revisit your business continuity plans. Not only do you need to examine how SAP NetWeaver technology will improve your business processes, but you need to consider how you will continue to do business should this powerful enabling technology be interrupted.
A "Properly Architected" Portal
Let's start this discussion at the portal level. SAP Enterprise Portal (SAP EP) factors prominently into SAP NetWeaver shops. The portal is leveraged for everything from access to enterprise applications, news feeds, internal corporate information, and online training to a communications medium for external customers.
If SAP EP serves as the primary user interface to your organization's enterprise applications, services, and intranet, what happens if it goes down? If it is your face to your customers, every minute a portal is unavailable puts revenue at risk. "If you lose that user who wanted to come in through the Web, you've lost money," says IBM's Sanjoy Das, a senior technical staff member, enterprise solutions development and marketing. "The portal is a vital piece."
"With a properly architected solution, there would be no risk of that," says Kyle Warfield, a software consultant at the Unisys SAP Competency Center in Reston, Virginia. Warfield recently consulted on a large SAP implementation that included SAP EP for a major U.S. government agency. "As long as you have redundant application servers, and the mechanism to route users to those servers," Warfield says. The caveat is, of course, a "properly architected" system.
The rule of thumb, at least when it comes to proper architecture, is to not put all your technology eggs in one basket. According to Warfield and other experts, build your system with enough redundancy so that it won't fail. When one server goes down, users may get thrown off, but when they reconnect, the message server sends them to another application server that is operational and has the capacity to carry the additional load.
SAP is designed for redundancy so that if a server fails, another will take its place. "The problem with redundancy is that you can protect your hardware, network, and system from errors, but this doesn't protect the business from disaster," says EMC's Knopper. An SAP customer may possess an enterprise services architecture (ESA) that contains plenty of redundancy, but what if all its servers are housed in the same location? All it would take is flood, fire, earthquake, terrorist attack - you name it - and suddenly the company's workforce would be paralyzed. "You really need to have an automated disaster restart process that gives you a good, known point in time to fall back to for the business, not just the infrastructure," Knopper says.
Federated Database Dilemma
- Hardware failure
- File system corruption
- Logical inconsistencies in the data
- Software damage from viruses
- Issues when installing updates, patches, or new software
SAP NetWeaver enables organizations to create landscapes that contain a wide variety of applications that interact with diverse databases. In an SAP NetWeaver world, business processes become the driving force within the landscape, and a single business process can affect a multitude of systems. That poses a considerable challenge in a disaster recovery scenario.
"Take a sales order, for example," says Lisa Roderick, senior manager of SAP partner engineering at EMC. "A sales order may have 20 or more database transactions resulting in hundreds of I/Os. Every DBA will tell you, 'I can promise you will get back your data, and all you will lose will be your in-flight database transactions.' With SAP NetWeaver's ability to integrate xApps and other, non-SAP applications across heterogeneous databases and platforms, you've now got six or seven databases that are mutually dependent on each other. Now the DBA will say, 'I can bring up those databases and each database will be transactionally consistent - but don't talk to me about the business transactions being consistent.'" Should a catastrophe occur, "that sales order that I kicked off may have talked to three other databases, and now it may or may not be complete within the system." Multiply that by hundreds or thousands of transactions in various states of completion, and any attempt to restore a landscape of databases and make sure they are synchronized would be maddening and practically impossible, she says.
The Advanced Technology Group (ATG) is part of SAP's Active Global Support Organization and is the central point of contact for storage technologies and storage-based solutions within SAP. The group is on the front lines when it comes to storage technologies and how to make them work in the new SAP NetWeaver world: backup, recovery, high availability, and disaster recovery. Jochen Wolter of SAP's ATG has spent the past two years working on disaster recovery solutions within SAP NetWeaver.
According to Wolter, many different solutions are available for disaster recovery. An organization can generate a copy of a database to a second database in real time (known as synchronous mirroring) or with a delay to improve performance (asynchronous mirroring). Mirrored databases can reside inside a single data center or at a second data center.
"If you have synchronous mirroring to a second data center, that's quite an expensive solution," says Wolter. "I think if you've already invested that much to have a good disaster recovery solution, then you should add 'consistency' technology on top of that, as it won't make too much of a difference from a cost perspective." Consistency technology, according to Wolter, enables an organization to safely and effectively back up a federated landscape.
"With consistency technology, you can make a copy of the entire production environment for a restartable purpose at a point in time," says Roderick of EMC, which offers a consistency technology product, Symmetrix Automated Replication, that has been tested with SAP. "Making this copy has no impact on production users since EMC consistency technology allows splitting mirrors without altering or freezing the databases. Since you cannot roll forward the databases, you will lose transactions after that split. However, you are guaranteed that you will come up on that copy with a business consistent image."
"Within that there is something called consistency groups, which ensure anything on the primary is copied on the secondary, but if there is a disruption or corruption, it will sense it, suspend the copy process, and cut off the relationships between the primary and secondary databases," says IBM's Das. IBM's Consistency Technology Group's technology has also been tested with SAP.
"Consistency technology can ensure that as soon as one database cannot mirror the data to the disaster recovery site, the second database is also prevented from mirroring the data there," says SAP's Wolter. "But that would mean we lose something that could have been mirrored, so by preserving data consistency, we might get a bit more data loss in our disaster recovery image."
"Consistency technology will ensure the proper ordering of the transactions and ensure that databases will be synchronized when you restore," adds IBM's Das. "It has a series of actions it takes to ensure that no corruption is transmitted to the target site, because once you replicate corruption, it's a terrible situation. It also has ways of managing what they call rolling disasters. Disasters don't always happen at once. Some things go wrong, more things go wrong, more things go wrong, and then it collapses."
SAP customers are well aware of the potential problems of running a federated database system within an SAP infrastructure, according to EMC's Knopper. "What we can do with consistency technology is overcome this pain point for customers."
Consistency technology enables an SAP customer to group several diverse databases into a "consistency group" and treat it as a single IT object. "You can create a split image of the entire federated landscape," says Knopper. "In case of trouble, you can restart from this split image, and all databases will be in sync and at the same status."
Above is an example of a simple transaction between two systems. A sending system (System 1) inserts a single record into a table and hands over the same record to the Request for Call (RFC) engine. The RFC is stored locally and will be executed after the commit work is issued on the sending system.
Execution of the function on the receiving system will then asynchronously insert the record into a table on System 2. What follows are examples of the type of inconsistencies that can occur with a federated database landscape.
This transaction will be missing in the receiving system if the copy of the sending system is more recent than the copy of the receiving system or if the time-to-restore of the sending system is later than the receiving system. As a result, this transaction will appear in the sending system but not the receiving system, making the two databases inconsistent.
This transaction will be missing in the sending system if the copy of the receiving system is more recent than the copy of the sending system or if the time-to-restore of the receiving system is later than the sending system. As a result, this transaction will appear in the receiving system but not the sending system, again making the two databases inconsistent.
A transaction can be processed twice in the receiving system in those circumstances where the receiving system is restored to a time later than the sending system. If the sending system is in mid-transaction at the restore, it may attempt to send the transaction again. The receiving system, if it is restored later, may have already deleted the information saying it had already processed this transaction and will then perform the transaction again.
Challenges Bigger, but Not Insurmountable
The challenges to disaster recovery posed by SAP NetWeaver are not new, but they need to be carefully considered and not treated as an afterthought. By properly architecting solutions such as SAP Enterprise Portal and adding the extra protection of consistency technology to protect federated databases, the risks to these systems will diminish. For more information about disaster recovery, contact SAP Advanced Technology Group (email@example.com). The ATG is available to consult with SAP customers, including reviewing architectures and backup/recovery concepts. ATG also holds workshops for those investigating storage technology in an SAP NetWeaver landscape. ATG will deliver this as a new service. This SMO (System Management Optimization) service on "Continuity Management" is in preparation and will be available through SAP's service catalog.
Five Tips for Averting Disaster
In an SAP NetWeaver world, the stakes have become larger. It is one thing for the few hundred users of an R/3 system to lose their connection. It's another when the entire company loses its enterprise user interface and suddenly no one can work. Here are five tips for protecting your technological landscape.
1. Distinguish between backup and disaster recovery.
"For me, the most important thing is to distinguish disaster recovery from an ordinary restore," says Jochen Wolter of SAP's Advanced Technology Group, which has been studying and testing disaster recovery solutions within SAP NetWeaver. "For an ordinary restore, customers want to save as much data as possible. So if one database goes down, and maybe your log files are corrupt, you would lose some data in one system; but no one would accept data loss in other systems. If a disaster strikes, people are willing to accept some data loss. For disaster recovery, it's better to have consistency between the databases and a bit more of a data loss."
Unisys's Warfield agrees. "A majority of our customers will allow for some data loss." One reason is that users can usually remember the final transactions they were working on when the disaster struck.
2. Determine what you can't afford to lose.
"For every level of disaster recovery you want, you pay more," says Unisys's Warfield. "The customer has got to come to grips with a number of decisions: How fast do they want failover? How many transactions are they willing to lose? What budget do they have for disaster recovery? All affect the outcome of an architected solution."
While most companies are willing to accept loss of some data, Unisys has worked with organizations, such as airlines and stock exchanges, who cannot afford to lose a single transaction, he says. "You've got to guarantee that the live site and remote site are in sync."
There are disaster recovery solutions that can almost ensure that a company will not lose a single transaction, "but it's very, very expensive - two to three times the price of equipment," Warfield says. If you look at disaster recovery on a sliding scale in which zero means full data loss and 10 equals no data loss, "the difference between nine and 10 is a hundredfold greater than between 1 and 2. That last step is an enormous amount."
3. Determine the solution that best meets your operating goals.
When Unisys implemented SAP NetWeaver for a large government agency, it protected the SAP applications layer by installing redundant systems in different locations. To protect the databases, Unisys implemented a three-tier approach to disaster recovery:
Clustering. "You basically buy a redundant server and configure it so if a server fails, the redundant server takes over," Warfield says. In addition to offering protection from a disaster, it also makes it easier to do upgrades or other software changes. "We can do 'rolling upgrades,'" Warfield says. "The second server can be upgraded while the first server is doing the work. When you want to upgrade the first server, you move all the services over to the other node, and then you can upgrade the server."
Log shipping. According to Warfield, what is frequently overlooked is "logical protection." You can cluster servers, put in enough redundancy, "but if someone deletes some data, you'll have to do a restore." To create the restored database, Unisys's large government client uses log shipping. Every transaction that changes a database is written to a log. That log is applied to a remote server on a delayed basis. Should someone corrupt a database by inadvertently deleting a table, for example, the database can be restored using the log up to the point of corruption.
Business Continuance Volume (BCV). BCV creates a mirrored version of the database, a real-time clone of a database. "The nice part about this technology is if I have a large database that is multiple terabytes, I can restore it in minutes," Warfield says.
4. Backup tape or backup site?
A disaster, by its very nature, can affect an entire site. One major power outage or terrorist attack, and your entire company can be shut down if you have located your operations there. If you have a global operation, an outage at one facility that stops all work may be unacceptable. Therefore, a backup site may be required. There are three types: a hot site, which is a live failover site that replicates the production environment; a warm site, which contains all the equipment and infrastructure of a hot site but does not have the synchronous or asynchronous feed from the live site; and a cold site, which does not even contain the equipment and infrastructure.
"One of the things I've been seeing with our customers are Category Five rooms," says IBM's Das. These are synchronous sites, usually within the same campus as the prime location but a few hundred yards apart. The secondary site is built like a bunker, able to withstand a Category Five hurricane.
5. Test, test, test.
"Part of the thing with disaster recovery is I am a true believer that on the day you set it up, you must test it before you can claim it is working," says Warfield of Unisys. "If you don't test it out, you've paid a lot of money for something that likely will not work."
He suggests that a disaster recovery system be tested the day it gets turned on and then tested on a regular basis thereafter. "Some of our customers in the financial arena, they test their disaster recovery systems every week," he says.
Implement not only rigorous testing, but also test it every couple of months. With these complex designs, it is too easy for a simple network change that is benign to your production site to totally adversely affect your remote site.
For more information about disaster recovery, contact SAP Advanced Technology Group (firstname.lastname@example.org). The ATG is available to consult with SAP customers, including reviewing architectures and backup/recovery concepts. ATG also holds workshops for those investigating storage technology in an SAP NetWeaver landscape. ATG will deliver this as a new service. This SMO (System Management Optimization) service on "Continuity Management" is in preparation and will be available through SAP's service catalog.