Considering Unicode? Then Think About Archiving — It Can Save You Time and Money

by Georg Fischer | SAPinsider

January 1, 2007

by Georg Fischer, SAP AG, and Tanja Kaufmann, SAP AG SAPinsider - 2007 (Volume 8), January (Issue 1)

As SAP customers move to mySAP ERP 2005, many are finding that it's also the time to convert to a purely Unicode-based system. The shift toward Unicode signifies a market-wide trend in global business, and customers are embracing the character-encoding standard to support communication in all major world languages (see sidebar below).

What some customers may not realize, though, is that archiving data before converting to Unicode can bring important benefits to the project. Data archiving is a best-practice strategy that many companies perform as part of their regular data management plan (see the "Refresher" sidebar). Data archiving can often reduce data growth by 50% or more, and can bring additional benefits — such as reduced system downtime and hardware costs — to a Unicode conversion or upgrade project.

Why the Market Is Embracing Unicode

As the international character-encoding standard, Unicode enables software products and Web sites to be accessed across multiple platforms, languages, and countries. With more business crossing international borders, the market is demanding that business software support Unicode because of its ability to communicate and seamlessly conduct transactions in all major world languages. International business is just not possible anymore without Unicode; it is foundational for Web services, and it allows companies to transfer data across SAP and non-SAP systems without corruption or the need for reengineering.

Beyond Unicode's capacity for multilingual communication, companies are also finding that by converting, they gain a prime opportunity to make adaptations to their system landscape. With the help of SAP's System Landscape Optimization (SLO) service, companies can consolidate systems that previously ran on different code pages by converting them all to Unicode.1 This is especially helpful for enterprises looking to consolidate systems running in all areas of the world.

In response to these market-wide developments, SAP has adapted its products accordingly, providing full Unicode support starting from SAP Web Application Server (SAP Web AS) 6.20. SAP R/3 Enterprise, as well as all current releases of SAP NetWeaver and mySAP Business Suite, can also be run on Unicode. In addition, new releases of SAP NetWeaver and SAP applications based on SAP NetWeaver that are released in 2007 or later will no longer support new installations of non-Unicode systems.2 If you are considering an upgrade to mySAP ERP 2005, you'll want to check out the possibility of converting to Unicode as well, if you haven't already.3

For a more detailed look at Unicode and its role in the future of SAP solutions, see "Unicode: Overhead or Necessity?" by Dr. Franz-Josef Fritz in the April-June 2006 issue of SAP Insider (

Why the Right Time to Archive Is Before a Unicode Conversion

Let's take a closer look at how archiving data before moving to Unicode can enhance the conversion process.

Data archiving in a Unicode environment does not affect CPU or memory, only available disk space. After a Unicode conversion, additional CPU and memory consumption mainly occur on the application servers, which are scalable and generally less expensive and less critical than database servers.

1. Reduce System Downtime

During a database conversion to Unicode, you must shut down the system in order to convert all data correctly. Some current estimates place the downtime at around 12 hours per 1TB of data.4

Because data archiving decreases the size of the database, it will directly reduce the amount of downtime needed for a Unicode conversion. How much you can reduce the database size depends on several factors, such as whether you have archived before, how large your tables are, and what kind of data you need to archive. Regardless of these factors, though, you will find that having less data to work with reduces downtime not just in a Unicode conversion, but for any sort of conversion project.

Consider the example of a large German company that urgently needed to reduce the size of its database prior to converting a chart of accounts. Because this customer had never archived before, the results were dramatic: It archived 500GB of data through the first archiving sessions, and in less than a year, it reduced the total size of its database from 2.3TB to 1.4TB. The smaller database led to less downtime for its particular conversion; for illustration's sake, and using the estimate of 12 hours per 1TB, the customer would have seen their total shutdown time reduced by almost 12 hours had this been a Unicode conversion project!

Note that if you're planning to optimize your system landscape following your conversion to Unicode, you will find that pre-conversion archiving reduces the downtime for that as well, simply because there is less data in the online database to convert. Although you may need to convert those archive files later, such a conversion would not require you to shut down any systems.

While this article focuses on data archiving, we recommend employing all four data management methods to optimize system performance during landscape transitions.

2. Gain More Database Space

Some estimates show that the size of the occupied space within the database may increase between 10% and 30% after a Unicode conversion, depending on which code representation is used inside the database.5 For some customers this could mean that, in a worst-case scenario, they would have to purchase additional hardware. Archiving your data prior to the conversion is one way that can help you stabilize or even reduce the database size following the conversion.

Companies that have never archived before will see an especially considerable decrease in the size of their database following the initial archiving run (see Figure 1).

Figure 1
The effects of a Unicode conversion on the database with and without data archiving (based on the aforementioned estimate of 12 hours of downtime per 1TB of data)

Note that in some cases, much of the space freed up through data management is not reusable without a database reorganization. But 24x7 system availability requirements often make a database reorganization nearly impossible, so starting data archiving early is key. SAP recommends that customers begin setting up an archiving strategy during the system-sizing phase, before the system goes live. If you archive data before a Unicode conversion, the database will automatically be reorganized, and you'll maximize the possible amount of regained database space without any added effort.

3. Reduce Hardware Costs in the Long Term

Because files archived prior to the Unicode conversion do not need to be converted, they will not take up the space that non-archived data will when converted to a Unicode-based system. As a result, the disk space necessary for storing the data does not increase following the Unicode conversion, and you need not invest in additional hardware. Furthermore, through improved compression techniques, the size of the data archived after the Unicode conversion does not increase significantly as compared to its size before the conversion.

Hardware cost savings also come from continually archiving data after the Unicode conversion, as archiving will help you maintain the smaller database and therefore reduce the amount of necessary disk space.

Refresher: The 4 Elements of a Comprehensive Data Management Plan

Data management is an important part of system maintenance and a best-practice strategy employed by most large SAP customers. Its goal is to keep data volume growth in check, helping you maintain good system performance, reduce system administration costs, and use fewer system resources in operations. A comprehensive data management strategy includes:

  • Data prevention — Stop the creation of unnecessary data

  • Data aggregation — Reduce unnecessary details by summarizing certain data

  • Data deletion — Eliminate any data that is not necessary for legal purposes

  • Data archiving — Write data in the database to archive files in a compressed format and then securely delete it from the database

While this article focuses on the data archiving aspect of data management, we recommend employing all four methods to optimize system performance during landscape transitions.

To support enterprises' data management and archiving efforts, SAP offers SAP Data Volume Management (DVM), a comprehensive service portfolio that helps companies implement a data management and archiving strategy that provides an assessment of existing strategies. DVM's aim is to empower company experts to create and later maintain these data management strategies. See for more information.

You can also find instructions for applying the four data management methods to your most important database tables through the Data Management Guide at ? Media Library ? Literature and Brochures. This guide includes more than 70 tables and is updated quarterly based on customer feedback.

But What About Your Archived Data? Demystifying Frequent User Misconceptions

When it comes to data archiving, some users will undoubtedly have questions including: Will I still be able to read my archived data once the system is converted to Unicode? Will I eventually have to convert these archive files too?

Because archive files are not converted to Unicode along with the rest of the system, customers commonly believe that these files are no longer readable. This is not true; in most cases the data can be easily accessed and read, as archived data is always converted "on the fly" — that is, converted at the time of reading using the same rules that were applied to the data in the database — during the read operation. The actual data in the file is never changed. For seldom-accessed archived data, this is by far the most efficient approach.

If your system used only one code page before the Unicode conversion, the procedure is very straightforward — the used code is written to the header of each archive file.

If your system was configured as a multi-display, multi-processing (MDMP) system, and your archive files were created in SAP R/3 4.6D or lower before the switch to Unicode, the situation is slightly more complicated because the code page that the system uses for the conversion depends on your users' logon language. In most cases, this will not affect the on-the-fly conversion, but there is a slight chance the data will not convert correctly. See the "Context-Dependent Conversions" sidebar below for an explanation of this possible logon language issue and SAP's method for overcoming it.6

Context-Dependent Conversions: How SAP Identifies the Right Code Page to Convert Archived Data into a Readable Language

On-the-fly conversions of files archived prior to the Unicode conversion allow you to view your files simply and automatically in nearly every circumstance. But if you're working on an MDMP system, it is more difficult for the system to find the right conversion rule. This is because the MDMP system uses different code pages simultaneously (in most archived databases, archive file headers each contain the code page that the system used when it archived the data). In other words, the system assigns a code page to a user according to the logon language, so the code page that the system writes to the archive file header is the code page assigned to the logon language of the administrator that started the archiving sessions. But the file will also include data entered by users of another language, so this data cannot be read using the code page from the file header.7

Before a system is switched to Unicode, this code page disparity does not have any influence on how the archived data is displayed — the user can still log on to the MDMP system and view the archive files in the same language in which the data was written to the archive. But after the Unicode conversion, the user must convert the archived data to the valid Unicode code page at the time of reading. This is where the single code page entry causes problems. Because the code page refers only to one language, it cannot determine language-dependent code pages, so it's impossible to know which code page to convert the data to.

SAP has implemented a solution to ensure that the user trying to access the archive file can correctly convert the data, regardless of the language in which the administrator entered it. Rather than using the code page stored in the archive file for converting archived data, the system (as of SAP Web Application Server 6.20) uses the logon language of the user and attempts to identify its code page assignment before the Unicode conversion. If the system cannot determine this code page (for example, because the code page information was not updated in the history of the system during the Unicode conversion), it uses the code page stored in the archive file.

With this procedure, any archive file should be easily readable after a Unicode conversion. In the rare case in which the system history does not contain the necessary information, you can assign a code page to a language in the control table ARCH_CP (using transaction SM30). If a code page is stored in this table, the system overrides both the code page revealed by the system history and the code page stored in the archive file. Depending on the logon language of the user, the system then uses the code page from the control table to convert the data during the archive file read.8

Because archive files are not converted to Unicode along with the rest of the system, customers commonly believe that these files are no longer readable. This is not true.

Conclusion: Make the Most of Your Unicode Conversion

Unicode has become the default standard for character encoding to support complex systems and system landscapes in a multinational and multilingual environment, and SAP customers are making the switch, if they haven't already. By employing data archiving procedures prior to a conversion to Unicode, perhaps in combination with an upgrade to mySAP ERP 2005, you can uncover significant benefits, including reduced hardware costs and faster, more efficient systems.

SAP has taken steps to ensure that archived information is as equally accessible after a Unicode conversion as it was before. Archived data is converted on the fly when users attempt to access it; in the exceptional cases when that does not work, SAP has implemented several workarounds to identify the language of the archived data's code pages and translate that data into a readable language. Through SAP's various methodologies, virtually any archived data can be read after the switch to Unicode.

For more information on SAP's Unicode strategy, please visit com/unicode.

For more information about System Landscape Optimization, visit

2 Non-Unicode upgrades and system copies of single code-page systems will still be possible.

3 As of mySAP ERP 2005, SAP strongly discourages the use of multi-display, multi-processing (MDMP) code pages. Although MDMP allows the application server to dynamically switch between code pages according to logon language and language keys, it has limitations compared to Unicode; for instance, an individual user in an MDMP system can only use characters belonging to one code page at a time.

4 This estimate depends heavily on the hardware and settings of your system and can therefore vary greatly. The main factors are speed of hardware and degree of parallelization (i.e., utilization of the available hardware).

5 Oracle and IBM DB2 UDB, for example, use UTF 8, which typically leads to only a 10% increase.

6 Also see SAP note 449918.

7The single code page is helpful for dealing with technical platform changes, such as changes from EBCDIC to ASCII.

8 The reloading of data archived before the Unicode conversion is not allowed under any circumstances following a Unicode conversion. SAP cannot accept any liability for any problems that occur if this type of reloading is performed after a Unicode conversion for the following reasons: If data is reloaded as a batch process, the SAP system cannot determine the code page because it does not know the logon language, and the data can be converted only by using one code page. If the reloaded archive file contains data in more than one language, all but one language will be reloaded incorrectly.

Additional Resources

SAP notes:

  • 705447 — Size of archive files

  • 449918 — Reading archived data in Unicode systems

AP Insider articles (

  • "Data Archiving: The Fastest Access to Your Business Data?" by Dr. Bernhard Brinkmöller and Helmut Stefani (July-September 2006)

  • "Unicode: Overhead or Necessity?" by Dr. Franz-Josef Fritz (April-June 2006)

  • "Data Archiving Improves Performance — Myth or Reality?" by Dr. Bernhard Brinkmöller and Georg Fischer (October-December 2005)

  • "Consider These 3 Questions When Deciding to Upgrade to mySAP ERP" by Jason Fox (October-December 2005)

Georg Fischer has been with SAP AG since 1998, serving as product manager for the Performance, Data Management & Scalability group since 2003. After studying IT at the Darmstadt University of Technology, Germany, he worked at Digital Equipment Corporation (DEC) in document management, optical archiving, and data archiving. Before he moved to product management, he managed SAP solution implementations in Europe and the US. You can reach him at

Tanja Kaufmann joined SAP AG in 2002 and is part of the data archiving product management team. Prior to SAP, she spent several years working in Mexico City as coordinator of brokerage firm Acciones y Valores de Mexico SA de CV Casa de Bolsa's financial and economic markets magazine. She holds a master's degree in translation from the Monterey Institute of International Studies in California. You can reach her at

An email has been sent to:

More from SAPinsider


Please log in to post a comment.

No comments have been submitted on this article. Be the first to comment!