GRC
HR
SCM
CRM
BI


Article

 

Unicode: Overhead or Necessity?

by Dr. Franz-Josef Fritz | SAPinsider

April 1, 2006

by Dr. Franz-Josef Fritz, SAP AG SAPinsider - 2006 (Volume 7), April (Issue 2)
 




Dr. Franz-Josef Fritz,
SAP AG

Unicode — the international character-encoding standard that allows your systems to handle text data from multiple languages simultaneously and consistently — has been around for some time now.1 In fact, more than 5,000 SAP customer installations are already purely Unicode-based, and the relative share of pure Unicode installations is growing rapidly.

If your company hasn't already made a full-fledged conversion to Unicode, there are very good reasons to look into Unicode as an essential standard for your IT landscape now:

  • SAP technology provides full Unicode support starting from SAP Web Application Server (SAP Web AS) 6.20. In addition, SAP R/3 Enterprise (based on SAP Web AS 6.20), as well as all current releases of SAP NetWeaver and mySAP Business Suite, can be run Unicode-enabled. As a result, running multiple code pages in a non-Unicode system — an approach that was sometimes necessary in the "pre-Unicode world" but that always had to be considered as a workaround — is now strongly discouraged from mySAP ERP 2005 onward.

  • Many companies are adopting Web services to gain benefits such as greater openness that extends processes to customers and business partners.2 Service-oriented architectures (SOAs), including SAP's Enterprise Services Architecture, rely on a set of standards that enable global interoperability across systems, programming languages, and application services. One of these required standards is Unicode.

  • Your future software choices will require Unicode — if they don't already. Much of the enterprise software out there is already completely Unicode-ready: Everything in the Java space and everything based on XML is Unicode by definition. New system installations will have to be Unicode-only in future releases, and new SAP products will only be offered in Unicode. In fact, SAP NetWeaver Portal and SAP NetWeaver Exchange Infrastructure (XI) are already Unicode-only.

  • Tools from SAP now make it faster and more straightforward — and even automated in many cases — to convert your systems to Unicode.

So what do you need to know to get started? And what are the considerations as you prepare your systems and solution environment for the inevitable switch to Unicode? If your company has been reluctant to convert to Unicode, this article is designed to dispel some of the misgivings that companies may have about the effort of conversion. You will also gain a better understanding of the costs and benefits of moving to a Unicode-only landscape, especially for those of you looking at the business and cost implications of the transition to a Unicode environment.

Refresher: What Is Unicode?

Unicode is the international character-encoding standard that allows text data from different languages to be stored in one repository. Unicode enables a single set of source code to be written to process data in virtually all languages. It signifies a step away from traditional 8-bit characters, where the same character number can represent different characters in different alphabets, to a system that assigns each character one unique number in each of the major languages of the world.3

As a result, in a Unicode system, users enter and display any character from any script, no matter which logon language they use, and can print text data in multiple languages. Unicode also simplifies the addition of new language support to an e-business application, since character processing and storage remains unchanged.

Unicode is a prerequisite for all new technologies and any new and future code pages and characters. Here are some prominent examples:

  • Unicode is required by modern standards such as XML, Java, ECMAScript (JavaScript), LDAP, CORBA 3.0, and WML

  • New characters like the Euro sign (€) are only represented in Unicode

  • The Unicode standard has been widely adopted by industry leaders including Apple, HP, IBM, JustSystem, Microsoft, Oracle, SAP, Sun, Sybase, Unisys, and many others4

SAP Support for Unicode Conversion

The good news is that SAP provides lots of tools to convert existing non-Unicode systems to Unicode. These Unicode conversion tools have matured very recently, so that no matter what approach you are currently using to manage your multi-language system support (see sidebar below), the conversion process has been simplified greatly:

  • If you're using single code page systems, the conversion to Unicode is straightforward and, in fact, mostly automated. The same applies to blended code page systems.

  • If you're converting from multi-code page systems (MDMP) to Unicode, you can combine the conversion with an upgrade from R/3 4.6C to mySAP ERP 2005. This is especially important, since use of MDMP in mySAP ERP 2005 and beyond is strongly discouraged.

Note!
When evaluating an upgrade, planning for the Unicode conversion is crucial — and doing it in one step saves you effort and downtime.

Global Business Before Unicode:
Longstanding SAP Support for Multiple Character Sets

SAP has a long tradition of supporting global companies, systems, and applications in multiple languages and code pages — something that started long before Unicode became the gold standard for multilingual technology. If you are evaluating the implications of a conversion effort within your own company, it's important to understand what approach your enterprise is already using to handle multiple character sets and code pages:

Single Code Page System
In a single code page system, all application servers and the database use one standard system code page. This may be a 1-byte code page like Latin 1 (for Western Europe) or Latin 2 (for Eastern Europe), or a multi-byte Japanese code page. If your system landscape goes beyond one of these regions, however, this single code page system will no longer be sufficient.

Blended Code Page System (R/3 3.0D – R/3 4.6D)
From R/3 3.0D on, SAP application servers could run multi-byte blended code pages, which contain characters from several standard code pages. Blended code pages are not standard code pages, but SAP-customized pages devised to support an increased number of possible language combinations in a single code page. But such an approach covers only a fixed set of language combinations and does not allow any flexibility regarding additional code pages.

MDMP System Configuration (R/3 3.1I – SAP NetWeaver 2004)
The most recent pre-Unicode innovation from SAP is Multiple Display/Multiple Processing (MDMP). MDMP systems deploy more than one system code page on the application server. This allows multiple languages to be used together in one system, even though the characters of those languages are not covered by the same code page. This kind of setup has a number of challenges and risks, however, because the relevant code page always needs to be determined from the context, such as the logon language. The database layer is not aware of the different code pages at all. If one user now logs on in the "wrong" language or tries to work with data in a different code page from the one that relates to his logon language, the data will be interpreted incorrectly, the display of the data will not work, and — worse — data entry will lead to corrupted data in the database, something SAP cannot prevent in such a setting. Strict organizational measures must be in place to deal with these risks.

4 More Arguments for Unicode Conversion

As you have seen already, there are a number of very good reasons to convert to Unicode — but there is more to it than just complying with software and technology requirements. For those companies that are struggling to justify the cost of additional hardware resources and the one-time effort of conversion (see the section "The Big Questions When Planning for Unicode" for details on these costs), here are some compelling business reasons to make the switch.

Reason 1: Running Global Systems

Doing business globally doesn't just affect the business systems being used by your employees. Imagine that your business offers a Web service that allows your customers to enter their own contact data. To open your systems to the Web in this way, your global master data system must be able to contain multiple local language characters.

Or what if you want to enable collaborative business? Third-party products may be running on any possible code page. Wherever your business is running processes that are truly global — including HR systems, global master data management systems, or any customer database where you have to manage address data in multiple countries — standardizing on Unicode offers a single, standard, and flexible solution for the language challenges of global business information.

Reason 2: Standardizing IT Infrastructure

Unicode defines the character set for efficient text processing in any language and for maintaining text data integrity across the system (see sidebar above). Unicode systems integrate more easily in existing system landscapes (SAP and non-SAP systems), since they do not require any restrictions regarding supported languages and code pages. They also provide all language keys for ISO 639-2 (the relevant international standard for the representation of language names) and 86 additional country-specific language keys, for a total of 560 technically supported language keys!

This means streamlined support for users, wherever they are located and whatever language they speak. In a Unicode system, users can:

  • Enter and display any character from any script, no matter which logon language they use

  • Print text data in multiple languages

The result of such standardization is ultimately less maintenance for your IT team and lower risk to your systems, since it eliminates code-page conversions where data crosses system borders.

Reason 3: Running New Technologies

Using Internet standards also means bringing Unicode to the table. The Java language (and all technology that is built on it) requires Unicode, as do any Web services based on XML. To use either of these technologies, Unicode is a must. And for SAP customers, these technologies are crucial: To take advantage of the open integration of mySAP Business Suite and the SAP NetWeaver platform, you will be relying on Java and XML, and therefore on Unicode, as well.

Reason 4: Minimizing Risk

While it is technically possible to connect Unicode systems and non-Unicode systems, this presents many restrictions, challenges, and — perhaps most importantly — risks of losing information in the process.

This is also true for the connectivity inside an SAP system where a Java application (running in Unicode) delivers data to an ABAP application (if it still runs in a non-Unicode code page). Without appropriate governance, there is always the risk of losing or corrupting data in such a situation. And for those using SAP NetWeaver Portal, Unicode-based data from the portal generally cannot be fully reflected to non-Unicode backend systems.

Running multiple code pages in one system without Unicode presents increasing technical challenges and risks. Therefore mySAP ERP 2005 customers are strongly discouraged from using MDMP.5

The Big Questions When Planning for Unicode

Here are some common questions about the costs — in terms of additional hardware and effort — that go into a Unicode conversion.

What Is the Overhead of Running Unicode?

It is true: compared to a 1-byte single code page system, Unicode systems do need more main memory and disk space. But it's important to understand that you're balancing some short-term hardware costs (which, as the technology advances, might be less than you'd think) against reduced complexity and reliable support for global business across your enterprise. As you evaluate technology overhead, consider that:

  1. CPU overhead is less than 30% on average compared to single-byte systems. If the current system runs MDMP or double-byte code pages, the relative overhead is much less. Also, newer CPU models are more optimized toward handling Unicode, so the difference will shrink further.

  2. Main memory consumption can increase up to 50% compared to single-byte systems because inside application servers all characters go from 1-byte representation to 2-bytes. Be sure to factor in, though, that the memory required for numerals (which remain at 1-byte representation) will not change. In addition, all the new state-of-the-art servers come with 64-bit addressing and ample memory, so this is becoming less of a concern.

  3. Network load is only minimally affected. Network protocols and XML representations normally use a UTF-8 representation, which only needs 1 byte for the most frequently used characters.

  4. The database size increases 10% to 30%, depending on which code representation is used inside the database. Oracle and IBM DB2 UDB use UTF-8, which typically leads to only a 10% increase. Since a Unicode conversion implies a system copy operation with a re-build of the database tables, most customers actually experience an initial reduction in database size.

What Is the Effort and Cost of Conversion to Unicode?

You can break down the process of converting to Unicode into three basic steps (see Figure 1):

1. Make your custom applications Unicode-ready

2. Run the actual conversion of the database

3. Conduct some post-conversion testing and verification

In the following sections, we will concentrate on the first two steps, the steps that are unique to the Unicode-conversion process.

Figure 1

Planning Your Unicode Conversion Process

How Do We Unicode-Enable Our Custom Applications?

To prepare your custom applications for the transition to Unicode, you have to run check tools and mark your applications accordingly as either Unicode-compliant or not yet compliant. In the end, you must ensure that all your ABAP coding is Unicode-compliant, especially since some special programming techniques may have created dependencies on the specific code page you used.6

SAP Web Application Server (from 6.10 onward) comes with a check tool called UCCHECK for precisely this purpose (see Figure 2). UCCHECK has been used in the past for SAP code during the preparation of all applications for Unicode-readiness, and is designed specifically for the Unicode conversion of custom applications. With this check tool, you can:

  1. Remove errors in existing ABAP code that will hinder Unicode conversion

  2. Inspect places in your custom code that cannot be checked automatically for Unicode compliance, such as untyped field symbols, calculations with field lengths that are byte-oriented rather than character-oriented, and generic access to database tables.

Figure 2

Checking and Removing Unicode Errors in Custom Applications with UCCHECK

click here for a larger version of this image

For more on how to use UCCHECK, please see the corresponding SAP documentation at help.sap.com.

How Long Does It Take to Convert a Database to Unicode?

The time needed for conversion of a database to Unicode depends on a number of factors:

  • Is the source system a single code page or an MDMP system? MDMP requires more pre-conversion tasks and post-conversion handling in the Unicode system.

  • What are your biggest tables? Once you have identified these, you can optimize the process by setting up parallel export/import processes for those tables.

  • How much time is needed for processing cluster tables? This depends on the sizes of cluster tables (compared to transparent tables).

  • What is the employed hardware for the conversion? For example, what is the number and speed of CPUs? What is the performance of disks? And are there separate servers available for the target Unicode systems?

Note that the more you invest in optimization upfront, the shorter the downtime for the conversion can be.

SAP Current and Future Plans for Unicode-Based Offerings

SAP has already taken some key steps to ensure that its software and its customers are Unicode-ready:

  • New products (like SAP NetWeaver XI) have been Unicode-only from the beginning

  • New hardware platforms support Unicode only

  • Unicode is the recommended option for all new installations (starting with SAP NetWeaver 2004)

  • The Unicode recommendation is shown explicitly in the installation procedure (starting from SAP NetWeaver 2004s)

  • An upgrade to mySAP ERP 2005 can be combined with a conversion to Unicode, minimizing downtime

Clearly, Unicode is increasingly the default option of running SAP systems — a fact that reflects both industry trends and the need for open integration. There is also more focus on Unicode to come; in future SAP releases, new system installations will have to be Unicode-only. Look for formal announcements in this direction in the near future. Eventually, you can expect that Unicode will become the default target for any upgrade to a new release.

If your company has not yet made the switch, you can learn a great deal from the many customers who have already moved to Unicode or who are well on their way. For example, there is a very lively Unicode working group inside the Americas' SAP Users' Group (ASUG). In each of the recent ASUG conferences, customers have reported on their Unicode conversion projects and shared experience and best practices. Cooperation among worldwide user groups around the issue of global business — for example, between ASUG and DSAG (the organization of German-speaking SAP users) — has also brought the Unicode issue to the forefront and means that there will be ongoing Unicode support from a variety of sources, including SAP itself.7

For more information, see the "Resources" sidebar. For specifics on Unicode or other resources for global business, please send any questions to globalization@sap.com.

Resources


1 See SAP Insider articles from as far back as 2001 touting SAP's move to Unicode, listed in the "Resources" sidebar at the end of this article.

2 For more on Web services and SAP's Enterprise Services Architecture, please see articles such as "Getting Started with Enterprise Services Architecture: Beyond Integrating Systems to Enabling Growth and Innovation" in the October-December 2005 issue of SAP Insider (www.SAPinsider.com).

3 For a complete primer on how Unicode works, see Michael Redford's article "Looking Forward to the Unicode Advantage: Internationalization and Integration" in the January/February 2002 issue of SAP Professional Journal (www.sappro.com).

4 Visit www.unicode.org for more information on the Unicode standard.

5 For a detailed roadmap and a disclaimer on converting from MDMP to mySAP ERP 2005 Unicode, visit service.sap.com/unicode.

6 It's a good idea to remove these dependencies anyway, simply as a matter of good application development housekeeping.

7 See the DSAG Directions column "New DSAG Working Group Confronts the Challenges of Global SAP Implementations" in this issue of SAP Insider (www.SAPinsider.com).


Franz J. Fritz has a Ph.D. in mathematics and 30 years of experience in all areas of IT. Workflow and business process management have been particular areas of interest for much of his life. He has worked at SAP since 1993 as Program Director and Vice President with responsibility for the Business Process Technology and Internet-Business Framework departments. Since 2003, he has been responsible for several areas within SAP NetWeaver Product Management.

An email has been sent to:






More from SAPinsider



COMMENTS

Please log in to post a comment.

No comments have been submitted on this article. Be the first to comment!


SAPinsider
FAQ