Unicode — the international
character-encoding standard that allows your
systems to handle text data from multiple languages
simultaneously and consistently — has
been around for some time now.1 In fact, more
than 5,000 SAP customer installations are already
purely Unicode-based, and the relative share
of pure Unicode installations is growing rapidly.
If your company hasn't already made a full-fledged
conversion to Unicode, there are very good reasons
to look into Unicode as an essential standard for your
IT landscape now:
- SAP technology provides full Unicode
support starting from SAP Web Application
Server (SAP Web AS) 6.20. In addition, SAP
R/3 Enterprise (based on SAP Web AS 6.20),
as well as all current releases of SAP NetWeaver
and mySAP Business Suite, can be run Unicode-enabled.
As a result, running multiple code pages in a
non-Unicode system — an approach that
was sometimes necessary in the "pre-Unicode
world" but that always had to be considered
as a workaround — is now strongly discouraged
from mySAP ERP 2005 onward.
- Many companies are adopting Web services
to gain benefits such as greater openness
that extends processes to customers and business
architectures (SOAs), including SAP's
Enterprise Services Architecture, rely on
a set of standards that enable global interoperability
across systems, programming languages, and
application services. One of these required
standards is Unicode.
- Your future software choices will require
Unicode — if
they don't already. Much of the enterprise
software out there is already completely Unicode-ready:
Everything in the Java space and everything based
on XML is Unicode by definition. New system installations
will have to be Unicode-only in future releases,
and new SAP products will only be offered in Unicode.
In fact, SAP NetWeaver Portal and SAP NetWeaver
Exchange Infrastructure (XI) are already Unicode-only.
- Tools from SAP now make it faster and
more straightforward — and
even automated in many cases — to convert
your systems to Unicode.
So what do you need to know to get started? And what
are the considerations as you prepare your systems
and solution environment for the inevitable switch
to Unicode? If your company has been reluctant to convert
to Unicode, this article is designed to dispel some
of the misgivings that companies may have about the
effort of conversion. You will also gain a better
understanding of the costs and benefits of moving
to a Unicode-only landscape, especially for those
of you looking at the business and cost implications
of the transition to a Unicode environment.
What Is Unicode?
Unicode is the international character-encoding
standard that allows text data from
different languages to be stored in
one repository. Unicode enables a single
set of source code to be written to
process data in virtually all languages.
It signifies a step away from traditional
8-bit characters, where the same character
number can represent different characters
in different alphabets, to a system
that assigns each character one unique
number in each of the major languages
of the world.3
As a result, in a Unicode system,
users enter and display any character
from any script, no matter which logon
language they use, and can print text
data in multiple languages. Unicode
also simplifies the addition of new
language support to an e-business application,
since character processing and storage
Unicode is a prerequisite for all
new technologies and any new and future
code pages and characters. Here are
some prominent examples:
- Unicode is required by modern
standards such as XML, Java, ECMAScript
- New characters like the Euro sign
(€) are only represented in
- The Unicode standard has been widely
adopted by industry leaders including
Apple, HP, IBM, JustSystem, Microsoft,
Oracle, SAP, Sun, Sybase, Unisys,
and many others4
SAP Support for Unicode Conversion
The good news is that SAP provides lots of tools to
convert existing non-Unicode systems to Unicode. These
Unicode conversion tools have matured very recently,
so that no matter what approach you are currently using
to manage your multi-language system support (see sidebar
below), the conversion process has been simplified
- If you're using single code page systems,
the conversion to Unicode is straightforward and,
in fact, mostly automated. The same applies to
blended code page systems.
- If you're converting from multi-code page
systems (MDMP) to Unicode, you can combine
the conversion with an upgrade from R/3 4.6C to
mySAP ERP 2005. This is especially important, since
use of MDMP in mySAP ERP 2005 and beyond is strongly
When evaluating an upgrade, planning
for the Unicode conversion is crucial — and
doing it in one step saves you effort and downtime.
Global Business Before Unicode:
Longstanding SAP Support for Multiple Character
SAP has a long tradition of supporting
global companies, systems, and applications
in multiple languages and code pages — something
that started long before Unicode became the
gold standard for multilingual technology.
If you are evaluating the implications of
a conversion effort within your own company,
it's important to understand what approach
your enterprise is already using to handle
multiple character sets and code pages:
Single Code Page System
In a single code page system, all application
servers and the database use one standard
system code page. This may be a 1-byte code
page like Latin 1 (for Western Europe) or
Latin 2 (for Eastern Europe), or a multi-byte
Japanese code page. If your system landscape
goes beyond one of these regions, however,
this single code page system will no longer
Blended Code Page System (R/3
3.0D – R/3 4.6D)
From R/3 3.0D on, SAP application servers could
run multi-byte blended code pages, which contain
characters from several standard code pages.
Blended code pages are not standard code pages,
but SAP-customized pages devised to support
an increased number of possible language combinations
in a single code page. But such an approach
covers only a fixed set of language combinations
and does not allow any flexibility regarding
additional code pages.
MDMP System Configuration (R/3
3.1I – SAP NetWeaver 2004)
The most recent pre-Unicode innovation from
SAP is Multiple Display/Multiple Processing
(MDMP). MDMP systems deploy more than one system
code page on the application server. This allows
multiple languages to be used together in one
system, even though the characters of those
languages are not covered by the same code
page. This kind of setup has a number of challenges
and risks, however, because the relevant code
page always needs to be determined from the
context, such as the logon language. The database
layer is not aware of the different code pages
at all. If one user now logs on in the "wrong" language
or tries to work with data in a different code
page from the one that relates to his logon
language, the data will be interpreted incorrectly,
the display of the data will not work, and — worse — data
entry will lead to corrupted data in the database,
something SAP cannot prevent in such a setting.
Strict organizational measures must be in place
to deal with these risks.
4 More Arguments for Unicode Conversion
As you have seen already, there are a number
of very good reasons to convert to Unicode — but
there is more to it than just complying with
software and technology requirements. For
those companies that are struggling to justify
the cost of additional hardware resources
and the one-time effort of conversion (see
the section "The Big Questions When
Planning for Unicode" for details on
these costs), here are some compelling business
reasons to make the switch.
Reason 1: Running Global Systems
Doing business globally doesn't just
affect the business systems being used by your
employees. Imagine that your business offers
a Web service that allows your customers to
enter their own contact data. To open your
systems to the Web in this way, your global
master data system must be able to contain
multiple local language characters.
Or what if you want to enable collaborative business?
Third-party products may be running on any possible
code page. Wherever your business is running processes
that are truly global — including HR systems,
global master data management systems, or any customer
database where you have to manage address data
in multiple countries — standardizing
on Unicode offers a single, standard, and flexible
solution for the language challenges of global business
Reason 2: Standardizing IT Infrastructure
Unicode defines the character set for efficient text
processing in any language and for maintaining text
data integrity across the system (see sidebar above).
Unicode systems integrate more easily in existing
system landscapes (SAP and non-SAP systems), since
they do not require any restrictions regarding
supported languages and code pages. They also provide
all language keys for ISO 639-2 (the relevant international
standard for the representation of language names)
and 86 additional country-specific language keys,
for a total of 560 technically supported language
This means streamlined support for users, wherever
they are located and whatever language they speak.
In a Unicode system, users can:
- Enter and display any character from
any script, no matter which logon language
- Print text data in multiple languages
The result of such standardization is ultimately
less maintenance for your IT team and lower risk
to your systems, since it eliminates code-page
conversions where data crosses system borders.
Reason 3: Running New Technologies
Using Internet standards also means bringing Unicode
to the table. The Java language (and all technology
that is built on it) requires Unicode, as do any
Web services based on XML. To use either of these
technologies, Unicode is a must. And for SAP customers,
these technologies are crucial: To take advantage
of the open integration of mySAP Business Suite and
the SAP NetWeaver platform, you will be relying on
Java and XML, and therefore on Unicode, as well.
Reason 4: Minimizing Risk
While it is technically possible to connect Unicode
systems and non-Unicode systems, this presents many
restrictions, challenges, and — perhaps most
importantly — risks of losing information in
This is also true for the connectivity inside an
SAP system where a Java application (running in Unicode)
delivers data to an ABAP application (if it still
runs in a non-Unicode code page). Without appropriate
governance, there is always the risk of losing or
corrupting data in such a situation. And for those
using SAP NetWeaver Portal, Unicode-based data
from the portal generally cannot be fully reflected
to non-Unicode backend systems.
Running multiple code pages in one system without
Unicode presents increasing technical challenges
and risks. Therefore mySAP ERP 2005 customers are
strongly discouraged from using MDMP.5
The Big Questions When Planning for Unicode
are some common questions
about the costs — in terms of additional hardware
and effort — that go into a Unicode conversion.
What Is the Overhead of Running Unicode?
It is true: compared to a 1-byte single code page
system, Unicode systems do need more main memory
and disk space. But it's
important to understand that you're balancing
some short-term hardware costs (which, as the technology
advances, might be less than you'd think) against
reduced complexity and reliable support for global
business across your enterprise. As you evaluate
technology overhead, consider that:
- CPU overhead is less than 30% on average
compared to single-byte systems. If the current
system runs MDMP or double-byte code pages, the
relative overhead is much less. Also, newer CPU
models are more optimized toward handling Unicode,
so the difference will shrink further.
- Main memory
consumption can increase up to 50% compared
to single-byte systems because inside application
servers all characters go from 1-byte representation
to 2-bytes. Be sure to factor in, though, that
the memory required for numerals (which remain
at 1-byte representation) will not change.
In addition, all the new state-of-the-art servers
come with 64-bit addressing and ample memory,
so this is becoming less of a concern.
load is only minimally affected. Network protocols
and XML representations normally use a UTF-8
representation, which only needs 1 byte for
the most frequently used characters.
- The database
size increases 10% to 30%, depending on which
code representation is used inside the database.
Oracle and IBM DB2 UDB use UTF-8, which typically
leads to only a 10% increase. Since a Unicode
conversion implies a system copy operation with
a re-build of the database tables, most customers
actually experience an initial reduction in database
What Is the Effort and Cost of Conversion to Unicode?
You can break down the process of converting to Unicode
into three basic steps (see Figure 1):
1. Make your custom applications Unicode-ready
2. Run the actual conversion of the database
3. Conduct some post-conversion testing and verification
In the following sections, we will concentrate on
the first two steps, the steps that are unique to
the Unicode-conversion process.
Your Unicode Conversion Process
How Do We Unicode-Enable Our Custom Applications?
To prepare your custom applications for the transition
to Unicode, you have to run check tools and mark
your applications accordingly as either Unicode-compliant or not
yet compliant. In the end, you must ensure
that all your ABAP coding is Unicode-compliant,
especially since some special programming techniques
may have created dependencies on the specific code
page you used.6
SAP Web Application Server (from 6.10 onward) comes
with a check tool called UCCHECK for precisely this
purpose (see Figure 2). UCCHECK has been used in
the past for SAP code during the preparation of
all applications for Unicode-readiness, and is
designed specifically for the Unicode conversion
of custom applications. With this check tool, you
- Remove errors in existing ABAP code that
will hinder Unicode conversion
- Inspect places
in your custom code that cannot be checked automatically
for Unicode compliance, such as untyped field
symbols, calculations with field lengths that
are byte-oriented rather than character-oriented,
and generic access to database tables.
and Removing Unicode Errors in Custom Applications
a larger version of this image
For more on how to use UCCHECK, please see the corresponding
SAP documentation at help.sap.com.
How Long Does It Take to Convert a Database to Unicode?
The time needed for conversion of a database to Unicode
depends on a number of factors:
- Is the source system a single code page
or an MDMP system? MDMP requires more pre-conversion
tasks and post-conversion handling in the Unicode
- What are your biggest tables? Once you
have identified these, you can optimize the
process by setting up parallel export/import
processes for those tables.
- How much time is
needed for processing cluster tables? This
depends on the sizes of cluster tables (compared
to transparent tables).
- What is the employed
hardware for the conversion? For example, what
is the number and speed of CPUs? What is the
performance of disks? And are there separate
servers available for the target Unicode systems?
Note that the more you invest in optimization upfront,
the shorter the downtime for the conversion can
SAP Current and Future Plans for Unicode-Based Offerings
SAP has already taken some key steps to ensure that
its software and its customers are Unicode-ready:
- New products (like SAP NetWeaver XI) have
been Unicode-only from the beginning
- New hardware platforms support Unicode only
- Unicode is the recommended option for all
new installations (starting with SAP NetWeaver
- The Unicode recommendation is shown explicitly
in the installation procedure (starting
from SAP NetWeaver 2004s)
- An upgrade to mySAP ERP 2005 can be combined
with a conversion to Unicode, minimizing
Clearly, Unicode is increasingly the default option
of running SAP systems — a fact that reflects
both industry trends and the need for open integration.
There is also more focus on Unicode to come; in future
SAP releases, new system installations will have
to be Unicode-only. Look for formal announcements
in this direction in the near future. Eventually,
you can expect that Unicode will become the default
target for any upgrade to a new release.
If your company has not yet made the switch, you
can learn a great deal from the many customers who
have already moved to Unicode or who are well on
their way. For example, there is a very lively Unicode
working group inside the Americas' SAP
Users' Group (ASUG). In each of the recent
ASUG conferences, customers have reported on their
Unicode conversion projects and shared experience
and best practices. Cooperation among worldwide user
groups around the issue of global business — for
example, between ASUG and DSAG (the organization
of German-speaking SAP users) — has also brought
the Unicode issue to the forefront and means that
there will be ongoing Unicode support from a variety
of sources, including SAP itself.7
For more information, see the "Resources" sidebar.
For specifics on Unicode or other resources for global
business, please send any questions to email@example.com.
1 See SAP
Insider articles from as far back as 2001
touting SAP's move to Unicode, listed in the "Resources" sidebar
at the end of this article.
2 For more on Web services
Enterprise Services Architecture, please see
articles such as "Getting Started with
Enterprise Services Architecture: Beyond Integrating
Systems to Enabling Growth and Innovation" in
the October-December 2005 issue of SAP
3 For a complete primer on
how Unicode works, see Michael Redford's article "Looking
Forward to the Unicode Advantage: Internationalization
and Integration" in the January/February
2002 issue of SAP Professional Journal (www.sappro.com).
4 Visit www.unicode.org for more information
on the Unicode standard.
5 For a detailed
roadmap and a disclaimer on converting from
MDMP to mySAP ERP 2005 Unicode, visit service.sap.com/unicode.
a good idea to remove these dependencies
anyway, simply as a matter of good application
the DSAG Directions column "New
DSAG Working Group Confronts the Challenges
of Global SAP Implementations" in this
issue of SAP Insider (www.SAPinsider.com).
|Franz J. Fritz has a Ph.D. in mathematics and
30 years of experience in all areas of IT.
Workflow and business process management have
been particular areas of interest for much
of his life. He has worked at SAP since 1993
as Program Director and Vice President with
responsibility for the Business Process Technology
and Internet-Business Framework departments.
Since 2003, he has been responsible for several
areas within SAP NetWeaver Product Management.