Multilingual Architecture Primer

Abstract

This article explains the issues of software localization, proposes three concrete aproaches and explains the differences between them.

Why Multilingual Architectures?

Multilingual architecture addresses the issues of running a single software application in several markets or environments at the same time. Such environments can be characterized by:

Language
Country specific dialects or language variants
Time, date and currency formats
Character sets
Address formats (ZIP code formats, telephone number digits, ...)
Hot key assignments (Ctrl-O opens a file in an English application, Ctrl-A in a Spanish)
Legal differences (privacy regulations in Europe, VAT calculations, accounting rules, ...)
Other issues of globalization, such as culture specific elements that may be perceived differently in other countries.

Most software applications are being created monolingual and face the challenge to operate in an international environment later in their lifecylce (see CFO's view). In this situation, a company has to learn a lot about foreign markets, mentalities etc., so the idea is to reduce the hassle with the localization of the software product to a minimum. Typically, such a company will contact one of the leaders of the localization industry to investigate options.

Localization of Windows Executables

The solution that most "localization companies" will offer is to pass your .EXE and .DLL files through some special "localization tools" (see the systems page). These tools are able to extract resources from Windows binaries into a translator friendly format and to compile compile executable target binaries with the translated text.

These localization tools are very popular with localization providers because:

They are easy to use, even for technically less sophisticated translators (which represents the vast majority).
They maintain the context of the translatable items, allowing the translators to choose the precise expression and tone for every translatable item.
They can be integrated with the translation memories of the translators, to maintain the translation consitstent with online help and other documentation.
They normally deliver running target executables

It is important to understand that the "translation" of software is a surprisingly complex task, supported by its own set of software tools and restricted to a small elite of translation companies who have the tools, the tech skills and the right people. And these localization tools allow to reduce the complexity of localization projects and the skills necessary for the translators.

Changing to a Multilingual Architecture

Having learned about the basics of localization and the advantages of localization tools, it surprises that most large sortware companies are dealing with in-house, without the localization tools (then frequently called internationalization or I18N). There are several advantages:

Working with "localization tools" ties you to a single provider, who will take advantage of the lock-in effect and charge you a premium price.
Extracting the English text from the source files isn't really that complicated .
Localization tools do not help you with the internationalization (I18N) of other software components, such as date and number formats etc. (see above). You have to adapt your software anyway.
Keeping the language specific text in separate files allows you to deliver a single product that can run in several countries, instead of delivering different binaries for each country.

Particularly the last point gains a lot of weight, if you have many countries and short software releases cycles or patch intervals. Imagine if Microsoft would have to deliver new binaries every two weeks for 50 countries for several hundered products...

So sooner or later you will have to implement your multilingual architecture. The question is really about the best timing. But before we drill into that, we checkout what happens if a company starts with a multilingual architecture without being familiar with internationalization.

Traps and Issues with Multilingual Architectures

Software architect are typically make several important errors when designing multilingual architectues:

Most prominently, architects don't design the system to provide context to the translators.
A translator is lost if he has to translate for example the English word "Account" for a German ERP software, where there are over 20 different terms in German, depending on specific type and use of the account.
Duplicated text strings can cause problems if they have to be translated differently in different pages of a program.
Software is not translated "once and forever".
New releases and patches require a tight integration with the same translator during the lifetime of the application. "Tight" because the translation company needs to be able to react fast and "same translator" because you can't have a patchwork of personal styles and expressions throughout your application.

Multilingual Architecture Requirements

These issues translate into a number of requirements that a multilingual architecture has to deal with, beyond the simple separation of source text from the program code:

Localization Fallback Handling:
But errors are going to occur even if you are using an automated translation workflow. Typical cases include missing translations or missing localized GIFs (all images that include text). In these cases you want to provide the user with reasonable default values (such as English text) and you want to inform your Translation Workflow about the error ASAP.
Translation Workflow:
Imagine that you have just added one new field to a screen of your program and that you want to deliver the modified application to your clients ASAP. But before you can do so, the new field has to be translated by N (the number of your target languages) translators, N EMail replies have to be checked for completeness and have to be manually integrated into your resource files. Finally you should have made N final tests.
So what you need is a workflow application that takes care of the organizational aspects and that helps you to reduce errors. Such workflow applications notify translators if a change has occurred in the source text and that tells you when the last translator has returned his or her results.
Translation Cost:
You can be sure that your CFO is going to ask you after your second software release why the translation costs are so high. This is because translation is a manual process and good translators charge daily rates comparable to software developers. So you will have to add cost saving features to your translation workflow such as computing the "diff" between the current version and the last version. Or you have to close a special deal with your translation agency.
Translation Context:
Another issue that is going to occur is that your translators are going to complain about missing "context" for their translations. Missing context leads to wrong translations and can have very funny results. The question is whether your clients also think that it is funny... So you need to provide your translators with the context of their phrases.
Terminology Consistency:
Finally you have to make sure that your translators use the same words for the same concept throughout your GUI and your documentation. This issues becomes even more important if you are working with several translators on the same project of if you are changing translators. So you have to make sure that your translators use a Translation Memory and a Terminology Maintenance Application at home. Unfortunately these tools are very expensive and they only work together with Microsoft Word which unfortunately is not the file format of your resource files.

Two MLA Examples

I want to present two extreme examples of multilingual architectures: The Car Configurator application represents a typical object oriented Java architecture. The Competitiveness Marketplace represents a typical Web Application with a database backend.

	Car Configurator	Competitiveness Marketplace
System Architecture	Java Servlets and Business Objects with local state	Stateless Tcl architecture with central Oracle database
Resource File Format	Flat File	Database
Translation Lookup	Centralized "Localization Subsystem" cached in memory	Centralized translation lookup using a single database lookup table
Translation Workflow	EMail exchange of MS-Word resource files	Online workflow per translation item
Fallback strategy	Explicit fallback rules + logging of fallback events for the translation workflow	Return of English text + error events to the translation workflow
Templating	No templates	Design templates with separate components for all localizable items
Performance	Translation strings cached in RAM because of slow Java-DB access speed	Database access for all localization strings because of fast DB connection

Both multilingual architectures have proved to work well in practice. However, the way of maintaining the systems is very different:

The Car Configurator maintains all translatable items in Microsoft Word resource files structured as tables. This makes the work very easy for the translators who can use the Trados translation memory and MultiTerm terminology maintenance component. Also, this made it easy to communicate with translators (by EMail) who are normally not very technology savvy. The organizational overhead is limited because there is only a single resource file per language.
The Competitiveness.com Marketplace includes its own online Translation Workflow module that allows translators to work online. Consistency and error control is automatic. However, translators were complaining about missing context and expensive online time. Translation is now done by in-house translators in the Competitiveness.com offices.

Conclusion

Both multilingual architectures present viable options. However, I recommend to define clearly the linguistic and organizational aspects before the implementation of a MLA.

Please contact me if you have doubts, questions or comments. Tell me if you want me to put up your banner ad. Also, I am available as a freelance consultant.