Wiki

Clone wiki

javarosa / AutomatingTranslations

Automating Translations

Motivation and Background

JavaRosa uses two different localization formats for different tasks. The applications localization files are straightforward static translations in the format

string.key=The String Value
#this is what a comment looks like
string.anotherkey=Value again!
fancy.key=This is an argument ${0} inline!

The XForm uses the more complex itext system for managing localizations which enables much more powerful behaviors like dynamic referencing and different text forms like:

<itext>
     <translation lang="English>
              <text id="key">
                   <value>String value!</value>
      </translation>
</itext>

The goal of these localization formats is to encapsulate text in single locations so that translations can easily be automated. However, different translators don't always have the capability to easily parse through an enormous XML file making changes.

This document introduces a small harness which is used to help automate the process of dumping itext blocks into a more easily shared format, and then turning them back into XML so that it can be inserted into XForms.

Getting Started

The harness itself is in the javarosa library, under the util directory's "schema-gen" project. The build.xml file in that project should run happily if JavaRosa itself is compiling and running. Upon running it generates a jar file in the build folder named form_translate.jar. This file will be used on the command line to manage the localizations.

Using the Harness

The general invocation for going from XForm to csv file is:

java -jar form_translate.jar csvdump < form.xml > output.csv

This command will locate all of the translations in the form, and dump them to a satisfying csv table which can be easily amended or changed.

Rebuilding itext

Once a csv file has been translated, the itext translations for that file can be rebuilt with the invocation

java -jar form_translate.jar csvimport < translations.csv > itext.xml

or

java -jar form_translate.jar csvimport  UTF-8 < translations.csv > itext.xml

Where UTF-8 can be replaced with any other valid encoding in the current java environment

Due to the difficulty of inserting this into an xml file while still maintaining its whitespace, the itext file must be opened manually and the xml block needs to be re-inserted by hand.

Common Tools

Since CSV isn't by itself the most useful format for editing large bodies of text, tools like Microsoft excel are often used to manage translations. This section covers how to import and export translation files into Excel so that they can be shared.

Importing to Excel

Exporting from Excel (Excel 2007)

Now that the file has been translated, we'll cover the steps of taking an Excel workbook full of translation files and rebuilding the necessary CSV.

First, open your Excel workbook and find the appropriate sheet for the translation you are rebuilding.

  • Go Menu(the fancy button) and choose Save As....
  • Choose save as type "Unicode Text (*.txt)"

This will save the file as a Tab Separated Unicode file. Now you'll need to fire up your favorite regex system like SED or VIM and replace the tabs with commas. (Working on fixing this part currently)

In VIM:

  • Open the file. Its unicode contents (if any) will probably look garbled. Don't worry.
  • Press "gg" to go to the top of the file
  • Press Shift+v to enter selection mode
  • Press Shift+g to select to the bottom of the file
  • Press Shift+: to enter the appropriate command mode
  • Enter the regex s/\t/,/ Note the lack of the "g" flag to do more than one replacement per line
  • Repeat the above step as many times as there are translations. Avoid using s/\t/,/g, since the unicode translation might have extraneous strings

The file should now be comma separated. Notepad++ will give you a nice view of the unicode itself which you can use to verify.

Now run your file through the translator with the command

java -jar form_translate.jar csvimport  UTF-8 < yourtranslations.csv > itext.xml

This should result in an itext file containing the translations from the worksheet. You'll need to repeat the process for each file.

Updated