Tagging source language text with unique identifiers

Which "Hot" am I translating?

Problem

App source code contains the strings that get translated. The app displays those strings. One typically can match the source to display only if the strings are unique. That is frequently not true. Duplicate strings like "OK", "Yes", "No", "Cancel", "Hot", etc can appear multiple times in the source code, and each needs a separate translation (even if the translation is the same), depending upon the circumstance. For example: "Hot" in English has 31 different Spanish translations depending upon the context.

Or maybe your app has a translation for "I Agree" and unknown to you, it is used in three different places. Your boss tells you to reword the question from "I Agree" to "I Do Not Agree" so you change the translation and the other two rare uses now show "I Do Not Agree" and confuse the customers. Better for each use of a string or phrase to have its own translation, even if they are all the same translation initially.

In addition, strings that are composed of live data "Temperature is 90˚C" won’t be in the source code. The source code will contain something like LocalizeString("Temperature is @1") where @1 would dynamically get replaced by "90˚C" or perhaps "194˚F". Connecting source code strings to what gets displayed is not always obvious.

Solution

The source code is duplicated into a temporary directory and each translated string is pre- or post-pended, "tagged", with invisible Unicode characters. The app is compiled and run in the Simulator to capture strings and screens. Although not visible because they are zero width characters, the code that captures the screens does see the Unicode characters. The characters can be treated as numerical digits so that each string gets a unique "number" that ties it back to where it is in the source code.

Impact

This absolutely ties strings in the code to strings that appear on screen. It solves the ambiguity problems of comprehensive duplicate support and supports translations with string formatting placeholders.

Status

US Patent Application Number 62409380
Filed 18-OCT-2017