设计字符串本地化的最佳方式

时间:2022-10-20 18:22:05

This is kinda a general question, open for opinions. I've been trying to come up with a good way to design for localization of string resources for a Windows MFC application and related utilities. My wishlist is:

这是一个普遍的问题,对意见持开放态度。我一直在努力想出一个很好的方法来设计Windows MFC应用程序和相关实用程序的字符串资源的本地化。我的愿望是:

  • Must preserve string literals in code (as opposed to replacing with macro #define resource ID's), so that the messages are still readable inline
  • 必须在代码中保留字符串文字(而不是替换为宏#define资源ID),以便消息仍然可以内联读取

  • Must allow localized string resources (duh)
  • 必须允许本地化的字符串资源(duh)

  • Must not impose additional run-time environment restrictions (eg: dependency on .NET, etc.)
  • 不得强加额外的运行时环境限制(例如:依赖于.NET等)

  • Should have minimal obtrusion into existing code (the less modification the better)
  • 应该对现有代码的最小限制(修改越少越好)

  • Should be debuggable
  • 应该是可调试的

  • Should generate resource files which are editable by common tools (ie: common format)
  • 应生成可通过常用工具编辑的资源文件(即:通用格式)

  • Should not use copy/paste comment blocks to preserve literal strings in code, or anything else which creates the potential for de-synchronization
  • 不应使用复制/粘贴注释块来保留代码中的文字字符串,或其他任何可能导致去同步的内容

  • Would be nice to allow static (compile-time) checking that every "notated" string is in the resource file(s)
  • 很高兴允许静态(编译时)检查每个“标记”字符串是否在资源文件中

  • Would be nice to allow cross-language resource string pooling (for components in various languages, eg: native C++ and .NET)
  • 允许跨语言资源字符串池(对于各种语言的组件,例如:本机C ++和.NET)会很高兴

I have a way which fulfills all my wishlist to some extent except for static checking, but I have had to develop a bit of custom code to achieve it (and it has limitations). I'm wondering if anyone has solved this problem in a particularly good way.

我有一种方法可以在某种程度上满足我的所有愿望清单,除了静态检查,但我必须开发一些自定义代码来实现它(并且它有局限性)。我想知道是否有人以特别好的方式解决了这个问题。

Edit: The solution I currently have looks like this:

编辑:我目前的解决方案如下:

ShowMessage( RESTRING( _T("Some string") ) );
ShowMessage( RESTRING( _T("Some string with variable %1"), sNonTranslatedStringVariable ) );

I then have a custom utility to parse out the strings from within the 'RESTRING' blocks and put them into a .resx file for localization, and a separate C# COM object to load them from localized resource files with fallback. If the C# object is not available (or cannot load), I fallback to the string in the code. The macro expands to a template class which calls the COM object and does the formatting, etc.

然后我有一个自定义实用程序来解析'RESTRING'块中的字符串,并将它们放入.resx文件进行本地化,以及一个单独的C#COM对象,用于从具有回退的本地化资源文件加载它们。如果C#对象不可用(或无法加载),我将回退到代码中的字符串。宏扩展为一个模板类,它调用COM对象并进行格式化等。

Anyway, I thought it would be useful to add what I have now for reference.

无论如何,我认为添加我现在的内容以供参考是有用的。

7 个解决方案

#1


3  

We use the English string as the ID.

我们使用英文字符串作为ID。

If it fails the look up from the international resource object (loaded from the I18N dll installed) then we default to the ID string.

如果它从国际资源对象(从安装的I18N dll加载)中查找失败,那么我们默认为ID字符串。

Code looks like:

代码如下:

doAction(I18N.get("Press OK to continue"));

As part of the build processes we have a perl script that parses all source for string constants. It builds a temp file of all strings in the application and then compares these against the resource strings in each local to see if they exists. Any missing strings generates an e-mail to the appropriate translation team.

作为构建过程的一部分,我们有一个perl脚本,它解析字符串常量的所有源代码。它构建应用程序中所有字符串的临时文件,然后将它们与每个本地的资源字符串进行比较,以查看它们是否存在。任何缺少的字符串都会向相应的翻译团队发送电子邮件。

We can have multiple dll for each local. The name of the dll is based on RFC 3066
language[_territory][.codeset][@modifier]

我们可以为每个本地提供多个dll。 dll的名称基于RFC 3066语言[_territory] ​​[.codeset] [@modifier]

We try and extract the locale from the machine and be as specific as possible when loading the I18N dll but fallback to less specific local variations if the more specific version is not present.

我们尝试从机器中提取区域设置,并在加载I18N dll时尽可能具体,但如果不存在更具体的版本,则回退到不太具体的局部变体。

Example:

In the UK: If the local was en_GB.UTF-8
(I use the term dll loosely not in the specific windows sense).

在英国:如果本地是en_GB.UTF-8(我使用术语dll松散地不在特定的窗口意义上)。

First look for the I18N.en_GB.UTF-8 dll. If this dll does not exist fall back to I18N.en_GB. If this dll does not exist fall back to I18N.en If this dll does not exist fall beck to I18N.default

首先查看I18N.en_GB.UTF-8 dll。如果这个dll不存在则回落到I18N.en_GB。如果这个dll不存在则回落到I18N.en如果这个dll不存在则会降到I18N.default

The only exception to this rule is: Simplified Chinese (zh_CN) where the fallback is US English (en_US). If the machine does not support simplified Chinese then it is unlikely to support full Chinese.

此规则的唯一例外是:简体中文(zh_CN),其中后备是美国英语(en_US)。如果机器不支持简体中文,则不太可能支持完整的中文。

#2


1  

I don't know much about how this is normally done on Windows, but the way localized strings are handled in Apple's Cocoa framework works pretty well. They have a very basic text-format file that you can send to a translator, and some preprocessor macros to retrieve the values from the files.

我不太了解这通常是如何在Windows上完成的,但是在Apple的Cocoa框架中处理本地化字符串的方式非常有效。它们有一个非常基本的文本格式文件,可以发送给翻译器,还有一些预处理器宏可以从文件中检索值。

In your code, you'll see the strings in your native language, rather than as opaque IDs.

在您的代码中,您将看到您的母语中的字符串,而不是不透明的ID。

#3


1  

Your solution is quite similar to the Unix/Linux "gettext" solution. In fact, you would not need to write the extraction routines.

您的解决方案与Unix / Linux“gettext”解决方案非常相似。实际上,您不需要编写提取例程。

I'm not sure why you want the _RESTRING macro to handle multiple arguments. My code (using wxWidgets' support for gettext) looks like this: MyString.Format(_("Some string with variable %ls"), _("variable"));. That is to say, String::Format(...) gets two individually translated arguments. In hindsight, Boost::Format would have been better, but it too would allow boost::format(_("Some string with variable %1")) % _("variable");

我不确定你为什么要_RESTRING宏来处理多个参数。我的代码(使用wxWidgets对gettext的支持)如下所示:MyString.Format(_(“一些带有变量%ls的字符串”),_(“变量”));.也就是说,String :: Format(...)获得两个单独翻译的参数。事后看来,Boost :: Format会更好,但它也会允许boost :: format(_(“一些字符串变量%1”))%_(“变量”);

(We use the _() macro for brevity)

(为简洁起见,我们使用_()宏)

#4


1  

The simple way is to only use string IDs in your code - no literal strings. You can then produce different versions of the.rc file for each language and either create resource only DLLs or simply different language builds.

简单的方法是只在代码中使用字符串ID - 没有文字字符串。然后,您可以为每种语言生成不同版本的.rc文件,并创建仅限资源的DLL或仅创建不同的语言版本。

There are a couple of shareware utilstohelp localising the rc file which handle resizing dialog elements for languages with longer words and warnign about missing translations.

有一些共享软件utilstohelp本地化rc文件,它处理调整大小对话框元素的语言,这些语言具有较长的单词和关于缺失翻译的警告。

A more complicated problem is word order, if you have several numbers in a printf which must be in a different order for different language's grammar. There are some extended printf classes on codeproject that let you specify things like printf("word %1s and %2s",var1,var2) so you can switch %1s and %2s if necessary.

一个更复杂的问题是词序,如果你在printf中有几个数字,对于不同语言的语法必须有不同的顺序。在codeproject上有一些扩展的printf类,可以让你指定像printf(“word%1s和%2s”,var1,var2)这样的东西,这样你就可以根据需要切换%1s和%2s。

#5


0  

On one project I had localized into 10+ languages, I put everything that was to be localized into a single resource-only dll. At install time, the user selected which dll got installed with their application.

在一个我已经本地化为10多种语言的项目中,我将所有要本地化的项目放入一个仅限资源的dll中。在安装时,用户选择了哪个dll与他们的应用程序一起安装。

I only had to deliver the English dll to the localization team. They returned a localized dll to me for each language which I included in the build.

我只需要将英文dll发送给本地化团队。他们为我在构建中包含的每种语言返回了一个本地化的dll。

I know it's not perfect, but it worked.

我知道它并不完美,但它确实奏效了。

#6


0  

Since it is open for opinions, here is how I do it.

既然它是开放的意见,我就是这样做的。

My localized text file is a simple tab delimited text file that can be loaded in Excel and edited. The first column is for the define and each column to the right is a subsequent language, for example:

我的本地化文本文件是一个简单的制表符分隔文本文件,可以在Excel中加载并进行编辑。第一列用于定义,右边的每列是后续语言,例如:

ID              ENGLISH      FRENCH    GERMAN
STRING_YES      YES          OUI       YA
STRING_NO       NO           NON       NEIN

Then in my makefile is a cusom build step that generates a strings.h file and a strings.dat. In my case it builds an enum list for the string ids and then a binary file with offsets for the text. Since in my app the user can change the language at any time i have them all in memory but you could easily have your pre-processer generate a different output file for each language if necessary.

然后在我的makefile中有一个cusom构建步骤,它生成一个strings.h文件和一个strings.dat。在我的例子中,它为字符串id构建枚举列表,然后为文本构建偏移量的二进制文件。因为在我的应用程序中,用户可以随时更改语言,但是如果需要,您可以轻松地让预处理器为每种语言生成不同的输出文件。

The thing that I like about this design is that if any strings are missing then I would get a compile error whereas if strings were looked up at runtime then you might not know about a missing string in a seldom used part of the code until later.

我喜欢这个设计的事情是,如果缺少任何字符串,那么我会得到一个编译错误,而如果在运行时查找字符串,那么你可能不知道在很少使用的部分代码中缺少字符串,直到以后。

#7


0  

You want an advanced utility that I've always wanted to write but never had the time to. If you don't find such a tool, you may want to fallback on my CMsg() and CFMsg() wrapper classes that allow to very easily pull strings from the resource table. (CFMsg even provide a FormatMessage one-liner wrapper. And yes, in the absence of that tool you're looking for, keeping a copy of the string in comment is a good solution. Regarding desynchronisation of the comment, remember that string literals are very rarely changed.

你想要一个我一直想写的高级实用工具,但从来没有时间去做。如果您没有找到这样的工具,您可能希望回退我的CMsg()和CFMsg()包装类,它们可以非常轻松地从资源表中提取字符串。 (CFMsg甚至提供了一个FormatMessage单行包装器。是的,如果没有你正在寻找的工具,在注释中保留字符串的副本是一个很好的解决方案。关于注释的去同步,请记住字符串文字是很少改变。

http://www.codeproject.com/KB/string/stringtable.aspx

BTW, native Win32 programs and .NET programs have a totally different resource storage management. You'll have a hard time finding a common solution for both.

BTW,本机Win32程序和.NET程序具有完全不同的资源存储管理。你很难找到两者的通用解决方案。

#1


3  

We use the English string as the ID.

我们使用英文字符串作为ID。

If it fails the look up from the international resource object (loaded from the I18N dll installed) then we default to the ID string.

如果它从国际资源对象(从安装的I18N dll加载)中查找失败,那么我们默认为ID字符串。

Code looks like:

代码如下:

doAction(I18N.get("Press OK to continue"));

As part of the build processes we have a perl script that parses all source for string constants. It builds a temp file of all strings in the application and then compares these against the resource strings in each local to see if they exists. Any missing strings generates an e-mail to the appropriate translation team.

作为构建过程的一部分,我们有一个perl脚本,它解析字符串常量的所有源代码。它构建应用程序中所有字符串的临时文件,然后将它们与每个本地的资源字符串进行比较,以查看它们是否存在。任何缺少的字符串都会向相应的翻译团队发送电子邮件。

We can have multiple dll for each local. The name of the dll is based on RFC 3066
language[_territory][.codeset][@modifier]

我们可以为每个本地提供多个dll。 dll的名称基于RFC 3066语言[_territory] ​​[.codeset] [@modifier]

We try and extract the locale from the machine and be as specific as possible when loading the I18N dll but fallback to less specific local variations if the more specific version is not present.

我们尝试从机器中提取区域设置,并在加载I18N dll时尽可能具体,但如果不存在更具体的版本,则回退到不太具体的局部变体。

Example:

In the UK: If the local was en_GB.UTF-8
(I use the term dll loosely not in the specific windows sense).

在英国:如果本地是en_GB.UTF-8(我使用术语dll松散地不在特定的窗口意义上)。

First look for the I18N.en_GB.UTF-8 dll. If this dll does not exist fall back to I18N.en_GB. If this dll does not exist fall back to I18N.en If this dll does not exist fall beck to I18N.default

首先查看I18N.en_GB.UTF-8 dll。如果这个dll不存在则回落到I18N.en_GB。如果这个dll不存在则回落到I18N.en如果这个dll不存在则会降到I18N.default

The only exception to this rule is: Simplified Chinese (zh_CN) where the fallback is US English (en_US). If the machine does not support simplified Chinese then it is unlikely to support full Chinese.

此规则的唯一例外是:简体中文(zh_CN),其中后备是美国英语(en_US)。如果机器不支持简体中文,则不太可能支持完整的中文。

#2


1  

I don't know much about how this is normally done on Windows, but the way localized strings are handled in Apple's Cocoa framework works pretty well. They have a very basic text-format file that you can send to a translator, and some preprocessor macros to retrieve the values from the files.

我不太了解这通常是如何在Windows上完成的,但是在Apple的Cocoa框架中处理本地化字符串的方式非常有效。它们有一个非常基本的文本格式文件,可以发送给翻译器,还有一些预处理器宏可以从文件中检索值。

In your code, you'll see the strings in your native language, rather than as opaque IDs.

在您的代码中,您将看到您的母语中的字符串,而不是不透明的ID。

#3


1  

Your solution is quite similar to the Unix/Linux "gettext" solution. In fact, you would not need to write the extraction routines.

您的解决方案与Unix / Linux“gettext”解决方案非常相似。实际上,您不需要编写提取例程。

I'm not sure why you want the _RESTRING macro to handle multiple arguments. My code (using wxWidgets' support for gettext) looks like this: MyString.Format(_("Some string with variable %ls"), _("variable"));. That is to say, String::Format(...) gets two individually translated arguments. In hindsight, Boost::Format would have been better, but it too would allow boost::format(_("Some string with variable %1")) % _("variable");

我不确定你为什么要_RESTRING宏来处理多个参数。我的代码(使用wxWidgets对gettext的支持)如下所示:MyString.Format(_(“一些带有变量%ls的字符串”),_(“变量”));.也就是说,String :: Format(...)获得两个单独翻译的参数。事后看来,Boost :: Format会更好,但它也会允许boost :: format(_(“一些字符串变量%1”))%_(“变量”);

(We use the _() macro for brevity)

(为简洁起见,我们使用_()宏)

#4


1  

The simple way is to only use string IDs in your code - no literal strings. You can then produce different versions of the.rc file for each language and either create resource only DLLs or simply different language builds.

简单的方法是只在代码中使用字符串ID - 没有文字字符串。然后,您可以为每种语言生成不同版本的.rc文件,并创建仅限资源的DLL或仅创建不同的语言版本。

There are a couple of shareware utilstohelp localising the rc file which handle resizing dialog elements for languages with longer words and warnign about missing translations.

有一些共享软件utilstohelp本地化rc文件,它处理调整大小对话框元素的语言,这些语言具有较长的单词和关于缺失翻译的警告。

A more complicated problem is word order, if you have several numbers in a printf which must be in a different order for different language's grammar. There are some extended printf classes on codeproject that let you specify things like printf("word %1s and %2s",var1,var2) so you can switch %1s and %2s if necessary.

一个更复杂的问题是词序,如果你在printf中有几个数字,对于不同语言的语法必须有不同的顺序。在codeproject上有一些扩展的printf类,可以让你指定像printf(“word%1s和%2s”,var1,var2)这样的东西,这样你就可以根据需要切换%1s和%2s。

#5


0  

On one project I had localized into 10+ languages, I put everything that was to be localized into a single resource-only dll. At install time, the user selected which dll got installed with their application.

在一个我已经本地化为10多种语言的项目中,我将所有要本地化的项目放入一个仅限资源的dll中。在安装时,用户选择了哪个dll与他们的应用程序一起安装。

I only had to deliver the English dll to the localization team. They returned a localized dll to me for each language which I included in the build.

我只需要将英文dll发送给本地化团队。他们为我在构建中包含的每种语言返回了一个本地化的dll。

I know it's not perfect, but it worked.

我知道它并不完美,但它确实奏效了。

#6


0  

Since it is open for opinions, here is how I do it.

既然它是开放的意见,我就是这样做的。

My localized text file is a simple tab delimited text file that can be loaded in Excel and edited. The first column is for the define and each column to the right is a subsequent language, for example:

我的本地化文本文件是一个简单的制表符分隔文本文件,可以在Excel中加载并进行编辑。第一列用于定义,右边的每列是后续语言,例如:

ID              ENGLISH      FRENCH    GERMAN
STRING_YES      YES          OUI       YA
STRING_NO       NO           NON       NEIN

Then in my makefile is a cusom build step that generates a strings.h file and a strings.dat. In my case it builds an enum list for the string ids and then a binary file with offsets for the text. Since in my app the user can change the language at any time i have them all in memory but you could easily have your pre-processer generate a different output file for each language if necessary.

然后在我的makefile中有一个cusom构建步骤,它生成一个strings.h文件和一个strings.dat。在我的例子中,它为字符串id构建枚举列表,然后为文本构建偏移量的二进制文件。因为在我的应用程序中,用户可以随时更改语言,但是如果需要,您可以轻松地让预处理器为每种语言生成不同的输出文件。

The thing that I like about this design is that if any strings are missing then I would get a compile error whereas if strings were looked up at runtime then you might not know about a missing string in a seldom used part of the code until later.

我喜欢这个设计的事情是,如果缺少任何字符串,那么我会得到一个编译错误,而如果在运行时查找字符串,那么你可能不知道在很少使用的部分代码中缺少字符串,直到以后。

#7


0  

You want an advanced utility that I've always wanted to write but never had the time to. If you don't find such a tool, you may want to fallback on my CMsg() and CFMsg() wrapper classes that allow to very easily pull strings from the resource table. (CFMsg even provide a FormatMessage one-liner wrapper. And yes, in the absence of that tool you're looking for, keeping a copy of the string in comment is a good solution. Regarding desynchronisation of the comment, remember that string literals are very rarely changed.

你想要一个我一直想写的高级实用工具,但从来没有时间去做。如果您没有找到这样的工具,您可能希望回退我的CMsg()和CFMsg()包装类,它们可以非常轻松地从资源表中提取字符串。 (CFMsg甚至提供了一个FormatMessage单行包装器。是的,如果没有你正在寻找的工具,在注释中保留字符串的副本是一个很好的解决方案。关于注释的去同步,请记住字符串文字是很少改变。

http://www.codeproject.com/KB/string/stringtable.aspx

BTW, native Win32 programs and .NET programs have a totally different resource storage management. You'll have a hard time finding a common solution for both.

BTW,本机Win32程序和.NET程序具有完全不同的资源存储管理。你很难找到两者的通用解决方案。