使用C ++ Builder 5将UTF-8转换为WIN1252

时间:2023-01-06 13:40:03

I have to import some UTF-8 encoded text-file into my C++Builder 5 program. Are there any components or code samples to accomplish that?

我必须将一些UTF-8编码的文本文件导入到我的C ++ Builder 5程序中。是否有任何组件或代码示例可以实现?

4 个解决方案

#1


You are best off reading all the other questions on SO that are tagged unicode and c++. For starters you should probably look at this one and see whether library in the accepted answer (UTF8-CPP) works for you.

你最好阅读有关标记为unicode和c ++的所有其他问题。对于初学者,你应该看看这个,看看接受的答案(UTF8-CPP)中的库是否适合你。

I would however first think about what you're trying to achieve, as there is no way you can just import UTF-8-encoded strings into "Ansi" (what ever you mean by that, maybe something like ISO8859_1 or WIN1252 encoding?).

然而,我会首先考虑你想要实现的目标,因为你无法将UTF-8编码的字符串导入“Ansi”(你的意思是什么,也许像ISO8859_1或WIN1252编码?) 。

#2


Here is a more VCL-centric approach for you:

这是一个更加以VCL为中心的方法:

UTF8String utf8 = "...";
WideString utf16;
AnsiString latin1;

int len = ::MultiByteToWideChar(CP_UTF8, 0, utf8.c_str(), utf8.Length(), NULL, 0);
utf16.SetLength(len);
::MultiByteToWideChar(CP_UTF8, 0, utf8.c_str(), utf8.Length(), utf16.c_bstr(), len);

len = ::WideCharToMultiByte(1252, 0, utf16.c_bstr(), utf16.Length(), NULL, 0, NULL, NULL);
latin1.SetLength(len);
::WideCharToMultiByte(1252, 0, utf16.c_bstr(), utf16.Length(), latin1.c_str(), len, NULL, NULL);

If you upgrade to CB2009, you can simplify it to this:

如果升级到CB2009,可以将其简化为:

UTF8String utf8 = "...";
AnsiString<1252> latin1 = utf8;

#3


As there is no-one working on weekends, I have to answer it myself :)

由于没有人在周末工作,我必须自己回答:)

String Utf8ToWinLatin1(char* aData, char* aValue)
{
    int i=0;
    for(int j=0;j<strlen(aData);)
    {   int val=aData[j];
        int c=(unsigned char)aData[j];
        if(c<=127)
        {   aValue[i]=c;
            j+=1;                                  
            i++;
        }
        else if(c>=192 && c<=223)
        {
            aValue[i]=(c-192)*64 + (aData[j+1]-128);
            i++;
            j+=2;
        }
        else if(c>=224 && c<=239)
        {
            aValue[i]=( c-224)*4096 + (aData[j+1]-128)*64 + (aData[j+2]-128);
            i++;
            j+=3;
        }
        else if(c>=240 && c<=247)
        {
            aValue[i]=(c-240)*262144 + (aData[j+1]-128)*4096 + (aData[j+2]-128)*64 + (aData[j+3]-128);
            i++;
            j+=4;
        }
        else if(c>=248 && c<=251)
        {
            aValue[i]=(c-248)*16777216 + (aData[j+1]-128)*262144+ (aData[j+2]-128)*4096 + (aData[j+3]-128)*64 + (aData[j+4]-128);
            i++;
            j+=5;
        }
        else
            j+=1;
    }
    return aValue;
}

#4


Your question doesn't say specifically which character set you want to convert to. If you only want the basic 7-bit ASCII charset, discarding every character with a higher value than 127 will work.

您的问题没有明确说明要转换为哪个字符集。如果您只需要基本的7位ASCII字符集,则丢弃值高于127的每个字符都可以。

If you want to convert to a 8-bit character set, such as latin1, you'll have to do it the hard way.

如果你想转换成8位字符集,比如latin1,你就必须这么做。

#1


You are best off reading all the other questions on SO that are tagged unicode and c++. For starters you should probably look at this one and see whether library in the accepted answer (UTF8-CPP) works for you.

你最好阅读有关标记为unicode和c ++的所有其他问题。对于初学者,你应该看看这个,看看接受的答案(UTF8-CPP)中的库是否适合你。

I would however first think about what you're trying to achieve, as there is no way you can just import UTF-8-encoded strings into "Ansi" (what ever you mean by that, maybe something like ISO8859_1 or WIN1252 encoding?).

然而,我会首先考虑你想要实现的目标,因为你无法将UTF-8编码的字符串导入“Ansi”(你的意思是什么,也许像ISO8859_1或WIN1252编码?) 。

#2


Here is a more VCL-centric approach for you:

这是一个更加以VCL为中心的方法:

UTF8String utf8 = "...";
WideString utf16;
AnsiString latin1;

int len = ::MultiByteToWideChar(CP_UTF8, 0, utf8.c_str(), utf8.Length(), NULL, 0);
utf16.SetLength(len);
::MultiByteToWideChar(CP_UTF8, 0, utf8.c_str(), utf8.Length(), utf16.c_bstr(), len);

len = ::WideCharToMultiByte(1252, 0, utf16.c_bstr(), utf16.Length(), NULL, 0, NULL, NULL);
latin1.SetLength(len);
::WideCharToMultiByte(1252, 0, utf16.c_bstr(), utf16.Length(), latin1.c_str(), len, NULL, NULL);

If you upgrade to CB2009, you can simplify it to this:

如果升级到CB2009,可以将其简化为:

UTF8String utf8 = "...";
AnsiString<1252> latin1 = utf8;

#3


As there is no-one working on weekends, I have to answer it myself :)

由于没有人在周末工作,我必须自己回答:)

String Utf8ToWinLatin1(char* aData, char* aValue)
{
    int i=0;
    for(int j=0;j<strlen(aData);)
    {   int val=aData[j];
        int c=(unsigned char)aData[j];
        if(c<=127)
        {   aValue[i]=c;
            j+=1;                                  
            i++;
        }
        else if(c>=192 && c<=223)
        {
            aValue[i]=(c-192)*64 + (aData[j+1]-128);
            i++;
            j+=2;
        }
        else if(c>=224 && c<=239)
        {
            aValue[i]=( c-224)*4096 + (aData[j+1]-128)*64 + (aData[j+2]-128);
            i++;
            j+=3;
        }
        else if(c>=240 && c<=247)
        {
            aValue[i]=(c-240)*262144 + (aData[j+1]-128)*4096 + (aData[j+2]-128)*64 + (aData[j+3]-128);
            i++;
            j+=4;
        }
        else if(c>=248 && c<=251)
        {
            aValue[i]=(c-248)*16777216 + (aData[j+1]-128)*262144+ (aData[j+2]-128)*4096 + (aData[j+3]-128)*64 + (aData[j+4]-128);
            i++;
            j+=5;
        }
        else
            j+=1;
    }
    return aValue;
}

#4


Your question doesn't say specifically which character set you want to convert to. If you only want the basic 7-bit ASCII charset, discarding every character with a higher value than 127 will work.

您的问题没有明确说明要转换为哪个字符集。如果您只需要基本的7位ASCII字符集,则丢弃值高于127的每个字符都可以。

If you want to convert to a 8-bit character set, such as latin1, you'll have to do it the hard way.

如果你想转换成8位字符集,比如latin1,你就必须这么做。