如何将一个文本文件从ANSI转换为UTF-8和Delphi 7?

时间:2023-01-06 11:06:44

I written a program with Delphi 7 which searches *.srt files on a hard drive. This program lists the path and name of these files in a memo. Now I need convert these files from ANSI to UTF-8, but I haven't succeeded.

我用Delphi 7编写了一个搜索*的程序。srt文件在硬盘上。这个程序在备忘录中列出了这些文件的路径和名称。现在我需要将这些文件从ANSI转换为UTF-8,但是我还没有成功。

5 个解决方案

#1


9  

The Utf8Encode function takes a WideString string as parameter and returns a Utf-8 string.

Utf8Encode函数以一个WideString字符串作为参数,并返回Utf-8字符串。

Sample:

示例:

procedure ConvertANSIFileToUTF8File(AInputFileName, AOutputFileName: TFileName);
var
  Strings: TStrings;
begin
  Strings := TStringList.Create;
  try
    Strings.LoadFromFile(AInputFileName);
    Strings.Text := UTF8Encode(Strings.Text);
    Strings.SaveToFile(AOutputFileName);
  finally
    Strings.Free;
  end;
end;

#2


1  

Take a look at GpTextStream which looks like it works with Delphi 7. It has the ability to read/write unicode files in older versions of Delphi (although does work with Delphi 2009) and should help with your conversion.

看一下GpTextStream,它看起来和Delphi 7兼容。它能够在旧版本的Delphi中读写unicode文件(虽然确实可以使用Delphi 2009),并且应该有助于您的转换。

#3


0  

var
  Latin1Encoding: TEncoding;
begin
  Latin1Encoding := TEncoding.GetEncoding(28591);
  try
       MyTStringList.SaveToFile('some file.txt', Latin1Encoding);
  finally
      Latin1Encoding.Free;
  end;
end;

#4


0  

Please read the whole answer before you start coding.


The proper answer to question - and it is not the easy one - basically consist of tree steps:

对问题的正确答案——而不是简单的——基本上由树木的步骤组成:

  1. You have to determine the ANSI code page used on your computer. You can achieve this goal by using the GetACP() function from Windows API. (Important: you have to retrieve the codepage as soon as possible after the file name retrieval, because it can be changed by the user.)
  2. 您必须确定在您的计算机上使用的ANSI代码页。您可以通过使用Windows API中的GetACP()函数来实现这个目标。(重要的是:在获取文件名之后,您必须尽快检索代码页,因为用户可以对其进行更改。)
  3. You must convert your ANSI string to Unicode by calling MultiByteToWideChar() Windows API function with the correct CodePage parameter (retrieved in the previous step). After this step you have an UTF-16 string (practically a WideString) containing the file name list.
  4. 您必须通过调用MultiByteToWideChar() Windows API函数,使用正确的代码页参数(在前面的步骤中检索)将ANSI字符串转换为Unicode。在此步骤之后,您将拥有一个包含文件名列表的UTF-16字符串(实际上是一个WideString)。
  5. You have to convert the Unicode string to UTF-8 using UTF8Encode() or the WideCharToMultiByte() Windows API. This function will return an UTF-8 string you needed.
  6. 您必须使用UTF8Encode()或WideCharToMultiByte() Windows API将Unicode字符串转换为UTF-8。此函数将返回所需的UTF-8字符串。

However this solution will return an UTF-8 string containing the input ANSI string, this probably is not the best way to solve your problems, since the file names may already be corrupted when the ANSI functions returned them, so proper file names are not guaranteed.

但是,这个解决方案将返回一个包含输入ANSI字符串的UTF-8字符串,这可能不是解决问题的最佳方法,因为当ANSI函数返回文件名时,文件名可能已经被破坏,因此不能保证正确的文件名。


The proper solution to your problem is ways more complicated:

解决你的问题的正确方法是更复杂的:

If you want to be sure that your file name list is exactly clean, you have to make sure it won't get converted to ANSI at all. You can do this by explicitly using the "W" version of the file handling API's. In this case - of course - you can not use TFileStream and other ANSI file handling objects, but the Windows API calls directly.

如果您想要确保您的文件名列表是完全干净的,您必须确保它不会被转换为ANSI。可以通过使用文件处理API的“W”版本显式地实现这一点。在这种情况下—当然—您不能使用TFileStream和其他ANSI文件处理对象,但是Windows API直接调用。

It is not that hard, but if you already have a complex framework built on e.g. TFileStream it could be a bit of a pain in the @ss. In this case the best solution is to create a TStream descendant that uses the appropriate API's.

这并不难,但是如果您已经有了一个构建在TFileStream上的复杂框架,那么在@ss中可能有点麻烦。在这种情况下,最好的解决方案是创建一个使用适当API的TStream后代。

I hope my answer helps you or anyone who has to deal with the same problem. (I had to not so long ago.)

我希望我的答案能帮助你或任何不得不处理同样问题的人。(不久前我不得不这么做。)

#5


-1  

Did you mean ASCII?

你的意思是ASCII吗?

ASCII is backwards compatible with UTF-8. http://en.wikipedia.org/wiki/UTF-8

ASCII是向后兼容UTF-8的。http://en.wikipedia.org/wiki/UTF-8

#1


9  

The Utf8Encode function takes a WideString string as parameter and returns a Utf-8 string.

Utf8Encode函数以一个WideString字符串作为参数,并返回Utf-8字符串。

Sample:

示例:

procedure ConvertANSIFileToUTF8File(AInputFileName, AOutputFileName: TFileName);
var
  Strings: TStrings;
begin
  Strings := TStringList.Create;
  try
    Strings.LoadFromFile(AInputFileName);
    Strings.Text := UTF8Encode(Strings.Text);
    Strings.SaveToFile(AOutputFileName);
  finally
    Strings.Free;
  end;
end;

#2


1  

Take a look at GpTextStream which looks like it works with Delphi 7. It has the ability to read/write unicode files in older versions of Delphi (although does work with Delphi 2009) and should help with your conversion.

看一下GpTextStream,它看起来和Delphi 7兼容。它能够在旧版本的Delphi中读写unicode文件(虽然确实可以使用Delphi 2009),并且应该有助于您的转换。

#3


0  

var
  Latin1Encoding: TEncoding;
begin
  Latin1Encoding := TEncoding.GetEncoding(28591);
  try
       MyTStringList.SaveToFile('some file.txt', Latin1Encoding);
  finally
      Latin1Encoding.Free;
  end;
end;

#4


0  

Please read the whole answer before you start coding.


The proper answer to question - and it is not the easy one - basically consist of tree steps:

对问题的正确答案——而不是简单的——基本上由树木的步骤组成:

  1. You have to determine the ANSI code page used on your computer. You can achieve this goal by using the GetACP() function from Windows API. (Important: you have to retrieve the codepage as soon as possible after the file name retrieval, because it can be changed by the user.)
  2. 您必须确定在您的计算机上使用的ANSI代码页。您可以通过使用Windows API中的GetACP()函数来实现这个目标。(重要的是:在获取文件名之后,您必须尽快检索代码页,因为用户可以对其进行更改。)
  3. You must convert your ANSI string to Unicode by calling MultiByteToWideChar() Windows API function with the correct CodePage parameter (retrieved in the previous step). After this step you have an UTF-16 string (practically a WideString) containing the file name list.
  4. 您必须通过调用MultiByteToWideChar() Windows API函数,使用正确的代码页参数(在前面的步骤中检索)将ANSI字符串转换为Unicode。在此步骤之后,您将拥有一个包含文件名列表的UTF-16字符串(实际上是一个WideString)。
  5. You have to convert the Unicode string to UTF-8 using UTF8Encode() or the WideCharToMultiByte() Windows API. This function will return an UTF-8 string you needed.
  6. 您必须使用UTF8Encode()或WideCharToMultiByte() Windows API将Unicode字符串转换为UTF-8。此函数将返回所需的UTF-8字符串。

However this solution will return an UTF-8 string containing the input ANSI string, this probably is not the best way to solve your problems, since the file names may already be corrupted when the ANSI functions returned them, so proper file names are not guaranteed.

但是,这个解决方案将返回一个包含输入ANSI字符串的UTF-8字符串,这可能不是解决问题的最佳方法,因为当ANSI函数返回文件名时,文件名可能已经被破坏,因此不能保证正确的文件名。


The proper solution to your problem is ways more complicated:

解决你的问题的正确方法是更复杂的:

If you want to be sure that your file name list is exactly clean, you have to make sure it won't get converted to ANSI at all. You can do this by explicitly using the "W" version of the file handling API's. In this case - of course - you can not use TFileStream and other ANSI file handling objects, but the Windows API calls directly.

如果您想要确保您的文件名列表是完全干净的,您必须确保它不会被转换为ANSI。可以通过使用文件处理API的“W”版本显式地实现这一点。在这种情况下—当然—您不能使用TFileStream和其他ANSI文件处理对象,但是Windows API直接调用。

It is not that hard, but if you already have a complex framework built on e.g. TFileStream it could be a bit of a pain in the @ss. In this case the best solution is to create a TStream descendant that uses the appropriate API's.

这并不难,但是如果您已经有了一个构建在TFileStream上的复杂框架,那么在@ss中可能有点麻烦。在这种情况下,最好的解决方案是创建一个使用适当API的TStream后代。

I hope my answer helps you or anyone who has to deal with the same problem. (I had to not so long ago.)

我希望我的答案能帮助你或任何不得不处理同样问题的人。(不久前我不得不这么做。)

#5


-1  

Did you mean ASCII?

你的意思是ASCII吗?

ASCII is backwards compatible with UTF-8. http://en.wikipedia.org/wiki/UTF-8

ASCII是向后兼容UTF-8的。http://en.wikipedia.org/wiki/UTF-8