如何通过ActiveX接口将MATLAB中的Unicode文本发送到Word文档中?

时间:2022-10-30 14:21:41

I'm using MATLAB to programmatically create a Microsoft Word document on Windows. In general this solution works fine, but it is having trouble with non-ASCII text. For example, take this code:

我正在使用MATLAB以编程方式在Windows上创建Microsoft Word文档。通常,此解决方案工作正常,但它与非ASCII文本有问题。例如,请使用以下代码:

wordApplication = actxserver('Word.Application');
wordApplication.Visible = 1;
wordApplication.Documents.Add;
selection = wordApplication.Selection;
umbrella = char(9730);
disp(umbrella)
selection.TypeText(umbrella)

The Command Window displays the umbrella character correctly, but the character in the Word document is the "question mark in a box" missing character symbol. I can cut-and-paste the character from the Command Window into Word, so that character is indeed available in that font.

命令窗口正确显示伞形字符,但Word文档中的字符是“框中的问号”缺少字符符号。我可以将命令窗口中的字符剪切并粘贴到Word中,以便该字符确实可用。

The TypeText method must be assuming ASCII. There are resources on how to set Unicode flags for similar operations from other languages, but I don't know how to translate them into the syntax I have available in MATLAB.

TypeText方法必须假定为ASCII。有关如何为其他语言的类似操作设置Unicode标志的资源,但我不知道如何将它们转换为我在MATLAB中提供的语法。

Clarification: My use case is sending an unknown Unicode string (char array), not just a single character. It would be ideal to be able to send it all at once. Here is better sample code:

澄清:我的用例是发送一个未知的Unicode字符串(char数组),而不仅仅是一个字符。能够一次发送所有内容是理想的。这是更好的示例代码:

% Define a string to send with a non-ASCII character.
umbrella = char(9730);
toSend = ['Have you seen my ' umbrella '?'];
disp(toSend)

% Open a new Word document.
wordApplication = actxserver('Word.Application');
wordApplication.Visible = 1;
wordApplication.Documents.Add;

% Send the text.
selection = wordApplication.Selection;
selection.TypeText(toSend)

I was hoping I could simply set the encoding of the document itself, but this doesn't seem to help:

我希望我可以简单地设置文档本身的编码,但这似乎没有帮助:

wordApplication = actxserver('Word.Application');
wordApplication.Visible = 1;
wordApplication.Documents.Add;
disp(wordApplication.ActiveDocument.TextEncoding)
wordApplication.ActiveDocument.TextEncoding = 65001;
disp(wordApplication.ActiveDocument.TextEncoding)
selection = wordApplication.Selection;
toSend = sprintf('Have you seen my \23002?');
selection.TypeText(toSend)

2 个解决方案

#1


Method 1. Valid for a single character (original question)

Taken from here:

取自这里:

umbrella = 9730; %// Unicode number of the desired character
selection.InsertSymbol(umbrella, '', true); %// true means use Unicode

The second argument specifies the font (so you could use 'Arial' etc), and '' apparently means use current font. The third argument 'true' means use Unicode.

第二个参数指定字体(所以你可以使用'Arial'等),''显然意味着使用当前字体。第三个参数'true'表示使用Unicode。

Method 2. Valid for a single character (original question)

A less direct way, taken from here:

从这里采取的不那么直接的方式:

umbrella = 9730; %// Unicode number of the desired character
selection.TypeText(dec2hex(umbrella));
selection.ToggleCharacterCode;

Method 3. Valid for a string (edited question)

You can work with a string at once if you don't mind using the clipboard:

如果您不介意使用剪贴板,则可以立即使用字符串:

umbrella = char(9730);
toSend = ['Have you seen my ' umbrella '?'];
clipboard('copy', toSend); %// copy the Unicode string contained in variable `toSend`
selection.Paste %// paste it onto the Word document

#2


I tried this as well, and got the same issue you reported (I tested with MATLAB R2015a and Office 2013)...

我也尝试了这个,并得到了你报告的相同问题(我用MATLAB R2015a和Office 2013测试过)...

I think something in the COM layer between MATLAB and Word is messing up the text encoding.

我认为MATLAB和Word之间的COM层中的某些东西搞乱了文本编码。

To confirm this is indeed a bug in MATLAB, I tried the same in Python, and it worked fine:

为了确认这确实是MATLAB中的一个错误,我在Python中尝试了同样的方法,它运行良好:

#!/usr/bin/env python

import os
import win32com.client

word = win32com.client.Dispatch("Word.Application")
word.Visible = True

doc = word.Documents.Add()

str = u"Have you seen my " + unichr(9730) + u"?"
word.Selection.TypeText(str)

fname = os.path.join(os.getcwd(), "out.docx")
doc.SaveAs2(fname)
doc.Close()

word.Quit()

I came up with two workarounds for MATLAB:

我想出了两个MATLAB的变通方法:

Method 1 (preferred):

The idea is to create a .NET assembly that uses Office Interop. It would receive any Unicode string and write it to some specified Word document. This assembly can then be loaded in MATLAB and used as a wrapper against MS Office.

我们的想法是创建一个使用Office Interop的.NET程序集。它将接收任何Unicode字符串并将其写入某些指定的Word文档。然后可以将此程序集加载到MATLAB中,并用作MS Office的包装程序。

Example in C#:

C#中的示例:

MSWord.cs

using System;
using Microsoft.Office.Interop.Word;

namespace MyOfficeInterop
{
    public class MSWord
    {
        // this is very basic, but you can expose anything you want!
        public void AppendTextToDocument(string filename, string str)
        {
            Application app = null;
            Document doc = null;
            try
            {
                app = new Application();
                doc = app.Documents.Open(filename);

                app.Selection.TypeText(str);
                app.Selection.TypeParagraph();

                doc.Save();
            }
            catch (Exception)
            {
                throw;
            }
            finally
            {
                doc.Close();
                app.Quit();
            }
        }
    }
}

We compile it first:

我们先编译它:

csc.exe /nologo /target:library /out:MyOfficeInterop.dll /reference:"C:\Program Files (x86)\Microsoft Visual Studio 12.0\Visual Studio Tools for Office\PIA\Office15\Microsoft.Office.Interop.Word.dll" MSWord.cs

Then we test it from MATLAB:

然后我们从MATLAB测试它:

%// load assembly
NET.addAssembly('C:\path\to\MyOfficeInterop.dll')

%// I am assuming the document file already exists
fname = fullfile(pwd,'test.docx');
fclose(fopen(fname,'w'));

%// some text
str = ['Have you seen my ' char(9730) '?'];

%// add text to Word document
word = MyOfficeInterop.MSWord();
word.AppendTextToDocument(fname, str);

Method 2:

This is more of a hack! We simply write the text in MATLAB directly to a text file (encoded correctly). Then we use COM/ActiveX interface to open it in MS Word, and re-save it as a proper .docx Word document.

这更像是一个黑客!我们只需将MATLAB中的文本直接写入文本文件(正确编码)。然后我们使用COM / ActiveX接口在MS Word中打开它,并将其重新保存为正确的.docx Word文档。

Example:

%// params
fnameTXT = fullfile(pwd,'test.txt');
fnameDOCX = fullfile(pwd,'test.docx');
str = ['Have you seen my ' char(9730) '?'];

%// create UTF-8 encoded text file
bytes = unicode2native(str, 'UTF-8');
fid = fopen(fnameTXT, 'wb');
fwrite(fid, bytes);
fclose(fid);

%// some office interop constants (extracted using IL DASM)
msoEncodingUTF8 = int32(hex2dec('0000FDE9'));         % MsoEncoding
wdOpenFormatUnicodeText = int32(hex2dec('00000005')); % WdOpenFormat
wdFormatDocumentDefault = int32(hex2dec('00000010')); % WdSaveFormat
wdDoNotSaveChanges = int32(hex2dec('00000000'));      % WdSaveOptions

%// start MS Word 
Word = actxserver('Word.Application');
%Word.Visible = true;

%// open text file in MS Word
doc = Word.Documents.Open(...
    fnameTXT, ...                % FileName
    [], ...                      % ConfirmConversions
    [], ...                      % ReadOnly
    [], ...                      % AddToRecentFiles
    [], ...                      % PasswordDocument
    [], ...                      % PasswordTemplate
    [], ...                      % Revert
    [], ...                      % WritePasswordDocument
    [], ...                      % WritePasswordTemplate
    wdOpenFormatUnicodeText, ... % Format
    msoEncodingUTF8, ...         % Encoding
    [], ...                      % Visible
    [], ...                      % OpenAndRepair
    [], ...                      % DocumentDirection
    [], ...                      % NoEncodingDialog
    []);                         % XMLTransform

%// save it as docx
doc.SaveAs2(...
    fnameDOCX, ...               % FileName
    wdFormatDocumentDefault, ... % FileFormat
    [], ...                      % LockComments
    [], ...                      % Password
    [], ...                      % AddToRecentFiles
    [], ...                      % WritePassword
    [], ...                      % ReadOnlyRecommended
    [], ...                      % EmbedTrueTypeFonts
    [], ...                      % SaveNativePictureFormat
    [], ...                      % SaveFormsData
    [], ...                      % SaveAsAOCELetter
    msoEncodingUTF8, ...         % Encoding
    [], ...                      % InsertLineBreaks
    [], ...                      % AllowSubstitutions
    [], ...                      % LineEnding
    [], ...                      % AddBiDiMarks
    []),                         % CompatibilityMode

%// close doc, quit, and cleanup
doc.Close(wdDoNotSaveChanges, [], [])
Word.Quit()
clear doc Word

#1


Method 1. Valid for a single character (original question)

Taken from here:

取自这里:

umbrella = 9730; %// Unicode number of the desired character
selection.InsertSymbol(umbrella, '', true); %// true means use Unicode

The second argument specifies the font (so you could use 'Arial' etc), and '' apparently means use current font. The third argument 'true' means use Unicode.

第二个参数指定字体(所以你可以使用'Arial'等),''显然意味着使用当前字体。第三个参数'true'表示使用Unicode。

Method 2. Valid for a single character (original question)

A less direct way, taken from here:

从这里采取的不那么直接的方式:

umbrella = 9730; %// Unicode number of the desired character
selection.TypeText(dec2hex(umbrella));
selection.ToggleCharacterCode;

Method 3. Valid for a string (edited question)

You can work with a string at once if you don't mind using the clipboard:

如果您不介意使用剪贴板,则可以立即使用字符串:

umbrella = char(9730);
toSend = ['Have you seen my ' umbrella '?'];
clipboard('copy', toSend); %// copy the Unicode string contained in variable `toSend`
selection.Paste %// paste it onto the Word document

#2


I tried this as well, and got the same issue you reported (I tested with MATLAB R2015a and Office 2013)...

我也尝试了这个,并得到了你报告的相同问题(我用MATLAB R2015a和Office 2013测试过)...

I think something in the COM layer between MATLAB and Word is messing up the text encoding.

我认为MATLAB和Word之间的COM层中的某些东西搞乱了文本编码。

To confirm this is indeed a bug in MATLAB, I tried the same in Python, and it worked fine:

为了确认这确实是MATLAB中的一个错误,我在Python中尝试了同样的方法,它运行良好:

#!/usr/bin/env python

import os
import win32com.client

word = win32com.client.Dispatch("Word.Application")
word.Visible = True

doc = word.Documents.Add()

str = u"Have you seen my " + unichr(9730) + u"?"
word.Selection.TypeText(str)

fname = os.path.join(os.getcwd(), "out.docx")
doc.SaveAs2(fname)
doc.Close()

word.Quit()

I came up with two workarounds for MATLAB:

我想出了两个MATLAB的变通方法:

Method 1 (preferred):

The idea is to create a .NET assembly that uses Office Interop. It would receive any Unicode string and write it to some specified Word document. This assembly can then be loaded in MATLAB and used as a wrapper against MS Office.

我们的想法是创建一个使用Office Interop的.NET程序集。它将接收任何Unicode字符串并将其写入某些指定的Word文档。然后可以将此程序集加载到MATLAB中,并用作MS Office的包装程序。

Example in C#:

C#中的示例:

MSWord.cs

using System;
using Microsoft.Office.Interop.Word;

namespace MyOfficeInterop
{
    public class MSWord
    {
        // this is very basic, but you can expose anything you want!
        public void AppendTextToDocument(string filename, string str)
        {
            Application app = null;
            Document doc = null;
            try
            {
                app = new Application();
                doc = app.Documents.Open(filename);

                app.Selection.TypeText(str);
                app.Selection.TypeParagraph();

                doc.Save();
            }
            catch (Exception)
            {
                throw;
            }
            finally
            {
                doc.Close();
                app.Quit();
            }
        }
    }
}

We compile it first:

我们先编译它:

csc.exe /nologo /target:library /out:MyOfficeInterop.dll /reference:"C:\Program Files (x86)\Microsoft Visual Studio 12.0\Visual Studio Tools for Office\PIA\Office15\Microsoft.Office.Interop.Word.dll" MSWord.cs

Then we test it from MATLAB:

然后我们从MATLAB测试它:

%// load assembly
NET.addAssembly('C:\path\to\MyOfficeInterop.dll')

%// I am assuming the document file already exists
fname = fullfile(pwd,'test.docx');
fclose(fopen(fname,'w'));

%// some text
str = ['Have you seen my ' char(9730) '?'];

%// add text to Word document
word = MyOfficeInterop.MSWord();
word.AppendTextToDocument(fname, str);

Method 2:

This is more of a hack! We simply write the text in MATLAB directly to a text file (encoded correctly). Then we use COM/ActiveX interface to open it in MS Word, and re-save it as a proper .docx Word document.

这更像是一个黑客!我们只需将MATLAB中的文本直接写入文本文件(正确编码)。然后我们使用COM / ActiveX接口在MS Word中打开它,并将其重新保存为正确的.docx Word文档。

Example:

%// params
fnameTXT = fullfile(pwd,'test.txt');
fnameDOCX = fullfile(pwd,'test.docx');
str = ['Have you seen my ' char(9730) '?'];

%// create UTF-8 encoded text file
bytes = unicode2native(str, 'UTF-8');
fid = fopen(fnameTXT, 'wb');
fwrite(fid, bytes);
fclose(fid);

%// some office interop constants (extracted using IL DASM)
msoEncodingUTF8 = int32(hex2dec('0000FDE9'));         % MsoEncoding
wdOpenFormatUnicodeText = int32(hex2dec('00000005')); % WdOpenFormat
wdFormatDocumentDefault = int32(hex2dec('00000010')); % WdSaveFormat
wdDoNotSaveChanges = int32(hex2dec('00000000'));      % WdSaveOptions

%// start MS Word 
Word = actxserver('Word.Application');
%Word.Visible = true;

%// open text file in MS Word
doc = Word.Documents.Open(...
    fnameTXT, ...                % FileName
    [], ...                      % ConfirmConversions
    [], ...                      % ReadOnly
    [], ...                      % AddToRecentFiles
    [], ...                      % PasswordDocument
    [], ...                      % PasswordTemplate
    [], ...                      % Revert
    [], ...                      % WritePasswordDocument
    [], ...                      % WritePasswordTemplate
    wdOpenFormatUnicodeText, ... % Format
    msoEncodingUTF8, ...         % Encoding
    [], ...                      % Visible
    [], ...                      % OpenAndRepair
    [], ...                      % DocumentDirection
    [], ...                      % NoEncodingDialog
    []);                         % XMLTransform

%// save it as docx
doc.SaveAs2(...
    fnameDOCX, ...               % FileName
    wdFormatDocumentDefault, ... % FileFormat
    [], ...                      % LockComments
    [], ...                      % Password
    [], ...                      % AddToRecentFiles
    [], ...                      % WritePassword
    [], ...                      % ReadOnlyRecommended
    [], ...                      % EmbedTrueTypeFonts
    [], ...                      % SaveNativePictureFormat
    [], ...                      % SaveFormsData
    [], ...                      % SaveAsAOCELetter
    msoEncodingUTF8, ...         % Encoding
    [], ...                      % InsertLineBreaks
    [], ...                      % AllowSubstitutions
    [], ...                      % LineEnding
    [], ...                      % AddBiDiMarks
    []),                         % CompatibilityMode

%// close doc, quit, and cleanup
doc.Close(wdDoNotSaveChanges, [], [])
Word.Quit()
clear doc Word