如何使用C#中的help memoryStream将带有图像和方程的ms word文件转换为html

时间:2022-10-30 15:26:59

I am using as below coding and it is working fine. These programming convert word file into html file with image.

我使用如下编码,它工作正常。这些编程将word文件转换为带有图像的html文件。

There is problem in equation.I am unable to convert ms word file equation HTML.

方程式中存在问题。我无法转换ms word文件方程式HTML。

Can anybody help?

有人可以帮忙吗?

FileUpload1.SaveAs(Server.MapPath(FileUpload1.FileName));

string imageDirectoryName = FileUpload1.FileName + "_files";
DirectoryInfo dirInfo = new DirectoryInfo(Server.MapPath(imageDirectoryName));

if (dirInfo.Exists)
{
        // Delete the directory and files.
        foreach (var f in dirInfo.GetFiles())
            f.Delete();
        dirInfo.Delete();
}

int imageCounter = 0;

byte[] byteArray = File.ReadAllBytes(sourceDocumentFileName);

using (MemoryStream memoryStream = new MemoryStream())
{
        memoryStream.Write(byteArray, 0, byteArray.Length);
        using (WordprocessingDocument doc =
            WordprocessingDocument.Open(memoryStream, true))
        {
            HtmlConverterSettings settings = new HtmlConverterSettings()
            {
                //PageTitle = "Test Title",
                //ConvertFormatting = false,
            };
            XElement html = HtmlConverter.ConvertToHtml(doc, settings,
                imageInfo =>
                {
                    DirectoryInfo localDirInfo = new DirectoryInfo(Server.MapPath(imageDirectoryName));
                    if (!localDirInfo.Exists)
                        localDirInfo.Create();
                    ++imageCounter;
                    string extension = imageInfo.ContentType.Split('/')[1].ToLower();
                    ImageFormat imageFormat = null;
                    if (extension == "png")
                    {
                        // Convert the .png file to a .jpeg file.
                        extension = "jpeg";
                        imageFormat = ImageFormat.Jpeg;
                    }
                    else if (extension == "bmp")
                        imageFormat = ImageFormat.Bmp;
                    else if (extension == "jpeg")
                        imageFormat = ImageFormat.Jpeg;
                    else if (extension == "tiff")
                        imageFormat = ImageFormat.Tiff;
                    else if (extension == "wmf")
                        imageFormat = ImageFormat.Jpeg;
                    else if (extension == "png")
                        imageFormat = ImageFormat.Png;


                    // If the image format is not one that you expect, ignore it,
                    // and do not return markup for the link.
                    if (imageFormat == null)
                        return null;

                    string imageFileName = imageDirectoryName + "/image" +
                        imageCounter.ToString() + "." + extension;
                    try
                    {
                        imageInfo.Bitmap.Save(Server.MapPath(imageFileName), imageFormat);
                    }
                    catch (System.Runtime.InteropServices.ExternalException)
                    {
                        return null;
                    }
                    XElement img = new XElement(Xhtml.img,
                        new XAttribute(NoNamespace.src, imageFileName),
                        imageInfo.ImgStyleAttribute,
                        imageInfo.AltText != null ?
                            new XAttribute(NoNamespace.alt, imageInfo.AltText) : null);
                    return img;
                });
            File.WriteAllText(fileInfo.Directory.FullName + "/" + fileInfo.Name.Substring(0,
                fileInfo.Name.Length - fileInfo.Extension.Length) + ".html",
                html.ToStringNewLineOnAttributes());
        }
}

1 个解决方案

#1


0  

Step 1 - You should go here to get understand how to get Math object in word file here

第1步 - 您应该到这里了解如何在word文件中获取Math对象

Step 2 - Loop through Paragraphs of word file and select OfficeMath object in it, transform it to MathML (see step 1), and can transform to LaTex if you want (I think use LaTex will be friendly when use in HTML)

第2步 - 循环遍历word文件的Paragraphs并在其中选择OfficeMath对象,将其转换为MathML(参见步骤1),如果需要可以转换为LaTex(我认为在HTML中使用时使用LaTex会很友好)

Note: Transform to LaTex will be similar when Transform from MMOL2MML in step 1 see here to get file

注意:当在步骤1中从MMOL2MML转换到此处获取文件时,转换为LaTex将类似

Step 3 - Insert befor/after object in step 2 a text object with content is MathML/LaTex (in step 2). Use this step because when use HtmlConverter.ConvertToHtml will miss math object in Word content so when you insert before/after object math a text will be available in HTML

步骤3 - 在步骤2中插入befor / after对象,内容为MathML / LaTex的文本对象(步骤2)。使用此步骤是因为当使用HtmlConverter.ConvertToHtml时会错过Word内容中的数学对象,因此当您在对象数学之前/之后插入时,HTML中将提供一个文本

This is my code:

这是我的代码:

using (WordprocessingDocument doc = WordprocessingDocument.Open(docFilePath, true))
        {
            foreach (var paragraph in doc.MainDocumentPart.RootElement.Descendants<Paragraph>())
            {
                foreach (var ele in paragraph.Descendants<DocumentFormat.OpenXml.Math.OfficeMath>())
                {
                    string wordDocXml = ele.OuterXml;

                    XslCompiledTransform xslTransform = new XslCompiledTransform();
                    xslTransform.Load(officeMathMLSchemaFilePath);
                    var result = "";
                    using (TextReader tr = new StringReader(wordDocXml))
                    {
                        // Load the xml of your main document part.
                        using (XmlReader reader = XmlReader.Create(tr))
                        {
                            using (MemoryStream ms = new MemoryStream())
                            {
                                XmlWriterSettings settings = xslTransform.OutputSettings.Clone();

                                // Configure xml writer to omit xml declaration.
                                settings.ConformanceLevel = ConformanceLevel.Fragment;
                                settings.OmitXmlDeclaration = true;

                                XmlWriter xw = XmlWriter.Create(ms, settings);

                                // Transform our OfficeMathML to MathML.
                                xslTransform.Transform(reader, xw);
                                ms.Seek(0, SeekOrigin.Begin);

                                using (StreamReader sr = new StreamReader(ms, Encoding.UTF8))
                                {
                                    result = MathML2Latex(sr.ReadToEnd());
                                    officeMLFormulas.Add(result);
                                }
                            }
                        }
                    }

                    Run run = new Run();
                    run.Append(new Text(result));
                    ele.InsertBeforeSelf(run);
                }
            }
        }

#1


0  

Step 1 - You should go here to get understand how to get Math object in word file here

第1步 - 您应该到这里了解如何在word文件中获取Math对象

Step 2 - Loop through Paragraphs of word file and select OfficeMath object in it, transform it to MathML (see step 1), and can transform to LaTex if you want (I think use LaTex will be friendly when use in HTML)

第2步 - 循环遍历word文件的Paragraphs并在其中选择OfficeMath对象,将其转换为MathML(参见步骤1),如果需要可以转换为LaTex(我认为在HTML中使用时使用LaTex会很友好)

Note: Transform to LaTex will be similar when Transform from MMOL2MML in step 1 see here to get file

注意:当在步骤1中从MMOL2MML转换到此处获取文件时,转换为LaTex将类似

Step 3 - Insert befor/after object in step 2 a text object with content is MathML/LaTex (in step 2). Use this step because when use HtmlConverter.ConvertToHtml will miss math object in Word content so when you insert before/after object math a text will be available in HTML

步骤3 - 在步骤2中插入befor / after对象,内容为MathML / LaTex的文本对象(步骤2)。使用此步骤是因为当使用HtmlConverter.ConvertToHtml时会错过Word内容中的数学对象,因此当您在对象数学之前/之后插入时,HTML中将提供一个文本

This is my code:

这是我的代码:

using (WordprocessingDocument doc = WordprocessingDocument.Open(docFilePath, true))
        {
            foreach (var paragraph in doc.MainDocumentPart.RootElement.Descendants<Paragraph>())
            {
                foreach (var ele in paragraph.Descendants<DocumentFormat.OpenXml.Math.OfficeMath>())
                {
                    string wordDocXml = ele.OuterXml;

                    XslCompiledTransform xslTransform = new XslCompiledTransform();
                    xslTransform.Load(officeMathMLSchemaFilePath);
                    var result = "";
                    using (TextReader tr = new StringReader(wordDocXml))
                    {
                        // Load the xml of your main document part.
                        using (XmlReader reader = XmlReader.Create(tr))
                        {
                            using (MemoryStream ms = new MemoryStream())
                            {
                                XmlWriterSettings settings = xslTransform.OutputSettings.Clone();

                                // Configure xml writer to omit xml declaration.
                                settings.ConformanceLevel = ConformanceLevel.Fragment;
                                settings.OmitXmlDeclaration = true;

                                XmlWriter xw = XmlWriter.Create(ms, settings);

                                // Transform our OfficeMathML to MathML.
                                xslTransform.Transform(reader, xw);
                                ms.Seek(0, SeekOrigin.Begin);

                                using (StreamReader sr = new StreamReader(ms, Encoding.UTF8))
                                {
                                    result = MathML2Latex(sr.ReadToEnd());
                                    officeMLFormulas.Add(result);
                                }
                            }
                        }
                    }

                    Run run = new Run();
                    run.Append(new Text(result));
                    ele.InsertBeforeSelf(run);
                }
            }
        }