如何使用格式化的open xml将docx转换成html文件?

时间:2021-08-17 06:20:21

I know there are lot of question having same title but I am currently having some issue for them I didn't get the correct way to go.

我知道有很多相同的题目,但是我现在有一些问题,我没有找到正确的方法。

I am using Open xml sdk 2.5 along with Power tool to convert .docx file to .html file which uses HtmlConverter class for conversion.

我正在使用Open xml sdk 2.5和Power工具将.docx文件转换为.html文件,该文件使用HtmlConverter类进行转换。

I am successfully able to convert the docx file into the Html file but the problem is, html file doesn't retain the original formatting of the document file. eg. Font-size,color,underline,bold etc doesn't reflect into the html file.

我成功地将docx文件转换成Html文件,但问题是,Html文件不保留文档文件的原始格式。如。字体大小、颜色、下划线、粗体等不会反映到html文件中。

Here is my existing code:

以下是我现有的代码:

public void ConvertDocxToHtml(string fileName)
{
   byte[] byteArray = File.ReadAllBytes(fileName);
   using (MemoryStream memoryStream = new MemoryStream())
   {
      memoryStream.Write(byteArray, 0, byteArray.Length);
      using (WordprocessingDocument doc = WordprocessingDocument.Open(memoryStream, true))
      {
         HtmlConverterSettings settings = new HtmlConverterSettings()
         {
            PageTitle = "My Page Title"
         };
         XElement html = HtmlConverter.ConvertToHtml(doc, settings);
         File.WriteAllText(@"E:\Test.html", html.ToStringNewLineOnAttributes());
      }
    }
 }

So I just want to know if is there any way by which I can retain the formatting in converted HTML file.

我只是想知道是否有办法在转换后的HTML文件中保留格式。

I know about some third party APIs which does the same thing. But I would prefer if there any way using open xml or any other open source to do this.

我知道一些第三方api也做同样的事情。但我更希望使用开放xml或其他开放源代码来实现这一点。

4 个解决方案

#1


6  

PowerTools for Open XML just released a new HtmlConverter module. It now contains an open source, free implementation of a conversion from DOCX to HTML formatted with CSS. The module HtmlConverter.cs supports all paragraph, character, and table styles, fonts and text formatting, numbered and bulleted lists, images, and more. See http://bit.ly/1bclyg9

打开XML的PowerTools刚刚发布了一个新的HtmlConverter模块。它现在包含了一个开源的、免费的实现,从DOCX转换到用CSS格式化的HTML。模块HtmlConverter。cs支持所有的段落、字符和表格样式、字体和文本格式、编号和项目符号列表、图像等等。参见http://bit.ly/1bclyg9

#2


1  

You might want to find an external tool to help you do this, like Aspose Words

您可能希望找到一个外部工具来帮助您实现这一点,比如Aspose word

#3


0  

Your end result will not look exactly the way your Word Document turns out, but this link might help.

最终结果与Word文档的结果并不完全相同,但是这个链接可能会有所帮助。

#4


0  

You can use OpenXML Viewer extension for Firefox for Converting with formatting. http://openxmlviewer.codeplex.com This works for me. Hope this helps.

您可以使用Firefox的OpenXML查看器扩展来进行格式转换。这对我很有用。希望这个有帮助。

#1


6  

PowerTools for Open XML just released a new HtmlConverter module. It now contains an open source, free implementation of a conversion from DOCX to HTML formatted with CSS. The module HtmlConverter.cs supports all paragraph, character, and table styles, fonts and text formatting, numbered and bulleted lists, images, and more. See http://bit.ly/1bclyg9

打开XML的PowerTools刚刚发布了一个新的HtmlConverter模块。它现在包含了一个开源的、免费的实现,从DOCX转换到用CSS格式化的HTML。模块HtmlConverter。cs支持所有的段落、字符和表格样式、字体和文本格式、编号和项目符号列表、图像等等。参见http://bit.ly/1bclyg9

#2


1  

You might want to find an external tool to help you do this, like Aspose Words

您可能希望找到一个外部工具来帮助您实现这一点,比如Aspose word

#3


0  

Your end result will not look exactly the way your Word Document turns out, but this link might help.

最终结果与Word文档的结果并不完全相同,但是这个链接可能会有所帮助。

#4


0  

You can use OpenXML Viewer extension for Firefox for Converting with formatting. http://openxmlviewer.codeplex.com This works for me. Hope this helps.

您可以使用Firefox的OpenXML查看器扩展来进行格式转换。这对我很有用。希望这个有帮助。