用于将Word文档文本转换为HTML的库

时间:2022-10-30 13:06:27

Is there a .Net open source library to convert the word dococument to HTML to display inside the webpage.

是否有.Net开源库将单词dococument转换为HTML以显示在网页内。

I know several tools to convert word docs to html files, but my requirements is to convert the doc(either from the file or just extracted text) to HTML on the fly in the ASP.Net application.

我知道几种将word文档转换为html文件的工具,但我的要求是在ASP.Net应用程序中将文档(从文件或刚刚提取的文本)转换为HTML。

I found the converting-a-word-document-into-usable-html-in-php PHP library do the same thing, is there any similar tool in .net?

我发现转换-a-word-document-into-useful-html-in-php PHP库做同样的事情,.net中是否有类似的工具?

2 个解决方案

#1


2  

You just want to convert a *.doc file to HTML? Is saving it as a a HTML file an option?

您只想将* .doc文件转换为HTML?将它保存为HTML文件是一种选择吗?

There is the standard .SaveAs method which has the option to save as HTML:

有标准.SaveAs方法,可以选择保存为HTML:

wdFormatHTML Saves all text and formatting with HTML tags so that the resulting document can be viewed in a Web browser.

wdFormatHTML使用HTML标记保存所有文本和格式,以便可以在Web浏览器中查看生成的文档。

from: MSDN SaveAs Method

来自:MSDN SaveAs方法

An example tutorial on how to use the method to convert .doc to a different format you can find here: How to convert DOC into other formats using C#.

有关如何使用该方法将.doc转换为不同格式的示例教程,您可以在此处找到:如何使用C#将DOC转换为其他格式。

If you have *.docx files instead of *.doc files it is even easier because you get to use the OpenXML API like explained on MSDN here: Manipulating Word 2007 Files with the Open XML Format API (Part 1 of 3). And if you get the XML of the Word file you can of course output it to any format (HTML) you want.

如果你有* .docx文件而不是* .doc文件,那么它就更容易了,因为你可以像在MSDN上解释的那样使用OpenXML API:使用Open XML Format API处理Word 2007文件(第1部分,共3部分)。如果您获得Word文件的XML,您当然可以将其输出为您想要的任何格式(HTML)。

#2


1  

Convert your doc files to pdf with the help of JOdConverter and OpenOffice

在JOdConverter和OpenOffice的帮助下将doc文件转换为pdf

See How to convert ppt to images in Ruby? for reference

请参阅如何将ppt转换为Ruby中的图像?以供参考

and then use pdftohtml (http://pdftohtml.sourceforge.net) a utility which converts PDF files into HTML.

然后使用pdftohtml(http://pdftohtml.sourceforge.net)将PDF文件转换为HTML的实用程序。

You will get amazing results.

你会得到惊人的结果。

#1


2  

You just want to convert a *.doc file to HTML? Is saving it as a a HTML file an option?

您只想将* .doc文件转换为HTML?将它保存为HTML文件是一种选择吗?

There is the standard .SaveAs method which has the option to save as HTML:

有标准.SaveAs方法,可以选择保存为HTML:

wdFormatHTML Saves all text and formatting with HTML tags so that the resulting document can be viewed in a Web browser.

wdFormatHTML使用HTML标记保存所有文本和格式,以便可以在Web浏览器中查看生成的文档。

from: MSDN SaveAs Method

来自:MSDN SaveAs方法

An example tutorial on how to use the method to convert .doc to a different format you can find here: How to convert DOC into other formats using C#.

有关如何使用该方法将.doc转换为不同格式的示例教程,您可以在此处找到:如何使用C#将DOC转换为其他格式。

If you have *.docx files instead of *.doc files it is even easier because you get to use the OpenXML API like explained on MSDN here: Manipulating Word 2007 Files with the Open XML Format API (Part 1 of 3). And if you get the XML of the Word file you can of course output it to any format (HTML) you want.

如果你有* .docx文件而不是* .doc文件,那么它就更容易了,因为你可以像在MSDN上解释的那样使用OpenXML API:使用Open XML Format API处理Word 2007文件(第1部分,共3部分)。如果您获得Word文件的XML,您当然可以将其输出为您想要的任何格式(HTML)。

#2


1  

Convert your doc files to pdf with the help of JOdConverter and OpenOffice

在JOdConverter和OpenOffice的帮助下将doc文件转换为pdf

See How to convert ppt to images in Ruby? for reference

请参阅如何将ppt转换为Ruby中的图像?以供参考

and then use pdftohtml (http://pdftohtml.sourceforge.net) a utility which converts PDF files into HTML.

然后使用pdftohtml(http://pdftohtml.sourceforge.net)将PDF文件转换为HTML的实用程序。

You will get amazing results.

你会得到惊人的结果。