如何将HTML转换为文档格式? [重复]

时间:2022-04-09 08:03:19

This question already has an answer here:

这个问题在这里已有答案:

I'd like to be able to convert HTML to either docx or RTF. There are plenty of Ruby gems for creating docx and RTF docs, but they are just for creating an empty document, which you can then programmatically add stuff to.

我希望能够将HTML转换为docx或RTF。有很多Ruby gems用于创建docx和RTF文档,但它们只是用于创建一个空文档,然后您可以以编程方式添加内容。

The issue with those gems is there is no way to accurately convert the format of a webpage to be the same/similar on a printable page. There are a lot of complexities with HTML tags, and the position of those tags due to their CSS attributes.

这些宝石的问题是无法在可打印页面上准确地将网页格式转换为相同/相似。 HTML标记有很多复杂性,并且由于CSS属性,这些标记的位置也很复杂。

With my current knowledge of the gems out there for RTF and Word creation, I'd have to write an HTML parser and convert all the HTML tags to similar openXML tags, such as bold, and italic, but then position things based on the CSS, but due to position: relative/absolute rendering a document page would be extremely difficult.

根据我目前对RTF和Word创建的宝石知识,我必须编写HTML解析器并将所有HTML标记转换为类似的openXML标记,例如粗体和斜体,但随后根据CSS定位内容,但由于位置:相对/绝对渲染文档页面将是非常困难的。

I'm wondering if there are any recent developments, or if there is some soon-to-be-released gem or service or tool to be able to handle this conversion.

我想知道是否有最近的发展,或者是否有一些即将发布的宝石或服务或工具能够处理这种转换。

There is a gem that is supposed to convert Word to and from HTML, but, it has no documentation, and can only be found at https://www.ruby-toolbox.com/gems/word_parsing and on rubygems. And, I've been unsuccessful installing it on my local machine, due to dependency issues. Since there is no documentation, there is no mention how to fix the dependencies.

有一个gem应该将Word转换为HTML和从HTML转换,但它没有文档,只能在https://www.ruby-toolbox.com/gems/word_parsing和rubygems上找到。而且,由于依赖性问题,我在本地计算机上安装它并不成功。由于没有文档,因此没有提及如何修复依赖项。

There are services out there that will convert PDF to "word", and converting HTML to PDF has already been solved by multiple people or gems. This service: http://www.pdftoword.com/ converts PDF to RTF, and even separates out the images in the resulting document. Their issue is that it runs on a Windows server -- I need something cross platform, because the app I'm working on is Ruby on Rails running on Unix based servers.

有些服务可以将PDF转换为“word”,将HTML转换为PDF已经由多人或宝石解决。此服务:http://www.pdftoword.com/将PDF转换为RTF,甚至可以分离出结果文档中的图像。他们的问题是它运行在Windows服务器上 - 我需要一些跨平台的东西,因为我正在开发的应用程序是在基于Unix的服务器上运行的Ruby on Rails。

2 个解决方案

#1


1  

I've published a little gem that generates docx files from html templates.

我发布了一个从html模板生成docx文件的小宝石。

https://github.com/docxtor/docxtor

https://github.com/docxtor/docxtor

It can insert page numbers, footers/headers with given <div>'s contains, translate <h1> headings to document headings.

它可以插入带有给定

的包含的页码,页脚/标题,将

标题翻译成文档标题。

The catch is that all word processors parse docx format differently. So the resulting files are read just fine by Libre Office on Mac, but wouldn't open in Google Docs.

问题是所有文字处理器都以不同方式解析docx格式。因此,Mac上的Libre Office可以很好地读取生成的文件,但不会在Google Docs中打开。

Any help and/or feedback on a gem is much appreciated!

任何有关宝石的帮助和/或反馈都非常感谢!

#2


0  

I'm also looking for this kind of solution, I think it's better looking at on https://github.com/bagilevi/docx_builder. I haven't tried it yet however. Read this article also http://rubythings.blogspot.com/2011/05/creating-word-documents-in-rails.html

我也在寻找这种解决方案,我认为最好在https://github.com/bagilevi/docx_builder上查看。然而,我还没有尝试过。阅读本文也是http://rubythings.blogspot.com/2011/05/creating-word-documents-in-rails.html

If someone could come up with a better solution, we all would be thankful :)

如果有人能想出更好的解决方案,我们都会感激不尽:)

#1


1  

I've published a little gem that generates docx files from html templates.

我发布了一个从html模板生成docx文件的小宝石。

https://github.com/docxtor/docxtor

https://github.com/docxtor/docxtor

It can insert page numbers, footers/headers with given <div>'s contains, translate <h1> headings to document headings.

它可以插入带有给定

的包含的页码,页脚/标题,将

标题翻译成文档标题。

The catch is that all word processors parse docx format differently. So the resulting files are read just fine by Libre Office on Mac, but wouldn't open in Google Docs.

问题是所有文字处理器都以不同方式解析docx格式。因此,Mac上的Libre Office可以很好地读取生成的文件,但不会在Google Docs中打开。

Any help and/or feedback on a gem is much appreciated!

任何有关宝石的帮助和/或反馈都非常感谢!

#2


0  

I'm also looking for this kind of solution, I think it's better looking at on https://github.com/bagilevi/docx_builder. I haven't tried it yet however. Read this article also http://rubythings.blogspot.com/2011/05/creating-word-documents-in-rails.html

我也在寻找这种解决方案,我认为最好在https://github.com/bagilevi/docx_builder上查看。然而,我还没有尝试过。阅读本文也是http://rubythings.blogspot.com/2011/05/creating-word-documents-in-rails.html

If someone could come up with a better solution, we all would be thankful :)

如果有人能想出更好的解决方案,我们都会感激不尽:)