如何将HTML转换为Textile?

时间:2022-10-30 13:54:12

I'm scraping a static html site and moving the content into a database-backed CMS. I'd like to use Textile in the CMS.

我正在抓取一个静态html站点并将内容移动到数据库支持的CMS中。我想在CMS中使用Textile。

Is there a tool out there that converts HTML into Textile, so I can scrape the existing site, convert the HTML to Textile, and insert that data into the database?

是否有一个工具可以将HTML转换为Textile,因此我可以抓取现有网站,将HTML转换为Textile,并将该数据插入数据库?

5 个解决方案

#1


1  

I know this is an old question, but I found myself trying to do this the other day and not finding anything useful, until I found Pandoc. It can convert loads of other markup formats as well - it's quite brilliant.

我知道这是一个老问题,但我发现自己前几天试图这样做而没有找到任何有用的东西,直到我找到Pandoc。它也可以转换其他标记格式的负载 - 它非常棒。

#2


0  

Here is a c# lib converting html 2 textile. Though it is textile with their additions. Not pure textile.

这是一个c#lib转换html 2纺织品。虽然它是添加了纺织品。不是纯纺织品。

#3


0  

Since there was no javascript implementation, I wrote one: https://github.com/cmroanirgo/to-textile

由于没有javascript实现,我写了一个:https://github.com/cmroanirgo/to-textile

It's a little primitive at the moment, as it's a blind port of the 'to-markdown' equivalent, but should get the job done.

目前它有点原始,因为它是“降价”等价物的盲目端口,但应该完成工作。

#4


-1  

try this simple java code hope it work for you

试试这个简单的java代码希望它适合你

import java.net.*;
import java.io.*;

class Crawle
{

public static void main(String ar[])throws Exception
{


URL url = new URL("https://www.google.co.in/#q=i+am+happy");
InputStream io =  url.openStream();
BufferedReader br = new BufferedReader(new InputStreamReader(io));
FileOutputStream fio = new FileOutputStream("crawler/file.txt");
PrintWriter pr = new PrintWriter(fio,true);
String data = "";
while((data=br.readLine())!=null)
{
pr.println(data);
System.out.println(data);
}

}
}
}

#5


-2  

This is a simple markup replacement, nothing a good regex could not fix.

这是一个简单的标记替换,没有一个好的正则表达式无法修复。

I recommend Perl, LWP::Simple and some regexes to do the whole thing (spidering, stripping design and menus, converting to textile, and then posting to the database.)

我推荐Perl,LWP :: Simple和一些正则表达式来完成整个事情(旋转,剥离设计和菜单,转换为纺织品,然后发布到数据库。)

#1


1  

I know this is an old question, but I found myself trying to do this the other day and not finding anything useful, until I found Pandoc. It can convert loads of other markup formats as well - it's quite brilliant.

我知道这是一个老问题,但我发现自己前几天试图这样做而没有找到任何有用的东西,直到我找到Pandoc。它也可以转换其他标记格式的负载 - 它非常棒。

#2


0  

Here is a c# lib converting html 2 textile. Though it is textile with their additions. Not pure textile.

这是一个c#lib转换html 2纺织品。虽然它是添加了纺织品。不是纯纺织品。

#3


0  

Since there was no javascript implementation, I wrote one: https://github.com/cmroanirgo/to-textile

由于没有javascript实现,我写了一个:https://github.com/cmroanirgo/to-textile

It's a little primitive at the moment, as it's a blind port of the 'to-markdown' equivalent, but should get the job done.

目前它有点原始,因为它是“降价”等价物的盲目端口,但应该完成工作。

#4


-1  

try this simple java code hope it work for you

试试这个简单的java代码希望它适合你

import java.net.*;
import java.io.*;

class Crawle
{

public static void main(String ar[])throws Exception
{


URL url = new URL("https://www.google.co.in/#q=i+am+happy");
InputStream io =  url.openStream();
BufferedReader br = new BufferedReader(new InputStreamReader(io));
FileOutputStream fio = new FileOutputStream("crawler/file.txt");
PrintWriter pr = new PrintWriter(fio,true);
String data = "";
while((data=br.readLine())!=null)
{
pr.println(data);
System.out.println(data);
}

}
}
}

#5


-2  

This is a simple markup replacement, nothing a good regex could not fix.

这是一个简单的标记替换,没有一个好的正则表达式无法修复。

I recommend Perl, LWP::Simple and some regexes to do the whole thing (spidering, stripping design and menus, converting to textile, and then posting to the database.)

我推荐Perl,LWP :: Simple和一些正则表达式来完成整个事情(旋转,剥离设计和菜单,转换为纺织品,然后发布到数据库。)