从Wikipedia XML文本中删除文本和斜体格式?

时间:2023-01-14 23:58:40

This link - http://en.wikipedia.org/wiki/Help:Wiki_markup#Text_formatting%20first%20point - states that whenever one bolds or italicize the text it is enclosed in either 2 apostrophes for''italicize text'' in 3 apostrophes for '''bold the text''' and 5 apostrophes '''''bold italics''''', I want to be able to take in a String which has this type of formatting as input as an input into the function and remove this sort of markup from the string to return clean text, what kind of regex should I write in java to achieve this, I am new to regexes and have no clue about doing this. Sample Content -

这个链接 - http://en.wikipedia.org/wiki/Help:Wiki_markup#Text_formatting%20first%20point - 表示每当一个用粗体或斜体显示文本时,它都包含在2个撇号中,用于'3'中的'''''''''撇号为'''加粗文本'''和5撇号'''''粗体斜体''''',我希望能够接受一个字符串,它具有这种格式作为输入作为输入到函数并从字符串中删除这种标记以返回干净的文本,我应该在java中编写什么样的正则表达式来实现这一点,我是regexes的新手并且不知道这样做。样本内容 -

Input

ranked him #'''89''' of the top 500 singles wrestlers

他排名前500的单打选手中排名第89位

Output

ranked him #89 of the top 500 singles wrestlers

他排名前500的单打摔跤手中排名第89位

2 个解决方案

#1


0  

Try, replaceAll()

    String sample = "ranked him #'''89''' of the top 500 singles wrestlers";
    System.out.println(""+sample.replaceAll("'", ""));

output :

ranked him #89 of the top 500 singles wrestlers

#2


0  

You can quickly replace groups of 2-3 apostrophes with the following regex:

您可以使用以下正则表达式快速替换2-3个撇号组:

[']{2,3}

Search for that pattern and replace with nothing. This should work since you're not trying to extract matches.

搜索该模式并替换为空。这应该有效,因为你没有尝试提取匹配。

#1


0  

Try, replaceAll()

    String sample = "ranked him #'''89''' of the top 500 singles wrestlers";
    System.out.println(""+sample.replaceAll("'", ""));

output :

ranked him #89 of the top 500 singles wrestlers

#2


0  

You can quickly replace groups of 2-3 apostrophes with the following regex:

您可以使用以下正则表达式快速替换2-3个撇号组:

[']{2,3}

Search for that pattern and replace with nothing. This should work since you're not trying to extract matches.

搜索该模式并替换为空。这应该有效,因为你没有尝试提取匹配。