I have a string that looks like this:
我有一个看起来像这样的字符串:
<a href="http://forum.tibia.com/forum/?action=board&boardid=476">Amera</a><br><font class="ff_info">This board is for general discussions related to the game world Amera.</font>
How can I ignore/remove everything after the </a>
and then only get the url: http://forum.tibia.com/forum/?action=board&boardid=476
and the value Amera
如何在之后忽略/删除所有内容,然后只获取网址:http://forum.tibia.com/forum/?action = board&boardid = 476和Amera的值
So afterwards, I want 2 variables with their values, like:
所以之后,我想要2个变量及其值,如:
string url = "http://forum.tibia.com/forum/?action=board&boardid=476";
string url =“http://forum.tibia.com/forum/?action=board&boardid=476”;
and
string value = "Amera";
string value =“Amera”;
I tried this to get the value:
我试过这个来获得价值:
string value = System.Text.RegularExpressions.Regex.Replace(MYSTRING, "(<[a|A][^>]*>|)", "");
But it returns:
但它返回:
Amera</a><br><font class="ff_info">This board is for general discussions related to the game world Amera.</font>
3 个解决方案
#1
0
For getting the URL, maybe try, this regex pattern: /href=\"(.*)\"/
要获取URL,也许可以试试这个正则表达式模式:/href=\"(.*)\"/
...And to get the values between > Amera </a>
use a pattern like: >(.+?)</a>
...要获得> Amera 之间的值,请使用以下模式:>(。+?)
...although, this seems far from perfect...
......虽然,这似乎远非完美......
#2
0
If the a
tag won't contain more attributes, you can use just this for the URL only:
如果a标记不包含更多属性,则只能将此用于URL:
\bhref="(.*?)"
And little more complex for URL and text:
而URL和文本则更复杂:
<a\b[^>]*?\bhref="([^"]*?)"[^>]*?>(.*?)<\/a>
So in C# code (quotation marks need to be escaped!):
所以在C#代码中(引号需要转义!):
var html = "<a href=\"http://forum.tibia.com/forum/?action=board&boardid=476\">Amera</a><br><font class=\"ff_info\">This board is for general discussions related to the game world Amera.</font>";
var match = Regex.Match(html, "<a\\b[^>]*?\\bhref=\"([^\"]*?)\"[^>]*?>(.*?)<\\/a>", RegexOptions.IgnoreCase);
if (match.Success) {
var url = match.Groups[1];
var text = match.Groups[2]
}
#3
0
Try this:
HtmlDocument dc = new HtmlAgilityPack.HtmlDocument();
dc.LoadHtml("<a href='http://forum.tibia.com/forum/?action=board&boardid=476'>Amera</a><br><font class='ff_info'>This board is for general discussions related to the game world Amera.</font>");
foreach (HtmlNode link in dc.DocumentNode.SelectNodes("a"))
{
string url = link.Attributes["href"].Value; // http://forum.tibia.com/forum/?action=board&boardid=476
string value = link.InnerText; // Amera
}
#1
0
For getting the URL, maybe try, this regex pattern: /href=\"(.*)\"/
要获取URL,也许可以试试这个正则表达式模式:/href=\"(.*)\"/
...And to get the values between > Amera </a>
use a pattern like: >(.+?)</a>
...要获得> Amera 之间的值,请使用以下模式:>(。+?)
...although, this seems far from perfect...
......虽然,这似乎远非完美......
#2
0
If the a
tag won't contain more attributes, you can use just this for the URL only:
如果a标记不包含更多属性,则只能将此用于URL:
\bhref="(.*?)"
And little more complex for URL and text:
而URL和文本则更复杂:
<a\b[^>]*?\bhref="([^"]*?)"[^>]*?>(.*?)<\/a>
So in C# code (quotation marks need to be escaped!):
所以在C#代码中(引号需要转义!):
var html = "<a href=\"http://forum.tibia.com/forum/?action=board&boardid=476\">Amera</a><br><font class=\"ff_info\">This board is for general discussions related to the game world Amera.</font>";
var match = Regex.Match(html, "<a\\b[^>]*?\\bhref=\"([^\"]*?)\"[^>]*?>(.*?)<\\/a>", RegexOptions.IgnoreCase);
if (match.Success) {
var url = match.Groups[1];
var text = match.Groups[2]
}
#3
0
Try this:
HtmlDocument dc = new HtmlAgilityPack.HtmlDocument();
dc.LoadHtml("<a href='http://forum.tibia.com/forum/?action=board&boardid=476'>Amera</a><br><font class='ff_info'>This board is for general discussions related to the game world Amera.</font>");
foreach (HtmlNode link in dc.DocumentNode.SelectNodes("a"))
{
string url = link.Attributes["href"].Value; // http://forum.tibia.com/forum/?action=board&boardid=476
string value = link.InnerText; // Amera
}