来自网站源代码的正则表达式为yearid =“10287”

时间:2023-01-19 00:16:37

I have this source code of a website and I want select yearid="10287" from it using regex. I know I can easily do it with JSoup but I just don't want to add the library to my project for this sole purpose.

我有一个网站的源代码,我希望使用正则表达式从中选择yearid =“10287”。我知道我可以轻松地使用JSoup,但我只是不想为了这个唯一目的将库添加到我的项目中。

Facts about yearid="10287"

关于yearid =“10287”的事实

  • yearid is a constant ie the letters never change.

    yearid是一个常数,即字母永远不会改变。

  • The value 10287 varies, it might be 84748 or 746 but it is always a number

    值10287变化,可能是84748或746但它始终是一个数字

  • yearid="10287" appears more than once in the source code but I just need a single yearid="10287"
  • yearid =“10287”在源代码中出现不止一次,但我只需要一个yearid =“10287”

Currently I am trying this:

目前我正在尝试这个:

\s*[yearid]0-9 

but it seems not to be working.

但它似乎没有奏效。

Sample Html

//Skipped the meta and header because I don't need it.
    ...
    <body class="sin" yearid="10287" ezaw='580' ezar='400' style='min-height:200px>
    <div class="ks">
        <div class="wrap">

            <div class="content-right-sidebar-wrap">
                <main class="content">

                    //A lot of unneeded tags

                    <article class="post-1989009 post type-post post" itemscope="" itemtype="http://schema.org/CreativeWork">
                        <header class="post-header">
                            <h1 class="post-title" itemprop="headline">Tyh RGB  Marco to habits gtr</h1>
                            <img src="https://ohniee.com/wp-content/uploads/avatars/1/djsy8933e89ufio8389e8-author-img.jpg" class="avatar user-1-avatar avatar-40 photo" width="40" height="40" alt="Profile photo of Johnnie Adams">

                            <div class="entry-meta" style="padding-top:3px; margin-left: 50px">
                            " Written by "<a href="/authors/johnnie"><span class="entry-author" itemprop="author" itemscope="" itemtype="http://schema.org/Person"><span class="entry-author-name" itemprop="name">Johnnie Adams</span></span></a> <script>
                            document.write(" on April 23rd, 2002 11:28 PM")</script>" on April 23rd, 2002 11:28 PM  .  "<span class="entry-comments-link"><a href="https://johniee.com/2002/04/thalo-in-American-film-industryk.html#comments">1 Comment</a></span>
                            </div>
                        </header>

                        //A lot of unneeded tags

                       ...

2 个解决方案

#1


1  

Try this regex: /yearid="[0-9]+"/

试试这个正则表达式:/ yearid =“[0-9] +”/

http://regexr.com/3dn2j

#2


2  

Your attempt isn't working because [yearid] part is matching a single character, which is one of {y, e, a, r, i, d}; and the 0-9 part is matching the literal sequence 0-9 (\d or [0-9] are what you're after there). Something like \byearid=\"[0-9]+\"\b should work.

你的尝试无效,因为[yearid]部分匹配单个字符,这是{y,e,a,r,i,d}之一;并且0-9部分匹配文字序列0-9(\ d或[0-9]是你之后的那些)。像\ byearid = \“[0-9] + \”\ b之类的东西应该有效。

#1


1  

Try this regex: /yearid="[0-9]+"/

试试这个正则表达式:/ yearid =“[0-9] +”/

http://regexr.com/3dn2j

#2


2  

Your attempt isn't working because [yearid] part is matching a single character, which is one of {y, e, a, r, i, d}; and the 0-9 part is matching the literal sequence 0-9 (\d or [0-9] are what you're after there). Something like \byearid=\"[0-9]+\"\b should work.

你的尝试无效,因为[yearid]部分匹配单个字符,这是{y,e,a,r,i,d}之一;并且0-9部分匹配文字序列0-9(\ d或[0-9]是你之后的那些)。像\ byearid = \“[0-9] + \”\ b之类的东西应该有效。