从字符串中提取最大的数字序列(正则表达式,还是?)

时间:2022-09-13 11:06:30

I have strings similar to the following:

我有类似于以下字符串:

 4123499-TESCO45-123
 every99999994_54

And I want to extract the largest numeric sequence in each string, respectively:

我想分别提取每个字符串中最大的数字序列:

 4123499
 99999994

I have previously tried regex (I am using VB6)

我之前尝试过正则表达式(我使用的是VB6)

 Set rx = New RegExp
 rx.Pattern = "[^\d]"
 rx.Global = True

 StringText = rx.Replace(StringText, "")

Which gets me partway there, but it only removes the non-numeric values, and I end up with the first string looking like:

这让我在那里,但它只删除非数字值,我最终看到第一个字符串:

412349945123

Can I find a regex that will give me what I require, or will I have to try another method? Essentially, my pattern would have to be anything that isn't the longest numeric sequence. But I'm not actually sure if that is even a reasonable pattern. Could anyone with a better handle of regex tell me if I am going down a rabbit hole? I appreciate any help!

我可以找到能够满足我要求的正则表达式,还是我必须尝试其他方法?基本上,我的模式必须是任何不是最长的数字序列。但我真的不确定这是否是一个合理的模式。有没有更好处理正则表达式的人能告诉我,如果我要走下兔子洞吗?我感谢任何帮助!

4 个解决方案

#1


3  

You cannot get the result by just a regex. You will have to extract all numeric chunks and get the longest one using other programming means.

你只能用正则表达式来获得结果。您必须使用其他编程方法提取所有数字块并获得最长的数据块。

Here is an example:

这是一个例子:

Dim strPattern As String: strPattern = "\d+"
Dim str As String: str = "4123499-TESCO45-123"
Dim regEx As New RegExp
Dim matches  As MatchCollection
Dim match As Match
Dim result As String

With regEx
     .Global = True
     .MultiLine = False
     .IgnoreCase = False
     .Pattern = strPattern
End With

Set matches = regEx.Execute(str)
For Each m In matches
  If result < Len(m.Value) Then result = m.Value
Next

Debug.Print result

The \d+ with RegExp.Global=True will find all digit chunks and then only the longest will be printed after all matches are processed in a loop.

带有RegExp.Global = True的\ d +将找到所有数字块,然后在循环中处理所有匹配后才会打印最长的数字块。

#2


2  

That's not solvable with an RE on its own.

这与RE本身无法解决。

Instead you can simply walk along the string tracking the longest consecutive digit group:

相反,您可以简单地沿着跟踪最长连续数字组的字符串:

For i = 1 To Len(StringText)
    If IsNumeric(Mid$(StringText, i, 1)) Then
        a = a & Mid$(StringText, i, 1)
    Else
        a = ""
    End If
    If Len(a) > Len(longest) Then longest = a
Next

MsgBox longest 

(first result wins a tie)

(第一个成绩赢得平局)

#3


1  

If the two examples you gave, are of a standard where:

如果你给出的两个例子是标准的,其中:

  1. <long_number>-<some_other_data>-<short_number>
  2. <text><long_number>_<short_number>

Are the two formats that the strings come in, there are some solutions.

字符串是两种格式,有一些解决方案。

However, if you are searching any string in any format for the longest number, these will not work.

但是,如果您要搜索任何格式的任何字符串中最长的数字,这些都不起作用。

Solution 1

([0-9]+)[_-].*

See the demo

看演示

In the first capture group, you should have the longest number for those 2 formats.

在第一个捕获组中,您应该拥有这两种格式的最长编号。

Note: This assumes that the longest number will be the first number it encounters with an underscore or a hyphen next to it, matching those two examples given.

注意:这假设最长的数字将是它遇到下划线或连字符后面的连字符的第一个数字,与给出的两个示例相匹配。

Solution 2

\d{6,}

See the demo

看演示

Note: This assumes that the shortest number will never exceed 5 characters in length, and the longest number will never be shorter than 6 characters in length

注意:这假设最短的数字永远不会超过5个字符,最长的数字永远不会短于6个字符

#4


1  

Please, try.
Pure VB. No external libs or objects.
No brain-breaking regexp's patterns.
No string manipulations, so - speed. Superspeed. ~30 times faster than regexp :)
Easy transform on variouse needs.
For example, concatenate all digits from the source string to a single string.

请试试。纯VB。没有外部库或对象。没有突破性的正则表达式模式。没有字符串操作,所以 - 速度。超高速。比regexp快~30倍:)轻松改变各种需求。例如,将源字符串中的所有数字连接到单个字符串。

Moreover, if target string is only intermediate step,
so it's possible to manipulate with numbers only.

此外,如果目标字符串只是中间步骤,那么可以仅使用数字进行操作。

Public Sub sb_BigNmb()
Dim sSrc$, sTgt$
Dim taSrc() As Byte, taTgt() As Byte, tLB As Byte, tUB As Byte
Dim s As Byte, t As Byte, tLenMin As Byte

    tLenMin = 4
    sSrc = "every99999994_54"

    sTgt = vbNullString

    taSrc = StrConv(sSrc, vbFromUnicode)
    tLB = LBound(taSrc)
    tUB = UBound(taSrc)

    ReDim taTgt(tLB To tUB)

    t = 0
    For s = tLB To tUB
        Select Case taSrc(s)
            Case 48 To 57
                taTgt(t) = taSrc(s)
                t = t + 1
            Case Else
                If CBool(t) Then Exit For   ' *** EXIT FOR ***
        End Select
    Next

    If (t > tLenMin) Then
        ReDim Preserve taTgt(tLB To (t - 1))
        sTgt = StrConv(taTgt, vbUnicode)
    End If

    Debug.Print "'" & sTgt & "'"
    Stop

End Sub

How to handle sSrc = "ev_1_ery99999994_54", please, make by yourself :) .

如何处理sSrc =“ev_1_ery99999994_54”,请自己制作:)。

#1


3  

You cannot get the result by just a regex. You will have to extract all numeric chunks and get the longest one using other programming means.

你只能用正则表达式来获得结果。您必须使用其他编程方法提取所有数字块并获得最长的数据块。

Here is an example:

这是一个例子:

Dim strPattern As String: strPattern = "\d+"
Dim str As String: str = "4123499-TESCO45-123"
Dim regEx As New RegExp
Dim matches  As MatchCollection
Dim match As Match
Dim result As String

With regEx
     .Global = True
     .MultiLine = False
     .IgnoreCase = False
     .Pattern = strPattern
End With

Set matches = regEx.Execute(str)
For Each m In matches
  If result < Len(m.Value) Then result = m.Value
Next

Debug.Print result

The \d+ with RegExp.Global=True will find all digit chunks and then only the longest will be printed after all matches are processed in a loop.

带有RegExp.Global = True的\ d +将找到所有数字块,然后在循环中处理所有匹配后才会打印最长的数字块。

#2


2  

That's not solvable with an RE on its own.

这与RE本身无法解决。

Instead you can simply walk along the string tracking the longest consecutive digit group:

相反,您可以简单地沿着跟踪最长连续数字组的字符串:

For i = 1 To Len(StringText)
    If IsNumeric(Mid$(StringText, i, 1)) Then
        a = a & Mid$(StringText, i, 1)
    Else
        a = ""
    End If
    If Len(a) > Len(longest) Then longest = a
Next

MsgBox longest 

(first result wins a tie)

(第一个成绩赢得平局)

#3


1  

If the two examples you gave, are of a standard where:

如果你给出的两个例子是标准的,其中:

  1. <long_number>-<some_other_data>-<short_number>
  2. <text><long_number>_<short_number>

Are the two formats that the strings come in, there are some solutions.

字符串是两种格式,有一些解决方案。

However, if you are searching any string in any format for the longest number, these will not work.

但是,如果您要搜索任何格式的任何字符串中最长的数字,这些都不起作用。

Solution 1

([0-9]+)[_-].*

See the demo

看演示

In the first capture group, you should have the longest number for those 2 formats.

在第一个捕获组中,您应该拥有这两种格式的最长编号。

Note: This assumes that the longest number will be the first number it encounters with an underscore or a hyphen next to it, matching those two examples given.

注意:这假设最长的数字将是它遇到下划线或连字符后面的连字符的第一个数字,与给出的两个示例相匹配。

Solution 2

\d{6,}

See the demo

看演示

Note: This assumes that the shortest number will never exceed 5 characters in length, and the longest number will never be shorter than 6 characters in length

注意:这假设最短的数字永远不会超过5个字符,最长的数字永远不会短于6个字符

#4


1  

Please, try.
Pure VB. No external libs or objects.
No brain-breaking regexp's patterns.
No string manipulations, so - speed. Superspeed. ~30 times faster than regexp :)
Easy transform on variouse needs.
For example, concatenate all digits from the source string to a single string.

请试试。纯VB。没有外部库或对象。没有突破性的正则表达式模式。没有字符串操作,所以 - 速度。超高速。比regexp快~30倍:)轻松改变各种需求。例如,将源字符串中的所有数字连接到单个字符串。

Moreover, if target string is only intermediate step,
so it's possible to manipulate with numbers only.

此外,如果目标字符串只是中间步骤,那么可以仅使用数字进行操作。

Public Sub sb_BigNmb()
Dim sSrc$, sTgt$
Dim taSrc() As Byte, taTgt() As Byte, tLB As Byte, tUB As Byte
Dim s As Byte, t As Byte, tLenMin As Byte

    tLenMin = 4
    sSrc = "every99999994_54"

    sTgt = vbNullString

    taSrc = StrConv(sSrc, vbFromUnicode)
    tLB = LBound(taSrc)
    tUB = UBound(taSrc)

    ReDim taTgt(tLB To tUB)

    t = 0
    For s = tLB To tUB
        Select Case taSrc(s)
            Case 48 To 57
                taTgt(t) = taSrc(s)
                t = t + 1
            Case Else
                If CBool(t) Then Exit For   ' *** EXIT FOR ***
        End Select
    Next

    If (t > tLenMin) Then
        ReDim Preserve taTgt(tLB To (t - 1))
        sTgt = StrConv(taTgt, vbUnicode)
    End If

    Debug.Print "'" & sTgt & "'"
    Stop

End Sub

How to handle sSrc = "ev_1_ery99999994_54", please, make by yourself :) .

如何处理sSrc =“ev_1_ery99999994_54”,请自己制作:)。