如何使用excel VBA脚本删除某些字符?

时间:2022-10-10 22:20:30

The following VBA script gets rid of unwanted characters but unfortunately only NUMBERS.

下面的VBA脚本删除不需要的字符,但不幸的是只删除数字。

Could you please assist me, It needs to rid letters too as in the table example(bolded) below.

你能帮我一下吗,它也需要去掉下面表格中的字母(粗体)。

the Range could be anywhere from 0 to 15000+ cells

范围可以是0到15000+细胞。

.....................................................

.....................................................

a new a york a times a

a new a york a times a

b new b york b times b

b纽约b纽约b乘以b

c new c york c watertown c ny c

c纽约c水城c纽约c

6 ave 6 new 6 york 6 city 6

6大街6号纽约6城6号

......................................................

......................................................

The VBA script:

VBA脚本:

Sub Remove()

Application.ScreenUpdating = False
Dim R As RegExp, C As Range
For Each C In Range("A1:A" & Cells(Rows.Count, 1).End(xlUp).Row)
If R Is Nothing Then
Set R = New RegExp
R.Global = True
R.Pattern = "\D"
C.Offset(0, 1) = R.Replace(C, "")
R.Pattern = "\d"
C = R.Replace(C, "")
End If
Set R = Nothing
Next C
Application.ScreenUpdating = True
End Sub

EDIT1

EDIT1

Sub Remove()
Call BackMeUp

Dim cell As Range
Dim RE As Object
Dim Whitecell As Range
Dim strFind As String, strReplace As String
Dim lLoop As Long
Dim Loop1 As Long

    Application.ScreenUpdating = False
    Application.Calculation = xlCalculationManual
    Application.EnableEvents = False
Range("A3:L3").Select
Selection.Delete Shift:=xlUp
'--------------------------------------------------Remove JUNK
Range("A3:A" & Cells(Rows.Count, 1).End(xlUp).Row).Select
On Error Resume Next
For lLoop = 1 To 100
    strFind = Choose(lLoop, "~?»", "~®", "~.", "~!", "~ï", "~-", "~§", "~$", "~%", "~&", "~/", "~\", "~,", "~(", "~)", "~=", "~www", "~WWW", "~.com", "~.net", "~.org", "~{", "~}", "~[", "~]", "~ï", "~¿", "~½", "~:", "~;", "~_", "~µ", "~@", "~#", "~'", "~|", "~€", "~ä", "~ö", "~ü", "~Ä", "~Ü", "~Ö", "~+", "~<", "~>", "~nbsp", "~â", "~¦", "~©", "~Â", "~–", "~¼", "~?")
    strReplace = Choose(lLoop, " ")

    Selection.Replace What:=strFind, Replacement:=strReplace, LookAt:=xlPart, _
    SearchOrder:=xlByRows, MatchCase:=False, SearchFormat:=False, _
    ReplaceFormat:=False

Next lLoop
'--------------------------------------------------Remove Numbers
Range("A3:A" & Cells(Rows.Count, 1).End(xlUp).Row).Select
On Error Resume Next
For Loop1 = 1 To 40
    strFind = Choose(lLoop, "~1", "~2", "~3", "~4", "~5", "~6", "~7", "~8", "~9", "~0")
    strReplace = Choose(Loop1, " ")

    Selection.Replace What:=strFind, Replacement:=strReplace, LookAt:=xlPart, _
    SearchOrder:=xlByRows, MatchCase:=False, SearchFormat:=False, _
    ReplaceFormat:=False

Next Loop1
'--------------------------------------------------Remove Single Letters
Set RE = CreateObject("vbscript.regexp")
RE.Global = True
RE.MultiLine = True
RE.Pattern = "^[a-z]\b | \b[a-z]\b"

For Each cell In Range("A3:A" & Cells(Rows.Count, 1).End(xlUp).Row)
    cell.Value = RE.Replace(cell.Value, "")

Next

'--------------------------------------------------Remove WHITE SPACES

For Each Whitecell In Range("A3:A" & Cells(Rows.Count, 1).End(xlUp).Row)
    Whitecell = WorksheetFunction.Trim(Whitecell)
Next Whitecell

'--------------------------------------------------Remove DUPES

ActiveSheet.Range("A3:A" & Cells(Rows.Count, 1).End(xlUp).Row).RemoveDuplicates Columns:=1, Header:=xlYes
ActiveWorkbook.Worksheets(ActiveSheet.Name).Sort.SortFields.Clear

'--------------------------------------------------Copy to B - REPLACE ALL WHITE IN B

Range("A3:A" & Cells(Rows.Count, 1).End(xlUp).Row).Select
    Selection.Copy
    Range("B3:B" & Cells(Rows.Count, 1).End(xlUp).Row).Select
    ActiveSheet.Paste
    Selection.Replace What:=" ", Replacement:="", LookAt:=xlPart, _
        SearchOrder:=xlByRows, MatchCase:=False, SearchFormat:=False, _
        ReplaceFormat:=False
        Columns("A:L").EntireColumn.AutoFit

'--------------------------------------------------END
    Application.ScreenUpdating = True
    Application.Calculation = xlCalculationAutomatic
    Application.EnableEvents = True
    Range("a1").Select
End Sub

3 个解决方案

#1


4  

EDIT (deleted original answer as it was not applicable after recieving more info on what you wanted, but leaving advice)

编辑(删除原始答案,因为在收到更多关于你想要的信息后不适用,但留下建议)

  • You are creating/destroying the RE object every cell, which is expensive/unnessessary
    • If other users will use the function, create the object inside the code instead of adding references
    • 如果其他用户使用该函数,则在代码中创建对象而不是添加引用。
    • There is no need to set the regex object to nothing at the end - variables are released from memory at the end of the function automatically
    • 不需要在结束时将regex对象设置为空——在函数的末尾会自动从内存中释放变量
    • Improving your variable naming and using proper indentation could help improve readability and make it easier to edit
    • 改进变量命名和使用适当的缩进可以帮助提高可读性并使编辑更容易
    • Add the multiline option in case your cells have line breaks inside them.
    • 添加多行选项,以防止您的单元格内部出现换行。
    • You might want to use a variant array if working with a lot of cells
    • 如果使用大量单元格,您可能需要使用一个变体数组
  • 您正在创建/销毁每个单元格的RE对象,这是昂贵的/ unnessary如果其他用户使用这个函数,创建对象的代码而不是内添加引用不需要设置正则表达式对象没有结束时,变量是释放内存的函数自动提高你的变量命名和使用适当的缩进可以帮助改善可读性和便于编辑添加多行选项,以防你的细胞有换行符。如果使用大量单元格,您可能需要使用一个变体数组

UDPATE 2

UDPATE 2

Based one the comments below, here is how to get only occurances of two or more lowercase characters and the single spaces in-between. Instead of replacing what you DON'T want, I personally think a good way is to extract what you DO want. I have shared the below function quite a bit on this site as it's really useful. Here's an example of how to call it on the contents of Column A and put the results in Column B.

基于下面的评论,这里介绍了如何只获得两个或多个小写字符和中间的单个空格的出现。我个人认为一个好的方法是提取你想要的东西,而不是替换你不想要的东西。我在这个网站上分享了下面的功能,因为它真的很有用。这里有一个如何在A列的内容上调用它并将结果放在B列的例子。

Sub test()

' Show how to run this on cells in A and transpose result in B
Dim varray As Variant
Dim i As Long

Application.ScreenUpdating = False
varray = Range("A1:A15000").Value

For i = 1 To UBound(varray, 1)
    varray(i, 1) = RegexExtract(varray(i, 1), "([a-z]{2,})", " ")
Next

Range("B1").Resize(UBound(varray, 1)).Value = _
Application.WorksheetFunction.Transpose(varray)

Application.ScreenUpdating = True

End Sub

And make sure this is in the module:

确保这是在模块中:

Function RegexExtract(ByVal text As String, _
                      ByVal extract_what As String, _
                      Optional seperator As String = "") As String

Dim i As Long
Dim j As Long
Dim result As String
Dim allMatches As Object
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")

RE.Pattern = extract_what
RE.Global = True
Set allMatches = RE.Execute(text)

For i = 0 To allMatches.Count - 1
    For j = 0 To allMatches.Item(i).submatches.Count - 1
        result = result & seperator & allMatches.Item(i).submatches.Item(j)
    Next
Next

If Len(result) <> 0 Then
    result = Right$(result, Len(result) - Len(seperator))
End If

RegexExtract = result

End Function

#2


3  

Your "R.Pattern = "\d" is the only line you need to change. The "\d" is a regular expression describing a "digit".

你的“R。模式= "\d"是唯一需要更改的行。“\d”是描述“数字”的正则表达式。

I would suggest changing "\d" to "^[a-z0-9] | [a-z0-9]\b" as a starting point.

我建议改变“\ d”到“^[a-z0-9]| a-z0-9 \ b”作为一个起点。

#3


3  

I rewrote your code below so that

我在下面重写了你的代码

  • The RegExp is only created once. Your current code creates a new object then destroys it for each cell being tested as it is inside your loop
  • RegExp只创建一次。您当前的代码创建一个新对象,然后将其销毁,因为每个单元都在您的循环中进行测试。
  • The code below uses a variant array to minimise process time when manipulating each cell value. The constant VbNullString is slightly quicker than "".
  • 下面的代码使用了一个变体数组来最小化操作每个单元格值时的处理时间。恒定的VbNullString比“”稍快一些。
  • you case use the simpler \w in a regex to match any a-z0-9
  • 在regex中,使用更简单的\w来匹配任何a-z0-9。
  • late binding on the RegExp object avoids the need to ak a third party to set a reference, setting ignore case to true makes your replacement case insenstive

    在RegExp对象上的后期绑定避免了需要ak第三方来设置引用,将忽略大小写设置为true会使替换大小写变得无意义

         Sub Remove()
         Dim R As Object
         Dim C As Range
         Dim lngrow As Long
         Dim rng1 As Range
         Dim X
         Set R = CreateObject("vbscript.regexp")
         With R
           .Global = True
           .Pattern = "^\w\s|\b\w\b"
           .ignoreCase = True
         End With
         Application.ScreenUpdating = False
         Set rng1 = Range("A1:A" & Cells(Rows.Count, 1).End(xlUp).Row)
         X = rng1.Value2
         For lngrow = 1 To UBound(X, 1)
           X(lngrow, 1) = R.Replace(X(lngrow, 1), vbNullString)
         Next lngrow
         rng1.Value2 = X
         Application.ScreenUpdating = True
          End Sub
    

#1


4  

EDIT (deleted original answer as it was not applicable after recieving more info on what you wanted, but leaving advice)

编辑(删除原始答案,因为在收到更多关于你想要的信息后不适用,但留下建议)

  • You are creating/destroying the RE object every cell, which is expensive/unnessessary
    • If other users will use the function, create the object inside the code instead of adding references
    • 如果其他用户使用该函数,则在代码中创建对象而不是添加引用。
    • There is no need to set the regex object to nothing at the end - variables are released from memory at the end of the function automatically
    • 不需要在结束时将regex对象设置为空——在函数的末尾会自动从内存中释放变量
    • Improving your variable naming and using proper indentation could help improve readability and make it easier to edit
    • 改进变量命名和使用适当的缩进可以帮助提高可读性并使编辑更容易
    • Add the multiline option in case your cells have line breaks inside them.
    • 添加多行选项,以防止您的单元格内部出现换行。
    • You might want to use a variant array if working with a lot of cells
    • 如果使用大量单元格,您可能需要使用一个变体数组
  • 您正在创建/销毁每个单元格的RE对象,这是昂贵的/ unnessary如果其他用户使用这个函数,创建对象的代码而不是内添加引用不需要设置正则表达式对象没有结束时,变量是释放内存的函数自动提高你的变量命名和使用适当的缩进可以帮助改善可读性和便于编辑添加多行选项,以防你的细胞有换行符。如果使用大量单元格,您可能需要使用一个变体数组

UDPATE 2

UDPATE 2

Based one the comments below, here is how to get only occurances of two or more lowercase characters and the single spaces in-between. Instead of replacing what you DON'T want, I personally think a good way is to extract what you DO want. I have shared the below function quite a bit on this site as it's really useful. Here's an example of how to call it on the contents of Column A and put the results in Column B.

基于下面的评论,这里介绍了如何只获得两个或多个小写字符和中间的单个空格的出现。我个人认为一个好的方法是提取你想要的东西,而不是替换你不想要的东西。我在这个网站上分享了下面的功能,因为它真的很有用。这里有一个如何在A列的内容上调用它并将结果放在B列的例子。

Sub test()

' Show how to run this on cells in A and transpose result in B
Dim varray As Variant
Dim i As Long

Application.ScreenUpdating = False
varray = Range("A1:A15000").Value

For i = 1 To UBound(varray, 1)
    varray(i, 1) = RegexExtract(varray(i, 1), "([a-z]{2,})", " ")
Next

Range("B1").Resize(UBound(varray, 1)).Value = _
Application.WorksheetFunction.Transpose(varray)

Application.ScreenUpdating = True

End Sub

And make sure this is in the module:

确保这是在模块中:

Function RegexExtract(ByVal text As String, _
                      ByVal extract_what As String, _
                      Optional seperator As String = "") As String

Dim i As Long
Dim j As Long
Dim result As String
Dim allMatches As Object
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")

RE.Pattern = extract_what
RE.Global = True
Set allMatches = RE.Execute(text)

For i = 0 To allMatches.Count - 1
    For j = 0 To allMatches.Item(i).submatches.Count - 1
        result = result & seperator & allMatches.Item(i).submatches.Item(j)
    Next
Next

If Len(result) <> 0 Then
    result = Right$(result, Len(result) - Len(seperator))
End If

RegexExtract = result

End Function

#2


3  

Your "R.Pattern = "\d" is the only line you need to change. The "\d" is a regular expression describing a "digit".

你的“R。模式= "\d"是唯一需要更改的行。“\d”是描述“数字”的正则表达式。

I would suggest changing "\d" to "^[a-z0-9] | [a-z0-9]\b" as a starting point.

我建议改变“\ d”到“^[a-z0-9]| a-z0-9 \ b”作为一个起点。

#3


3  

I rewrote your code below so that

我在下面重写了你的代码

  • The RegExp is only created once. Your current code creates a new object then destroys it for each cell being tested as it is inside your loop
  • RegExp只创建一次。您当前的代码创建一个新对象,然后将其销毁,因为每个单元都在您的循环中进行测试。
  • The code below uses a variant array to minimise process time when manipulating each cell value. The constant VbNullString is slightly quicker than "".
  • 下面的代码使用了一个变体数组来最小化操作每个单元格值时的处理时间。恒定的VbNullString比“”稍快一些。
  • you case use the simpler \w in a regex to match any a-z0-9
  • 在regex中,使用更简单的\w来匹配任何a-z0-9。
  • late binding on the RegExp object avoids the need to ak a third party to set a reference, setting ignore case to true makes your replacement case insenstive

    在RegExp对象上的后期绑定避免了需要ak第三方来设置引用,将忽略大小写设置为true会使替换大小写变得无意义

         Sub Remove()
         Dim R As Object
         Dim C As Range
         Dim lngrow As Long
         Dim rng1 As Range
         Dim X
         Set R = CreateObject("vbscript.regexp")
         With R
           .Global = True
           .Pattern = "^\w\s|\b\w\b"
           .ignoreCase = True
         End With
         Application.ScreenUpdating = False
         Set rng1 = Range("A1:A" & Cells(Rows.Count, 1).End(xlUp).Row)
         X = rng1.Value2
         For lngrow = 1 To UBound(X, 1)
           X(lngrow, 1) = R.Replace(X(lngrow, 1), vbNullString)
         Next lngrow
         rng1.Value2 = X
         Application.ScreenUpdating = True
          End Sub