使用RegEx和Replace在MS-Access中使用USPS缩写更新地址字段

时间:2021-10-22 02:58:10

I'm trying to write a VBA function in Access that replaces words in an address field with the standard United States Postal Abbreviations. I realize this is never going to be perfect, but I want to at least make simple abbreviations (without having to purchase address formatting software), e.g.

我正在尝试在Access中编写一个VBA函数,用标准的美国邮政缩写替换地址字段中的单词。我意识到这永远不会是完美的,但我想至少制作简单的缩写(无需购买地址格式化软件),例如

 input      output
 -------    -------------
 North   -> N
 Street  -> ST
 Drive   -> DR
 Lane    -> LN

I thought about using a simple table to store the string and the replacement string, and then looping through that table/recordset to perform a simple search and replace using the Replace() function, e.g. using the immediate window:

我想过使用一个简单的表来存储字符串和替换字符串,然后循环遍历该表/记录集以执行简单的搜索并使用Replace()函数替换,例如,使用即时窗口:

 ?Replace("123 North 3rd St", "North", "N", compare:=vbTextCompare)
 123 N 3rd St

However, this method can potentially cause errors, e.g.

但是,这种方法可能会导致错误,例如:

 ?Replace("123 Northampton St", "North", "N", compare:=vbTextCompare)
 123 Nampton St

My original strategy was to create a replacement table with regular expression patterns and replacement strings, then loop through that table to do a more precise search and replace.

我最初的策略是使用正则表达式模式和替换字符串创建替换表,然后遍历该表以进行更精确的搜索和替换。

pattern                 abbrev
-------------------     ------------
{pattern for North}     N
{pattern for Street}    ST

I realized that RegEx might be overkill here, especially since I'm going to be looping through address fields over and over in a database, but couldn't think of an easier way just using the Replace() function (Update: see responses from @mwolfe02 and @Cylian, and a hybrid solution).

我意识到RegEx在这里可能有些过分,特别是因为我将在数据库中反复遍历地址字段,但是想到使用Replace()函数时更容易想到更新(更新:请参阅@ mwolfe02和@Cylian,以及混合解决方案)。

In the above example, I want to search for the words North and Street when they are either as a exist as word in a string (thus separated by two white spaces) or at the end of the string or beginning of a string. This covers most of the situations that warrant an abbreviation. e.g.

在上面的例子中,我想搜索单词North和Street,它们或者作为字符串中的单词存在(因此由两个空格分隔)或者在字符串的结尾或字符串的开头。这涵盖了大多数需要缩写的情况。例如

address                       formatted
----------------------        --------------------------
123 North 3rd St           -> 123 N 3RD ST
123 ABC Street North       -> 123 ABC ST N
North 3rd Street           -> N 3RD ST
123 North Northampton St   -> 123 N NORTHAMPTON ST

As in these examples, I want to replace all instances of the pattern in the string. I also am converting everything to upper case (I can use UCase() on the final result no problem).

在这些示例中,我想替换字符串中模式的所有实例。我也将一切都转换为大写(我可以使用UCase()对最终结果没问题)。

Does anyone know of an existing module that does this sort of thing? Can anyone help with the pattern matching as in the above examples? For extra credit, I'm curious also about creating rule in the table to format post office boxes, e.g.

有没有人知道这样做的现有模块?任何人都可以帮助模式匹配,如上例所示?为了额外的功劳,我很好奇还要在表格中创建规则以格式化邮政信箱,例如:

address                   formatted
----------------------    --------------------------
P.O. Box 345           -> PO BOX 345
PO Box 345             -> PO BOX 345
Post Office Box 345    -> PO BOX 345
PO. Box 345            -> PO BOX 345
P. O. Box 345          -> PO BOX 345

This stack overflow post gives the following pattern to recognize some PO boxes "^\s*P.?\s?O.?\sB[Oo][Xx]." (admittedly not the third example above). Again, I'm not as comfortable with matching and replacement sets to figure out how to write this more precise replace function. Is there a RegEx/Access expert who can help?

这个堆栈溢出帖子给出了以下模式来识别一些PO框“^ \ s * P。?\ s?O。?\ sB [Oo] [Xx]。” (诚​​然,不是上面的第三个例子)。同样,我对匹配和替换设置不太熟悉,以找出如何编写这种更精确的替换功能。是否有RegEx / Access专家可以提供帮助?

4 个解决方案

#1


5  

Try this function

试试这个功能

Public Function FormatPO(inputString$)
'This example uses **Microsoft VBScript Regular Expressions 5.5**
Dim re As New RegExp, result$
With re
    .Pattern = "\bP(?:[. ]+|ost +)?O(?:ff\.?(?:ice))?[. ]+B(?:ox|\.) +(\d+)\b"
    .Global = True
    .IgnoreCase = True
    If .test(inputString) Then
        FormatPO = .Replace(inputString, "PO BOX $1")
    Else
        MsgBox "Data doesn't matched!"
    End If
End With

and could be called as (from immediate window)

并且可以被称为(从即时窗口)

?FormatPO("P.O. Box 563")

gives result

给出结果

PO BOX 563

Matching pattern for Street names with addresses need more time to built. But you could visit here and build your RegEx online.

街道名称与地址的匹配模式需要更多时间来构建。但您可以访问此处并在线构建您的RegEx。

Hope this helps.

希望这可以帮助。

#2


2  

@Cylian has a good answer for the second part of your question. I'll try to address the first. If your only concern is that you replace whole words in the address then the following function will do what you need:

@Cylian对你问题的第二部分有一个很好的答案。我会尝试解决第一个问题。如果您唯一关心的是替换地址中的整个单词,那么以下函数将执行您所需的操作:

Function AddressReplace(AddressLine As String, _
                        FullName As String, _
                        Abbrev As String)
    AddressReplace = Trim(Replace(" " & AddressLine & " ", _
                                  " " & FullName & " ", _
                                  " " & Abbrev & " "))
End Function

It encloses the address line in an opening and closing space, so that you can require an opening and closing space on each word you are trying to replace. It finishes up with a trim to get rid of those temporary spaces.

它将地址行包含在打开和关闭空间中,因此您可以在要替换的每个单词上需要打开和关闭空间。它完成修剪以摆脱那些临时空间。

The following procedure tests the code and produces the output you are looking for:

以下过程测试代码并生成您要查找的输出:

Sub TestAddressReplace()
    Debug.Print AddressReplace("123 North 3rd St", "North", "N")
    Debug.Print AddressReplace("123 Northampton St", "North", "N")
End Sub

#3


2  

The USPS has a free lookup API to validate and standardize addresses. You will need to register for the service (quick), and then use your id/password in the API to bounce against their site. Does all the work for you, and has sample code. Canadian postal service has same thing (not sure it's free though).

USPS有一个免费的查找API来验证和标准化地址。您需要注册该服务(快速),然后在API中使用您的ID /密码来反弹他们的网站。是否所有工作都适合您,并提供示例代码。加拿大邮政服务有同样的事情(不确定它是免费的)。

https://www.usps.com/business/web-tools-apis/welcome.htm

https://www.usps.com/business/web-tools-apis/welcome.htm

B. Sevier

B.塞维尔

#4


0  

I created a very simple reference table ref_USPS_abbrev from the USPS Abbreviation list online. Here's the entries that correspond to the example originally given:

我从USPS在线缩写列表中创建了一个非常简单的参考表ref_USPS_abbrev。这是与最初给出的示例相对应的条目:

WORD          ABBREV
------------  -------------
NORTH         N
STREET        ST

Then, incorporating responses to my original post, I created two helper functions.

然后,结合对原始帖子的回复,我创建了两个辅助函数。

From @Cylian:

来自@Cylian:

    ' ----------------------------------------------------------------------'
    '  Formats string containing P.O. Box to USPS Approved PO BOX format    '
    ' ----------------------------------------------------------------------'
    '  Requires Microsoft VBScript Regular Expressions 5.5

    Public Function FormatPO(inputString As String) As String

        Static rePO As Object
        If rePO Is Nothing Then
            Set rePO = CreateObject("vbscript.regexp")
        With rePO
        .Pattern = "\bP(?:[. ]+|ost +)?O(?:ff\.?(?:ice))" & _
                   "?[. ]+B(?:ox|\.) +(\d+)\b"
        .Global = True
        .IgnoreCase = True
        End With
        End If

        With rePO
           If .Test(inputString) Then
              FormatPO = .Replace(inputString, "PO BOX $1")
           Else
              FormatPO = inputString
           End If
        End With
    End Function

And, using @mwolfe02's excellent idea:

并且,使用@ mwolfe02的优秀想法:

    ' ----------------------------------------------------------------------'
    '  Replaces whole word only with an abbreviation in address string      '
    ' ----------------------------------------------------------------------'

    Public Function AddressReplace(AddressLine As String, _
                    FullName As String, _
                    Abbrev As String)

    'Enclose address line in an opening and closing space, so that you 
    'can require an opening and closing space on each word you are trying 
    'to replace. Finish up with a trim to get rid of those temporary spaces.

    AddressReplace = Trim(Replace(" " & AddressLine & " ", _
                              " " & FullName & " ", _
                              " " & Abbrev & " "))
    End Function

Then, incorporating these helper functions, I wrote this function:

然后,结合这些辅助函数,我写了这个函数:

' ----------------------------------------------------------------------'
'  Format address using abbreviations stored in table ref_USPS_abbrev   '
' ----------------------------------------------------------------------'  
'  Requires Microsoft DAO 3.6 Object Library
'  Table ref_USPS_abbrev has two fields: WORD (containing the word to match) 
'  and ABBREV containing the desired abbreviated substitution.
'  United States Postal Services abbreviations are available at:
'  https://www.usps.com/ship/official-abbreviations.htm

Public Function SubstituteUSPS(address As String) As String

Static dba As DAO.Database
Static rst_abbrev As DAO.Recordset

    If IsNull(address) Then Exit Function

    'Initialize the objects 

    If dba Is Nothing Then
        Set dba = CurrentDb
    End If

    'Create the rst_abbrev recordset once from ref_USPS_abbrev. If additional
    'entries are added to the source ref_USPS_abbrev table after the recordset 
    'is created, since it is an dbOpenTable (by default), the recordset will 
    'be updated dynamically. If you use dbOpenSnapshot it will not update 
    'dynamically.

    If rst_abbrev Is Nothing Then
        Set rst_abbrev = dba.OpenRecordset("ref_USPS_abbrev",  _
                                           Type:=dbOpenTable)
    End If

    'Since rst_abbrev is a static object, in the event the function is called 
    'in succession (e.g. while looping through a recordset to update values), 
    'move to the first entry in the recordset each time the function is 
    'called.

    rst_abbrev.MoveFirst

    'Only call the FormatPO helper function if the address has the 
    'string "ox" in it.    

    If InStr(address, "ox") > 0 Then
        address = FormatPO(address)
    End If

    'Loop through the recordset containing the abbreviations
    'and use the AddressReplace helper function to substitute 
    'abbreviations for whole words only.

    Do Until rst_abbrev.EOF
        address = AddressReplace(address, rst_abbrev![WORD],  _
                                 rst_abbrev![ABBREV])
        rst_abbrev.MoveNext
    Loop

    'Convert the address to upper case and trim white spaces and return result
    'You can also add code here to trim out punctuation in the address, too.

    SubstituteUSPS = Trim(UCase(address))

End Function

To create the ref_USPS_abbrev table for testing:

要创建用于测试的ref_USPS_abbrev表:

Sub CreateUSPSTable()

Dim dbs As Database
Set dbs = CurrentDb

With dbs
    .Execute "CREATE TABLE ref_USPS_abbrev " _
        & "(WORD CHAR, ABBREV CHAR);"
    .Execute " INSERT INTO ref_USPS_abbrev " _
        & "(WORD, ABBREV) VALUES " _
        & "('NORTH', 'N');"
    .Execute " INSERT INTO ref_USPS_abbrev " _
        & "(WORD, ABBREV) VALUES " _
        & "('STREET', 'ST');"
    .Close
End With
End Sub

Finally, testing this function from the immediate window:

最后,从即时窗口测试此函数:

 CreateUSPSTable
 ?SubstituteUSPS("Post Office Box 345 123 North Northampton Street")
 PO BOX 345 123 N NORTHAMPTON ST

I'm not a programmer professionally, so I'd welcome suggestions for cleaning up my code even further, but for now this works great. Thanks, everyone.

我不是专业的程序员,所以我欢迎进一步清理我的代码的建议,但是现在这很好用。感谢大家。

Stack Overflow yet again FTW!

FTW再次堆栈溢出!

#1


5  

Try this function

试试这个功能

Public Function FormatPO(inputString$)
'This example uses **Microsoft VBScript Regular Expressions 5.5**
Dim re As New RegExp, result$
With re
    .Pattern = "\bP(?:[. ]+|ost +)?O(?:ff\.?(?:ice))?[. ]+B(?:ox|\.) +(\d+)\b"
    .Global = True
    .IgnoreCase = True
    If .test(inputString) Then
        FormatPO = .Replace(inputString, "PO BOX $1")
    Else
        MsgBox "Data doesn't matched!"
    End If
End With

and could be called as (from immediate window)

并且可以被称为(从即时窗口)

?FormatPO("P.O. Box 563")

gives result

给出结果

PO BOX 563

Matching pattern for Street names with addresses need more time to built. But you could visit here and build your RegEx online.

街道名称与地址的匹配模式需要更多时间来构建。但您可以访问此处并在线构建您的RegEx。

Hope this helps.

希望这可以帮助。

#2


2  

@Cylian has a good answer for the second part of your question. I'll try to address the first. If your only concern is that you replace whole words in the address then the following function will do what you need:

@Cylian对你问题的第二部分有一个很好的答案。我会尝试解决第一个问题。如果您唯一关心的是替换地址中的整个单词,那么以下函数将执行您所需的操作:

Function AddressReplace(AddressLine As String, _
                        FullName As String, _
                        Abbrev As String)
    AddressReplace = Trim(Replace(" " & AddressLine & " ", _
                                  " " & FullName & " ", _
                                  " " & Abbrev & " "))
End Function

It encloses the address line in an opening and closing space, so that you can require an opening and closing space on each word you are trying to replace. It finishes up with a trim to get rid of those temporary spaces.

它将地址行包含在打开和关闭空间中,因此您可以在要替换的每个单词上需要打开和关闭空间。它完成修剪以摆脱那些临时空间。

The following procedure tests the code and produces the output you are looking for:

以下过程测试代码并生成您要查找的输出:

Sub TestAddressReplace()
    Debug.Print AddressReplace("123 North 3rd St", "North", "N")
    Debug.Print AddressReplace("123 Northampton St", "North", "N")
End Sub

#3


2  

The USPS has a free lookup API to validate and standardize addresses. You will need to register for the service (quick), and then use your id/password in the API to bounce against their site. Does all the work for you, and has sample code. Canadian postal service has same thing (not sure it's free though).

USPS有一个免费的查找API来验证和标准化地址。您需要注册该服务(快速),然后在API中使用您的ID /密码来反弹他们的网站。是否所有工作都适合您,并提供示例代码。加拿大邮政服务有同样的事情(不确定它是免费的)。

https://www.usps.com/business/web-tools-apis/welcome.htm

https://www.usps.com/business/web-tools-apis/welcome.htm

B. Sevier

B.塞维尔

#4


0  

I created a very simple reference table ref_USPS_abbrev from the USPS Abbreviation list online. Here's the entries that correspond to the example originally given:

我从USPS在线缩写列表中创建了一个非常简单的参考表ref_USPS_abbrev。这是与最初给出的示例相对应的条目:

WORD          ABBREV
------------  -------------
NORTH         N
STREET        ST

Then, incorporating responses to my original post, I created two helper functions.

然后,结合对原始帖子的回复,我创建了两个辅助函数。

From @Cylian:

来自@Cylian:

    ' ----------------------------------------------------------------------'
    '  Formats string containing P.O. Box to USPS Approved PO BOX format    '
    ' ----------------------------------------------------------------------'
    '  Requires Microsoft VBScript Regular Expressions 5.5

    Public Function FormatPO(inputString As String) As String

        Static rePO As Object
        If rePO Is Nothing Then
            Set rePO = CreateObject("vbscript.regexp")
        With rePO
        .Pattern = "\bP(?:[. ]+|ost +)?O(?:ff\.?(?:ice))" & _
                   "?[. ]+B(?:ox|\.) +(\d+)\b"
        .Global = True
        .IgnoreCase = True
        End With
        End If

        With rePO
           If .Test(inputString) Then
              FormatPO = .Replace(inputString, "PO BOX $1")
           Else
              FormatPO = inputString
           End If
        End With
    End Function

And, using @mwolfe02's excellent idea:

并且,使用@ mwolfe02的优秀想法:

    ' ----------------------------------------------------------------------'
    '  Replaces whole word only with an abbreviation in address string      '
    ' ----------------------------------------------------------------------'

    Public Function AddressReplace(AddressLine As String, _
                    FullName As String, _
                    Abbrev As String)

    'Enclose address line in an opening and closing space, so that you 
    'can require an opening and closing space on each word you are trying 
    'to replace. Finish up with a trim to get rid of those temporary spaces.

    AddressReplace = Trim(Replace(" " & AddressLine & " ", _
                              " " & FullName & " ", _
                              " " & Abbrev & " "))
    End Function

Then, incorporating these helper functions, I wrote this function:

然后,结合这些辅助函数,我写了这个函数:

' ----------------------------------------------------------------------'
'  Format address using abbreviations stored in table ref_USPS_abbrev   '
' ----------------------------------------------------------------------'  
'  Requires Microsoft DAO 3.6 Object Library
'  Table ref_USPS_abbrev has two fields: WORD (containing the word to match) 
'  and ABBREV containing the desired abbreviated substitution.
'  United States Postal Services abbreviations are available at:
'  https://www.usps.com/ship/official-abbreviations.htm

Public Function SubstituteUSPS(address As String) As String

Static dba As DAO.Database
Static rst_abbrev As DAO.Recordset

    If IsNull(address) Then Exit Function

    'Initialize the objects 

    If dba Is Nothing Then
        Set dba = CurrentDb
    End If

    'Create the rst_abbrev recordset once from ref_USPS_abbrev. If additional
    'entries are added to the source ref_USPS_abbrev table after the recordset 
    'is created, since it is an dbOpenTable (by default), the recordset will 
    'be updated dynamically. If you use dbOpenSnapshot it will not update 
    'dynamically.

    If rst_abbrev Is Nothing Then
        Set rst_abbrev = dba.OpenRecordset("ref_USPS_abbrev",  _
                                           Type:=dbOpenTable)
    End If

    'Since rst_abbrev is a static object, in the event the function is called 
    'in succession (e.g. while looping through a recordset to update values), 
    'move to the first entry in the recordset each time the function is 
    'called.

    rst_abbrev.MoveFirst

    'Only call the FormatPO helper function if the address has the 
    'string "ox" in it.    

    If InStr(address, "ox") > 0 Then
        address = FormatPO(address)
    End If

    'Loop through the recordset containing the abbreviations
    'and use the AddressReplace helper function to substitute 
    'abbreviations for whole words only.

    Do Until rst_abbrev.EOF
        address = AddressReplace(address, rst_abbrev![WORD],  _
                                 rst_abbrev![ABBREV])
        rst_abbrev.MoveNext
    Loop

    'Convert the address to upper case and trim white spaces and return result
    'You can also add code here to trim out punctuation in the address, too.

    SubstituteUSPS = Trim(UCase(address))

End Function

To create the ref_USPS_abbrev table for testing:

要创建用于测试的ref_USPS_abbrev表:

Sub CreateUSPSTable()

Dim dbs As Database
Set dbs = CurrentDb

With dbs
    .Execute "CREATE TABLE ref_USPS_abbrev " _
        & "(WORD CHAR, ABBREV CHAR);"
    .Execute " INSERT INTO ref_USPS_abbrev " _
        & "(WORD, ABBREV) VALUES " _
        & "('NORTH', 'N');"
    .Execute " INSERT INTO ref_USPS_abbrev " _
        & "(WORD, ABBREV) VALUES " _
        & "('STREET', 'ST');"
    .Close
End With
End Sub

Finally, testing this function from the immediate window:

最后,从即时窗口测试此函数:

 CreateUSPSTable
 ?SubstituteUSPS("Post Office Box 345 123 North Northampton Street")
 PO BOX 345 123 N NORTHAMPTON ST

I'm not a programmer professionally, so I'd welcome suggestions for cleaning up my code even further, but for now this works great. Thanks, everyone.

我不是专业的程序员,所以我欢迎进一步清理我的代码的建议,但是现在这很好用。感谢大家。

Stack Overflow yet again FTW!

FTW再次堆栈溢出!