Python如何强制一个字符串匹配另一个字符串的格式

时间:2021-12-19 23:58:21

I have a few Python scripts I have written for the Assessor's office where I work. Most of these ask for an input parcel ID number (this is then used to grab certain data through an odbc). They are not very consistent about how they input parcel ID's.

我有一些我为Assessor办公室编写的Python脚本。其中大多数要求输入包裹ID号(然后用于通过odbc获取某些数据)。他们对如何输入包裹ID并不十分一致。

So here is my problem, they enter a parcel ID in one of 3 ways:

所以这是我的问题,他们以3种方式之一输入包裹ID:

1: '1005191000060'

1:'1005191000060'

2: '001005191000060'

2:'001005191000060'

3: '0010-05-19-100-006-0'

3:'0010-05-19-100-006-0'

The third way is the correct way, so I need to make sure the input is fixed to always match that format. Of course, they would rather type in the ID one of the first two ways. The parcel numbers must always be 15 digits long (20 with dashes)

第三种方法是正确的方法,所以我需要确保输入被修复为始终匹配该格式。当然,他们宁愿输入ID中的前两种方式之一。包裹号码必须始终为15位数(带破折号的20位)

I currently have a working method on how I fix the parcel ID, but it is very ugly. I am wondering if anyone knows a better way (or a more "Pythonic" way). I have a function that usually gets imported to all these scripts. Here is what I have:

我目前有一个关于如何修复宗地ID的工作方法,但它非常难看。我想知道是否有人知道更好的方式(或更多的“Pythonic”方式)。我有一个通常导入到所有这些脚本的函数。这是我有的:

import re

def FormatPID(in_pid):
    pid_format = re.compile('\d{4}-\d{2}-\d{2}-\d{3}-\d{3}-\d{1}')
    pid = in_pid.zfill(15) 
    if not pid_format.match(pid):
        fixed_pid = '-'.join([pid[:4],pid[4:6],pid[6:8],pid[8:11],pid[11:-1],pid[-1]])
        return fixed_pid
    else:
        return pid

if __name__ == '__main__':

    pid = '1005191000060'
##    pid = '001005191000060'
##    pid = '0010-05-19-100-006-0'

    # test
    t = FormatPID(pid)
    print t

This does work just fine, but I have been bothered by this ugly code for a while and I am thinking there has got to be a better way than slicing it. I am hoping there is a way I can "force" it to be converted to a string to match the "pid_format" variable. Any ideas? I couldn't find anything to do this in the regular expressions module

这确实很好用,但我已经被这个丑陋的代码困扰了一段时间,我认为必须有一个比切片更好的方法。我希望有一种方法可以“强制”将其转换为字符串以匹配“pid_format”变量。有任何想法吗?我在正则表达式模块中找不到任何要做的事情

2 个解决方案

#1


3  

Instead of manual slicing you can use itertools.islice:

您可以使用itertools.islice而不是手动切片:

import re
from itertools import islice
groups = (4, 2, 2, 3, 3, 1)
def FormatPID(in_pid):
    pid_format = re.compile('\d{4}-\d{2}-\d{2}-\d{3}-\d{3}-\d{1}')
    in_pid = in_pid.zfill(15)
    if not pid_format.match(in_pid):
        it = iter(in_pid)
        return '-'.join(''.join(islice(it, i)) for i in groups)
    return in_pid

print FormatPID('1005191000060')
print FormatPID('001005191000060')
print FormatPID('0010-05-19-100-006-0')

Output:

输出:

0010-05-19-100-006-0
0010-05-19-100-006-0
0010-05-19-100-006-0

#2


4  

I wouldn't bother using regexes. You just want to get all the digits, ignoring hyphens, left-pad with 0s, then insert the hyphens in the right places, right? So:

我不打扰使用正则表达式。你只想得到所有的数字,忽略连字符,左边的垫子为0,然后在正确的位置插入连字符,对吧?所以:

def format_pid(pid):
    p = pid.replace('-', '')
    if not p.isdigit():
        raise ValueError('Invalid format: {}'.format(pid))
    p = p.zfill(15)
    # You can use your `join` call instead of the following if you prefer.
    # Or Ashwini's islice call.
    return '{}-{}-{}-{}-{}-{}'.format(p[:4], p[4:6], p[6:8], p[8:11], p[11:14], p[14:])

#1


3  

Instead of manual slicing you can use itertools.islice:

您可以使用itertools.islice而不是手动切片:

import re
from itertools import islice
groups = (4, 2, 2, 3, 3, 1)
def FormatPID(in_pid):
    pid_format = re.compile('\d{4}-\d{2}-\d{2}-\d{3}-\d{3}-\d{1}')
    in_pid = in_pid.zfill(15)
    if not pid_format.match(in_pid):
        it = iter(in_pid)
        return '-'.join(''.join(islice(it, i)) for i in groups)
    return in_pid

print FormatPID('1005191000060')
print FormatPID('001005191000060')
print FormatPID('0010-05-19-100-006-0')

Output:

输出:

0010-05-19-100-006-0
0010-05-19-100-006-0
0010-05-19-100-006-0

#2


4  

I wouldn't bother using regexes. You just want to get all the digits, ignoring hyphens, left-pad with 0s, then insert the hyphens in the right places, right? So:

我不打扰使用正则表达式。你只想得到所有的数字,忽略连字符,左边的垫子为0,然后在正确的位置插入连字符,对吧?所以:

def format_pid(pid):
    p = pid.replace('-', '')
    if not p.isdigit():
        raise ValueError('Invalid format: {}'.format(pid))
    p = p.zfill(15)
    # You can use your `join` call instead of the following if you prefer.
    # Or Ashwini's islice call.
    return '{}-{}-{}-{}-{}-{}'.format(p[:4], p[4:6], p[6:8], p[8:11], p[11:14], p[14:])