正则表达式,如何在12/24小时时间戳中删除除冒号之外的所有非字母数字?

时间:2022-09-06 11:13:36

I have a string like:

我有一个字符串:

Today, 3:30pm - Group Meeting to discuss "big idea"

How do you construct a regex such that after parsing it would return:

你如何构造一个正则表达式,以便解析后它将返回:

Today 3:30pm Group Meeting to discuss big idea

I would like it to remove all non-alphanumeric characters except for those that appear in a 12 or 24 hour time stamp.

我希望删除除12或24小时时间戳中出现的所有非字母数字字符。

6 个解决方案

#1


7  

# this: D:DD, DD:DDam/pm 12/24 hr
re = r':(?=..(?<!\d:\d\d))|[^a-zA-Z0-9 ](?<!:)'

A colon must be preceded by at least one digit and followed by at least two digits: then it's a time. All other colons will be considered textual colons.

冒号前面必须至少有一个数字,后跟至少两位数:然后是时间。所有其他冒号将被视为文本冒号。

How it works

:              // match a colon
(?=..          // match but not capture two chars
  (?<!         // start a negative look-behind group (if it matches, the whole fails)
    \d:\d\d    // time stamp
  )            // end neg. look behind
)              // end non-capture two chars
|              // or
[^a-zA-Z0-9 ]  // match anything not digits or letters
(?<!:)         // that isn't a colon

Then when applied to this silly text:

然后当应用于这个愚蠢的文字:

Today, 3:30pm - Group 1,2,3 Meeting to di4sc::uss3: 2:3:4 "big idea" on 03:33pm or 16:47 is also good

...changes it into:

......将其改为:

Today, 3:30pm  Group 123 Meeting to di4scuss3 234 big idea on 03:33pm or 16:47 is also good

#2


2  

Python.

import string
punct=string.punctuation
s='Today, 3:30pm - Group Meeting:am to discuss "big idea" by our madam'
for item in s.split():
    try:
        t=time.strptime(item,"%H:%M%p")
    except:
        item=''.join([ i for i in item if i not in punct])
    else:
        item=item
    print item,

output

$ ./python.py
Today 3:30pm  Group Meetingam to discuss big idea by our madam

# change to s='Today, 15:30pm - Group 1,2,3 Meeting to di4sc::uss3: 2:3:4 "big idea" on 03:33pm or 16:47 is also good'

$ ./python.py
Today 15:30pm  Group 123 Meeting to di4scuss3 234 big idea on 03:33pm or 1647 is also good

NB: Method should be improved to check for valid time only when necessary(by imposing conditions) , but i will leave it as that for now.

注意:应该改进方法以仅在必要时(通过施加条件)检查有效时间,但我现在将保留它。

#3


1  

I assume you'd like to keep spaces as well, and this implementation is in python, but it's PCRE so it should be portable.

我假设你也想保留空格,这个实现是在python中,但它是PCRE所以它应该是可移植的。

import re
x = u'Today, 3:30pm - Group Meeting to discuss "big idea"'
re.sub(r'[^a-zA-Z0-9: ]', '', x)

Output: 'Today 3:30pm  Group Meeting to discuss big idea'

输出:'今天下午3:30小组会议讨论重大创意'

for a slightly cleaner answer (no double spaces)

一个稍微清晰的答案(没有双重空格)

import re
x = u'Today, 3:30pm - Group Meeting to discuss "big idea"'
tmp = re.sub(r'[^a-zA-Z0-9: ]', '', x)
re.sub(r'[ ]+', ' ', tmp)

Output: 'Today 3:30pm Group Meeting to discuss big idea'

输出:'今天下午3:30小组会议讨论重大创意'

#4


1  

You can try, in Javascript:

您可以在Javascript中尝试:

var re = /(\W+(?!\d{2}[ap]m))/gi;
var input = 'Today, 3:30pm - Group Meeting to discuss "big idea"';
alert(input.replace(re, " "))

#5


0  

Correct regexp to do that would be:

正确的正则表达式将是:

'(?<!\d):|:(?!\d\d)|[^a-zA-Z0-9 :]'

#6


-1  

s="Call me, my dear, at 3:30"

s =“亲爱的,打电话给我,3:30”

re.sub(r'[^\w :]','',s)

'Call me my dear at 3:30'

“3点30分,亲爱的,给我打电话”

#1


7  

# this: D:DD, DD:DDam/pm 12/24 hr
re = r':(?=..(?<!\d:\d\d))|[^a-zA-Z0-9 ](?<!:)'

A colon must be preceded by at least one digit and followed by at least two digits: then it's a time. All other colons will be considered textual colons.

冒号前面必须至少有一个数字,后跟至少两位数:然后是时间。所有其他冒号将被视为文本冒号。

How it works

:              // match a colon
(?=..          // match but not capture two chars
  (?<!         // start a negative look-behind group (if it matches, the whole fails)
    \d:\d\d    // time stamp
  )            // end neg. look behind
)              // end non-capture two chars
|              // or
[^a-zA-Z0-9 ]  // match anything not digits or letters
(?<!:)         // that isn't a colon

Then when applied to this silly text:

然后当应用于这个愚蠢的文字:

Today, 3:30pm - Group 1,2,3 Meeting to di4sc::uss3: 2:3:4 "big idea" on 03:33pm or 16:47 is also good

...changes it into:

......将其改为:

Today, 3:30pm  Group 123 Meeting to di4scuss3 234 big idea on 03:33pm or 16:47 is also good

#2


2  

Python.

import string
punct=string.punctuation
s='Today, 3:30pm - Group Meeting:am to discuss "big idea" by our madam'
for item in s.split():
    try:
        t=time.strptime(item,"%H:%M%p")
    except:
        item=''.join([ i for i in item if i not in punct])
    else:
        item=item
    print item,

output

$ ./python.py
Today 3:30pm  Group Meetingam to discuss big idea by our madam

# change to s='Today, 15:30pm - Group 1,2,3 Meeting to di4sc::uss3: 2:3:4 "big idea" on 03:33pm or 16:47 is also good'

$ ./python.py
Today 15:30pm  Group 123 Meeting to di4scuss3 234 big idea on 03:33pm or 1647 is also good

NB: Method should be improved to check for valid time only when necessary(by imposing conditions) , but i will leave it as that for now.

注意:应该改进方法以仅在必要时(通过施加条件)检查有效时间,但我现在将保留它。

#3


1  

I assume you'd like to keep spaces as well, and this implementation is in python, but it's PCRE so it should be portable.

我假设你也想保留空格,这个实现是在python中,但它是PCRE所以它应该是可移植的。

import re
x = u'Today, 3:30pm - Group Meeting to discuss "big idea"'
re.sub(r'[^a-zA-Z0-9: ]', '', x)

Output: 'Today 3:30pm  Group Meeting to discuss big idea'

输出:'今天下午3:30小组会议讨论重大创意'

for a slightly cleaner answer (no double spaces)

一个稍微清晰的答案(没有双重空格)

import re
x = u'Today, 3:30pm - Group Meeting to discuss "big idea"'
tmp = re.sub(r'[^a-zA-Z0-9: ]', '', x)
re.sub(r'[ ]+', ' ', tmp)

Output: 'Today 3:30pm Group Meeting to discuss big idea'

输出:'今天下午3:30小组会议讨论重大创意'

#4


1  

You can try, in Javascript:

您可以在Javascript中尝试:

var re = /(\W+(?!\d{2}[ap]m))/gi;
var input = 'Today, 3:30pm - Group Meeting to discuss "big idea"';
alert(input.replace(re, " "))

#5


0  

Correct regexp to do that would be:

正确的正则表达式将是:

'(?<!\d):|:(?!\d\d)|[^a-zA-Z0-9 :]'

#6


-1  

s="Call me, my dear, at 3:30"

s =“亲爱的,打电话给我,3:30”

re.sub(r'[^\w :]','',s)

'Call me my dear at 3:30'

“3点30分,亲爱的,给我打电话”