用于解析目录和文件名的正则表达式

时间:2021-12-11 01:42:58

I'm trying to write a regex that will parse out the directory and filename of a fully qualified path using matching groups.

我正在编写一个regex,它将使用匹配的组解析完全限定路径的目录和文件名。

so...

所以…

/var/log/xyz/10032008.log

would recognize group 1 to be "/var/log/xyz" and group 2 to be "10032008.log"

将第1组识别为“/var/log/xyz”,第2组识别为“10032008.log”

Seems simple but I can't get the matching groups to work for the life of me.

看起来很简单,但是我不能让配对组为我的生活工作。

NOTE: As pointed out by some of the respondents this is probably not a good use of regular expressions. Generally I'd prefer to use the file API of the language I was using. What I'm actually trying to do is a little more complicated than this but would have been much more difficult to explain, so I chose a domain that everyone would be familiar with in order to most succinctly describe the root problem.

注意:正如一些受访者指出的,这可能不是正则表达式的好用法。一般来说,我更喜欢使用我所使用的语言的文件API。实际上我想要做的事情比这要复杂的多,但是要解释起来会困难得多,所以我选择了一个大家都熟悉的领域,以最简洁地描述根本问题。

8 个解决方案

#1


25  

Try this:

试试这个:

^(.+)/([^/]+)$

#2


12  

In languages that support regular expressions with non-capturing groups:

在支持非捕获组正则表达式的语言中:

((?:[^/]*/)*)(.*)

I'll explain the gnarly regex by exploding it...

我将通过爆炸来解释这个恶心的正则表达式……

(
  (?:
    [^/]*
    /
  )
  *
)
(.*)

What the parts mean:

这些零件是什么意思:

(  -- capture group 1 starts
  (?:  -- non-capturing group starts
    [^/]*  -- greedily match as many non-directory separators as possible
    /  -- match a single directory-separator character
  )  -- non-capturing group ends
  *  -- repeat the non-capturing group zero-or-more times
)  -- capture group 1 ends
(.*)  -- capture all remaining characters in group 2

Example

To test the regular expression, I used the following Perl script...

为了测试正则表达式,我使用了以下Perl脚本…

#!/usr/bin/perl -w

use strict;
use warnings;

sub test {
  my $str = shift;
  my $testname = shift;

  $str =~ m#((?:[^/]*/)*)(.*)#;

  print "$str -- $testname\n";
  print "  1: $1\n";
  print "  2: $2\n\n";
}

test('/var/log/xyz/10032008.log', 'absolute path');
test('var/log/xyz/10032008.log', 'relative path');
test('10032008.log', 'filename-only');
test('/10032008.log', 'file directly under root');

The output of the script...

脚本的输出…

/var/log/xyz/10032008.log -- absolute path
  1: /var/log/xyz/
  2: 10032008.log

var/log/xyz/10032008.log -- relative path
  1: var/log/xyz/
  2: 10032008.log

10032008.log -- filename-only
  1:
  2: 10032008.log

/10032008.log -- file directly under root
  1: /
  2: 10032008.log

#3


8  

Most languages have path parsing functions that will give you this already. If you have the ability, I'd recommend using what comes to you for free out-of-the-box.

大多数语言都有路径解析函数,这些函数已经提供给您了。如果你有这个能力,我建议你使用免费的开箱即用的东西。

Assuming / is the path delimiter...

假设/是路径分隔符…

^(.*/)([^/]*)$

The first group will be whatever the directory/path info is, the second will be the filename. For example:

第一个组将是无论目录/路径信息是什么,第二个组将是文件名。例如:

  • /foo/bar/baz.log: "/foo/bar/" is the path, "baz.log" is the file
  • / foo / bar /巴兹。"/foo/bar/"是路径"baz。日志”文件
  • foo/bar.log: "foo/" is the path, "bar.log" is the file
  • foo / bar。log:“foo/”是路径,“bar”。日志”文件
  • /foo/bar: "/foo/" is the path, "bar" is the file
  • /foo/bar: "/foo/"是路径,"bar"是文件
  • /foo/bar/: "/foo/bar/" is the path and there is no file.
  • /foo/bar/:“/foo/bar/”是路径,没有文件。

#4


4  

What language? and why use regex for this simple task?

什么语言?为什么要在这个简单的任务中使用regex呢?

If you must:

如果你必须:

^(.*)/([^/]*)$

gives you the two parts you wanted. You might need to quote the parentheses:

给你你想要的两部分。您可能需要引用括号:

^\(.*\)/\([^/]*\)$

depending on your preferred language syntax.

取决于您的首选语言语法。

But I suggest you just use your language's string search function that finds the last "/" character, and split the string on that index.

但是我建议您只使用您的语言的字符串搜索函数来查找最后的“/”字符,并在该索引上拆分字符串。

#5


1  

What about this?

这是什么?

[/]{0,1}([^/]+[/])*([^/]*)

Deterministic :

确定性:

((/)|())([^/]+/)*([^/]*)

Strict :

严格:

^[/]{0,1}([^/]+[/])*([^/]*)$
^((/)|())([^/]+/)*([^/]*)$

#6


0  

Try this:

试试这个:

/^(\/([^/]+\/)*)(.*)$/

It will leave the trailing slash on the path, though.

它会在路径上留下斜线。

#7


0  

A very late answer, but hope this will help

这是一个非常晚的回答,但希望这能有所帮助

^(.+?)/([\w]+\.log)$

This uses lazy check for /, and I just modified the accepted answer

这使用了延迟检查/,并且我刚刚修改了已接受的答案

http://regex101.com/r/gV2xB7/1

http://regex101.com/r/gV2xB7/1

#8


-4  

I would avoid doing that with regex. I would use your language's included facilities for parsing the path names, and use regex for just the searching for which its nature is required.

我将避免使用regex。我将使用您的语言包含的工具来解析路径名,并使用regex进行搜索,而搜索的性质是必需的。

#1


25  

Try this:

试试这个:

^(.+)/([^/]+)$

#2


12  

In languages that support regular expressions with non-capturing groups:

在支持非捕获组正则表达式的语言中:

((?:[^/]*/)*)(.*)

I'll explain the gnarly regex by exploding it...

我将通过爆炸来解释这个恶心的正则表达式……

(
  (?:
    [^/]*
    /
  )
  *
)
(.*)

What the parts mean:

这些零件是什么意思:

(  -- capture group 1 starts
  (?:  -- non-capturing group starts
    [^/]*  -- greedily match as many non-directory separators as possible
    /  -- match a single directory-separator character
  )  -- non-capturing group ends
  *  -- repeat the non-capturing group zero-or-more times
)  -- capture group 1 ends
(.*)  -- capture all remaining characters in group 2

Example

To test the regular expression, I used the following Perl script...

为了测试正则表达式,我使用了以下Perl脚本…

#!/usr/bin/perl -w

use strict;
use warnings;

sub test {
  my $str = shift;
  my $testname = shift;

  $str =~ m#((?:[^/]*/)*)(.*)#;

  print "$str -- $testname\n";
  print "  1: $1\n";
  print "  2: $2\n\n";
}

test('/var/log/xyz/10032008.log', 'absolute path');
test('var/log/xyz/10032008.log', 'relative path');
test('10032008.log', 'filename-only');
test('/10032008.log', 'file directly under root');

The output of the script...

脚本的输出…

/var/log/xyz/10032008.log -- absolute path
  1: /var/log/xyz/
  2: 10032008.log

var/log/xyz/10032008.log -- relative path
  1: var/log/xyz/
  2: 10032008.log

10032008.log -- filename-only
  1:
  2: 10032008.log

/10032008.log -- file directly under root
  1: /
  2: 10032008.log

#3


8  

Most languages have path parsing functions that will give you this already. If you have the ability, I'd recommend using what comes to you for free out-of-the-box.

大多数语言都有路径解析函数,这些函数已经提供给您了。如果你有这个能力,我建议你使用免费的开箱即用的东西。

Assuming / is the path delimiter...

假设/是路径分隔符…

^(.*/)([^/]*)$

The first group will be whatever the directory/path info is, the second will be the filename. For example:

第一个组将是无论目录/路径信息是什么,第二个组将是文件名。例如:

  • /foo/bar/baz.log: "/foo/bar/" is the path, "baz.log" is the file
  • / foo / bar /巴兹。"/foo/bar/"是路径"baz。日志”文件
  • foo/bar.log: "foo/" is the path, "bar.log" is the file
  • foo / bar。log:“foo/”是路径,“bar”。日志”文件
  • /foo/bar: "/foo/" is the path, "bar" is the file
  • /foo/bar: "/foo/"是路径,"bar"是文件
  • /foo/bar/: "/foo/bar/" is the path and there is no file.
  • /foo/bar/:“/foo/bar/”是路径,没有文件。

#4


4  

What language? and why use regex for this simple task?

什么语言?为什么要在这个简单的任务中使用regex呢?

If you must:

如果你必须:

^(.*)/([^/]*)$

gives you the two parts you wanted. You might need to quote the parentheses:

给你你想要的两部分。您可能需要引用括号:

^\(.*\)/\([^/]*\)$

depending on your preferred language syntax.

取决于您的首选语言语法。

But I suggest you just use your language's string search function that finds the last "/" character, and split the string on that index.

但是我建议您只使用您的语言的字符串搜索函数来查找最后的“/”字符,并在该索引上拆分字符串。

#5


1  

What about this?

这是什么?

[/]{0,1}([^/]+[/])*([^/]*)

Deterministic :

确定性:

((/)|())([^/]+/)*([^/]*)

Strict :

严格:

^[/]{0,1}([^/]+[/])*([^/]*)$
^((/)|())([^/]+/)*([^/]*)$

#6


0  

Try this:

试试这个:

/^(\/([^/]+\/)*)(.*)$/

It will leave the trailing slash on the path, though.

它会在路径上留下斜线。

#7


0  

A very late answer, but hope this will help

这是一个非常晚的回答,但希望这能有所帮助

^(.+?)/([\w]+\.log)$

This uses lazy check for /, and I just modified the accepted answer

这使用了延迟检查/,并且我刚刚修改了已接受的答案

http://regex101.com/r/gV2xB7/1

http://regex101.com/r/gV2xB7/1

#8


-4  

I would avoid doing that with regex. I would use your language's included facilities for parsing the path names, and use regex for just the searching for which its nature is required.

我将避免使用regex。我将使用您的语言包含的工具来解析路径名,并使用regex进行搜索,而搜索的性质是必需的。