如何使用正则表达式提取字符串

时间:2022-09-13 11:10:28

I have the following string: SL2.40ch12:53884872-53885197.

我有以下字符串:SL2.40ch12:53884872-53885197。

I would like to assign SL2.40ch12 to $chromosome, 53884872 to $start and 53885197 to $end. What's an efficient way using regular expression for doing this?

我想将SL2.40ch12分配给$ chromosome,将53884872分配给$ start,将53885197分配给$ end。使用正则表达式执行此操作的有效方法是什么?

Here's how I tried doing it but my regex is off.

这是我尝试这样做但我的正则表达式已关闭。

my $string = SL2.40ch12:53884872-53885197
my $chromosome =~ /^*\:$/
my $start =~ /^+d\-$/
my $end =~ /^-+d\/

thanks

谢谢

3 个解决方案

#1


2  

For that particular string, you can do something simple like this:

对于该特定字符串,您可以执行以下简单操作:

my $string = "SL2.40ch12:53884872-53885197";
my ($chr, $start, $end) = split /[:-]/, $string, 3; 

If you want it a little stricter, do them separately

如果你想要它更严格一些,那就单独做吧

my ($chr, $range) = split /:/, $string, 2;
my ($start, $end) = split /-/, $range;

This is, of course, assuming that you will not have colons or dashes appearing elsewhere in your data.

当然,这是假设您的数据中不会出现冒号或破折号。

#2


1  

Here is a regex that may do what you want:

这是一个可以做你想要的正则表达式:

($chromosome, $begin, $end) = /^(.*):(.*)-(.*)$/;

#3


0  

I'm not really familiar with Perl, but if it uses common regexp syntax than your $start and $chromosome lines have an mistake. '$' - means end-of-the-line. So it will try to find dash at the end of the line.

我对Perl并不熟悉,但如果它使用常见的正则表达式语法而不是你的$ start和$染色体行有错误。 '$' - 意味着终结。所以它会尝试在行尾找到破折号。

#1


2  

For that particular string, you can do something simple like this:

对于该特定字符串,您可以执行以下简单操作:

my $string = "SL2.40ch12:53884872-53885197";
my ($chr, $start, $end) = split /[:-]/, $string, 3; 

If you want it a little stricter, do them separately

如果你想要它更严格一些,那就单独做吧

my ($chr, $range) = split /:/, $string, 2;
my ($start, $end) = split /-/, $range;

This is, of course, assuming that you will not have colons or dashes appearing elsewhere in your data.

当然,这是假设您的数据中不会出现冒号或破折号。

#2


1  

Here is a regex that may do what you want:

这是一个可以做你想要的正则表达式:

($chromosome, $begin, $end) = /^(.*):(.*)-(.*)$/;

#3


0  

I'm not really familiar with Perl, but if it uses common regexp syntax than your $start and $chromosome lines have an mistake. '$' - means end-of-the-line. So it will try to find dash at the end of the line.

我对Perl并不熟悉,但如果它使用常见的正则表达式语法而不是你的$ start和$染色体行有错误。 '$' - 意味着终结。所以它会尝试在行尾找到破折号。