Perl Regular Expression在特定位置插入/替换字符串

时间:2022-09-13 16:28:01

Given a url the following regular expression is able insert/substitute in words at certain points in the urls.

给定一个URL,以下正则表达式能够在URL中的某些点处插入/替换单词。

Code:

码:

#!/usr/bin/perl

use strict;
use warnings;
#use diagnostics;

my @insert_words = qw/HELLO GOODBYE/;
my $word = 0;
my $match;

while (<DATA>) {
    chomp;
    foreach my $word (@insert_words)
    {
        my $repeat = 1;
        while ((my $match=$_) =~ s|(?<![/])(?:[/](?![/])[^/]*){$repeat}[^/]*\K|$word|)
        {
            print "$match\n";
            $repeat++;
        }

    print "\n";
    }
}

__DATA__
http://www.*.com/dog/cat/rabbit/
http://www.superuser.co.uk/dog/cat/rabbit/hamster/
10.15.16.17/dog/cat/rabbit/

The output given (for the first example url in __DATA__ with the HELLO word):

给出的输出(对于__DATA__中的第一个示例url,带有HELLO字):

http://www.*.com/dogHELLO/cat/rabbit/
http://www.*.com/dog/catHELLO/rabbit/
http://www.*.com/dog/cat/rabbitHELLO/
http://www.*.com/dog/cat/rabbit/HELLO

Where I am now stuck:

我现在陷入困境的地方:

I would now like to alter the regular expression so that the output will look like what is shown below:

我现在想改变正则表达式,使输出看起来如下所示:

http://www.*.com/dogHELLO/cat/rabbit/
http://www.*.com/dog/catHELLO/rabbit/
http://www.*.com/dog/cat/rabbitHELLO/
http://www.*.com/dog/cat/rabbit/HELLO
#above is what it already does at the moment
#below is what i also want it to be able to do as well
http://www.*.com/HELLOdog/cat/rabbit/  #<-puts the word at the start of the string
http://www.*.com/dog/HELLOcat/rabbit/
http://www.*.com/dog/cat/HELLOrabbit/
http://www.*.com/dog/cat/rabbit/HELLO
http://www.*.com/HELLO/cat/rabbit/  #<- now also replaces the string with the word
http://www.*.com/dog/HELLO/rabbit/
http://www.*.com/dog/cat/HELLO/
http://www.*.com/dog/cat/rabbit/HELLO

But I am having trouble getting it to automatically do this within the one regular expression.

但我无法让它在一个正则表达式中自动执行此操作。

Any help with this matter would be highly appreciated, many thanks

对此事的任何帮助都将受到高度赞赏,非常感谢

3 个解决方案

#1


1  

One solution:

一解决方案:

use strict;
use warnings;

use URI qw( );

my @insert_words = qw( HELLO );

while (<DATA>) {
   chomp;
   my $url = URI->new($_);
   my $path = $url->path();

   for (@insert_words) {
      # Use package vars to communicate with /(?{})/ blocks.
      local our $insert_word = $_;
      local our @paths;
      $path =~ m{
         ^(.*/)([^/]*)((?:/.*)?)\z
         (?{
            push @paths, "$1$insert_word$2$3";
            if (length($2)) {
               push @paths, "$1$insert_word$3";
               push @paths, "$1$2$insert_word$3";
            }
         })
         (?!)
      }x;

      for (@paths) {
         $url->path($_);
         print "$url\n";
      }
   }
}

__DATA__
http://www.*.com/dog/cat/rabbit/
http://www.superuser.co.uk/dog/cat/rabbit/hamster/
http://10.15.16.17/dog/cat/rabbit/

#2


1  

Without crazy regexes:

没有疯狂的正则表达式:

use strict;
use warnings;

use URI qw( );

my @insert_words = qw( HELLO );

while (<DATA>) {
   chomp;
   my $url = URI->new($_);
   my $path = $url->path();

   for my $insert_word (@insert_words) {
      my @parts = $path =~ m{/([^/]*)}g;
      my @paths;
      for my $part_idx (0..$#parts) {
         my $orig_part = $parts[$part_idx];
         local $parts[$part_idx];
         {
            $parts[$part_idx] = $insert_word . $orig_part;
            push @paths, join '', map "/$_", @parts;
         }
         if (length($orig_part)) {
            {
               $parts[$part_idx] = $insert_word;
               push @paths, join '', map "/$_", @parts;
            }
            {
               $parts[$part_idx] = $orig_part . $insert_word;
               push @paths, join '', map "/$_", @parts;
            }
         }
      }

      for (@paths) {
         $url->path($_);
         print "$url\n";
      }
   }
}

__DATA__
http://www.*.com/dog/cat/rabbit/
http://www.superuser.co.uk/dog/cat/rabbit/hamster/
http://10.15.16.17/dog/cat/rabbit/

#3


1  

one more solution:

还有一个解决方案

#!/usr/bin/perl

use strict;
use warnings;

my @insert_words = qw/HELLO GOODBYE/;

while (<DATA>) {
    chomp;
    /(?<![\/])(?:[\/](?![\/])[^\/]*)/p;
    my $begin_part = ${^PREMATCH};
    my $tail = ${^MATCH} . ${^POSTMATCH};
    my @tail_chunks = split /\//, $tail; 

    foreach my $word (@insert_words) {                      
        for my $index (1..$#tail_chunks) {
            my @new_tail = @tail_chunks;

            $new_tail[$index] = $word . $tail_chunks[$index];
            my $str = $begin_part . join "/", @new_tail;
            print $str, "\n";

            $new_tail[$index] = $tail_chunks[$index] . $word;
            $str = $begin_part . join "/", @new_tail;  
            print $str, "\n";
        }

        print "\n";
    }
}

__DATA__
http://www.*.com/dog/cat/rabbit/
http://www.superuser.co.uk/dog/cat/rabbit/hamster/
10.15.16.17/dog/cat/rabbit/

#1


1  

One solution:

一解决方案:

use strict;
use warnings;

use URI qw( );

my @insert_words = qw( HELLO );

while (<DATA>) {
   chomp;
   my $url = URI->new($_);
   my $path = $url->path();

   for (@insert_words) {
      # Use package vars to communicate with /(?{})/ blocks.
      local our $insert_word = $_;
      local our @paths;
      $path =~ m{
         ^(.*/)([^/]*)((?:/.*)?)\z
         (?{
            push @paths, "$1$insert_word$2$3";
            if (length($2)) {
               push @paths, "$1$insert_word$3";
               push @paths, "$1$2$insert_word$3";
            }
         })
         (?!)
      }x;

      for (@paths) {
         $url->path($_);
         print "$url\n";
      }
   }
}

__DATA__
http://www.*.com/dog/cat/rabbit/
http://www.superuser.co.uk/dog/cat/rabbit/hamster/
http://10.15.16.17/dog/cat/rabbit/

#2


1  

Without crazy regexes:

没有疯狂的正则表达式:

use strict;
use warnings;

use URI qw( );

my @insert_words = qw( HELLO );

while (<DATA>) {
   chomp;
   my $url = URI->new($_);
   my $path = $url->path();

   for my $insert_word (@insert_words) {
      my @parts = $path =~ m{/([^/]*)}g;
      my @paths;
      for my $part_idx (0..$#parts) {
         my $orig_part = $parts[$part_idx];
         local $parts[$part_idx];
         {
            $parts[$part_idx] = $insert_word . $orig_part;
            push @paths, join '', map "/$_", @parts;
         }
         if (length($orig_part)) {
            {
               $parts[$part_idx] = $insert_word;
               push @paths, join '', map "/$_", @parts;
            }
            {
               $parts[$part_idx] = $orig_part . $insert_word;
               push @paths, join '', map "/$_", @parts;
            }
         }
      }

      for (@paths) {
         $url->path($_);
         print "$url\n";
      }
   }
}

__DATA__
http://www.*.com/dog/cat/rabbit/
http://www.superuser.co.uk/dog/cat/rabbit/hamster/
http://10.15.16.17/dog/cat/rabbit/

#3


1  

one more solution:

还有一个解决方案

#!/usr/bin/perl

use strict;
use warnings;

my @insert_words = qw/HELLO GOODBYE/;

while (<DATA>) {
    chomp;
    /(?<![\/])(?:[\/](?![\/])[^\/]*)/p;
    my $begin_part = ${^PREMATCH};
    my $tail = ${^MATCH} . ${^POSTMATCH};
    my @tail_chunks = split /\//, $tail; 

    foreach my $word (@insert_words) {                      
        for my $index (1..$#tail_chunks) {
            my @new_tail = @tail_chunks;

            $new_tail[$index] = $word . $tail_chunks[$index];
            my $str = $begin_part . join "/", @new_tail;
            print $str, "\n";

            $new_tail[$index] = $tail_chunks[$index] . $word;
            $str = $begin_part . join "/", @new_tail;  
            print $str, "\n";
        }

        print "\n";
    }
}

__DATA__
http://www.*.com/dog/cat/rabbit/
http://www.superuser.co.uk/dog/cat/rabbit/hamster/
10.15.16.17/dog/cat/rabbit/