如何在Perl中将人的全名解析为用户名?

时间:2022-06-01 20:18:50

I need to convert a name in the format Parisi, Kenneth into the format kparisi.

我需要将Parisi,Kenneth格式的名称转换为kparisi格式。

Does anyone know how to do this in Perl?

有没有人知道如何在Perl中这样做?

Here is some sample data that is abnormal:

以下是一些异常的示例数据:

Zelleb, Charles F.,,IV
Eilt, John,, IV
Wods, Charles R.,,III
Welkt, Craig P.,,Jr.

Zelleb,Charles F。,, IV Eilt,John ,, IV Wods,Charles R。,, III Welkt,Craig P。,, Jr。

These specific names should end up as czelleb, jeilt, cwoods, cwelkt, etc.

这些特定的名称最终应该是czelleb,jeilt,cwoods,cwelkt等。


I have one more condition that is ruining my name builder

O'Neil, Paul

so far, Vinko Vrsalovic's answer is working the best when weird/corrupt names are in the mix, but this example above would come out as "pneil"... id be damned below judas if i cant get that o between the p and the n

我还有一个条件正在破坏我的名字建设者奥尼尔,保罗到目前为止,Vinko Vrsalovic的答案是最好的,当奇怪/腐败的名字在混合,但上面这个例子将出现“pneil”...如果我不能在p和n之间得到那个,那么我会被判断为低于犹大

7 个解决方案

#1


vinko@parrot:~$ cat genlogname.pl
use strict;
use warnings;

my @list;
push @list, "Zelleb, Charles F.,,IV";
push @list, "Eilt, John,, IV";
push @list, "Woods, Charles R.,,III";
push @list, "Welkt, Craig P.,,Jr.";

for my $name (@list) {
        print gen_logname($name)."\n";
}

sub gen_logname {
        my $n = shift;
        #Filter out unneeded characters
        $n =~ s/['-]//g;
        #This regex will grab the lastname a comma, optionally a space (the 
        #optional space is my addition) and the first char of the name, 
        #which seems to satisfy your condition
        $n =~ m/(\w+), ?(.)/;
        return lc($2.$1);
}
vinko@parrot:~$ perl genlogname.pl
czelleb
jeilt
cwoods
cwelkt

#2


I would start by filtering the abnormal data so you only have regular names. Then something like this should do the trick

我将从过滤异常数据开始,因此您只有常规名称。那么这样的事情应该可以解决问题

$t = "Parisi, Kenneth";
$t =~ s/(.+),\s*(.).*/\l$2\l$1/;

#3


Try:

$name =~ s/(\w+),\s(\w)/$2$1/;
$name = lc $name;

\w here matches an alphanumerical character. If you want to be more specific, you could also use [a-z] instead, and pass the i flag (case insensitive):

\ w这里匹配一个字母数字字符。如果你想更具体,你也可以使用[a-z],并传递i标志(不区分大小写):

$name =~ s/([a-z]+)\s([a-z])/$2$1/i;

#4


Here's a one line solution, assuming you store all the names in a file called "names" (one per line) and you will do duplicated name detection somehow later.

这是一个单行解决方案,假设您将所有名称存储在名为“names”的文件中(每行一个),稍后您将以某种方式进行重复的名称检测。

cat names | perl -e 'while(<>) {/^\s*(\S*)?,\s*(\S)/; print lc "$2$1\n";}' | sed s/\'//g

#5


It looks like your input data is comma-separated. To me, the clearest way to do this would be split into components, and then generate the login names from that:

看起来您的输入数据是逗号分隔的。对我来说,最简单的方法是将其拆分为组件,然后从中生成登录名:

while (<>) {
    chomp;
    my ($last, $first) = split /,/, lc $_;
    $last =~ s/[^a-z]//g;  # strip out nonletters
    $first =~ s/[^a-z]//g; # strip out nonletters
    my $logname = substr($first, 0, 1) . $last;
    print $logname, "\n";
}

#6


    $rowfetch =~ s/['-]//g; #All chars inside the [ ] will be filtered out.
    $rowfetch =~ m/(\w+), ?(.)/;
    $rowfetch = lc($2.$1);

this is how I ended up using Vinko Vrsalovic's solution... its inside a while loop that goes through a sql query result ... thanks again vinko

这就是我最终使用Vinko Vrsalovic的解决方案......它内部循环通过sql查询结果...再次感谢vinko

#7


This should do what you need

这应该做你需要的

use strict;
use warnings;
use 5.010;

while ( <DATA> ) {
    say abbreviate($_);
}


sub abbreviate {
    for ( @_ ) {
        s/[-']+//g;
        tr/A-Z/a-z/;
        tr/a-z/ /c;
        return "$2$1" if /([a-z]+)\s+([a-z])/;
    }
}


__DATA__
Zelleb, Charles F.,,IV
Eilt, John,, IV
Woods, Charles R.,,III
Welkt, Craig P.,,Jr.
O'Neil, Paul

output

czelleb
jeilt
cwoods
cwelkt
poneil

#1


vinko@parrot:~$ cat genlogname.pl
use strict;
use warnings;

my @list;
push @list, "Zelleb, Charles F.,,IV";
push @list, "Eilt, John,, IV";
push @list, "Woods, Charles R.,,III";
push @list, "Welkt, Craig P.,,Jr.";

for my $name (@list) {
        print gen_logname($name)."\n";
}

sub gen_logname {
        my $n = shift;
        #Filter out unneeded characters
        $n =~ s/['-]//g;
        #This regex will grab the lastname a comma, optionally a space (the 
        #optional space is my addition) and the first char of the name, 
        #which seems to satisfy your condition
        $n =~ m/(\w+), ?(.)/;
        return lc($2.$1);
}
vinko@parrot:~$ perl genlogname.pl
czelleb
jeilt
cwoods
cwelkt

#2


I would start by filtering the abnormal data so you only have regular names. Then something like this should do the trick

我将从过滤异常数据开始,因此您只有常规名称。那么这样的事情应该可以解决问题

$t = "Parisi, Kenneth";
$t =~ s/(.+),\s*(.).*/\l$2\l$1/;

#3


Try:

$name =~ s/(\w+),\s(\w)/$2$1/;
$name = lc $name;

\w here matches an alphanumerical character. If you want to be more specific, you could also use [a-z] instead, and pass the i flag (case insensitive):

\ w这里匹配一个字母数字字符。如果你想更具体,你也可以使用[a-z],并传递i标志(不区分大小写):

$name =~ s/([a-z]+)\s([a-z])/$2$1/i;

#4


Here's a one line solution, assuming you store all the names in a file called "names" (one per line) and you will do duplicated name detection somehow later.

这是一个单行解决方案,假设您将所有名称存储在名为“names”的文件中(每行一个),稍后您将以某种方式进行重复的名称检测。

cat names | perl -e 'while(<>) {/^\s*(\S*)?,\s*(\S)/; print lc "$2$1\n";}' | sed s/\'//g

#5


It looks like your input data is comma-separated. To me, the clearest way to do this would be split into components, and then generate the login names from that:

看起来您的输入数据是逗号分隔的。对我来说,最简单的方法是将其拆分为组件,然后从中生成登录名:

while (<>) {
    chomp;
    my ($last, $first) = split /,/, lc $_;
    $last =~ s/[^a-z]//g;  # strip out nonletters
    $first =~ s/[^a-z]//g; # strip out nonletters
    my $logname = substr($first, 0, 1) . $last;
    print $logname, "\n";
}

#6


    $rowfetch =~ s/['-]//g; #All chars inside the [ ] will be filtered out.
    $rowfetch =~ m/(\w+), ?(.)/;
    $rowfetch = lc($2.$1);

this is how I ended up using Vinko Vrsalovic's solution... its inside a while loop that goes through a sql query result ... thanks again vinko

这就是我最终使用Vinko Vrsalovic的解决方案......它内部循环通过sql查询结果...再次感谢vinko

#7


This should do what you need

这应该做你需要的

use strict;
use warnings;
use 5.010;

while ( <DATA> ) {
    say abbreviate($_);
}


sub abbreviate {
    for ( @_ ) {
        s/[-']+//g;
        tr/A-Z/a-z/;
        tr/a-z/ /c;
        return "$2$1" if /([a-z]+)\s+([a-z])/;
    }
}


__DATA__
Zelleb, Charles F.,,IV
Eilt, John,, IV
Woods, Charles R.,,III
Welkt, Craig P.,,Jr.
O'Neil, Paul

output

czelleb
jeilt
cwoods
cwelkt
poneil