有人知道把一个人的名字和他或她的性别联系起来的好图书馆吗?

时间:2022-09-13 10:54:34

I am looking for a library or database that can provide guesses about whether a person is male or female based on his or her name or nickname. Something like

我正在寻找一个图书馆或数据库,可以根据他或她的名字或昵称来猜测一个人是男性还是女性。类似的

john => "M",mary => "F",alex => "A", #ambiguous

I am looking for something that supports names other than English names (such as Japanese, Indian, etc.).

我正在寻找一些支持英文以外的名字(如日语、印度语等)的东西。

Before I get another answer along the lines of "you are going to offend people by assuming their sex/gender" let me be clear, my application does not interact with anyone. It does not send emails or contact anyone in anyway. There are no users to ask. In many cases, the person in question is dead, and the only information I have is name, birth date, and date of death. The reason I want to know the sex of the individual is to make the grammar of the output nicer and to aid in possible searches that may come latter.

在我得到另一个类似于“假定人们的性别会冒犯他们”的答案之前,请让我澄清一下,我的应用程序不会与任何人交互。它不会发送电子邮件或联系任何人。没有用户要求。在很多情况下,这个人已经死了,我所知道的只有姓名、出生日期和死亡日期。我想知道个体性别的原因是为了让输出的语法更好,并帮助可能出现的搜索结果。

33 个解决方案

#1


67  

The gender of a name is something that cannot be inferred programmatically in the general case. You need a name database. Here is a free name database from the US Census Bureau.

名称的性别是在一般情况下不能以编程方式推断的。您需要一个名称数据库。这是美国人口普查局的免费姓名数据库。

EDIT: The link for the 2010 name is dead but there are working links and a libraries in the comments.

编辑:2010年名称的链接已经死亡,但是在评论中有工作链接和一个库。

#2


68  

gender.c is an open source C program that does a good job.It comes with data for 44568 first names from all around the world.There is good documentation and a description of the file format (basically plain text)so it should not be to difficult to read it from your own application.

性别。c是一个很好的开源c程序。它提供了来自世界各地的44568个名字的数据。有很好的文档和文件格式(基本上是纯文本)的描述,所以从您自己的应用程序中读取它应该不会很困难。

Here is what the author says:

作者是这样说的:

A few words on quality of data

关于数据质量的几句话

The dictionary of first names has been prepared with utmost care. For example, the Turkish, Indian and Korean names in this dictionary have all been independently classified by several native speakers. I also took special care to list only those names which can currently be found.

那本人名字典已经非常小心地准备好了。例如,本词典中土耳其语、印度语和韩语的名称都是由几个母语人士独立分类的。我还特别注意只列出那些目前可以找到的名字。

The lesson from this?

的教训呢?

Any modifications should be done very cautiously (and they must also adhere to the sorting required by the search algorithm). For example, knowing that "Sascha" is a boy's name in Germany, the author never assumed the English "Sasha" to be a girl's name. Knowing that "Jan" is a boy's name in Germany, I never assumed it to be also a English short form of "Janet". Another case in point is the name "Esra". This is a boy's name in Germany, but a girl's name in Turkey.

任何修改都应该非常小心(而且它们还必须遵守搜索算法所需的排序)。例如,知道“Sascha”是一个男孩在德国的名字,作者从来没有假设英语“Sasha”是女孩的名字。我知道“Jan”在德国是一个男孩的名字,我从来没有想过它也会是“Janet”的英文缩写。另一个例子是“Esra”。这是一个男孩的名字在德国,但一个女孩的名字在土耳其。

The program calculates a probability for the name being male of female.It can do so with the name as input alone or with the name and country of origin,which gives significantly better results.

这个程序会计算一个女性名字为男性的概率。它可以只使用名称作为输入,也可以使用名称和原产国,这样可以得到更好的结果。

You can download it from the website of the German computer magazine c't40 000 Namen.The article is in German but don't worry, all documentation is English.Here is the direct ftp link 0717-182.zip if you are not interested in the article.The zip-File contains the source code, an windows executable, the databaseand the documentation.

你可以从德国电脑杂志c' t40000 Namen的网站上下载。这篇文章是用德语写的,但不用担心,所有的文件都是英文的。这里是直接ftp链接0717-182。如果你对这篇文章不感兴趣,那就闭嘴。zip文件包含源代码、windows可执行文件、数据库和文档。

#3


34  

"I tell ya, life ain't easy for a boy named 'Sue.'"

“我告诉你,对一个叫‘苏’的男孩来说,生活并不容易。”

...So, why make it any harder? If you need to know the sex, just ask... Otherwise, don't worry about it.

…那么,为什么要让它变得更困难呢?如果你想知道性别,只要问……否则,别担心。

#4


26  

I've builded a free API that gives a probabilistic guess on the gender based on a first name. Instead of using any of the above mentioned approaches, i instead use a huge dataset of profiles from social networks to provide a probabilistic guess along with a certainty factor. It also supports optional filtering through country or language id's. It's getting better by the day as more profiles are added to the dataset.

我已经构建了一个免费的API,它根据名字给出性别的概率猜测。我没有使用上面提到的任何方法,而是使用来自社交网络的庞大数据集来提供概率猜测和确定性因素。它还支持通过国家或语言id进行可选过滤。随着越来越多的个人资料被添加到数据集中,情况越来越好。

It's free to use at http://genderize.io

可以在http://genderize.io免费使用

ONE thing you should consider is using a tool that takes demographics into account, as naming conventions will rely heavily on this.

您应该考虑的一件事是使用一个考虑到人口统计信息的工具,因为命名约定将严重依赖于此。

Example

例子

http://api.genderize.io?name=kim{"name":"kim","gender":"female","probability":"0.89","count":1440}http://api.genderize.io?name=kim&country_id=dk{"name":"kim","gender":"male","probability":"0.95","count":44,"country_id":"dk"}

#5


21  

Here are two oddball approaches that may not even work, and likely wouldn't work en masse without violating the terms of a license:

以下是两种奇怪的方法,它们甚至可能根本不管用,而且在不违反许可证条款的情况下可能不会一起工作:

  1. Use the Facebook API (which I know virtually nothing about, it may not even be possible) to perform two searches: one for FB male users with that first name, and one for female. Use the two numbers to decide the probability of gender.

    使用Facebook API(我对它几乎一无所知,甚至可能都不可能)来执行两个搜索:一个是FB男用户的名字,另一个是女用户的名字。用这两个数字来决定性别的概率。

  2. Much looser but more scalable, use the Google API and search for the name plus the gender-specific pronouns, and compare the numbers. For instance, there are 592,000,000 results for searching for "Richard his" (not as a phrase), but only 179,000,000 for "Richard her".

    使用谷歌API,搜索名称和特定性别的代词,并比较数字。例如,搜索“Richard his”(不是短语)的结果为5.92亿,而搜索“Richard her”的结果为1.79亿。

#6


6  

Given your stated constraints, your best option is to re-phrase whatever it is you're writing to be gender-neutral unless you know what gender they want to be called in each instance.

考虑到你所陈述的约束条件,你最好的选择是将你正在写的东西重新措辞成中性的,除非你知道在每个实例中他们想要被叫做什么性别。

If writing in English, remember that singular “they” is grammatically fine as a gender-neutral third-person singular pronoun.

如果用英语写作,记住单数“they”作为中性第三人称单数代词在语法上是可以的。

A good example is the title of this question. As is currently:

这个问题的题目就是一个很好的例子。目前:

    … mapping a person's name to his or her sex?

That would be less awkward if written:

如果写上:

    … mapping a person's name to their sex?

#7


4  

It's also poor practice to assume that users must be male or female. There are a small but significant number of "intersex" people, most of whom are heartily sick of not having a box to tick..
bignose: interesting on the "singular they". I didn't realize it had such a long history.

假定用户必须是男性或女性也是不恰当的。“双性恋”的人虽然不多,但数量可观,他们中的大多数人都非常讨厌没有盒子。bignose:“单数他们”很有趣。我不知道它有这么长的历史。

#8


3  

The only thing you'll get from trying to automate it is a bunch of unhappy users. From that census data:

在自动化过程中,你唯一能做的就是一群不快乐的用户。从普查数据:

JAMES, JOHN, ROBERT, MICHAEL, WILLIAM, DAVID, RICHARD, CHARLES, JOSEPH, THOMAS, CHRISTOPHER, DANIEL, PAUL, MARK, DONALD, GEORGE, KENNETH, STEVEN, EDWARD, BRIAN, RONALD, ANTHONY, KEVIN, JASON, MATTHEW, GARY, TIMOTHY, JOSE, LARRY, JEFFREY, FRANK, SCOTT, ERIC, STEPHEN, ANDREW, RAYMOND, GREGORY, JOSHUA, JERRY, DENNIS, WALTER, PATRICK, PETER, HAROLD, HENRY, CARL, ARTHUR, RYAN, JOE, JUAN, JACK, ALBERT, JUSTIN, TERRY, GERALD, KEITH, SAMUEL, WILLIE, LAWRENCE, ROY, BRANDON, ADAM, FRED, BILLY, LOUIS, JEREMY, AARON, RANDY, EUGENE, CARLOS, RUSSELL, BOBBY, VICTOR, MARTIN, JESSE, SHAWN, CLARENCE, SEAN, CHRIS, JOHNNY, JIMMY, ANTONIO, TONY, LUIS, MIKE, DALE, CURTIS, NORMAN, ALLEN, GLENN, TRAVIS, LEE, MELVIN, KYLE, FRANCIS, JESUS, RAY, JOEL, EDDIE, TROY, ALEXANDER, MARIO, FRANCISCO, MICHEAL, OSCAR, JAY, ALEX, JON, RONNIE, TOMMY, LEON, LEO, WESLEY, DEAN, DAN, LEWIS, COREY, MAURICE, VERNON, ROBERTO, CLYDE, SHANE, SAM, LESTER, CHARLIE, TYLER, GENE, BRETT, ANGEL, LESLIE, CECIL, ANDRE, ELMER, GABRIEL, MITCHELL, ADRIAN, KARL, CORY, CLAUDE, JAMIE, JESSIE, CHRISTIAN, LONNIE, CODY, JULIO, KELLY, JIMMIE, JORDAN, JAIME, CASEY, JOHNNIE, SIDNEY, JULIAN, DARYL, VIRGIL, MARSHALL, PERRY, MARION, TRACY, RENE, FREDDIE, AUSTIN, JACKIE, JOEY, EVAN, DANA, DONNIE, SHANNON, ANGELO, SHAUN, LYNN, CAMERON, BLAKE, KERRY, JEAN, IRA, RUDY, BENNIE, ROBIN, LOREN, NOEL, DEVIN, KIM, GUADALUPE, CARROLL, SAMMY, MARTY, TAYLOR, ELLIS, DALLAS, LAURENCE, DREW, JODY, FRANKIE, PAT, MERLE, TERRELL, DARNELL, TOMMIE, TOBY, VAN, COURTNEY, JAN, CARY, SANTOS, AUBREY, MORGAN, LOUIE, STACY, MICAH, BILLIE, LOGAN, DEMETRIUS, ROBBIE, KENDALL, ROYCE, MICKEY, DEVON, ASHLEY, CAREY, SON, MARLIN, ALI, SAMMIE, MICHEL, RORY, KRIS, AVERY, ALEXIS, GERRY, STACEY, CARMEN, SHELBY, RICKIE, BOBBIE, OLLIE, DENNY, DION, ODELL, MARY, COLBY, HOLLIS, KIRBY, CRUZ, MERRILL, LANE, CLEO, BLAIR, NUMBERS, CLAIR, BERNIE, JOAN, DOMINIQUE, TRISTAN, JAME, GALE, LAVERNE, ALVA, STEVIE, ERIN, AUGUSTINE, YOUNG, JOHNIE, ARIEL, DUSTY, LINDSEY, TRACEY, SCOTTIE, SANDY, SYDNEY, GAIL, DORIAN, LAVERN, REFUGIO, IVORY, ANDREA, SANG, DEON, CAROL, YONG, BERRY, TRINIDAD, SHIRLEY, MARIA, CHANG, ROSARIO, DANNIE, FRANCES, THANH, CONNIE, TORY, LUPE, DEE, SUNG, CHI, QUINN, MINH, THEO, LOU, CHUNG, VALENTINE, JAMEY, WHITNEY, SOL, CHONG, PARIS, OTHA, LACY, DONG, ANTONIA, KELLEY, CARROL, SHAYNE, VAL, JUDE, BRITT, HONG, LEIGH, GAYLE, JAE, NICKY, LESLEY, MAN, KASEY, JEWELL, PATRICIA, LAUREN, ELISHA, MICHAL, LINDSAY, and JEWEL

詹姆斯,约翰,罗伯特,迈克尔,威廉,大卫,理查德,查尔斯约瑟夫,托马斯,克里斯托弗,丹尼尔,保罗,马克,唐纳德,乔治,肯尼斯,史蒂文,爱德华,布莱恩,罗纳德,安东尼,凯文,杰森,马修,加里,盖,何塞,拉里,杰弗里,弗兰克,斯科特,埃里克,斯蒂芬,安德鲁,雷蒙德,格里高利,约书亚,杰瑞,丹尼斯,沃尔特,帕特里克,彼得,哈罗德,亨利,卡尔,亚瑟,瑞安,乔,胡安,杰克,艾伯特,贾斯汀,特里,杰拉尔德,基斯,撒母耳,威利,劳伦斯,罗伊,布兰登,亚当,弗雷德,比利,路易斯,杰里米,亚伦,兰迪,尤金,卡洛斯,拉塞尔,鲍比,维克多,马丁,杰西,肖恩,克拉伦斯,肖恩,克里斯,约翰尼,吉米,安东尼奥,托尼,路易斯,迈克,戴尔,柯蒂斯,诺曼,艾伦,格伦,特拉维斯,李,梅尔文,凯尔,弗朗西斯,耶稣,光线,乔尔,埃迪,特洛伊,亚历山大,马里奥,旧金山,迈克尔,奥斯卡,杰伊,亚历克斯,乔恩,罗尼,汤米,莱昂,狮子座,韦斯利,迪恩,丹,刘易斯,科里,莫里斯,弗农,罗伯特,克莱德,谢恩,山姆,莱斯特,查理,泰勒,基因,布雷特,天使,莱斯利,塞西尔,安德烈,埃尔默,加布里埃尔,米切尔,艾德里安,卡尔·科里,克劳德,杰米,杰西,基督徒,朗尼,科迪,胡里奥,凯利,吉米,约旦,杰米,凯西,约翰尼,西德尼,朱利安,达里尔,维吉尔,马歇尔,佩里,马里昂,特蕾西,刘若英,房地美,奥斯汀,杰基,乔伊,艾凡,达纳,唐尼,香农安吉洛,肖恩,林恩,卡梅隆,布莱克,克里琴,IRA,鲁迪,判决,罗宾,罗兰,诺埃尔,DEVIN,金姆,瓜达卢佩圣母,卡罗尔,萨米,马蒂,泰勒,埃利斯,达拉斯,劳伦斯,画的杨晨,弗兰基,帕特,山鸟,泰瑞,达内尔,汤米·,托比,面包车,考特尼,简,卡里,桑托斯,奥布里,摩根路易,史黛西,米迦,比利,洛根,狄美崔司,罗比,肯德尔,罗伊斯,米奇,德文郡,阿什利,凯里,儿子,马林,阿里,回潮,米歇尔,罗里,克里斯,艾弗里,亚历克西斯,格里,斯泰西,卡门,谢尔比,瑞奇,博比,奥利,丹尼,迪翁,ODELL,玛丽,科尔比,霍利斯,科比,克鲁斯,美林,车道,克莱奥,布莱尔,数字,克莱尔,伯尼,琼,多米尼克,特里斯坦,詹姆,大风,拉维恩,阿尔瓦,史蒂夫,艾琳,奥古斯汀,年轻,约翰尼·,爱丽儿,尘土飞扬,林赛,特蕾西,苏格兰人,桑迪,悉尼,盖尔,多里安人、LAVERN REFUGIO,象牙,安德里亚,唱着歌,迪翁•,卡罗,勇,浆果,特立尼达,雪莉,玛丽亚,CHANG罗萨里奥,丹妮,弗朗西斯,THANH,康妮,保守党,卢皮,迪,唱,CHI,奎因,明,西奥卢,钟,情人节,杰米,惠特尼,索尔,庄,巴黎,OTHA,花边,咚,安东尼娅,凯利,卡罗尔,肖恩,VAL,裘德,布瑞特,在香港,李,盖尔,JAE,尼基,莱斯利,男人,得,杰帕特丽夏,劳伦,以利沙,米甲,林赛和珠宝

are all names that work for both males and females. If a girl's name is Robert and everyone, including your software, keeps on calling her a man, she'd be rather pissed.

所有的名字都适用于男性和女性。如果一个女孩的名字是罗伯特,所有人,包括你的软件,都不停地叫她男人,她会很生气。

#9


3  

Although databases are probably the most practical solution, if you want to have some fun maybe you could try writing a neural net (or using a neural net library) that takes in the name and outputs one of those 3 options (F,M,A).

虽然数据库可能是最实用的解决方案,但是如果您想要获得一些乐趣,您可以尝试编写一个神经网络(或者使用一个神经网络库),它接收名称并输出这三个选项中的一个(F,M, a)。

You could train it using the datasets that exist in the databases suggested by other answers, as well as with any other data you have.

您可以使用其他答案建议的数据库中存在的数据集以及您拥有的任何其他数据对其进行训练。

This solution would allow you to handle names not specifically categorised previously, and also handle different languages. You might want to pass the language (if you know it) as an input to the neural net as well.

此解决方案将允许您处理以前未明确分类的名称,并处理不同的语言。你可能也想把语言(如果你知道的话)作为神经网络的输入。

I don't know that I can say neural nets (or any other machine learning) would do a good job of categorising though.

我不知道神经网络(或其他机器学习)是否能很好地分类。

#10


3  

It's culture/region dependent: take Andrea, for Italians is only masculine, for Sweden is a female name while Andreas is for men; Shawn is ambiguous in English.If a language has declination, like Latin or Russian, the final letters will change according to grammatical rules,

它依赖于文化/地区:以安德里亚为例,因为意大利人是男性,瑞典人是女性,安德烈亚斯是男性;肖恩的英语很含糊。如果一种语言有衰落,如拉丁语或俄语,最后的字母会根据语法规则变化,

Another source of ambiguities is Family names identical to Personal names.

模棱两可的另一个来源是与个人名字相同的姓。

In my opinion it's impossibile to solve in general.

在我看来,一般来说是不可能解决的。

#11


3  

The idea will clearly not work in most languages.

这个想法在大多数语言中显然行不通。

However if you could tell the nationality beforehand you could have more luck.In most Slav languages (e.g. russian, polish, bulgarian) you could safely assume that all surnames ending with -va -cha -ska (-a in general are feminine) while -v -ch -shi are masculine.

但是如果你能事先告诉国籍,你就会有更多的运气。在大多数斯拉夫语(如俄语、波兰语、保加利亚语)中,你可以有把握地假设所有姓氏都以-va -cha -ska (- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

In fact any surname has feminine and masculine form depending on the ending.The same names used in other countries (e.g. US) might use only the masculine form though.

事实上,任何姓氏都有女性和男性的形式,取决于结尾。同样的名字在其他国家(如美国)可能只使用男性的形式。

The same could be said for first names (-a -ya are feminine) but it is not 100% accurate.

名字也可以这样说(-ya是女性),但它不是100%准确的。

But in general you would hardly get a library that is sufficiently accurate.

但总的来说,你很难找到一个足够精确的库。

#12


3  

The python package SexMachine will do that for you. Given any first name it returns if it's male, female or unisex. It relies on the data from the gender.c program by Jorg Michael.

python包SexMachine将为您实现这一点。给定任何名字,如果是男性、女性或男女均可返回。它依赖于来自性别的数据。c节目由乔丹·迈克尔主持。

#13


2  

I haven't used it, but IBM has a Global Name Analytics library (for a price!) that seems pretty comprehensive.

我还没有使用过它,但是IBM有一个全局名称分析库(价格不菲!),看起来相当全面。

#14


2  

It's not a service, but a little app with a database:
http://www.codeproject.com/KB/cpp/genderizer.aspx

它不是一个服务,而是一个带有数据库的小应用:http://www.codeproject.com/KB/cpp/genderizer.aspx

And this tool is in german:
http://www.faq-o-matic.net/2011/06/01/zu-einem-vornamen-das-geschlecht-finden/

这个工具是德语:http://www.faq-o- matic.net1/06/01/zuem-vornamen -das- geschlechlecht -finden/

And another one in VB:
http://www.vbarchiv.net/tipps/tipp_1925-geschlecht-anhand-des-vornamens-ermitteln.html

VB中还有一个:http://www.vbarchiv.net/tipps/tipp_1925-geschlecht-anhand- vornamens-ermitteln.html

I think in combination with some "Most used firstname in 2011" lists you should be able to build something decent.

我认为,结合一些“2011年最常用的名字”列表,你应该能够构建一些像样的东西。

#15


2  

The Z Directory (at vettrasoft.com) has a C-language function, works something like so:

Z目录(在vettrasoft.com)有一个c语言函数,工作原理如下:

void func(){    char c = z_guess_sex_byfirstname ("Lon");    switch(c)    {    case 'M': std::cout << "It's a boy!\n"; break;    case 'F': std::cout << "It's a girl!\n"; break;    case 'B': std::cout << "this name is for both sexes\n"; break;    case '?': std::cout << "sex unknown sorry\n"; break;    }}

it's database driven, the table has something like 10,000+ names I think, but you need todownload and install the z directory (includes many other topo items like countries, geographical landmarks, airports, states, area codes, postal-zip codes, etc along withc++ functions and objects to access the data). However the names are very English-languageoriented. The table is a work in progress and gradually updated.

它是数据库驱动的,我认为这个表有10,000多个名字,但是你需要下载并安装z目录(包括许多其他topo项目,如国家、地理地标、机场、州、区号、邮政编码等以及c++函数和对象来访问数据)。然而,这些名字都是以英语为导向的。该表是一项正在进行和逐步更新的工作。

#16


1  

Name-gender maps can work but in multicultural countries it's more like guessing. I can give you one example: Marian in Polish is a typical masculine name, whereas the same name in Great Britain is a female name. In the era of people immigrating all over the world, I'm not sure such database would be very accurate. Good luck!

名称-性别地图可以起作用,但在多元文化的国家,这更像是猜测。我可以给你举个例子:玛丽安在波兰语中是一个典型的男性名字,而在英国,同样的名字是一个女性名字。在世界各地移民的时代,我不确定这样的数据库是否准确。好运!

#17


1  

Some cultures have unisex names - like mine. What do you do then? I think the answer is plain and simple - don't assume - you could cause offence. Just ask if its needed, otherwise gender neutrality.

有些文化有男女通用的名字——就像我的一样。那你怎么办呢?我认为答案很简单——不要假设——你会冒犯别人。只要问问是否有必要,否则性别中立。

#18


1  

Well, not anymore. IBM patented that idea a while ago.

哦,不了。IBM不久前为这个想法申请了专利。

So if you're looking for any level of flexability (something other than a list of names), you'll either have to (gasp!) ask the user, or simply pay IBM for the rights :)

因此,如果您正在寻找任何级别的flexability(除了名字列表以外的其他东西),您要么需要(gasp!)询问用户,要么只需支付IBM的权限:)

In any case, such autodetection is annoying for many people who have gender-ambiguous names, or even just mean parents. Let's not make this any harder for them.

在任何情况下,这种自动检测对许多性别模糊的人来说都是令人讨厌的,甚至对那些刻薄的父母也是如此。让我们不要让他们感到更困难。

#19


1  

It's not free, but this is a nice library that I have used before:

它不是免费的,但这是我以前使用过的一个很好的库:

NetGender for .NET allows you to quickly and easily build Name Verification, Parsing and Gender Determination into your custom applications. Accurately verify whether a particular field contains a valid individual or company. NetGender uses a 100,000+, ethnically diverse, Name Dictionary in combination with an 8,000+ Company Name Dictionary to ensure precise gender determination.

net的NetGender允许您快速、轻松地在自定义应用程序中构建名称验证、解析和性别确定。准确验证特定字段是否包含有效的个人或公司。NetGender(性别)网站使用10万多本不同种族的名称字典和8000多本公司名称字典,以确保准确的性别定义。

http://www.softwarecompany.com/dotnet/netgender.htm

http://www.softwarecompany.com/dotnet/netgender.htm

#20


1  

It's interesting that you say you have birth date. That could help. I've seen databases of histories of name popularity.

有趣的是你说你有出生日期。这可以帮助。我看到过名字流行历史的数据库。

In the film Splash (1984), it was funny that Darryl Hannah's character chooses the name "Madison" from a Madison Avenue street sign, because obviously "Madison" is not a girl's name.

在电影《Splash》(1984)中,Darryl Hannah的角色从麦迪逊大道的一个标志中选择了“Madison”,这很有趣,因为很明显,“Madison”不是一个女孩的名字。

24 years later, Madison is the 4th most popular name for girl babies!

24年后,麦迪逊是第4个最受欢迎的女孩名字!


Name history from the gov't. (Check out Mary's sad decline through the last 100 years.)

从*部门列出历史。(看看玛丽在过去100年里的悲惨衰落吧。)


When I wrote to the White House as a child, Richard Nixon (or, perhaps a secretary) responded to me with some photos of the historic place, addressed to "Miss Rhett Anderson." "Miss Rhett?" It doesn't even make sense! Can we REALLY not tell the difference between Clark Gable's Rhett (with a mustache, in Gone With The Wind!) and Vivian Lee's Scarlett? I shall never forgive him, despite Neil Young's assurance that "even Richard Nixon has got soul."

当我还是个孩子的时候给白宫写信时,理查德·尼克松(或者可能是一位秘书)给我回复了一些历史悠久的地方的照片,上面写着“瑞德·安德森小姐”。“瑞德小姐?”它甚至没有意义!克拉克·盖博饰演的瑞德(留着胡子,在《乱世佳人》中饰演)和薇薇安·李饰演的斯佳丽有什么不同?我永远也不会原谅他,尽管尼尔·杨(Neil Young)曾保证“就连理查德·尼克松(Richard Nixon)也有灵魂”。

#21


1  

I'm pretty sure no such service could exist with an acceptable level of accuracy. Here are the problems which I think are insurmountable:

我非常确信,没有任何一种服务能够达到可接受的精确度。以下是我认为无法克服的问题:

  • There are plenty of names which are for both men and women.
  • 男人和女人都有很多名字。
  • There's a lot of different names in this world, even if you only consider one country.
  • 在这个世界上有很多不同的名字,即使你只考虑一个国家。
  • There is the "A Boy Named Sue" issue, raised so eloquently by Johnny Cash :-)
  • 有一个“一个叫苏的男孩”的问题,由约翰尼·卡什很有说服力地提出:

#22


1  

Check out http://genderchecker.com/

看看http://genderchecker.com/

#23


1  

You can have a look at my python gender detection project https://github.com/muatik/genderizer

您可以查看我的python性别检测项目https://github.com/muatik/genderizer。

It tries to detect authors' genders looking their names and/or sample text(for example tweets) of them.

它试图检测作者的性别,查看他们的名字和/或样本文本(例如tweet)。

And it also supports mongodb, memcached for performance.

它还支持mongodb, memcached为性能提供支持。

#24


0  

This is not really a programming problem - it comes down to getting a probability table.

这不是一个真正的编程问题——它归结为一个概率表。

AFAIK there are no public databases in distilled forms. You could either build this from census data, or buy the data from someone.

没有经过蒸馏的公共数据库。你可以从人口普查数据中建立数据,也可以从别人那里购买数据。

For example, this is someone who sells the probability table for Canada.

例如,这个人卖给加拿大的概率表。

#25


0  

IMHO, it is a generally bad idea to determine sex from an individuals name. A lot of names are intersexual (good grief, is this even a word ?? :-), and also they may be one sex in one culture and another in another.

从个人的名字来判断性是一个普遍的坏主意。很多名字都是跨性的(天啊,这是一个词吗?)而且,他们可能是一种文化中的一种性别,也可能是另一种文化中的另一种性别。

A few stupid examples, just a few that came to mind (from my part of the world, CE)

有几个愚蠢的例子,只是我想到的几个(来自我的家乡,CE)

Vanja - female, in eastern countries from here, mostly male
Alex - intersex (short for Sandra, female, and Sandro, male)
Robin - in western cultures, can be both

Vanja是女性,在东方国家,大部分都是男性Alex——在西方文化中,双性(桑德拉是女性,桑德罗是男性)罗宾可以兼而有之

In some parts of the world, a persons sex can be determined by looking at how the name ends. For example, Marija, Sandra, Ivana, Petra, Sara, Lucija, Ana - you can see that most of these female names end in "ja" or "ra". There are other examples as well.

在世界上的一些地方,一个人的性别可以由名字的结尾来决定。例如,Marija, Sandra, Ivana, Petra, Sara, Lucija, Ana——你可以看到大多数女性的名字都以“ja”或“ra”结尾。还有其他的例子。

Still, I think it's better just to ask the user for sex.

尽管如此,我认为最好还是直接问用户性的问题。

#26


0  

Got this from hacker news discussion about this

这是来自黑客新闻的讨论

#27


0  

I know of no such service. You can perhaps find the data you are looking for, however. The US government publishes data about the prevalence of names and the gender of the person they're attached to. The Social Security Administration has such a page, and the census may as well, but I haven't taken the time to look. Perhaps other world governments do similar things.

我知道没有这样的服务。然而,您可能可以找到您正在寻找的数据。美国*公布了姓名的流行程度和他们所依附的人的性别的数据。社会保障局有这样一页,人口普查也可以,但我还没有花时间去看。也许其他世界*也会做类似的事情。

#28


0  

I know of no such service, however ..

然而,我知道没有这种服务。

  • you could start with a raw list of person names or
  • 您可以从一个原始的人名列表开始
  • guess gender according to some rules (e.g. -o => male, -ela, -a => female)
  • 根据一些规则猜测性别(如-o => male, -ela, -a => female)

In some countries (e.g. germany) the name a person can be given is limited by law - maybe there are some publications concerning that matter, which could be harvested (but I don't know of any in the moment).

在一些国家(例如德国),一个人的名字是受法律限制的——也许有一些关于这个问题的出版物,可以收集(但我现在不知道)。

#29


0  

What I would do is make a hack which takes the name and searches it against the facebook api. Then looks at the resulting users and count how many of them are female or male. You then can return a percentage. Not so insurmountable anymore. :)

我要做的就是做一个黑客,用这个名字在facebook api上搜索。然后查看结果用户,并计算其中有多少是女性或男性。然后您可以返回一个百分比。不再那么不可逾越的了。:)

#30


-2  

Just ask people, and if they are nice they will give you their 'M's or 'F's , and if they are not then give'em an 'A' .

只要问问别人,如果他们友善,他们会给你他们的M或F,如果他们不友善,他们会给你A。

#1


67  

The gender of a name is something that cannot be inferred programmatically in the general case. You need a name database. Here is a free name database from the US Census Bureau.

名称的性别是在一般情况下不能以编程方式推断的。您需要一个名称数据库。这是美国人口普查局的免费姓名数据库。

EDIT: The link for the 2010 name is dead but there are working links and a libraries in the comments.

编辑:2010年名称的链接已经死亡,但是在评论中有工作链接和一个库。

#2


68  

gender.c is an open source C program that does a good job.It comes with data for 44568 first names from all around the world.There is good documentation and a description of the file format (basically plain text)so it should not be to difficult to read it from your own application.

性别。c是一个很好的开源c程序。它提供了来自世界各地的44568个名字的数据。有很好的文档和文件格式(基本上是纯文本)的描述,所以从您自己的应用程序中读取它应该不会很困难。

Here is what the author says:

作者是这样说的:

A few words on quality of data

关于数据质量的几句话

The dictionary of first names has been prepared with utmost care. For example, the Turkish, Indian and Korean names in this dictionary have all been independently classified by several native speakers. I also took special care to list only those names which can currently be found.

那本人名字典已经非常小心地准备好了。例如,本词典中土耳其语、印度语和韩语的名称都是由几个母语人士独立分类的。我还特别注意只列出那些目前可以找到的名字。

The lesson from this?

的教训呢?

Any modifications should be done very cautiously (and they must also adhere to the sorting required by the search algorithm). For example, knowing that "Sascha" is a boy's name in Germany, the author never assumed the English "Sasha" to be a girl's name. Knowing that "Jan" is a boy's name in Germany, I never assumed it to be also a English short form of "Janet". Another case in point is the name "Esra". This is a boy's name in Germany, but a girl's name in Turkey.

任何修改都应该非常小心(而且它们还必须遵守搜索算法所需的排序)。例如,知道“Sascha”是一个男孩在德国的名字,作者从来没有假设英语“Sasha”是女孩的名字。我知道“Jan”在德国是一个男孩的名字,我从来没有想过它也会是“Janet”的英文缩写。另一个例子是“Esra”。这是一个男孩的名字在德国,但一个女孩的名字在土耳其。

The program calculates a probability for the name being male of female.It can do so with the name as input alone or with the name and country of origin,which gives significantly better results.

这个程序会计算一个女性名字为男性的概率。它可以只使用名称作为输入,也可以使用名称和原产国,这样可以得到更好的结果。

You can download it from the website of the German computer magazine c't40 000 Namen.The article is in German but don't worry, all documentation is English.Here is the direct ftp link 0717-182.zip if you are not interested in the article.The zip-File contains the source code, an windows executable, the databaseand the documentation.

你可以从德国电脑杂志c' t40000 Namen的网站上下载。这篇文章是用德语写的,但不用担心,所有的文件都是英文的。这里是直接ftp链接0717-182。如果你对这篇文章不感兴趣,那就闭嘴。zip文件包含源代码、windows可执行文件、数据库和文档。

#3


34  

"I tell ya, life ain't easy for a boy named 'Sue.'"

“我告诉你,对一个叫‘苏’的男孩来说,生活并不容易。”

...So, why make it any harder? If you need to know the sex, just ask... Otherwise, don't worry about it.

…那么,为什么要让它变得更困难呢?如果你想知道性别,只要问……否则,别担心。

#4


26  

I've builded a free API that gives a probabilistic guess on the gender based on a first name. Instead of using any of the above mentioned approaches, i instead use a huge dataset of profiles from social networks to provide a probabilistic guess along with a certainty factor. It also supports optional filtering through country or language id's. It's getting better by the day as more profiles are added to the dataset.

我已经构建了一个免费的API,它根据名字给出性别的概率猜测。我没有使用上面提到的任何方法,而是使用来自社交网络的庞大数据集来提供概率猜测和确定性因素。它还支持通过国家或语言id进行可选过滤。随着越来越多的个人资料被添加到数据集中,情况越来越好。

It's free to use at http://genderize.io

可以在http://genderize.io免费使用

ONE thing you should consider is using a tool that takes demographics into account, as naming conventions will rely heavily on this.

您应该考虑的一件事是使用一个考虑到人口统计信息的工具,因为命名约定将严重依赖于此。

Example

例子

http://api.genderize.io?name=kim{"name":"kim","gender":"female","probability":"0.89","count":1440}http://api.genderize.io?name=kim&country_id=dk{"name":"kim","gender":"male","probability":"0.95","count":44,"country_id":"dk"}

#5


21  

Here are two oddball approaches that may not even work, and likely wouldn't work en masse without violating the terms of a license:

以下是两种奇怪的方法,它们甚至可能根本不管用,而且在不违反许可证条款的情况下可能不会一起工作:

  1. Use the Facebook API (which I know virtually nothing about, it may not even be possible) to perform two searches: one for FB male users with that first name, and one for female. Use the two numbers to decide the probability of gender.

    使用Facebook API(我对它几乎一无所知,甚至可能都不可能)来执行两个搜索:一个是FB男用户的名字,另一个是女用户的名字。用这两个数字来决定性别的概率。

  2. Much looser but more scalable, use the Google API and search for the name plus the gender-specific pronouns, and compare the numbers. For instance, there are 592,000,000 results for searching for "Richard his" (not as a phrase), but only 179,000,000 for "Richard her".

    使用谷歌API,搜索名称和特定性别的代词,并比较数字。例如,搜索“Richard his”(不是短语)的结果为5.92亿,而搜索“Richard her”的结果为1.79亿。

#6


6  

Given your stated constraints, your best option is to re-phrase whatever it is you're writing to be gender-neutral unless you know what gender they want to be called in each instance.

考虑到你所陈述的约束条件,你最好的选择是将你正在写的东西重新措辞成中性的,除非你知道在每个实例中他们想要被叫做什么性别。

If writing in English, remember that singular “they” is grammatically fine as a gender-neutral third-person singular pronoun.

如果用英语写作,记住单数“they”作为中性第三人称单数代词在语法上是可以的。

A good example is the title of this question. As is currently:

这个问题的题目就是一个很好的例子。目前:

    … mapping a person's name to his or her sex?

That would be less awkward if written:

如果写上:

    … mapping a person's name to their sex?

#7


4  

It's also poor practice to assume that users must be male or female. There are a small but significant number of "intersex" people, most of whom are heartily sick of not having a box to tick..
bignose: interesting on the "singular they". I didn't realize it had such a long history.

假定用户必须是男性或女性也是不恰当的。“双性恋”的人虽然不多,但数量可观,他们中的大多数人都非常讨厌没有盒子。bignose:“单数他们”很有趣。我不知道它有这么长的历史。

#8


3  

The only thing you'll get from trying to automate it is a bunch of unhappy users. From that census data:

在自动化过程中,你唯一能做的就是一群不快乐的用户。从普查数据:

JAMES, JOHN, ROBERT, MICHAEL, WILLIAM, DAVID, RICHARD, CHARLES, JOSEPH, THOMAS, CHRISTOPHER, DANIEL, PAUL, MARK, DONALD, GEORGE, KENNETH, STEVEN, EDWARD, BRIAN, RONALD, ANTHONY, KEVIN, JASON, MATTHEW, GARY, TIMOTHY, JOSE, LARRY, JEFFREY, FRANK, SCOTT, ERIC, STEPHEN, ANDREW, RAYMOND, GREGORY, JOSHUA, JERRY, DENNIS, WALTER, PATRICK, PETER, HAROLD, HENRY, CARL, ARTHUR, RYAN, JOE, JUAN, JACK, ALBERT, JUSTIN, TERRY, GERALD, KEITH, SAMUEL, WILLIE, LAWRENCE, ROY, BRANDON, ADAM, FRED, BILLY, LOUIS, JEREMY, AARON, RANDY, EUGENE, CARLOS, RUSSELL, BOBBY, VICTOR, MARTIN, JESSE, SHAWN, CLARENCE, SEAN, CHRIS, JOHNNY, JIMMY, ANTONIO, TONY, LUIS, MIKE, DALE, CURTIS, NORMAN, ALLEN, GLENN, TRAVIS, LEE, MELVIN, KYLE, FRANCIS, JESUS, RAY, JOEL, EDDIE, TROY, ALEXANDER, MARIO, FRANCISCO, MICHEAL, OSCAR, JAY, ALEX, JON, RONNIE, TOMMY, LEON, LEO, WESLEY, DEAN, DAN, LEWIS, COREY, MAURICE, VERNON, ROBERTO, CLYDE, SHANE, SAM, LESTER, CHARLIE, TYLER, GENE, BRETT, ANGEL, LESLIE, CECIL, ANDRE, ELMER, GABRIEL, MITCHELL, ADRIAN, KARL, CORY, CLAUDE, JAMIE, JESSIE, CHRISTIAN, LONNIE, CODY, JULIO, KELLY, JIMMIE, JORDAN, JAIME, CASEY, JOHNNIE, SIDNEY, JULIAN, DARYL, VIRGIL, MARSHALL, PERRY, MARION, TRACY, RENE, FREDDIE, AUSTIN, JACKIE, JOEY, EVAN, DANA, DONNIE, SHANNON, ANGELO, SHAUN, LYNN, CAMERON, BLAKE, KERRY, JEAN, IRA, RUDY, BENNIE, ROBIN, LOREN, NOEL, DEVIN, KIM, GUADALUPE, CARROLL, SAMMY, MARTY, TAYLOR, ELLIS, DALLAS, LAURENCE, DREW, JODY, FRANKIE, PAT, MERLE, TERRELL, DARNELL, TOMMIE, TOBY, VAN, COURTNEY, JAN, CARY, SANTOS, AUBREY, MORGAN, LOUIE, STACY, MICAH, BILLIE, LOGAN, DEMETRIUS, ROBBIE, KENDALL, ROYCE, MICKEY, DEVON, ASHLEY, CAREY, SON, MARLIN, ALI, SAMMIE, MICHEL, RORY, KRIS, AVERY, ALEXIS, GERRY, STACEY, CARMEN, SHELBY, RICKIE, BOBBIE, OLLIE, DENNY, DION, ODELL, MARY, COLBY, HOLLIS, KIRBY, CRUZ, MERRILL, LANE, CLEO, BLAIR, NUMBERS, CLAIR, BERNIE, JOAN, DOMINIQUE, TRISTAN, JAME, GALE, LAVERNE, ALVA, STEVIE, ERIN, AUGUSTINE, YOUNG, JOHNIE, ARIEL, DUSTY, LINDSEY, TRACEY, SCOTTIE, SANDY, SYDNEY, GAIL, DORIAN, LAVERN, REFUGIO, IVORY, ANDREA, SANG, DEON, CAROL, YONG, BERRY, TRINIDAD, SHIRLEY, MARIA, CHANG, ROSARIO, DANNIE, FRANCES, THANH, CONNIE, TORY, LUPE, DEE, SUNG, CHI, QUINN, MINH, THEO, LOU, CHUNG, VALENTINE, JAMEY, WHITNEY, SOL, CHONG, PARIS, OTHA, LACY, DONG, ANTONIA, KELLEY, CARROL, SHAYNE, VAL, JUDE, BRITT, HONG, LEIGH, GAYLE, JAE, NICKY, LESLEY, MAN, KASEY, JEWELL, PATRICIA, LAUREN, ELISHA, MICHAL, LINDSAY, and JEWEL

詹姆斯,约翰,罗伯特,迈克尔,威廉,大卫,理查德,查尔斯约瑟夫,托马斯,克里斯托弗,丹尼尔,保罗,马克,唐纳德,乔治,肯尼斯,史蒂文,爱德华,布莱恩,罗纳德,安东尼,凯文,杰森,马修,加里,盖,何塞,拉里,杰弗里,弗兰克,斯科特,埃里克,斯蒂芬,安德鲁,雷蒙德,格里高利,约书亚,杰瑞,丹尼斯,沃尔特,帕特里克,彼得,哈罗德,亨利,卡尔,亚瑟,瑞安,乔,胡安,杰克,艾伯特,贾斯汀,特里,杰拉尔德,基斯,撒母耳,威利,劳伦斯,罗伊,布兰登,亚当,弗雷德,比利,路易斯,杰里米,亚伦,兰迪,尤金,卡洛斯,拉塞尔,鲍比,维克多,马丁,杰西,肖恩,克拉伦斯,肖恩,克里斯,约翰尼,吉米,安东尼奥,托尼,路易斯,迈克,戴尔,柯蒂斯,诺曼,艾伦,格伦,特拉维斯,李,梅尔文,凯尔,弗朗西斯,耶稣,光线,乔尔,埃迪,特洛伊,亚历山大,马里奥,旧金山,迈克尔,奥斯卡,杰伊,亚历克斯,乔恩,罗尼,汤米,莱昂,狮子座,韦斯利,迪恩,丹,刘易斯,科里,莫里斯,弗农,罗伯特,克莱德,谢恩,山姆,莱斯特,查理,泰勒,基因,布雷特,天使,莱斯利,塞西尔,安德烈,埃尔默,加布里埃尔,米切尔,艾德里安,卡尔·科里,克劳德,杰米,杰西,基督徒,朗尼,科迪,胡里奥,凯利,吉米,约旦,杰米,凯西,约翰尼,西德尼,朱利安,达里尔,维吉尔,马歇尔,佩里,马里昂,特蕾西,刘若英,房地美,奥斯汀,杰基,乔伊,艾凡,达纳,唐尼,香农安吉洛,肖恩,林恩,卡梅隆,布莱克,克里琴,IRA,鲁迪,判决,罗宾,罗兰,诺埃尔,DEVIN,金姆,瓜达卢佩圣母,卡罗尔,萨米,马蒂,泰勒,埃利斯,达拉斯,劳伦斯,画的杨晨,弗兰基,帕特,山鸟,泰瑞,达内尔,汤米·,托比,面包车,考特尼,简,卡里,桑托斯,奥布里,摩根路易,史黛西,米迦,比利,洛根,狄美崔司,罗比,肯德尔,罗伊斯,米奇,德文郡,阿什利,凯里,儿子,马林,阿里,回潮,米歇尔,罗里,克里斯,艾弗里,亚历克西斯,格里,斯泰西,卡门,谢尔比,瑞奇,博比,奥利,丹尼,迪翁,ODELL,玛丽,科尔比,霍利斯,科比,克鲁斯,美林,车道,克莱奥,布莱尔,数字,克莱尔,伯尼,琼,多米尼克,特里斯坦,詹姆,大风,拉维恩,阿尔瓦,史蒂夫,艾琳,奥古斯汀,年轻,约翰尼·,爱丽儿,尘土飞扬,林赛,特蕾西,苏格兰人,桑迪,悉尼,盖尔,多里安人、LAVERN REFUGIO,象牙,安德里亚,唱着歌,迪翁•,卡罗,勇,浆果,特立尼达,雪莉,玛丽亚,CHANG罗萨里奥,丹妮,弗朗西斯,THANH,康妮,保守党,卢皮,迪,唱,CHI,奎因,明,西奥卢,钟,情人节,杰米,惠特尼,索尔,庄,巴黎,OTHA,花边,咚,安东尼娅,凯利,卡罗尔,肖恩,VAL,裘德,布瑞特,在香港,李,盖尔,JAE,尼基,莱斯利,男人,得,杰帕特丽夏,劳伦,以利沙,米甲,林赛和珠宝

are all names that work for both males and females. If a girl's name is Robert and everyone, including your software, keeps on calling her a man, she'd be rather pissed.

所有的名字都适用于男性和女性。如果一个女孩的名字是罗伯特,所有人,包括你的软件,都不停地叫她男人,她会很生气。

#9


3  

Although databases are probably the most practical solution, if you want to have some fun maybe you could try writing a neural net (or using a neural net library) that takes in the name and outputs one of those 3 options (F,M,A).

虽然数据库可能是最实用的解决方案,但是如果您想要获得一些乐趣,您可以尝试编写一个神经网络(或者使用一个神经网络库),它接收名称并输出这三个选项中的一个(F,M, a)。

You could train it using the datasets that exist in the databases suggested by other answers, as well as with any other data you have.

您可以使用其他答案建议的数据库中存在的数据集以及您拥有的任何其他数据对其进行训练。

This solution would allow you to handle names not specifically categorised previously, and also handle different languages. You might want to pass the language (if you know it) as an input to the neural net as well.

此解决方案将允许您处理以前未明确分类的名称,并处理不同的语言。你可能也想把语言(如果你知道的话)作为神经网络的输入。

I don't know that I can say neural nets (or any other machine learning) would do a good job of categorising though.

我不知道神经网络(或其他机器学习)是否能很好地分类。

#10


3  

It's culture/region dependent: take Andrea, for Italians is only masculine, for Sweden is a female name while Andreas is for men; Shawn is ambiguous in English.If a language has declination, like Latin or Russian, the final letters will change according to grammatical rules,

它依赖于文化/地区:以安德里亚为例,因为意大利人是男性,瑞典人是女性,安德烈亚斯是男性;肖恩的英语很含糊。如果一种语言有衰落,如拉丁语或俄语,最后的字母会根据语法规则变化,

Another source of ambiguities is Family names identical to Personal names.

模棱两可的另一个来源是与个人名字相同的姓。

In my opinion it's impossibile to solve in general.

在我看来,一般来说是不可能解决的。

#11


3  

The idea will clearly not work in most languages.

这个想法在大多数语言中显然行不通。

However if you could tell the nationality beforehand you could have more luck.In most Slav languages (e.g. russian, polish, bulgarian) you could safely assume that all surnames ending with -va -cha -ska (-a in general are feminine) while -v -ch -shi are masculine.

但是如果你能事先告诉国籍,你就会有更多的运气。在大多数斯拉夫语(如俄语、波兰语、保加利亚语)中,你可以有把握地假设所有姓氏都以-va -cha -ska (- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

In fact any surname has feminine and masculine form depending on the ending.The same names used in other countries (e.g. US) might use only the masculine form though.

事实上,任何姓氏都有女性和男性的形式,取决于结尾。同样的名字在其他国家(如美国)可能只使用男性的形式。

The same could be said for first names (-a -ya are feminine) but it is not 100% accurate.

名字也可以这样说(-ya是女性),但它不是100%准确的。

But in general you would hardly get a library that is sufficiently accurate.

但总的来说,你很难找到一个足够精确的库。

#12


3  

The python package SexMachine will do that for you. Given any first name it returns if it's male, female or unisex. It relies on the data from the gender.c program by Jorg Michael.

python包SexMachine将为您实现这一点。给定任何名字,如果是男性、女性或男女均可返回。它依赖于来自性别的数据。c节目由乔丹·迈克尔主持。

#13


2  

I haven't used it, but IBM has a Global Name Analytics library (for a price!) that seems pretty comprehensive.

我还没有使用过它,但是IBM有一个全局名称分析库(价格不菲!),看起来相当全面。

#14


2  

It's not a service, but a little app with a database:
http://www.codeproject.com/KB/cpp/genderizer.aspx

它不是一个服务,而是一个带有数据库的小应用:http://www.codeproject.com/KB/cpp/genderizer.aspx

And this tool is in german:
http://www.faq-o-matic.net/2011/06/01/zu-einem-vornamen-das-geschlecht-finden/

这个工具是德语:http://www.faq-o- matic.net1/06/01/zuem-vornamen -das- geschlechlecht -finden/

And another one in VB:
http://www.vbarchiv.net/tipps/tipp_1925-geschlecht-anhand-des-vornamens-ermitteln.html

VB中还有一个:http://www.vbarchiv.net/tipps/tipp_1925-geschlecht-anhand- vornamens-ermitteln.html

I think in combination with some "Most used firstname in 2011" lists you should be able to build something decent.

我认为,结合一些“2011年最常用的名字”列表,你应该能够构建一些像样的东西。

#15


2  

The Z Directory (at vettrasoft.com) has a C-language function, works something like so:

Z目录(在vettrasoft.com)有一个c语言函数,工作原理如下:

void func(){    char c = z_guess_sex_byfirstname ("Lon");    switch(c)    {    case 'M': std::cout << "It's a boy!\n"; break;    case 'F': std::cout << "It's a girl!\n"; break;    case 'B': std::cout << "this name is for both sexes\n"; break;    case '?': std::cout << "sex unknown sorry\n"; break;    }}

it's database driven, the table has something like 10,000+ names I think, but you need todownload and install the z directory (includes many other topo items like countries, geographical landmarks, airports, states, area codes, postal-zip codes, etc along withc++ functions and objects to access the data). However the names are very English-languageoriented. The table is a work in progress and gradually updated.

它是数据库驱动的,我认为这个表有10,000多个名字,但是你需要下载并安装z目录(包括许多其他topo项目,如国家、地理地标、机场、州、区号、邮政编码等以及c++函数和对象来访问数据)。然而,这些名字都是以英语为导向的。该表是一项正在进行和逐步更新的工作。

#16


1  

Name-gender maps can work but in multicultural countries it's more like guessing. I can give you one example: Marian in Polish is a typical masculine name, whereas the same name in Great Britain is a female name. In the era of people immigrating all over the world, I'm not sure such database would be very accurate. Good luck!

名称-性别地图可以起作用,但在多元文化的国家,这更像是猜测。我可以给你举个例子:玛丽安在波兰语中是一个典型的男性名字,而在英国,同样的名字是一个女性名字。在世界各地移民的时代,我不确定这样的数据库是否准确。好运!

#17


1  

Some cultures have unisex names - like mine. What do you do then? I think the answer is plain and simple - don't assume - you could cause offence. Just ask if its needed, otherwise gender neutrality.

有些文化有男女通用的名字——就像我的一样。那你怎么办呢?我认为答案很简单——不要假设——你会冒犯别人。只要问问是否有必要,否则性别中立。

#18


1  

Well, not anymore. IBM patented that idea a while ago.

哦,不了。IBM不久前为这个想法申请了专利。

So if you're looking for any level of flexability (something other than a list of names), you'll either have to (gasp!) ask the user, or simply pay IBM for the rights :)

因此,如果您正在寻找任何级别的flexability(除了名字列表以外的其他东西),您要么需要(gasp!)询问用户,要么只需支付IBM的权限:)

In any case, such autodetection is annoying for many people who have gender-ambiguous names, or even just mean parents. Let's not make this any harder for them.

在任何情况下,这种自动检测对许多性别模糊的人来说都是令人讨厌的,甚至对那些刻薄的父母也是如此。让我们不要让他们感到更困难。

#19


1  

It's not free, but this is a nice library that I have used before:

它不是免费的,但这是我以前使用过的一个很好的库:

NetGender for .NET allows you to quickly and easily build Name Verification, Parsing and Gender Determination into your custom applications. Accurately verify whether a particular field contains a valid individual or company. NetGender uses a 100,000+, ethnically diverse, Name Dictionary in combination with an 8,000+ Company Name Dictionary to ensure precise gender determination.

net的NetGender允许您快速、轻松地在自定义应用程序中构建名称验证、解析和性别确定。准确验证特定字段是否包含有效的个人或公司。NetGender(性别)网站使用10万多本不同种族的名称字典和8000多本公司名称字典,以确保准确的性别定义。

http://www.softwarecompany.com/dotnet/netgender.htm

http://www.softwarecompany.com/dotnet/netgender.htm

#20


1  

It's interesting that you say you have birth date. That could help. I've seen databases of histories of name popularity.

有趣的是你说你有出生日期。这可以帮助。我看到过名字流行历史的数据库。

In the film Splash (1984), it was funny that Darryl Hannah's character chooses the name "Madison" from a Madison Avenue street sign, because obviously "Madison" is not a girl's name.

在电影《Splash》(1984)中,Darryl Hannah的角色从麦迪逊大道的一个标志中选择了“Madison”,这很有趣,因为很明显,“Madison”不是一个女孩的名字。

24 years later, Madison is the 4th most popular name for girl babies!

24年后,麦迪逊是第4个最受欢迎的女孩名字!


Name history from the gov't. (Check out Mary's sad decline through the last 100 years.)

从*部门列出历史。(看看玛丽在过去100年里的悲惨衰落吧。)


When I wrote to the White House as a child, Richard Nixon (or, perhaps a secretary) responded to me with some photos of the historic place, addressed to "Miss Rhett Anderson." "Miss Rhett?" It doesn't even make sense! Can we REALLY not tell the difference between Clark Gable's Rhett (with a mustache, in Gone With The Wind!) and Vivian Lee's Scarlett? I shall never forgive him, despite Neil Young's assurance that "even Richard Nixon has got soul."

当我还是个孩子的时候给白宫写信时,理查德·尼克松(或者可能是一位秘书)给我回复了一些历史悠久的地方的照片,上面写着“瑞德·安德森小姐”。“瑞德小姐?”它甚至没有意义!克拉克·盖博饰演的瑞德(留着胡子,在《乱世佳人》中饰演)和薇薇安·李饰演的斯佳丽有什么不同?我永远也不会原谅他,尽管尼尔·杨(Neil Young)曾保证“就连理查德·尼克松(Richard Nixon)也有灵魂”。

#21


1  

I'm pretty sure no such service could exist with an acceptable level of accuracy. Here are the problems which I think are insurmountable:

我非常确信,没有任何一种服务能够达到可接受的精确度。以下是我认为无法克服的问题:

  • There are plenty of names which are for both men and women.
  • 男人和女人都有很多名字。
  • There's a lot of different names in this world, even if you only consider one country.
  • 在这个世界上有很多不同的名字,即使你只考虑一个国家。
  • There is the "A Boy Named Sue" issue, raised so eloquently by Johnny Cash :-)
  • 有一个“一个叫苏的男孩”的问题,由约翰尼·卡什很有说服力地提出:

#22


1  

Check out http://genderchecker.com/

看看http://genderchecker.com/

#23


1  

You can have a look at my python gender detection project https://github.com/muatik/genderizer

您可以查看我的python性别检测项目https://github.com/muatik/genderizer。

It tries to detect authors' genders looking their names and/or sample text(for example tweets) of them.

它试图检测作者的性别,查看他们的名字和/或样本文本(例如tweet)。

And it also supports mongodb, memcached for performance.

它还支持mongodb, memcached为性能提供支持。

#24


0  

This is not really a programming problem - it comes down to getting a probability table.

这不是一个真正的编程问题——它归结为一个概率表。

AFAIK there are no public databases in distilled forms. You could either build this from census data, or buy the data from someone.

没有经过蒸馏的公共数据库。你可以从人口普查数据中建立数据,也可以从别人那里购买数据。

For example, this is someone who sells the probability table for Canada.

例如,这个人卖给加拿大的概率表。

#25


0  

IMHO, it is a generally bad idea to determine sex from an individuals name. A lot of names are intersexual (good grief, is this even a word ?? :-), and also they may be one sex in one culture and another in another.

从个人的名字来判断性是一个普遍的坏主意。很多名字都是跨性的(天啊,这是一个词吗?)而且,他们可能是一种文化中的一种性别,也可能是另一种文化中的另一种性别。

A few stupid examples, just a few that came to mind (from my part of the world, CE)

有几个愚蠢的例子,只是我想到的几个(来自我的家乡,CE)

Vanja - female, in eastern countries from here, mostly male
Alex - intersex (short for Sandra, female, and Sandro, male)
Robin - in western cultures, can be both

Vanja是女性,在东方国家,大部分都是男性Alex——在西方文化中,双性(桑德拉是女性,桑德罗是男性)罗宾可以兼而有之

In some parts of the world, a persons sex can be determined by looking at how the name ends. For example, Marija, Sandra, Ivana, Petra, Sara, Lucija, Ana - you can see that most of these female names end in "ja" or "ra". There are other examples as well.

在世界上的一些地方,一个人的性别可以由名字的结尾来决定。例如,Marija, Sandra, Ivana, Petra, Sara, Lucija, Ana——你可以看到大多数女性的名字都以“ja”或“ra”结尾。还有其他的例子。

Still, I think it's better just to ask the user for sex.

尽管如此,我认为最好还是直接问用户性的问题。

#26


0  

Got this from hacker news discussion about this

这是来自黑客新闻的讨论

#27


0  

I know of no such service. You can perhaps find the data you are looking for, however. The US government publishes data about the prevalence of names and the gender of the person they're attached to. The Social Security Administration has such a page, and the census may as well, but I haven't taken the time to look. Perhaps other world governments do similar things.

我知道没有这样的服务。然而,您可能可以找到您正在寻找的数据。美国*公布了姓名的流行程度和他们所依附的人的性别的数据。社会保障局有这样一页,人口普查也可以,但我还没有花时间去看。也许其他世界*也会做类似的事情。

#28


0  

I know of no such service, however ..

然而,我知道没有这种服务。

  • you could start with a raw list of person names or
  • 您可以从一个原始的人名列表开始
  • guess gender according to some rules (e.g. -o => male, -ela, -a => female)
  • 根据一些规则猜测性别(如-o => male, -ela, -a => female)

In some countries (e.g. germany) the name a person can be given is limited by law - maybe there are some publications concerning that matter, which could be harvested (but I don't know of any in the moment).

在一些国家(例如德国),一个人的名字是受法律限制的——也许有一些关于这个问题的出版物,可以收集(但我现在不知道)。

#29


0  

What I would do is make a hack which takes the name and searches it against the facebook api. Then looks at the resulting users and count how many of them are female or male. You then can return a percentage. Not so insurmountable anymore. :)

我要做的就是做一个黑客,用这个名字在facebook api上搜索。然后查看结果用户,并计算其中有多少是女性或男性。然后您可以返回一个百分比。不再那么不可逾越的了。:)

#30


-2  

Just ask people, and if they are nice they will give you their 'M's or 'F's , and if they are not then give'em an 'A' .

只要问问别人,如果他们友善,他们会给你他们的M或F,如果他们不友善,他们会给你A。