Fri, Aug 22, 2003
I discovered The Gender Genie from LaughingMeme, which led me to Moshe Koppel and Shlomo Argamon's algorithm, described in Nature and the New York Times Magazine. The Koppel-Argamon algorithm analyzes the text and guesses the author's gender. (90)
The algorithm was very simple, so I implemented it as a Perl module -- Lingua::EN::Gender. I just registered for a PAUSE ID, and will upload Lingua::EN::Gender to CPAN as soon as my registration is confirmed. (91)
I went to Project Gutenberg for some test data, and chose the first chapters of Charles Dickens's A Tale of Two Cities and Herman Melville's Moby Dick, and the second chapter of Charlotte Bronte's Jane Eyre. The module correctly guessed the genders of these authors. (92)
Then, I decided to have a little fun. I tried the module on the preface of George W. Sands's Mazelli, and Other Poems. Sands, of course, was the pen name of Aurore Dupin, a woman. Sure enough, the module correctly identified the author as a woman. (93)
I wanted to try the reverse test (a man writing as a woman), so I searched for Mark Twain's short story, "Eve's Diary." Sadly, Project Gutenberg did not have this. However, it did have Twain's "Extracts from Adam's Diary," so for kicks, I tried that. The module incorrectly guessed that this was authored by a woman! (94)
Now that my euphoria was officially dead, I decided to test my old blog entries. The module claimed that 23 out of my 34 entries were written by females! Scientific proof that I'm a heckuva sensitive fella. (95)
Lest you think I have too much time on my hands, here's the work rationale for playing with this algorithm. Last January, I met Freada Kapor Klein, who is interested in diversity in the workplace, and who heads up the Level Playing Field Institute. She encouraged me to consider diversity in online communities as an area of research. One of the difficulties with this is that the only way to gather demographic data such as race or gender is via surveys. A tool that could accurately determine gender based on the author's prose would be very handy. (96)
/tech/perl | Posted at 3:56pm
A blog about collaboration, community-building, and the various goings-on at Blue Oxen Associates, with occasional digressions on food and other vital matters.
June 2008 (2)
April 2008 (1)
March 2008 (2)
February 2008 (10)
November 2007 (14)
October 2007 (4)
September 2007 (3)
August 2007 (7)
July 2007 (2)
June 2007 (7)
May 2007 (10)
April 2007 (14)
March 2007 (17)
February 2007 (12)
January 2007 (9)
December 2006 (3)
November 2006 (11)
October 2006 (23)
September 2006 (20)
August 2006 (22)
July 2006 (5)
June 2006 (19)
May 2006 (8)
April 2006 (5)
March 2006 (12)
February 2006 (10)
January 2006 (6)
November 2005 (14)
October 2005 (14)
September 2005 (10)
August 2005 (21)
July 2005 (2)
May 2005 (10)
April 2005 (7)
March 2005 (3)
February 2005 (7)
January 2005 (8)
December 2004 (5)
November 2004 (11)
October 2004 (7)
September 2004 (1)
August 2004 (9)
July 2004 (16)
June 2004 (1)
May 2004 (3)
April 2004 (8)
March 2004 (8)
February 2004 (12)
January 2004 (8)
December 2003 (12)
November 2003 (12)
October 2003 (3)
August 2003 (15)
July 2003 (20)
Blue Oxen Associates
The Watering Hole
Hyperscope
Blog Roll
(via Bloglines)
extisp.icio.us
Comments
Comments disabled until future notice. If you'd like to contact me, use my i-name (=eekim).