In the 25 April 2003 issue of Science was this interesting article about bayesian filters detecting the gender of authors. I wonder what it would have done with the works of George Sand?
That headline contains the tip-off: This was written by a woman. The clues? It’s in the present tense, contains pronouns, and addresses the audience directly, says computer scientist Shlomo Argamon.Argamon and his co-workers at Bar-Ilahn University in Ramat Gan, Israel, have put together a computer program, called Winnow, that they claim can figure out an author’s sex by his or her writing style.
Winnow has taught itself, through extensive reading, to recognize linguistic patterns more commonly used by one or the other sex and has formed rules based on patterns of word usage and sentence structure. Women use words such as “for,” “with,”and “and ” more often than men, signifying their more communal tendencies, says Argamon. Men are more quantitative and use more “determiners,” such as “an,” “a,” and “no.” The program’s overall success rate, published last year in Literary and Linguistic Computing, was 80% in identifying the sex of authors of British works including fiction and writing in the arts, sciences, and social sciences. In a total of 264 fictional works, the authors of six were misidentified. A.S. Byatt was the only woman who wrote like a man; five male authors, including Michael Frayn, sound like women, according to Winnow.
Even on 30 science texts, with their formal technical style, it scored 74%. “If I had asked you before you saw this, my bet would be that you would have thought there’d be no [gender] difference in nonfiction,” says Dan Roth, a computer scientist at the University of Illinois, Urbana-Champaign. The researchers will lay out further results in the August issue of Text.