Skip to main content

Stylometry Methods and Practices

A review of various stylometry methods and programs for the digital humanities.

Signature

Signature is a stylometric program designed in 2003 by Peter Millican of Oxford University, previously of Leeds University.  It’s a free software download and has a relatively simple user interface. Corpora can be uploaded using text or html files. 
Advantages of Signature
  • Generates frequency data based on Word lengths, Sentence lengths, Paragraph lengths, Letters, and Punctuation.
  • Easy user and graphical interface – good for those unfamiliar with coding or statistics
  • Produces graphic visualizations quickly
  • Data and images can be exported
  • Offers customization such as splitting texts into pairs (which helps with textual frequency analysis)
  • Can generate word list frequency
  • Performs basis statistical probability analysis, such as Chi-squared
  • Has an instructional user Power Point 

Disadvantages of Signature

  • Only offers five analytical options and cannot combine them
  • Does not include any additional help files and does not have an active research community
  • Has limited options for visualizations.
  • There is no evidence of Signature 2.0 coming out so this is likely not a sustainable platform

Suggestion

  • Signature may be a very useful classroom tool, due to its ease of use and graphical capabilities.

JGAAP

JGAAP was designed by Patrick Juola of Duquesne University, JGAAP stands for Java Graphical Authorship Attribution Program. It is a free Java-based program for textual analysis, text categorization, and authorship attribution.
Advantages of JGAAP
  • Provides extensive analytical customization such as Canonicizers (normalizes texts), Culling (what is removed from the data), Analytical Events (features such as n-grams, word length, etc.), and Analysis methods (Burrow’s Delta, Chi-Squared, etc).
  • Can process multiple texts and perform many different kinds of analytics at once.
  • ­Works easily with Java, no additional software or coding knowledge required.
  • GUI­ provides prompts and guidance related to different statistical options such as culling, analysis methods, and frequency options.

Disadvantages of JGAAP

  • Does not generate visualizations of data; only generates raw statistical scores.
  • Though the User Guide is very comprehensive, it has not been updated since version 5.1 (2013)
Fun Fact: Juola and JGAAP are most well known from the discovery of J.K. Rowling as author of The Cuckoo's Calling.
 

R-stylo

R and R Studio are free, open source, programming software used for graphics and statistical computing. R is the preferred platform for textual analysis, with built-in visualization tools. Stylo is a downloadable packet from R’s CRAN directory. R-Stylo reads plain text files, XML, or HTML and currently supports nine languages. 

Advantages of R-stylo

  • ­Provides the most comprehensive and most customizable analytical options for Stylometry.
  • ­Employs a GUI for ease of use, but can also accommodate more advanced functions.
  • ­Can generate detailed graphs and visuals that are easily downloadable in different formats.

Disadvantages of R-stylo

  • ­Requires some coding knowledge, so there is a learning curve.

R-stylo Resources

Both R more generally and the R-stylo package have very active coding communities. You can find examples of projects and open source code at these resources.

Github 

  • Provides open source code.

Computational Stylistics Group

  • Founded by Maciej Eder, Jan Rybicki, and Mike Kestemont, this online group contains numerous R-stylo resources and projects by some of the most active R coders and stylometric researchers.

Text Analysis in R for Students of Literature

  • This textbook is a short and targeted guide that provides clear step-by-step lessons for those wishing to learn to use R for literary textual analysis. Sometimes the Digital Scholarship Center at Paley Library does workshops using this book, so check the schedule.