Identifying intertextual relationships in large-scale digital text collections

As the Mellon Postdoctoral Fellow in Digital Humanities, I will spend the next two years at Oxford building upon my previous research on the digital Encyclopédie and, more generally, on new computational techniques for exploring and understanding the French Enlightenment and the transnational “Republic of Letters.” Leveraging the well-established expertise of Oxford in this domain – specifically the work enacted by Electronic Enlightenment (EE) and the Voltaire Foundation – I will apply various machine learning and data mining approaches to these collections in tandem with those established by ARTFL (American and French Research on the Treasury of the French Language, University of Chicago) in order to explore the intercultural and intertextual exchange of knowledge in the 18th century. In particular, I will examine the Electronic Enlightenment data housed at Oxford using a sequence alignment approach outlined in a recently published paper “Something Borrowed: Sequence Alignment and the Identification of Similar Passages in Large Text Collections” (Digital Studies / Le Champ numérique, 2011). Using this technique, shared passages between EE and the ARTFL Encyclopédie and other French data sets can be systematically identified and examined, providing researchers with an invaluable tool for tracing influence and exchange between the French philosophes and their European counterparts. This research project will represent a significant step forward in understanding the international scope and breadth of the circulation and translation of ideas in the age of Enlightenment. Moreover, by moving out from the Encyclopédie to larger data sets and collections, this project would also represent, in many ways, the culmination of several years of collaborative investigation into machine learning and text mining techniques for identifying intertextual relationships in large-scale digital text collections. Moving forward, these particular methods will no doubt prove increasingly fruitful for humanities and social science scholarship in general as mass digitization efforts continue to change the scholarly landscape of the 21st century.

Glenn Roe

help : login