Skip to Content

Center forDigital Humanities

Banner Image

Digital Humanities Projects


PARAGON is a software system capable of intelligent collation and difference detection among materials from multiple repositories, digitized according to varying standards with a range of methods and equipment.

We created a prototype for this system to address methodological problems specific to the editing of printed materials from the early modern period in Europe, but in developing this prototype we propose to explore ways of extending its application to print materials from later periods, as well as its potential value to other projects, including the English Broadside Ballad Archive at UCSB.

Collation of early modern printed materials is crucial to the disciplines that study them because manuscript copy seldom survives. Print typically offers the only access to what an author wrote—and yet extant copies differ in many places, while in others they are manifestly erroneous. Early editors combined erudition with sometimes-inspired guesswork in emending texts, but they did so haphazardly; systematic collation began at the turn of the twentieth century with the rise of the "New Bibliography." Modern editorial theory, based on a rigorous empirical method informed by careful study of the practices and conditions in print shops, developed out of the New Bibliography when D. F. McKenzie demonstrated that seemingly innocuous commonsense assumptions woven through the deliberations of earlier bibliographers had in fact given rise to fantastical creatures—"Printers of the Mind," as the title of one seminal essay put it.

Textual collation provides one kind of evidence indispensable to modern bibliographic method: the systematic comparison of multiple copies from a single print run in order to specify differences among them. This evidence is indispensable because proofreading in early modern print shops was done on the fly, with corrections made by stopping the presses to remove and unlock the chase containing blocks of type, a practice that occasionally introduced new errors in the process of correcting old ones. Since paper was too expensive to waste, sheets representing different states of the text were gathered and bound together at random. For any given publication surviving from this period, it is therefore possible that no two copies from the same print run will be identical. Collation isolates differences so that the bibliographer can use them to build a detailed genealogy of the text's production. It is on the basis of such a genealogy that editorial decisions are made.

Collation is indispensable, but time-consuming and expensive. Textual scholars have contrived ingenious methods and machines to assist in this labor, but it still requires travel to distant repositories for the painstaking visual examination of multiple original copies. The result is that that editorial best practices recognized for decades have seldom been implemented on a large scale with any but the most canonical texts held in major collections. As scanning costs come down and rare printed materials are preserved in digital form, we have the opportunity to change the way textual editing is done. All we need is the software.

Efforts to computerize textual collation have so far depended on first transcribing the texts to be compared or running them through Optical Character Recognition software—either way introducing a layer of labor, expense, and error. PARAGON circumvents this step by automating the first stages of collation directly from scanned images of the original text. It also affords new functionalities for coping with variations in the quality and rendering of digital materials captured in different ways at different times, enabling comparison across dozens of archives and collections holding rare originals. Its technical solutions to the challenges of collation are, moreover, extensible to other projects, as we will demonstrate in two ways: by partnering with other investigators to test the system against their project needs, and by hosting a workshop in which computational scientists active in digital humanities can assess the system and explore its potential uses.

Related Publications

Miller, D. L. and Salvi, D. “PARAGON: Intelligent Collation and Difference Detection.” Presentation at the Renaissance Society of America, New York City, NY, March 27-29, 2014.