XQuerying the medieval Dubrovnik

XQuerying the medieval Dubrovnik

Neven Jovanović

neven.jovanovic@ffzg.hr

Faculty of Humanities and Social Sciences, University of Zagreb

Hrvatska / Croatia

Introduction

Since we are in Rome, which is so huge in so many ways, I cannot resist quoting Jean Bodin, who in 1576 called the Republic of Dubrovnik – or “Ragusia” in the Latin version of his De republica -- “the smallest among all European states”, and compared it to an ant (while the Ottoman, Mongol, and Spanish Empires were the elephants). So Dubrovnik would be in many ways an opposite of Rome – and yet, the two cities are in many ways similar as well. Today, history is significant for both cities; both are crowded with tourists (as I have seen in the Colosseum day before yesterday, around 11am); and, in both cities, their past is not immediately accessible and comprehensible.

The project that I am here to present aims to make the past of medieval Dubrovnik, if not more comprehensible – this is the task of historians – at least more accessible to historians. This accessibility of a large amount of facts (or records, or data) is achieved by encoding the archival series of decisions and deliberations made by the three administrative councils of Dubrovnik. The encoding is in XML, and it is done with a view of querying the XML with the help of the XQuery language. My intention today is to explain why we have chosen this way – why XML, and why XQuery – and to show how we did it, and what we plan to do with it. So I will first talk about the problems, then present the goals, describe the solutions that we have found, and finally confess what we have not done yet.

 

But first please note that most of the material that I will be presenting today is accessible on the internet address written large on the page behind me, on the page describing the encoding, and in our bitbucket repository. So, if you can spare a moment to go there and test what we have done (and, if possible, break it), please do. I'll be grateful if you get in touch with me about it afterwards to tell us about it.

 

The material

As Fernand Braudel noted in 1949, and I quoted in the abstract of this paper, the Acta consiliorum, records of decisions made by the three administrative councils of Dubrovnik, are huge; they consist of hundreds of handwritten volumes, predominantly in Latin, spanning the five centuries from 1301 until 1808. The Acta, or Reformationes, as they are also called, have not been published in their entirety. Furthermore, it took the efforts of two learned societies – first Yugoslav, then Croatian Academy of Arts and Sciences, and the Serbian Academy of Arts and Sciences – more than a 130 years (from 1879 to 2011) to publish, in the series Monumenta historica Ragusina (MHR), first thirty volumes of the reformationes, from the first ninety years between 1301 and 1395. The project is obviously progressing very, very slowly. Furthermore, some of the editors over the last 130 years chose a different approach to editing the Reformationes: since volumes from the first surviving century record the decisions of all three councils together by years, in 1950's edition they were reorganized by councils, thus relinquishing the actual manuscript order. Finally, the printed book form forced the editors to introduce abbreviations, omit certain material (such as the counter-proposals which were refused in voting), or certain information (such as the changes of scribal hands, or the actual image of the manuscript page).

Solution

We – that is, the Department of classical philology of the University of Zagreb, together with the Dubrovnik Institute of Historical Sciences of the Croatian Academy of Sciences and Arts – propose to publish previous and following volumes of the Monumenta historica Ragusina not only in book form, but also in TEI XML, as “MHR in XML”. To this aim, we have undertaken a pilot project of encoding the Volume 6 of MHR. The volume contains the council records from the years 1390-1392; it was edited in 2005 by Nella Lonza and Zdravko Šundrica.

Four goals of our XML edition

With MHR in XML pilot project, we intend to demonstrate (to the wider public): 1. that new knowledge can be generated from the combination of markup and original text; 2. we want to provide groundwork for a systematic digital publication of the MHR series, and to make it easier for everyone interested to produce a similar XML edition, or to join our undertaking; 3. we want to demonstrate powers of TEI XML which are not only archival, but also analytic (at the same time we ourselves have to learn how this can be achieved, of course); 4. finally, we want to open the MHR texts both for linguistic and historiographic exploration (and to encourage reuse of the texts and their integration in larger collections).

 

To achieve these goals, we have published the pilot volume of the MHR in XML as a Bitbucket repository, and we are in the process of describing the TEI markup solutions we have adopted. Apart from the textual structure – sections, headings, issues discussed and voted on – we are also marking names, dates, and measures (especially price values). The documentation describes elements, attributes, and values used in free prose at the moment.

 

Challenges

Dubrovnik was a small city with relatively closed, but well-documented society. This combination of small size and excellent documentation had already attracted researchers. In 1960, Irmgard Mahnken analyzed marriage relations of Dubrovnik nobility; in 1999, Mahnken's results were compiled into a genealogy file of the GEDCOM format, which has since been converted also to XML, and is currently available on the internet in both formats. However, these records deal only with births, marriages, deaths, and relationships of parents to children across generations; there is no political or government history there.

 

In 2000 by David Rheubottom used archival records to examine the relationship between kinship, marriage, and political change in Dubrovnik's elite over the period from 1440 to 1490. But Rheubottom relied on "classical" relational database, so he had to extract its records from original text, abstracting data from words; as far as I know, Rheubottom's database remained unpublished – only his interpretations were published in the monograph form.

 

So here lies the innovation of our approach. It is, of course, not innovative in the TEI terms – ours is a very middle-of-the-road project, and intentionally so – but it aims to be somewhat novel from the historiographer's perspective, in the three following points: we want to enable interpretation not only of the recorded facts, but also of their linguistic expression (which allows us to study, for example, the formulaic language of medieval city administration); we want to enable exploration of different sets of historiographic problems; finally, we want to publish (and, in my opinion, to open access to) both the XML encoded files and their documentation, as well as the XQueries which we found useful or interesting. The last point, publishing of the XQueries makes our research at the same time didactically accessible, repeatable, and reproducible.

 

In the remaining time I want to demonstrate this reproducibility of Xqueries, and how selected queries can be used not only to explore the XML file, or to test our results, but also to help researchers build their own queries.

[Demonstration]

XQuery – how hard can it be?

XQuery, first recommended as a standard by the W3C in 2007, is a powerful and expressive programming language. At the same time, XQuery is not something that computer users normally see. More often, they are served “canned” queries, or query templates, and the  XQuery layer remains hidden. Mastering XQuery can indeed seem a daunting task, especially for humanist scholars, especially for those who know nothing about XML and TEI. But, do not underestimate the power of motivation, and the power of examples. Let us not forget that the historians who plan to explore records of medieval Dubrovnik in their existing form have not only to somehow get to Dubrovnik or to its unpublished records, but to also master medieval Latin and medieval palaeography to use these records. On the other hand, a resource such as The Programming Historian collaborative textbook shows to what computing depths some historians are prepared to go to be able to pose interesting questions to their material.

 

The ideal user of the MHR in XML is an algorithmically literate medieval scholar, one which understands XML and XQuery, and uses these tools to formulate interesting problems. Perhaps the MHR in XML can, besides helping us share the heritage of Dubrovnik, help produce, that is educate, such digital humanists.