Islandora at IDCC 2015

I rounded off my time at the 2015 International Digital Curation Conference this week with a workshop on Islandora. The Digital Humanities site, through which items from the University’s Special Collections are made available online, uses Islandora 6, and we have plans to upgrade it to Islandora 7, the current version.

The open-source Islandora framework is based principally on three widely-used and well established open-source applications:

The Islandora extensions to Drupal make it much easier to interact with the rock-solid, secure but somewhat user-unfriendly Fedora software and make use of Solr to ease the discovery of information. A range of “solution packs” allow users to work effectively with a range of data types, both when ingesting them into the repository (e.g. capture of metadata, creation of derivatives) and when viewing them. For example, when a TIFF image is imported via the Image Solution Pack, copies in other formats, such as JPEG, are stored alongside the original TIFF. This is because TIFFs are excellent for archival purposes but not as well suited for display on the web as JPEGs. The Image Solution Pack also allows users to do things like annotate the image and interact with it in a zoomable image viewer.

The Islandora framework is managed by the Islandora Foundation, but much of the work, particularly on the core components, is carried out by Discovery Garden, who provide commercial services around Islandora. Alan Stanley of Discovery Garden was one of the original developers of Islandora and he led the workshop after giving an introductory demonstration on the first day of the conference. Alan will, I believe, be making his slides available through the conference web pages but they are not there for me to link to at the time of writing.

Working with an Islandora instance which was in place when I came into my position here, and not having worked with Islandora before, the overview and the explanation of the architecture was very useful. With my work on Islandora having been focused on certain key areas, principally ensuring that content will display correctly following an upgrade, coverage of other features such as the Form Builder and the way in which roles can be used within the system were particularly useful.

Unfortunately, upgrading from 6 to 7 has thus far proved more difficult than I anticipated. I have previously forced content to work in the new version using techniques that I wasn’t entirely happy with (e.g. directly editing XML in the repository) and sought some further guidance here. (The documentation can sometimes be less than helpful – see the rather unhelpful comment at the end of the Overview of the Book Solution Pack. Some detail on the migration script would be useful.) While I now have some understanding of the (admittedly good) reasons why the upgrade is tricky, there was not adequate time to get into any detail on it. Alan has agreed to help me out if I get in touch with him, however, and I will be taking him up on that offer.

We have some other projects in the pipeline that would seem to be a good fit for Islandora, so we are considering moving to a multi-site setup, with several Drupal instances drawing on a single Fedora Commons instance, as described by a poster from the University of Toronto Libraries at the conference. The workshop included some useful tips on how to go about doing that and it continues to look like a good way forward.

Discussion on standardising and sharing data among Arabic projects

I attended the 2013 Deutscher Orientalistentag in Münster to co-present a talk on the Arab Cultural Semantics in Transition (ARSEM) project, along with the principal investigator, Kirill Dmitriev. My half of the presentation, available here, focused on the technical aspects of the project.

Our presentation clashed with the start of the Digital Humanities (New methods of text analysis in Arabic and Islamic studies) panel, but I was still able to make it to most of the talks and the discussion afterwards.

The presenters in the Digital Humanities panel were:

The discussion, led by Andreas Kaplony, focused on the need for the diverse range of projects represented in the room to agree on standards. He told the story of how Hellenistic scholars had already done this, by sitting together for an afternoon.

Examples of what can and should be agreed upon are when centuries stop and start (did the 20th century end on 31 December 1999 or 31 December 2000?), citation standards and, more domain specific issues such as transliteration from Arabic to Latin (DIN 31635, Hans Wehr, Buckwalter etc.).

These aren’t problems which affect humans – if accuracy to the year is important, it can often be inferred from the context, and we are fairly forgiving when it comes to citation styles. And from what I understand, scholars can often read many different
Romanised forms of Arabic.

But computers are more simple minded, and need to be told explicitly the date range to search within or which transliteration method has been used.

It was suggested that many of these types of issue could be resolved by using a schema to describe the standards used. Academics wouldn’t have to all use the same standards, as long as they documented the standards that they did use. As has been said, “the nice thing about standards is that you have so many to choose from”.

Less equivocal meanings could then be obtained for fuzzy phrases such as “eighth century”. Crosswalks could translate data from one standard to another.

This solves some issues, but a crosswalk won’t always be available for converting transliterated text between methods (without access to the original vocalised Arabic). Some Romanisation methods are lossy, and not all lose the same information.

One of the main motivations for agreeing on standards is to facilitate the sharing of data. Although most people in the room were in favour of opening their research data, there wasn’t agreement on when in a project’s life cycle this should be done. To some, one’s data represented hard work and reputation, and wasn’t to be given away before the rewards had been obtained. Others were more confident that attribution could be preserved and a wide dissemination and use of one’s data would not be a waste of one’s effort.

Guidelines which apply to data accompanying published articles don’t necessarily apply to a long research project. Which, if any, milestones in a research project are analogous to publication and might mandate the opening of one’s research data?

Currently, email is used to share data on an ad hoc basis between researchers. Trust is a key component here. Another suggested alternative to simply offering up one’s data for download was to make it available via an API. This would control the access and also mean that the data that one made available was worked or processed and not raw data.

One thing which would be possible if researchers opened their data up via an API would be federated searches across multiple databases. Many projects are producing Arabic dictionaries focusing on different uses of Arabic in different periods of time and in different places, e.g. merchants’ shopping lists from the middle ages, Arabic words used in Occitan medical books or pre-Islamic poetry. With an API, it would be possible to search across projects using a single interface.

Each project, however, focuses on different features of the language, and has different strengths and weaknesses. But hopefully a subset of data will be present across projects, to make search federation a useful facility.

In addition to the projects linked to above, Kirill has been compiling a list of related projects.

As the number of projects with a large Arabic component on which I’m working increases, I found it very useful to learn more about the subject.

The Trajan’s Column Project

052.8-053.42

Trajan’s Column, a 35m-tall structure in Rome, is an important historical object that is decorated from top to bottom with a series of elaborate engravings. The Trajan’s Column project aimed to create a digital representation of the column with a searchable online database. Other digital databases of the column already exist but the novel aspect of this project is the exhaustive cataloguing of each individual figure. Study of the column necessitates access to detailed, high-quality images and the project meets this need by making a catalogue of over 1,700 images available. Similarly, precise analysis of individual figures is made possible by the provision of academic information on each figure. The finished website therefore contains a mixture of images (divided into diagrams, cast images and shaft images) and textual descriptions of the figures.

What is Trajan’s Column?

Trajan’s Column has emerged as an object of intense academic interest partly because of the level of detail in its engravings. Other examples of architecture from the Roman Empire have survived but the column is remarkable for the precision and intricacy its figures exhibit. Furthermore, the column has a balcony from which visitors can survey Rome from above.  As such, the column is an important historical artefact which is appealing as a subject for further study.

Positioned within the Forum Traiani, the column is an epic monument to the two Dacian wars of 101-102 and 105-106 AD.  The carvings document the expansion of the Roman Empire into Eastern Europe under the command of the emperor Trajan. The two wars are given approximately the same amount of space, although there is more fighting within the first half and more travelling within the second half. The column serves to glorify the emperor and publicise his military successes. By contrast, the Dacian king Decebalus is represented as a nemesis who is eventually beheaded.

There is a quite dazzling array of figures depicted on the column, including both Roman and Dacian soldiers, civilians, deities, standard bearers, musicians, ceremonial attendants and archers. A great variety of different styles of armour, clothing and weaponry are evident amongst the figures and it has been suggested that variations can be attributed to the individual techniques of different sculptors.

Casts of the column were made in the 1860s which have helped to preserve the detail of the engravings in a different material. An analysis of both the casts and the original in contrast is the necessary for fruitful academic study. As a result, the project was involved in cataloguing images of both casts and ‘shaft’ (the original column) and clearly demarking the different types of images. These images were primarily photos of the column taken by the project’s Principle Investigator, Dr Jon Coulston of the School of Classics.

Other digital resources of the column already exist, including that of the German Archaeological Institute (GAI). The GAI archive is particularly useful because it features photographs of the column which are categorised by scene rather than drawings or artistic representations. The St Andrews project differs from others in that it is primarily composed of full-colour photographs. The most distinctive aspect, however, is the categorisation of figures within scenes.

The Process: The Database

This raw data next had to be put into a database format. The University’s image database was used as a system in which to input data. The image database conforms to the Visual Resources Association’s (VRA) Core 4 standard. Specifically, it differentiates between works and images of works and specifies the relationship between images. Individual scenes/figures or groups of scenes/figures were designated as works. Diagrams were demarked as ‘parent images’ to other diagrams and a link formed between them.

There were two basic types of entries: those relating to images and those relating to figures. Images were typically of multiple images or scenes and were divided into five categories: figure, figures, scene, scenes and detail. ‘Scene’ was used for images which contained all the figures from within a scene and ‘scenes’ for images which featured images from more than one scene. ‘Detail’ was used for images which did not feature one figure in its entirety but which perhaps focused on a particular helmet, shield or building. Figure entries were always designated as ‘figure’ and contained the figure description. Whilst there may be a possibility that images could be uploaded in the future for each individual figure these entries currently remain without images.

The Process: Types of Data

The column has traditionally been divided into 155 scenes in order to enable simple engravings to be simply referenced. The project maintained these scene divisions but added the further specification of numbering the individual figures within these scenes. The figures were numbered from left to right in each scene so that the leftmost figure in scene 1 was designated as figure 1. Each figure was given a five digit reference code so that scene 1 figure 1 was 001.01. Some scenes (such as scene 40) contained as many as 72 figures whereas others (such as scenes 3 and 78) only contained one. It is hoped that this unique system of designating figures will enable scholars to refer to specific figures of interest with increased clarity.

Three types of data were involved and this data formed the basis for the searchable database. The first type of data was images. These images were divided into three categories: diagrams, cast photos and shaft photos. The diagrams were simple black and white drawings which were produced for each scene with each figure being clearly numbered within the diagram. The photos were labelled with a file name that indicated the scene(s) and figure(s) it referred to and whether it was either a cast photo or a shaft photo. The difference between the two photo types is important and so there was a paramount need for accuracy when inputting the data and for clarifying the correct designation when it was unclear. Other information, such as the date when the photo was taken, may have been useful in order to comply more fully with the VRA Core 4 standard but it was necessary to recognise the limitations within both the data that was supplied and the exhaustiveness of the project.

Secondly, data was supplied by Dr Coulston relating to each individual figure, describing the figure type, armour details and so on. This information appears in the website in such a way that users see that figure 1 from scene 5 has lorica hamata armour then simply clicking on this armour type displays the other figures which possess this type of armour. Thirdly, image coordinates were generated by a programme that analysed the SVG images of the diagrams. The coordinates create a link between the diagram of the column and the diagrams of the individual scenes and between the diagrams of scenes and entries relating to individual figures. This feature meant that scenes could be clearly located within the column as a whole and that there was a visual element to navigating through the website.

The Outcome: The Trajan’s Project Website

The website was designed by Mary Woodock-Kroble and Swithun Crowe. A PHP script pulled XML data out of the image database, transforms the data and adds it to a Solr index. This index is the basis for search queries. Emma Lewsley was charged with checking the entries for any errors or inconsistencies. The finished website allows users to search the column by scene or figure. There is also the option to ‘zoom in’ or ‘zoom out’ through scenes and figures.

The website may be accessed at the following address: http://arts.st-andrews.ac.uk/trajans-column/. I would like to thank Dr Jon Coulston for allowing me to reproduce his photo here and for contributing information about the column.

How to Search the Column

Screenshot 1Figure 1: Users may search the column in two main ways. Firstly, they may identify a scene of interest from a diagram of the column in its entirety that is labelled in Roman numerals. For example, hovering over ‘LIII’ reveals a link to Scene 53.

Screenshot 2Figure 2: A diagram of the relevant scene is included, allowing the user to select an individual figure. Again, links are included in the diagram so that clicking on ‘1’ provides access to the entry for scene 53 figure 1.

Screenshot 3Figure 3: Each figure entry contains information relating to the figure. Clicking on ‘zoom out’ shows the user the images in which figure 1 features.

Screenshot 4Figure 4: Alternatively, users can search by typing a scene/figure number into the search boxes on the left.

Screenshot 5Figure 5: Here a search for ‘053’ brings up all the figure entries for that scene and all the images that relate to scene 53.

Screenshot 6Figure 6: Each image entry contains information on the material (cast or shaft) and an image type (figure, figures, scene, scenes or detail).

Online Research Database Service (ORDS)

It is not uncommon for us to get requests from researchers for the setup of databases. Especially when such requests relate to unfunded research, we have often had to decline them. As a result, a number of research datasets have remained unpublished.

The problem of not having the resources to provide bespoke technical solutions to every research project that applies for our help is not unique to St Andrews. But then, not every project requires a bespoke solution. Over the past years Oxford University IT Services undertook the VIDaaS (Virtual Infrastructure with Database as a Service)  project to develop a technical solution that allows researchers to build and publish online database quickly and without the need for programming skills.

VIDaaS runs on the DaaS (Database as a Service) software also developed by Oxford. DaaS is an online solution that provides users with a WYSIWIG (what you see is what you get) interface. DaaS allows for version-control and, if necessary, it permits users to make changes to the database schema via a drag and drop mechanism. There is no need to make any modifications to srcipts or user interfaces that normally become necessary as a result of a change to the database schema. These changes will be undertaken automatically by the DaaS software when the database schema is changed.

Oxford has been working at DaaS to support different types of database, including relational databases, XML databases and document databases. Plans also included the ablitiy to upload MS Access databases and for DaaS to convert these into online PostgeSQL databases.

DaaS supports various levels of access restriction to the data. The ability to support institutional single sign-on mechanisms is being developed. Several members of research groups can be given permission to modify the project database.

The VIDaaS project came to an end in 2012, and since then Oxford University IT Services has migrated the DaaS software to a more secure technical environment.

The Online Research Database Service (ORDS) that is currently under development uses and builds on the outcome of the VIDaaS project. Oxford University IT Services is currently undertaking an ORDS maturity project with a view to both supporting researchers at Oxford and making the DaaS software available to other institutions. As part of the ORDS maturity project the Research Computing Service will be looking at the software and its features with a view to investigating the feasibility of providing the ORDS locally to researchers within the University of St Andrews.

St Andrews Digital Collections project

We are currently working with the University Library on setting up an Islandora repository for the St Andrews Digital Collections project. The project came into being as a result of efforts from the Digital Humanities Research Librarian to promote to Library’s Special Collections and to increase access to rare materials within Special Collections’ holdings through making a growing proportion of materials available digitally to University staff and students as well as to the general public.

According to the map of the Islandora community the Digital Collections repository will be the second instance of an islandora installation in the UK and the first example od a UK University Libary adopting the software. The availability of various solution packs and the emerging support for the Digital Humanities within the software made Islandora a particularly attractive option for this project.

The repository runs on existing Research Computing infrastructure and is now at a stage where content is added. More information on this resource and when it will become available to the public will follow in due course.