Skip navigation to content

CKAN for Research Data Management

CKAN is an open source data management system that for the past six years has been developed by the Open Knowledge Foundation. CKAN provides tools to steamline the processes of publishing, sharing, finding and using data. Initialliy CKAN was aimed at data publishers such as national and regional governments, companies and organisations that want to make their data publicly available. For example, the UK Government’s data portal  (data.gov.uk) runs on CKAN, and in Australia, where CKAN is widely used by government agencies at both national and regional levels, it has become accepted as de facto standard for data management.

As part of the JISC Managing Research Data programme two projects, Orbital (University of Lincoln) and data.bris (University of Bristol) , have adopted CKAN as a component of their institutional RDM solutions.  The experience of both projects using CKAN was discussed at the workshop “CKAN for Research Data Management” that was held in London on 18th February 2013. In addition to representatives from both projects, members from the Open Knowledge Foundation and staff from other UK Universities took part in the workshop. Summaries of the workshop can be found on the data.bris and Orbital project blogs.

As part of the workshop a requirements gathering exercise was started investigating the wider needs of the RDM community and matching these needs against the functionality of CKAN. As part of this requirements gathering exercise, the following RDM roles were decided on: researcher, developer, curator/manager, re-user, IT support, and data subjects.

The requirements were gathered through expressions in the following format: “As a [RDM role], I want [what, requirement], so that [why, reason]”.

Below is a list of requirements of what we would hope to find in a RDM solution that we have fed back into the requirements gathering exercise. CKAN already meets a number of these requirements.

No RDM role Requirement Reason
1 Curator / manager Digital objects not to be stored within a database So that where necessary / desirable preservation tools (e.g.  JHOVE, DROID) can be used to continuously validate file integrity
2 Curator / manager Ability to integrate with Research Information Systems (e.g. Pure) To allow for efficiency of institutional processes, e.g. in relation to REF
3 Curator / manager Usage metrics to be available To support gathering of information on research impact, e.g. for REF
4 Curator / manager The availability of records management functionality, e.g. for the administration of retention / life cycle management periods To support institutional life cycle management processes
5 Curator / manager Ability to link publications (in the publications repository) to associated datasets (in CKAN) To allow for ease of access to publications and related data, to support transparency and openness
6 Curator / manager Ability to draw metadata from Pure and/or to export metadata into Pure To assist the integration of RDM solutions with the Research Information System
7 Curator / manager, researcher Support of variety of academic subject-specific metadata standards To ensure CKAN is useful and adaptable to a wide range of academic disciplines
8 Curator / manager, researcher To keep track of versions of data To allow for ease of tracking modifications made to individual files
9 Developer CKAN to support a range of accepted protocols for metadata harvesting (e.g. OAI-PMH) So that catalogues for a number of different data stores can be integrated and searched via a single point of access.
10 Developer to harvest metadata from Fedora Commons, possibly via OAI-PMH CKAN can present data that is kept in other repositories.
11 Developer to use Fedora Commons as a FileStore (http://docs.ckan.org/en/ckan-1.8/filestore.html) CKAN can access digital objects in Fedora (as an alternative to harvesting)
12 Developer APIs to support the development of  alternative methods of data ingest into CKAN and the development of tools for data analysis Subject-specific RDM needs can be met
13 IT Support CKAN to be able to support a number of different data and database structures Subject-specific RDM needs can be supported and existing department-level RDM solutions can be integrated into an institutional CKAN RDM solution
14 IT Support CKAN to support Shibboleth and other single sign on protocols Institutional sign on mechanisms can be used to authenticate to CKAN
15 IT Support CKAN to be able to integrate with institutional identity management (IDM) systems Existing institutional IDM can be used to define roles and levels of access to individual datasets within CKAN.
16 IT Support The availability of a documented mechanism of running several customisable instances of CKAN from the same codebase Institutional support for potentially a multitude of CKAN instances to cater for various subject-specific needs can be done efficiently.
17 IT Support The availability of maintenance agreements So that institutions adopting CKAN can get expert technical support when needed
18 IT Support Commitment from CKAN developers to keeping code base up-to-date So that business continuity can be ensured.
19 IT Support Any necessary security fixes to be developed quickly So that system security can be ensured.
20 IT Support metadata extraction upon ingest (including subject-specific metadata standards, e.g. TEI & VRA) To avoid, where possible, manual metadata entry (reduction of typos, efficient use of staff time)
21 IT Support The ability to run admin reports (e.g. storage space used in individual collections; file types contained in individual collections, access to / usage of (parts of) individual collections) To allow for efficient support provision (e.g. planning of storage requirements, charging, etc.)
22 IT Support CKAN to be designed in a modular fashion So that it is possible to select individual components of the software and to integrate these with existing systems and technical infrastructure
23 IT Support The availability of relevant and accessible user documentation that can be modified for local use To reduce the number of CKAN users contacting the IT Service Desk for advice on how to use the system

All requirements gathered will be summarised by the Orbital project and will feed into wider discussions on how CKAN can be developed further to support institutional RDM processes better.

Comments are closed.