CKAN is an open source data management system that for the past six years has been developed by the Open Knowledge Foundation. CKAN provides tools to steamline the processes of publishing, sharing, finding and using data. Initialliy CKAN was aimed at data publishers such as national and regional governments, companies and organisations that want to make their data publicly available. For example, the UK Government’s data portal (data.gov.uk) runs on CKAN, and in Australia, where CKAN is widely used by government agencies at both national and regional levels, it has become accepted as de facto standard for data management.
As part of the JISC Managing Research Data programme two projects, Orbital (University of Lincoln) and data.bris (University of Bristol) , have adopted CKAN as a component of their institutional RDM solutions. The experience of both projects using CKAN was discussed at the workshop “CKAN for Research Data Management” that was held in London on 18th February 2013. In addition to representatives from both projects, members from the Open Knowledge Foundation and staff from other UK Universities took part in the workshop. Summaries of the workshop can be found on the data.bris and Orbital project blogs.
As part of the workshop a requirements gathering exercise was started investigating the wider needs of the RDM community and matching these needs against the functionality of CKAN. As part of this requirements gathering exercise, the following RDM roles were decided on: researcher, developer, curator/manager, re-user, IT support, and data subjects.
The requirements were gathered through expressions in the following format: “As a [RDM role], I want [what, requirement], so that [why, reason]”.
Below is a list of requirements of what we would hope to find in a RDM solution that we have fed back into the requirements gathering exercise. CKAN already meets a number of these requirements.
|1||Curator / manager||Digital objects not to be stored within a database||So that where necessary / desirable preservation tools (e.g. JHOVE, DROID) can be used to continuously validate file integrity|
|2||Curator / manager||Ability to integrate with Research Information Systems (e.g. Pure)||To allow for efficiency of institutional processes, e.g. in relation to REF|
|3||Curator / manager||Usage metrics to be available||To support gathering of information on research impact, e.g. for REF|
|4||Curator / manager||The availability of records management functionality, e.g. for the administration of retention / life cycle management periods||To support institutional life cycle management processes|
|5||Curator / manager||Ability to link publications (in the publications repository) to associated datasets (in CKAN)||To allow for ease of access to publications and related data, to support transparency and openness|
|6||Curator / manager||Ability to draw metadata from Pure and/or to export metadata into Pure||To assist the integration of RDM solutions with the Research Information System|
|7||Curator / manager, researcher||Support of variety of academic subject-specific metadata standards||To ensure CKAN is useful and adaptable to a wide range of academic disciplines|
|8||Curator / manager, researcher||To keep track of versions of data||To allow for ease of tracking modifications made to individual files|
|9||Developer||CKAN to support a range of accepted protocols for metadata harvesting (e.g. OAI-PMH)||So that catalogues for a number of different data stores can be integrated and searched via a single point of access.|
|10||Developer||to harvest metadata from Fedora Commons, possibly via OAI-PMH||CKAN can present data that is kept in other repositories.|
|11||Developer||to use Fedora Commons as a FileStore (http://docs.ckan.org/en/ckan-1.8/filestore.html)||CKAN can access digital objects in Fedora (as an alternative to harvesting)|
|12||Developer||APIs to support the development of alternative methods of data ingest into CKAN and the development of tools for data analysis||Subject-specific RDM needs can be met|
|13||IT Support||CKAN to be able to support a number of different data and database structures||Subject-specific RDM needs can be supported and existing department-level RDM solutions can be integrated into an institutional CKAN RDM solution|
|14||IT Support||CKAN to support Shibboleth and other single sign on protocols||Institutional sign on mechanisms can be used to authenticate to CKAN|
|15||IT Support||CKAN to be able to integrate with institutional identity management (IDM) systems||Existing institutional IDM can be used to define roles and levels of access to individual datasets within CKAN.|
|16||IT Support||The availability of a documented mechanism of running several customisable instances of CKAN from the same codebase||Institutional support for potentially a multitude of CKAN instances to cater for various subject-specific needs can be done efficiently.|
|17||IT Support||The availability of maintenance agreements||So that institutions adopting CKAN can get expert technical support when needed|
|18||IT Support||Commitment from CKAN developers to keeping code base up-to-date||So that business continuity can be ensured.|
|19||IT Support||Any necessary security fixes to be developed quickly||So that system security can be ensured.|
|20||IT Support||metadata extraction upon ingest (including subject-specific metadata standards, e.g. TEI & VRA)||To avoid, where possible, manual metadata entry (reduction of typos, efficient use of staff time)|
|21||IT Support||The ability to run admin reports (e.g. storage space used in individual collections; file types contained in individual collections, access to / usage of (parts of) individual collections)||To allow for efficient support provision (e.g. planning of storage requirements, charging, etc.)|
|22||IT Support||CKAN to be designed in a modular fashion||So that it is possible to select individual components of the software and to integrate these with existing systems and technical infrastructure|
|23||IT Support||The availability of relevant and accessible user documentation that can be modified for local use||To reduce the number of CKAN users contacting the IT Service Desk for advice on how to use the system|
All requirements gathered will be summarised by the Orbital project and will feed into wider discussions on how CKAN can be developed further to support institutional RDM processes better.