How much (more) research data do we have, and where do we store it?

In January and February 2013 we asked Computing Officers across the University to provide us with feedback about the storage requirements for research data within their respective school.

15 out of 19 schools and 3 research centres responded to the survey. The format in which responses were received varied from a summary response per school or research centre to receiving a compilation of forms completed by researchers. For a number of schools Computing Officers have pointed out that participation from researchers in the survey was low, for example 5 researchers out of 61 for one school, or 11 out of 30 for another. Given the response rate, it is inevitable that the survey outcomes presented here are unable to provide a full picture of the University’s research data storage need.

After we received the feedback from Computing Officers we compared it to the outcome of the Data Asset Framework audits carried out in 2012, and where we had identified additional needs, these were added. As a final step we looked at the requests for research data storage that reached IT Services over the last year and at data that is already stored on existing Research Computing infrastructure. Where we found gaps, we added the figures to the storage requirements for each school.

The survey identified a storage need of 1.36Pb for active data with an annual increase of approximately 10%, and a long-term storage need of approximately 600Tb with an annual increase of 20%. The feedback that we received also suggests that most researchers do not currently differentiate between active data and long-term storage, which will almost certainly have had an effect on the figures presented here. The 1.36Pb of active data storage include a requirement of 600Tb for working copies of data produced by a High Performance Computing Cluster in the School of Mathematics and Statistics and 150Tb of scratch space for HPC users.

The table below provides a breakdown of current data storage split into the use of different options. It is obvious that only a very small proportion of the University‚Äôs research data (0.14%) is stored on centrally provided systems. Worryingly, an approximate 19% of research data is currently stored on staff computers, and a further 15% is on external storage media. Only 5% of research data is stored in the public cloud. None of the respondents used an external organisation or data centre to look after their data. Only one respondent indicated that some of their research data is stored on a privately owned computer at home. It is to expect that the figure for research data storage on home¬†computers is much higher than the few Gb in the category “other” the survey identified.

active data long-term storage
current need annual increase current need annual increase
ITS Central Filestore 369.6 89.6 232.0 0.0
ITS web servers 2,066.0 1,219.5 0.0 1,000.0
Networked file store (non-ITS) 938,721.1 37,040.1 153,000.0 27,600.2
non-ITS web servers 39,510.0 19,542.0 50,004.5 16,500.0
staff computers 208,253.0 28,993.0 164,030.0 732.5
external storage media 114,635.0 38,292.0 181,361.0 47,812.5
external organisation or datacentre 0.0 0.0 0.0 0.0
use of external cloud services 54,593.0 1,273.8 50,000.0 0.0
other 32.0 6.2 0.2 0.0
Total (in Gb) 1,358,179.7 126,456.2 598,627.7 93,645.2

The charts that follow provide a breakdown of the various types of data storage across the different schools and research centres. It is clear from those charts that there are some rather significant gaps, and that the results presented here only provide an incomplete overview of the University’s research data storage requirement.

Some of the charts suggest that, compared to the storage needs in other schools and research centres, there is some especially data intensive research in the School of Biology and in CREEM. While this is likely to be one factor, it is worth noting that responses from Biology and CREEM were much fuller than those received from other parts of the University. It is likely, therefore, that a higher proportion of research data has been identified than has been achieved in other schools and research centres.

IT Services central file store

ITS web servers

Networked file store (non-ITS)

The category on non-ITS networked file storage contains 600Tb of storage of working copies of HPC data (School of Mathematics and Statistics) and 150Tb of scratch space for HPC users (School of Chemistry). A further 62Tb for active data and 100Tb for long-term storage were added as a result of requests received by IT Services.

Non-ITS web servers

Staff computers

External storage media

External cloud services