The ScottisH Informatics Programme (SHIP) conference was held at the University of St Andrews on the 28-30th August 2013. The conference provided a forum to discuss the usage of linked electronic healthcare datasets. Those in attendance featured heavily from Scotland & Australia (even some Scottish working in Australia to confuse us all!). The cover on the book of abstracts was a clever ‘word cloud’ created from all the abstracts that were submitted; ‘Data’, ‘Linkage’, ‘Health’ all featured very prominently with ‘Electronic’, ‘Administrative’, ‘Care’, ‘Research’, ‘Governance’ closely following. The conference was vast, with 7 parallel sessions to choose from. I tried to attend as many diverse sessions as I could to get a complete picture of linked healthcare data so this may be reflected in my summary of the whole event.
The opening address was given by Christopher Chute a professor of Biomedical Informatics at the Mayo Clinic entitled “Big Data meets Healthcare: the case for comparability and consistency”. He talked about the “Chasm of Semantic Despair” and how often standards can feel like “lead in shoes”. Following on from this I attended a session entitled “Data transparency, access and public engagement…” Throughout the session the key message became obvious – ‘data is there, but Researchers can’t always gain access to it in a timely manner…’ One speaker explained how in her Canadian study it showed that a funding lifecycle can be ~2years, but it can take 1-18months to obtain the data required to complete the study. She questioned strongly the ‘risks’ in building a career using administrative health data. Speakers spoke of “the process” involved in obtaining data: ‘the application’, where the data requested is matched against the research questions; ‘the review’, where ethics, risks, privacy, legislation were all carefully examined – “adjudication & grant approval” and finally ‘data preparation’, the complexities involved in data extraction, linkage etc. Interestingly, or perhaps not?, the theme emerged time and time again, not only during this session but throughout the entire conference, that the technical components are not causing a headache but red tape and extensive, overlapping ‘review’ processes are. One speaker from Australia shared that there were 30 individual approvals required for their Population Health Research Network. As a technical person I am always cognisant that emerging technologies are providing useful tools for researchers enabling them to focus on their research and not be restricted by having to work around technology limitations, however this was not the take home message. Has too much time been spent focusing on the exciting, challenging technical intricacies and as a result has data governance been ignored? Is the fear of getting it wrong, therefore adapting a belt and braces approach, stifling research creativity and discovery?
Access platforms were discussed as where the complexities of web based delivery of such datasets. Emerging platforms included SURE: Secure Unified Research Environments in Australia and SAIL: Secure Anonymised Information Linkage Databank at Swansea University. Both offered (all be it with a little healthy competitiveness) to provide secure computing environments where researchers can log into remotely to access linked datasets.
SAIL & SURE where repeatedly mentioned as real solutions to dealing with the issues of datasets being anonymised at source but becoming re-identifiable by data linkage. Coming from a spatial background one clever feature of SURE caught my attention – the ability to use spatial data, de-identified and randomised, to obtain new data within SURE where the results are reconnected to the initial data without any breeches of confidentiality. Essentially, SURE carries out all the linkages behind the scenes, resulting in an enhanced spatial dataset, without any of the complexities of linkage data.
SeRP: Secure e–Research Platform (also from Swansea University) was described in detail by Simon Thompson as SAIL only bigger and more generic. It is due to go-live in March 2014 and has a very impressive (self-provisioning) menu on offer including: SQL Server 2012, HADOOP, R Cluster, LSM Server, dataset management, free text handling, service management etc. When questioned on the costs there were no clear figures offered just a reference to the platform being publically funded and a comment that if the setup is not used it will be a great shame. But I get the sense that this service will be in huge demand once the capabilities are more widely advertised.
Other presentations explored the work involved in sharing and data reuse. Amongst the researchers there was a sense of fear that data quality will decline unless research data outputs are seen as assets and financially supported. Funders are currently asking researchers to share their data, but without any financial support or incentives. There was some discussion around The Wellcome Trust delving into this further which was met with interest.
One very inspirational keynote speaker was Professor Ian Deary from the University of Edinburgh. His talk was entitled “Reusing historical data: the Scottish Mental Surveys of 1932 and 1947”. These surveys were carried out nationally to assess the mental ability of Scottish children. He talked about rediscovering the paper ledgers in storage at the basement of the Scottish Council for Research in Education in Glasgow with his colleague Professor Lawrence Whalley and how cohorts were set up subsequentially to reassess some of the participants now in their 80s and 90s. It was all very exciting with so many influential studies made possible as a direct result of the data rediscovery and reusage, he was only able to focus in on the top 10 peer reviewed outputs.
The final session I attended was entitled “The role of computer science in e-Health research”. This was a special session and I hoped to walk away with a clear vision of the future of the two disciplines. However this was not the case. Nobody can doubt the role computer science has to play in deciphering all this administrative healthcare data. However, the biggest limiting factor in regard to data is governance.
If I were to carry out my own ‘word cloud’ for the diverse sessions I attended I think I would put ‘Governance’ x10 bigger than any other. One thing was very clear; there are some real cutting edge and important studies being carried out using linked healthcare data both nationally and internationally. These studies demand, and will continue to demand, the ability to access elastic computation capabilities including storage, assistance with anonymisation and encryption, secure data transportation, access control, reliable record linkage but most importantly they require immediate verification of compliance with data governance.
Is enough time being devoted to ‘cryptographic security’ – addressing the standards around legal, political and ethical protocols on using administrative datasets in research? Are funds being appropriately spent in enhancing these studies? Has too much been invested in fixing data (programmatic) once it’s been collected? Should those inputting the data be trained and made aware of how the data, they input, may be used in the future? Are those that consent to such datasets being gathered fully informed of the intended usage? For researchers to be able to harness the benefits of the availability of datasets more fully, it is important that these questions and the challenges they represent are addressed.