Guest Post: The Power of Open Data
Dr. Kirstie Whitaker is a Research Associate in the Department of Psychiatry and Brain Mapping Unit at the University of Cambridge with a focus on brain development during the teenage years. Here she talks about her visit to The Data Dialogue: Time to Share and what she took away.
I’m a researcher in the Brain Mapping Unit at the University of Cambridge and I spend my days crunching gigabytes of data from magnetic resonance images to see how brain structure changes during adolescence.
In the past, human neuroimaging has been really expensive. This has meant we’ve had small groups of participants and these have lead to really rather limited statistical testing. It’s not that we think the brain is super simple! But when it costs £1000 per scan there’s not much we can do with our budget!
Except maybe there is. We could all come together and share our data! We could pool our resources and see what we can find together. It would be so much more efficient and we’d be more likely to find real meaning in our analyses.
I’m confident that the future will be very bright indeed
Events like The Data Dialogue make me feel so excited for my future as a scientist. I left feeling inspired and hopeful that we are building a critical mass of young (and older) researchers who will make sure that transparency, collaboration, efficiency and innovation are at the front and centre of research in the years to come. The one day event held at the University of Cambridge focused on navigating the boundaries & benefits of sharing data. For me, the greatest highlights were the passion in the room – from both the speakers and the audience members – and the variety of data that is curated and ready to use.We had speakers from almost every field you could imagine. There people from clinical medicine, psychology, biological sciences, political science, chemistry, engineering, computer science, and a multitude of government services. There were key themes that popped up again and again. Many of the speakers mentioned the FAIR principles: Data should be Find-able, Accessible, Inter-operable and Reusable. The UK Data Service in particular have put a huge amount of effort into building good data sharing habits early on, and smoothing the logistical challenges that come with making your data easy to share.
I’m willing to bet that everyone in the room had ideas for new ways of investigating or integrating the data we were exposed to. But it is only by making access to the data easy that we will see the outcomes of these exciting hypotheses. We also need to ensure that it is easy for early career researchers to implement their ideas. I loved Nicole Janz’s keynote on how she empowers students to actually perform statistical analyses by replicating published work. It’s one thing to learn from a textbook but quite another to deal with real data.
Another theme was the importance of working transparently and reproducibly. I’m always terrified by how difficult it is to get information from the authors of published papers.I was so delighted to hear how often the speakers mentioned the benefits of keeping track of everything you’ve done.My research focuses on brain development during the teenage years. My colleagues and I recently used data that we’d collected as part of the Wellcome Trust funded Neuroscience in Psychiatry Network to measure how the structure of the brain changes during adolescence. We related our findings to openly available data on gene expression from the Allen Institute for Brain Science. We found that genes related to the risk of schizophrenia are over represented in the regions that continue to change all the way into your 20s, and therefore linked the non-invasive imaging technique of magnetic resonance imaging with the heroic work that the Allen Institute have performed measuring the expression of 20,000 genes at hundreds of regions across the brain. It is a result that simply couldn’t have happened if the Allen data weren’t available for open use.
We replicated our findings in two independent cohorts, and we’ve made our MRI data available online. I’m so excited to see the innovations that will occur as a result of researchers taking a look at what we did (you can follow along every analysis with the code provided) and extending it with their own ideas.
My work is just one example of the power of openly sharing our data. If we can nurture the incredible projects that were featured at the Data Dialogue, and support the early career researchers who participated in the discussions all day long, I’m confident that the future will be very bright indeed.