puc-header

dtoolAI: Reproducibility for Deep Learning

16 Pages Posted: 6 Apr 2020 Publication Status: Published

More...

Abstract

Science has made use of machine learning techniques for data analysis since their first development in the 1950s. More recently, Deep Learning, a set of new approaches using artificial neural networks, has generated rapid recent advancements in machine learning. Deep Learning has already brought these advancements to several scientific domains and will likely improve further over time.Deep Learning does, however, have the potential to reduce the reproducibility of scientific results. Deep Learning models are often "black boxes" that produce output without a clear understanding of how results arise. Model outputs are critically dependent on the data and processing approach used to initially generate the model, but this information is usually lost during model training.To avoid a future reproducibility crisis, we need to improve our Deep Learning model management now. At present there is little practical instruction to help solve this problem. The FAIR principles for data stewardship give excellent high level guidance on ensuring effective reuse of data. However these need to be translated into practical, domain specific guidance.We suggest some principles for the generation and use of Deep Learning models in science. We then present "dtoolAI", a Python package that we have developed to implement these principles in our own work and which we hope will be useful to others.

Keywords: Data, Data Managment, AI, artificial intelligence, Deep Learning, machine learning, Reproducibility, Provenance, FAIR Data

Suggested Citation

Hartley, Matthew and Olsson, Tjelvar S. G., dtoolAI: Reproducibility for Deep Learning. Available at SSRN: https://ssrn.com/abstract=3565984 or http://dx.doi.org/10.2139/ssrn.3565984
This version of the paper has not been formally peer reviewed.

Tjelvar S. G. Olsson

John Innes Centre

Norwich Research Park
Norwich, NR4 7UH
United Kingdom

Click here to go to Cell.com

Paper statistics

Downloads
17
Abstract Views
313
PlumX Metrics