dtoolAI: Reproducibility for Deep Learning
16 Pages Posted: 6 Apr 2020 Publication Status: Published
More...Abstract
Science has made use of machine learning techniques for data analysis since their first development in the 1950s. More recently, Deep Learning, a set of new approaches using artificial neural networks, has generated rapid recent advancements in machine learning. Deep Learning has already brought these advancements to several scientific domains and will likely improve further over time.Deep Learning does, however, have the potential to reduce the reproducibility of scientific results. Deep Learning models are often "black boxes" that produce output without a clear understanding of how results arise. Model outputs are critically dependent on the data and processing approach used to initially generate the model, but this information is usually lost during model training.To avoid a future reproducibility crisis, we need to improve our Deep Learning model management now. At present there is little practical instruction to help solve this problem. The FAIR principles for data stewardship give excellent high level guidance on ensuring effective reuse of data. However these need to be translated into practical, domain specific guidance.We suggest some principles for the generation and use of Deep Learning models in science. We then present "dtoolAI", a Python package that we have developed to implement these principles in our own work and which we hope will be useful to others.
Keywords: Data, Data Managment, AI, artificial intelligence, Deep Learning, machine learning, Reproducibility, Provenance, FAIR Data
Suggested Citation: Suggested Citation