Data science systems (DSSs) are a fundamental tool in many areas of research and are now being developed by people with a myriad of backgrounds. This is coupled with a crisis in the reproducibility of such DSSs, despite the wide availability of powerful tools for data science and machine learning over the past decade. We believe that perverse incentives and a lack of widespread software engineering skills are among the many causes of this crisis and analyse why software engineering and building large complex systems is, in general, hard. Based on these insights, we identify how software engineering addresses those difficulties and how one might apply and generalize software engineering methods to make DSSs more fit for purpose. We advocate two key development philosophies: one should incrementally grow—not plan then build—DSSs, and one should use two types of feedback loop during development—one that tests the code’s correctness and another that evaluates the code’s efficacy.

Navigating the development challenges in creating complex data systems / Dittmer, S.; Roberts, M.; Gilbey, J.; Biguri, A.; Selby, I.; Breger, A.; Thorpe, M.; Weir-McCall, J. R.; Gkrania-Klotsas, E.; Korhonen, A.; Jefferson, E.; Langs, G.; Yang, G.; Prosch, H.; Stanczuk, J.; Tang, J.; Babar, J.; Escudero Sanchez, L.; Teare, P.; Patel, M.; Wassin, M.; Holzer, M.; Walton, N.; Lio, P.; Shadbahr, T.; Sala, E.; Preller, J.; Rudd, J. H. F.; Aston, J. A. D.; Schonlieb, C. -B.. - In: NATURE MACHINE INTELLIGENCE. - ISSN 2522-5839. - 5:7(2023), pp. 681-686. [10.1038/s42256-023-00665-x]

Navigating the development challenges in creating complex data systems

Lio P.;
2023

Abstract

Data science systems (DSSs) are a fundamental tool in many areas of research and are now being developed by people with a myriad of backgrounds. This is coupled with a crisis in the reproducibility of such DSSs, despite the wide availability of powerful tools for data science and machine learning over the past decade. We believe that perverse incentives and a lack of widespread software engineering skills are among the many causes of this crisis and analyse why software engineering and building large complex systems is, in general, hard. Based on these insights, we identify how software engineering addresses those difficulties and how one might apply and generalize software engineering methods to make DSSs more fit for purpose. We advocate two key development philosophies: one should incrementally grow—not plan then build—DSSs, and one should use two types of feedback loop during development—one that tests the code’s correctness and another that evaluates the code’s efficacy.
2023
Codes (symbols); Data Science; Philosophical aspects
01 Pubblicazione su rivista::01a Articolo in rivista
Navigating the development challenges in creating complex data systems / Dittmer, S.; Roberts, M.; Gilbey, J.; Biguri, A.; Selby, I.; Breger, A.; Thorpe, M.; Weir-McCall, J. R.; Gkrania-Klotsas, E.; Korhonen, A.; Jefferson, E.; Langs, G.; Yang, G.; Prosch, H.; Stanczuk, J.; Tang, J.; Babar, J.; Escudero Sanchez, L.; Teare, P.; Patel, M.; Wassin, M.; Holzer, M.; Walton, N.; Lio, P.; Shadbahr, T.; Sala, E.; Preller, J.; Rudd, J. H. F.; Aston, J. A. D.; Schonlieb, C. -B.. - In: NATURE MACHINE INTELLIGENCE. - ISSN 2522-5839. - 5:7(2023), pp. 681-686. [10.1038/s42256-023-00665-x]
File allegati a questo prodotto
File Dimensione Formato  
Dittmer_Navigating_2023.pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 3.77 MB
Formato Adobe PDF
3.77 MB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1728761
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 4
social impact