Machine learning is being increasingly used to find diagnostic markers from high-dimensional molecular data. However, experimental setup and distributional changes between development and deployment populations may impede generalization of the diagnostics in real-world clinical settings. The authors discuss how by taking a causal perspective and modelling the underlying data-generating process, it is possible to better understand these challenges and improve reliability of developed models. The article is available at https://rdcu.be/dBMcE.