Data Science Team Training

Author

Stephen D. Turner, Ph.D.

Published

July 2, 2026

Preface

Data Science Team Training

Public health has always been a data-driven enterprise, but the tools available to practitioners have changed dramatically. Spreadsheets that once required weeks of manual work can now be analyzed in seconds. Reports that used to mean emailing static PDFs can now execute code, update automatically when data changes, and be produced in multiple formats from a single source file. And workflows that depended on one person’s institutional knowledge can be documented, versioned, and shared across teams. The goal of this book is to help public health professionals take advantage of these tools.

This book grew out of the CSTE Data Science Team Training (DSTT) program, I serve as a project coach working with public health agencies across the country to build data science capacity in the public health workforce. The material here includes code, resources, workshop notes, and practical guidance accumulated and refined over several years of coaching teams at local and state health departments.

The chapters cover the foundational practices that make data science work sustainable and collaborative in a public health setting. Technical chapters address organizing and validating data, connecting to and querying relational databases, pulling data from public APIs and open data portals, migrating legacy SAS and Excel workflows, writing clean and well-documented code, managing reproducible environments and package dependencies, building R packages to share functions across projects, producing accessible reproducible reports and dashboards, automating and scheduling recurring work, and working effectively with AI coding assistants. Nontechnical chapters cover building and sustaining a data science team, project management, peer review of analytical work, navigating the data governance and IT relationships that shape what public health teams can actually do with their data, and communicating findings clearly to audiences who did not run the analysis. These are the practical skills that separate ad hoc analyses from reproducible, maintainable work.

While this material was created with DSTT participants in mind, it is intended to be broadly useful to anyone working at the intersection of data science and public health, whether you are a current or former DSTT participant, a public health practitioner looking to strengthen your analytical skills, or someone new to data science in a public health context.

No single book can cover everything, and this one does not try to. Appendix A — Additional Resources points to additional reading and training for topics that go deeper than what is covered here.

Cite this book:

Turner, Stephen D. (2026). Data Science Team Training. Retrieved from https://dstt.stephenturner.us/.

@book{
  title = {Data {{Science Team Training}}},
  author = {Turner, Stephen D.},
  date = {2026},
  url = {https://dstt.stephenturner.us/}
}

This work is licensed under CC BY-NC-SA 4.0.