Data Science Team Training
Preface
Public health has always been a data-driven enterprise, but the tools available to practitioners have changed dramatically. Spreadsheets that once required weeks of manual work can now be analyzed in seconds. Reports that used to mean emailing static PDFs can be replaced by live, interactive dashboards. And workflows that depended on one person’s institutional knowledge can be documented, versioned, and shared across teams. The goal of this book is to help public health professionals take advantage of these tools.
This book grew out of the CSTE Data Science Team Training (DSTT) program, where project coach Dr. Stephen D. Turner works with public health agencies across the country to build data science capacity in the public health workforce. The material here includes code, resources, workshop notes, and practical guidance accumulated and refined over several years of coaching teams at local and state health departments.
The chapters cover the foundational practices that make data science work sustainable and collaborative in a public health setting: organizing your data well, managing projects and workflows, using version control to track your work, building dashboards to communicate findings, and using AI tools to write and debug code more efficiently. These are not cutting-edge research topics. They are the practical skills that separate ad hoc analyses from reproducible, maintainable work.
While this material was created with DSTT participants in mind, it is intended to be broadly useful to anyone working at the intersection of data science and public health, whether you are a current or former DSTT participant, a public health practitioner looking to strengthen your analytical skills, or someone new to data science in a public health context.
No single book can cover everything, and this one does not try to. Appendix A — Other resources points to additional reading and training for topics that go deeper than what is covered here.