Search

A Hands-On Introduction to Data Science with Python

2nd edition
Chirag Shah
Coming soon
Expected online publication date:

January 2026

Print publication:

22 January 2026
- Textbook
- Export citation
Students will develop a practical understanding of data science with this hands-on textbook for introductory courses. This new edition is fully revised and updated, with numerous exercises and examples in the popular data science tool Python, a new chapter on using Python for statistical analysis, and a new chapter that demonstrates how to use Python within a range of cloud platforms. The many practice examples, drawn from real-life applications, range from small to big data and come to life in a new end-to-end project in Chapter 11. New 'Data Science in Practice' boxes highlight how concepts introduced work within an industry context and many chapters include new sections on AI and Generative AI. A suite of online material for instructors provides a strong supplement to the book, including lecture slides, solutions, additional assessment material and curriculum suggestions. Datasets and code are available for students online. This entry-level textbook is ideal for readers from a range of disciplines wishing to build a practical, working knowledge of data science.

A Hands-On Introduction to Data Science with R

2nd edition
Chirag Shah
Coming soon
Expected online publication date:

January 2026

Print publication:

22 January 2026
- Textbook
- Export citation
Students will develop a practical understanding of data science with this hands-on textbook for introductory courses. This new edition is fully revised and updated, with numerous exercises and examples in the popular data science tool R, a new chapter on using R for statistical analysis, and a new chapter that demonstrates how to use R within a range of cloud platforms. The many practice examples, drawn from real-life applications, range from small to big data and come to life in a new end-to-end project in Chapter 11. New 'Data Science in Practice' boxes highlight how concepts introduced work within an industry context and many chapters include new sections on AI and Generative AI. A suite of online material for instructors provides a strong supplement to the book, including lecture slides, solutions, additional assessment material and curriculum suggestions. Datasets and code are available for students online. This entry-level textbook is ideal for readers from a range of disciplines wishing to build a practical, working knowledge of data science.

Chapter 6 - Using Clinical Data
from Section 2 - Tools and Methodologies
- By Robert Stewart, Marcos Del Pozo Banos
Edited by Dawn N. Albertson, University of New Hampshire, Derek K. Tracy, South London and Maudsley NHS Foundation Trust, Dan W. Joyce, University of Liverpool, Sukhwinder S. Shergill, Kent and Medway Medical School
Book:

Research Methods in Mental Health

Published online:

31 October 2025

Print publication:

20 November 2025, pp 81-94
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

The traditional case register involved assembling records of people with a given condition in order to support cohort studies to describe and investigate the course of their condition and other outcomes. This old design has been resurrected and revolutionised following the widespread implementation of fully electronic healthcare records over the past few decades, providing ‘big data’ resources that are both large and very detailed. These, in turn, are being further enhanced through linkages with complementary administrative data (both health and non-health) and through natural language processing generating structured meta-data from source text fields. This chapter provides an overview of this rapidly developing research infrastructure, considering and advising on some of the challenges faced by researchers planning studies using clinical data and by those considering future resource development.

Mathematical Methods in Data Science

Bridging Theory and Applications with Python
Sébastien Roch
Published online:

04 November 2025

Print publication:

30 October 2025
- Textbook
- - Get access
    
    Buy a print copy
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Bridge the gap between theoretical concepts and their practical applications with this rigorous introduction to the mathematics underpinning data science. It covers essential topics in linear algebra, calculus and optimization, and probability and statistics, demonstrating their relevance in the context of data analysis. Key application topics include clustering, regression, classification, dimensionality reduction, network analysis, and neural networks. What sets this text apart is its focus on hands-on learning. Each chapter combines mathematical insights with practical examples, using Python to implement algorithms and solve problems. Self-assessment quizzes, warm-up exercises and theoretical problems foster both mathematical understanding and computational skills. Designed for advanced undergraduate students and beginning graduate students, this textbook serves as both an invitation to data science for mathematics majors and as a deeper excursion into mathematics for data science students.

Introduction
Soroush Saghafian, Harvard University, Massachusetts
Book:

Insight-Driven Problem Solving

Published online:

21 October 2025

Print publication:

30 October 2025, pp 1-11
- Chapter
- - You have access
- PDF
- HTML
- Export citation
Summary

This chapter discusses the broader role and impact of analytics science in improving various aspects of society. It introduces what the book is about, and what the reader should expect to learn from reading this book. It also discusses the analytics revolution in the private and public sector, and introduces a key element of the book — insight-driven problem solving — by highlighting its vital role in addressing various societal problems.

7 - Data Analysis
Soroush Saghafian, Harvard University, Massachusetts
Book:

Insight-Driven Problem Solving

Published online:

21 October 2025

Print publication:

30 October 2025, pp 207-249
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter is devoted to data analysis and its critical role in analytics science. The reader is introduced to the science of inference from observations and experiments and learns about the main ideas in data analysis that have been influential in addressing societal problems. Real-world examples are used throughout to convey the main ideas and illustrate why data analyses performed without sufficient care can yield wrong insights. Successful examples of insight-driven problem solving approaches in data analysis are contrasted with those that can yield wrong insights, and the reader is taken on an engaging yet educational journey that depicts how and why successful insight-driven problem solving approaches using data can have significant public impact.

1 - The Big Picture of Analytics Science
Soroush Saghafian, Harvard University, Massachusetts
Book:

Insight-Driven Problem Solving

Published online:

21 October 2025

Print publication:

30 October 2025, pp 12-52
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter introduces the reader to the big picture of what analytics science is. What is analytics science? What types does it have, and what is its scope? How can analytics science be used to improve various tasks that society needs to carry out? Is analytics science all about using data? Or can it work without data? What is the role of data versus models? How can one develop and rely on a model to answer essential questions when the model can be wrong due to its assumptions? What is ambiguity in analytics science? Is that different from risk? And how do analytics scientists address ambiguity? What is the role of simulation in analytics science? These are some of the questions that the chapter addresses. Finally, the chapter discusses the notion of "centaurs" and how a successful use of analytics science often requires combining human intuition with the power of strong analytical models.

Foundations of MATLAB Programming for Behavioral Sciences

With Applications
Maxwell Mansolf
Published online:

28 October 2025

Print publication:

07 August 2025
- Textbook
- - Get access
    
    Buy a print copy
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
This textbook introduces the fundamentals of MATLAB for behavioral sciences in a concise and accessible way. Written for those with or without computer programming experience, it works progressively from fundamentals to applied topics, culminating in in-depth projects. Part I covers programming basics, ensuring a firm foundation of knowledge moving forward. Difficult topics, such as data structures and program flow, are then explained with examples from the behavioral sciences. Part II introduces projects for students to apply their learning directly to real-world problems in computational modelling, data analysis, and experiment design, with an exploration of Psychtoolbox. Accompanied by online code and datasets, extension materials, and additional projects, with test banks, lecture slides, and a manual for instructors, this textbook represents a complete toolbox for both students and instructors.

Empowering professionals: An intensive short course on fundamentals of clinical data science
Richard F. Ittenbach, Brian McCourt, Maurizio Macaluso
Journal:

Journal of Clinical and Translational Science / Volume 9 / Issue 1 / 2025

Published online by Cambridge University Press:

10 October 2025, e244
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Clinical data science, like the broader discipline of all data science, has quickly grown from obscurity only a few decades ago to one of the fastest growing specialties in biomedical research today. Yet, the education and training of the workforce has not kept pace with the growth of the field, the complexity of science, or the needs of the profession. The purpose of this paper is to provide a template for an intensive short course on fundamentals of clinical data science that meets the needs of working professionals in academic, industry, and government research settings. Care will be taken to introduce students to essential roles, responsibilities, and practice patterns within the field, the foundational components from which they come, and many of the soft skills needed for professional practice and advancement in the field today. The course is designed as an evidence-based, immersive learning experience taught over a 5-day period on a university campus, taught using principles of best educational practice and multiple modalities, to assure optimal interaction and engagement throughout the week. This template may be reproduced by any institution interested in and capable of offering such a program.

Key takeaways from Stanford’s symposium on AI for Data Science
Manisha Desai, John Auerbach, Laurence Baker, Jade Benjamin-Chung, Melissa Bondy, Mary Boulos, Bryan J. Bunning, Ni Deng, Steven N. Goodman, Ivor Horn, Eleni Linos, Mark A. Musen, Lee Sanders, Nigam Shah, Sara Singer, Michelle Williams, James Zou, Michael Pencina
Journal:

Journal of Clinical and Translational Science / Volume 9 / Issue 1 / 2025

Published online by Cambridge University Press:

25 September 2025, e237
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Numerous symposia and conferences have been held to discuss the promise of Artificial Intelligence (AI). Many center on its potential to transform fields like health and medicine, law, education, business, and more. Further, while many AI-focused events include those data scientists involved in developing foundational models, to our knowledge, there has been little attention on AI’s role for data science and the data scientist. In a new symposium series with its inaugural debut in December 2024 titled AI for Data Science, thought leaders convened to discuss both the promises and challenges of integrating AI into the workflows of data scientists. A keynote address by Michael Pencina from Duke University together with contributions from three panels covered a wide range of topics including rigor, reproducibility, the training of current and future data scientists, and the potential of AI’s integration in public health.

Hands-On Network Machine Learning with Python

Eric W. Bridgeford, Alexander R. Loftus, Joshua T. Vogelstein
Published online:

23 September 2025

Print publication:

18 September 2025
- Book
- - Get access
    
    Buy a print copy
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Bridging theory and practice in network data analysis, this guide offers an intuitive approach to understanding and analyzing complex networks. It covers foundational concepts, practical tools, and real-world applications using Python frameworks including NumPy, SciPy, scikit-learn, graspologic, and NetworkX. Readers will learn to apply network machine learning techniques to real-world problems, transform complex network structures into meaningful representations, leverage Python libraries for efficient network analysis, and interpret network data and results. The book explores methods for extracting valuable insights across various domains such as social networks, ecological systems, and brain connectivity. Hands-on tutorials and concrete examples develop intuition through visualization and mathematical reasoning. The book will equip data scientists, students, and researchers in applications using network data with the skills to confidently tackle network machine learning projects, providing a robust toolkit for data science applications involving network-structured data.

13 - Data Analysis
from Part II - Applications of MATLAB Programming in Behavioral Sciences
Maxwell Mansolf, Northwestern University, Illinois
Book:

Foundations of MATLAB Programming for Behavioral Sciences

Published online:

28 October 2025

Print publication:

07 August 2025, pp 163-178
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Chapter 13 presents the second application of MATLAB to behavioral sciences: data analysis. Students review previously-learned data structures often encountered in practice before applying their programming knowledge from Chapters 1 to 11 to manage each. Starting with tabular data, tables from Chapter 8 are reviewed, with students learning common data science tasks for managing one or more tabular data sets, before applying their knowledge to real experimental data. Next, hierarchical data are reviewed, connecting students’ knowledge of structure arrays from Chapter 8 to a popular internet-based data format (JSON), with students applying their newfound knowledge to analyze data on the behavior of European monarchs.

The pipelines of deep learning-based plant image processing
Kaiyue Hong, Yun Zhou, Han Han
Journal:

Quantitative Plant Biology / Volume 6 / 2025

Published online by Cambridge University Press:

25 July 2025, e23
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Recent advancements in data science and artificial intelligence have significantly transformed plant sciences, particularly through the integration of image recognition and deep learning technologies. These innovations have profoundly impacted various aspects of plant research, including species identification, disease detection, cellular signaling analysis, and growth monitoring. This review summarizes the latest computational tools and methodologies used in these areas. We emphasize the importance of data acquisition and preprocessing, discussing techniques such as high-resolution imaging and unmanned aerial vehicle (UAV) photography, along with image enhancement methods like cropping and scaling. Additionally, we review feature extraction techniques like colour histograms and texture analysis, which are essential for plant identification and health assessment. Finally, we discuss emerging trends, challenges, and future directions, offering insights into the applications of these technologies in advancing plant science research and practical implementations.

A framework for scalable ambient air pollution concentration estimation
Liam J. Berrisford, Lucy S. Neal, Helen J. Buttery, Benjamin R. Evans, Ronaldo Menezes
Journal:

Environmental Data Science / Volume 4 / 2025

Published online by Cambridge University Press:

03 March 2025, e17
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Ambient air pollution remains a global challenge, with adverse impacts on health and the environment. Addressing air pollution requires reliable data on pollutant concentrations, which form the foundation for interventions aimed at improving air quality. However, in many regions, including the United Kingdom, air pollution monitoring networks are characterized by spatial sparsity, heterogeneous placement, and frequent temporal data gaps, often due to issues such as power outages. We introduce a scalable data-driven supervised machine learning model framework designed to address temporal and spatial data gaps by filling missing measurements within the United Kingdom. The machine learning framework used is LightGBM, a gradient boosting algorithm based on decision trees, for efficient and scalable modeling. This approach provides a comprehensive dataset for England throughout 2018 at a 1 km2 hourly resolution. Leveraging machine learning techniques and real-world data from the sparsely distributed monitoring stations, we generate 355,827 synthetic monitoring stations across the study area. Validation was conducted to assess the model’s performance in forecasting, estimating missing locations, and capturing peak concentrations. The resulting dataset is of particular interest to a diverse range of stakeholders engaged in downstream assessments supported by outdoor air pollution concentration data for nitrogen dioxide (NO2), Ozone (O3), particulate matter with a diameter of 10 μm or less (PM10), particulate matter with a diameter of 2.5 μm or less PM2.5, and sulphur dioxide (SO2), at a higher resolution than was previously possible.

Data science and artificial intelligence in biology, health, and healthcare
Peter L. Elkin, Christopher Lindsell, Julio Facelli, Manisha Desai, Chunhua Weng, Heidi Spratt, Shari Messinger, Lemuel Russell Waitman, JaMor Hairston, Ruth O’Hara, Jareen Meinzen-Derr
Journal:

Journal of Clinical and Translational Science / Volume 9 / Issue 1 / 2025

Published online by Cambridge University Press:

14 February 2025, e56
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation

Chapter 16 - Cultures of Data
- By Edward Whitley
Edited by Russ Castronovo, University of Wisconsin, Madison, Robert S. Levine, University of Maryland, Baltimore
Book:

The New Nineteenth-Century American Literary Studies

Published online:

02 January 2025

Print publication:

23 January 2025, pp 233-248
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

The nineteenth century was the first era of “big data” in the modern world, and American literary texts published during this time, such as Herman Melville’s Moby-Dick (1851), offer an aesthetic reframing of how individuals and institutions within a culture of data use information at scale to claim authority over knowledge and, by extension, power over people. Moby-Dick also gestures toward the ways that African and African American bodies were subjected to the most brutal regimes of quantification that the nineteenth century had to offer in the form of the transatlantic and intra-American slave trade. One of the major problems facing American literary studies and digital humanities today is the question of how to excavate and explicate the quantitative turn of earlier centuries as we seek to better understand the cultures of data we live in today. The best initial response to this problem is not to begin with a specific digital tool per se, but to build a set of guiding principles for how to critically approach data, media, and power from within a context that recognizes the distinctive contributions of literary texts as aesthetic objects. This essay models one such approach to do so.

The challenge of land in a neural network ocean model
Rachel Furner, Peter Haynes, Dani C. Jones, Dave Munday, Brooks Paige, Emily Shuckburgh
Journal:

Environmental Data Science / Volume 3 / 2024

Published online by Cambridge University Press:

02 January 2025, e40
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Machine learning (ML) techniques have emerged as a powerful tool for predicting weather and climate systems. However, much of the progress to date focuses on predicting the short-term evolution of the atmosphere. Here, we look at the potential for ML methodology to predict the evolution of the ocean. The presence of land in the domain is a key difference between ocean modeling and previous work looking at atmospheric modeling. Here, we look to train a convolutional neural network (CNN) to emulate a process-based General Circulation Model (GCM) of the ocean, in a configuration which contains land. We assess performance on predictions over the entire domain and near to the land (coastal points). Our results show that the CNN replicates the underlying GCM well when assessed over the entire domain. RMS errors over the test dataset are low in comparison to the signal being predicted, and the CNN model gives an order of magnitude improvement over a persistence forecast. When we partition the domain into near land and the ocean interior and assess performance over these two regions, we see that the model performs notably worse over the near land region. Near land, RMS scores are comparable to those from a simple persistence forecast. Our results indicate that ocean interaction with land is something the network struggles with and highlight that this is may be an area where advanced ML techniques specifically designed for, or adapted for, the geosciences could bring further benefits.

Understanding to intervene: The codesign of text classifiers with peace practitioners
Part of
- Data for Peace Technology
Julie Hawke, Helena Puig Larrauri, Andrew Sutjahjo, Benjamin Cerigo
Journal:

Data & Policy / Volume 6 / 2024

Published online by Cambridge University Press:

27 November 2024, e54
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Originating from a unique partnership between data scientists (datavaluepeople) and peacebuilders (Build Up), this commentary explores an innovative methodology to overcome key challenges in social media analysis by developing customized text classifiers through a participatory design approach, engaging both peace practitioners and data scientists. It advocates for researchers to focus on developing frameworks that prioritize being usable and participatory in field settings, rather than perfect in simulation. Focusing on a case study investigating the polarization within online Christian communities in the United States, we outline a testing process with a dataset consisting of 8954 tweets and 10,034 Facebook posts to experiment with active learning methodologies aimed at enhancing the efficiency and accuracy of text classification. This commentary demonstrates that the inclusion of domain expertise from peace practitioners significantly refines the design and performance of text classifiers, enabling a deeper comprehension of digital conflicts. This collaborative framework seeks to transition from a data-rich, analysis-poor scenario to one where data-driven insights robustly inform peacebuilding interventions.

Developing a data analytics toolbox for data-driven product planning: a review and survey methodology
Melina Panzner, Sebastian von Enzberg, Roman Dumitrescu
Journal:

AI EDAM / Volume 38 / 2024

Published online by Cambridge University Press:

18 November 2024, e18
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
The application of data analytics to product usage data has the potential to enhance engineering and decision-making in product planning. To achieve this effectively for cyber-physical systems (CPS), it is necessary to possess specialized expertise in technical products, innovation processes, and data analytics. An understanding of the process from domain knowledge to data analysis is of critical importance for the successful completion of projects, even for those without expertise in these areas. In this paper, we set out the foundation for a toolbox for data analytics, which will enable the creation of domain-specific pipelines for product planning. The toolbox includes a morphological box that covers the necessary pipeline components, based on a thorough analysis of literature and practitioner surveys. This comprehensive overview is unique. The toolbox based on it promises to support and enable domain experts and citizen data scientists, enhancing efficiency in product design, speeding up time to market, and shortening innovation cycles.

Behavioral Network Science

Language, Mind, and Society
Thomas T. Hills
Published online:

08 November 2024

Print publication:

19 December 2024
- Book
- - Get access
    
    Buy a print copy
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Behavioral Network Science explains how and why structure matters in the behavioral sciences. Exploring open questions in language evolution, child language learning, memory search, age-related cognitive decline, creativity, group problem solving, opinion dynamics, conspiracies, and conflict, readers will learn essential behavioral science theory alongside novel network science applications. This book also contains an introductory guide to network science, demonstrating how to turn data into networks, quantify network structure across scales, and hone one's intuition for how structure arises and evolves. Online R code allows readers to explore the data and reproduce all the visualizations and simulations for themselves, empowering them to make contributions of their own. For data scientists interested in gaining a professional understanding of how the behavioral sciences inform network science, or behavioral scientists interested in learning how to apply network science from the ground up, this book is an essential guide.

Search Results

Refine search

Refine search

Actions for selected content:

64 results

A Hands-On Introduction to Data Science with Python

A Hands-On Introduction to Data Science with R

Chapter 6 - Using Clinical Data

Summary

Mathematical Methods in Data Science

Introduction

Summary

7 - Data Analysis

Summary

1 - The Big Picture of Analytics Science

Summary

Foundations of MATLAB Programming for Behavioral Sciences

Empowering professionals: An intensive short course on fundamentals of clinical data science

Key takeaways from Stanford’s symposium on AI for Data Science

Hands-On Network Machine Learning with Python

13 - Data Analysis

Summary

The pipelines of deep learning-based plant image processing

A framework for scalable ambient air pollution concentration estimation

Data science and artificial intelligence in biology, health, and healthcare

Chapter 16 - Cultures of Data

Summary

The challenge of land in a neural network ocean model

Understanding to intervene: The codesign of text classifiers with peace practitioners

Developing a data analytics toolbox for data-driven product planning: a review and survey methodology

Behavioral Network Science

Search Results

Refine search

Refine search

Actions for selected content:

Save Search

64 results

A Hands-On Introduction to Data Science with Python

A Hands-On Introduction to Data Science with R

Summary

Mathematical Methods in Data Science

Summary

Summary

Summary

Foundations of MATLAB Programming for Behavioral Sciences

Hands-On Network Machine Learning with Python

Summary

Summary

Behavioral Network Science