MSc Data Science (Statistics)

University of Leeds logo

MSc Data Science (Statistics)

University of Leeds

Get started today

Requesting more information to find out more about this degree programme.

Master of Science

Offered by the University of Leeds

Taught in English

Build in-demand statistical skills and expertise applicable to a range of areas of practice such as health, business, retail and government.

24 months

180 credits in total and 15 credits per course

100% online

Earn a prestigious degree from a world-renowned institution, without the impact of travel or having to be there in person

Master in-demand data science and statistical skills to advance your new career

As a MSc Data Science (Statistics) student, you’ll put the advanced theories you’re learning into practice by solving real-world problems with support from the School of Mathematics and the Leeds Institute for Data Analytics. 

Your coursework and project work will include examples of data analysis that can be presented to potential employers, demonstrating your skills and suitability for data-driven senior roles in business, government and non-profit sectors. 

Along the way, you’ll learn to apply your data science and statistical knowledge to current and future statistical challenges at local, national and international levels.

Study at your own pace with a flexible online schedule
Courses are grouped into carousels, and you can choose your own courses, through one carousel at a time, completing one before moving to the next.
Prepare for advanced opportunities with senior-level skills
Develop senior-level evaluative skills for managing project work, engaging critically with sources and methods, and effective storytelling with data.
Access a wealth of mathematics expertise and research
Learn from and develop professional connections with expert researchers from the School of Mathematics, LIDA and other institutes such as the Alan Turing Institute.
Build techniques to tackle diverse tasks in data analytics
Graduate with a toolbox of techniques ready to tackle diverse tasks in data analytics, emphasising the statistical avenue within the expansive realm of data science tools.

Applications are now closed for March 2025!

Watch our latest webinar recording to find out more about the application process and meet the Programme Director, Dr Graham Murphy.

If you have a question please contact the University of Leeds via onlineadmissions@leeds.ac.uk.

Applications are now closed for March 2025!

Watch our latest webinar recording to find out more about the application process and meet the Programme Director, Dr Graham Murphy.

If you have a question please contact the University of Leeds via onlineadmissions@leeds.ac.uk.

Placeholder
[Featured image] A young Asian woman sits outside with her laptop. She gazes off in the distance.

Try a course and take your data science learning to the next level

Want to learn more about the degree before you commit? Enrol in one of the University of Leeds' open courses to get a preview of the topics, materials and instructors in the MSc Data Science (Statistics) programme.

Programming for Data Science: Explore the basics of programming and familiarise yourself with Python.

Exploratory Data Analysis: Learn how to analyse and investigate data sets and explore ways to visualise data.

Statistical Methods: Understand the role of statistics in data analysis and gain experience using RStudio for creating numerical and graphical summaries.

Placeholder
[Featured image] A young Asian woman sits outside with her laptop. She gazes off in the distance.

Curriculum

This course will introduce students to basic techniques, which can be used to perform a preliminary investigation of data sets. Exploring data involves visualising the variables and relationships to help determine outliers, identify trends, suggest suitable statistical models and inform future data gathering.

On completion of this module students should be able to:

  • Data types: Categorical, discrete, continuous. Data cleaning.
  • Graphical summary: Boxplots, Histogram, KDE.
  • Numerical summary: Location, variability, quantiles. Data manipulation.
  • Discrete distributions: Binomial, geometric, Poisson.
  • Continuous distributions: normal distribution, exponential, Uniform.
  • Bivariate data: Scatterplots, correlation. Linear regression.
  • Logistic regression and classification. PCA and dimension reduction.
  • Use a statistical software to import data and perform simple visualization, exploration and summary.

The module provides a general introduction to statistical thinking and data analysis including probability rules and distributions, methods of estimation and hypotheses testing and present the basics of Bayesian inference.

Indicative content for this module includes:

  • The role of statistical models.
  • Probability rules and distributions.
  • Statistical estimators, bias, mean squared error (MSE).
  • Standard examples of estimators (e.g. sample mean, sample variance).
  • Statistical tests, types of error and error probabilities.
  • Examples of tests (such as z-test and t-test).
  • Computing estimates and performing tests in R.

This module introduces the fundamental skills of programming in python. The aim is for students to develop the skills and experience to independently translate a broad range of data science related problems into functioning computer programs and communicate the results.

Indicative content for this module includes:

  • Computer programming in Python: control structures, data-types, data structures, functions and classes, importing and using libraries/packages, implementing simple algorithms.
  • Use of a Python development platform.
  • Use of specific libraries/APIs providing data access and analysis functionality, such as: accessing information from the web or from databases, statistical analysis, ML algorithms, graphical display of data.

Students will undertake a sequence of programming exercises starting with the fundamentals of programming and building up to a system that performs significant data analysis on real data:

  • Basic algorithms for representing and processing information.
  • Importing, manipulating and displaying data.
  • Implementation of a data analysis ‘pipeline’ in which data is extracted from some source, processed, analysed and visualised.
  • Use of example data science software tools.

The objective of this course is to equip students with the skills necessary to undertake project work as a data scientist. Project planning, reviewing existing methodologies and the presentation of outputs in different forms all form part of this. This module will also include ethical considerations of data usage.

On completion of this module students will be able to:

  • Communicate the key concepts, skills and attitudes for developing a proposal to deliver data science project work.
  • Explain how to critically appraise different methodological choices when planning project work in data science.
  • Understand how to present data science projects effectively in both oral and written formats, including communicating effectively with a non-technical audience.
  • Engage with professional ethics for working in data science.

Machine learning is a rapidly developing research area which takes an algorithmic approach to identifying patterns and statistical regularities in data without or with limited human intervention, often with the aim of supporting decision making. In this module you will learn to apply a number of machine learning techniques that are widely used in industry, government, and other large organisations. You will learn how the different approaches relate to and are motivated by statistics and will gain practical experience in the application of these techniques on real and simulated datasets.

Indicative content for this module includes:

  • Neural networks, decision trees, support vector machines, Bayesian learning, instance-based learning, linear regression, clustering, reinforcement learning, recent developments in machine learning. Examples will be drawn from simple problems that arise in data analytics and related areas.

In big data with multiple variables, it is vital to discover pattern and infer valuable information from the data. This module introduces basic techniques from multivariate statistics, with the aim to discover, describe and exploit dependencies between variables in complex datasets.

On completion of this module students should be able to:

  • Introduction to multivariate analysis
  • Statistical dependence, covariance matrix
  • High dimensional problems, the "curse of dimensionality"
  • Principal Component Analysis (PCA), dimension reduction
  • Clustering, K-means method, distances between/within clusters
  • Multidimensional Scaling (MDS)

This course will equip students with understanding of the theory of linear models and be able to fit multiple linear regression models to data and interpret the results. The content will develop an appreciation of the limitations of linear models and the use of link functions to generalise the linear regression model. In particular, the module will explore logistic regression and log linear models.

On completion of this module students should be able to:

  • fit multiple linear regression models to data, and interpret the models;
  • apply methods of robust regression;
  • carry out regression analysis with generalised linear models including the use of link functions;
  • understand and employ methods for model selection.

Indicative content for this module includes:

  • Linear regression
  • Robustness
  • Generalised Linear Models

This course introduces key concepts and techniques in statistical learning which are relevant to a number of practical applications. These techniques include statistical machine learning for classification and regression.

On completion of this module students should be able to:

  • Explain the classification and regression problem;
  • Assess the error of a fitted model and explain the fitting algorithm;
  • Understand the statistical foundations of different classification and regression methods;
  • Understand the importance of uncertainty and evaluate the uncertainty in simple model predictions; and
  • Perform classification and regression tasks using real-world data.

Data scientists work in a wide range of fields of application. This module gives an insight into some general principles of the work of a data scientist and some of the underpinnings of artificial intelligence and statistics in the practice of data science.

Indicative content for this module includes:

  • Core skills of a data scientist: problem-solving; statistics; business acumen; communication and business understanding
  • Data science scope: A day in the life of, workflows, and DS boundaries
  • Data understanding and visualisation, data acquisition, data preparation and data wrangling
  • Classification, similarity and clustering
  • Model-fitting and evaluation
  • Anomaly detection
  • Association Analysis
  • Big data consideration tools and techniques
  • Practical applications using case studies drawn from different application domains

The module aims to equip students with the ability to apply standard methods for random number generation and apply different Monte Carlo methods and develop understanding of the principles and methods of stochastic simulation. The module will also instruct students on how to implement statistical algorithms for a given problem and develop familiarity with software for advanced statistical computing.

On completion of this module students should be able to:

  • Be aware of how computers generate random numbers using different methods.
  • Understand and implement Monte-Carlo methods.
  • Understand and implement Markov Chain Monte Carlo (MCMC) methods.
  • Implement resampling methods.

Indicative content for this module includes:

  • Random number generation
  • Monte-Carlo methods
  • Markov Chain Monte Carlo (MCMC) methods
  • Resampling methods

The objective of this course is to introduce Bayesian statistical methods through the consideration of philosophical differences with traditional statistical procedures and the application of Bayesian techniques. This module also introduces the ideas of quantitative decision theory and rational decision making.

On completion of this module students should be able to:

  • Discuss the differences between Bayesian and traditional statistical methods;
  • Derive prior, posterior and predictive distributions for standard Bayesian models;
  • Employ hierarchical analyses using sampling methods;
  • Produce network representations of joint distributions and perform updates on small networks; and
  • Define utility in the context of decision making and apply decision analysis methods to simple finite dimensional problems.

The objective of this course is for students to plan, carry out and present the results of a short project in data science. The project will be presented in a professional format that could serve as an exemplar of their work for a future employer or client.

On completion of this module students will be able to:

  • Identify a specific problem in data science and formulate a relevant project proposal to address this.
  • Select, justify, and apply an appropriate method to investigate an identified problem in data science.
  • Deliver a short presentation including analysis and evaluation of results from a project in data science.
  • Conduct a project in data science in a collaborative, interdisciplinary fashion, in accordance with professional and ethical codes of practice.

Applications are now open for March 2025!

Apply now to secure your spot