9 Python Libraries for Data Science

Written by Coursera Staff • Updated on

Learn about nine Python libraries for data science and how to install them.

[Featured Image] A developer works from home on a laptop using Python libraries to perform a task.

Python is an object-oriented programming language with easy syntax and powerful tools for application development, machine learning, and data science. One of the reasons Python is so useful for data science is its open-source nature, which makes the development of Python libraries for data science easy to access, download, and develop. Python has an extensive list of free, open-source libraries and an active community willing to help troubleshoot and provide guidance to users at any level.

According to the TIOBE Index, which uses search engine results to rank programming languages, Python currently ranks at the top of its list—and it is one of the most popular programming languages in the world [1]. Explore nine Python libraries you can use for data science, how to download them, and how to start using them.

What are Python libraries for data science?

Python libraries give data scientists access to a range of tools to help them manipulate, analyse, mine, and visualise data in a simple, straightforward manner. Each Python library contains sets of code, classes, values, and templates  you can download to add functionality to Python to make analysing data more efficient. Nine Python libraries you can use for data science include:

  • NumPy

  • pandas

  • SciPy

  • Matplotlib

  • Seaborn

  • Pillow

  • Plotly

  • ScraPy

  • Autoviz

9 Python libraries for data science

Whilst hundreds of thousands of Python libraries are available, each library in this list has a unique set of tools that often work together to perform high-level data science computations. Read on to explore how data scientists use each library and the commands that install them. 

1. NumPy

Install commands: conda install numpy or pip install numpy

NumPy is a scientific computing package for producing and computing multidimensional arrays, matrices, Fourier transformations, statistics, linear algebra, and more. NumPy’s tools allow you to manipulate and compute large data sets efficiently and at a high level.

2. pandas

Install commands: ​​conda install -c conda-forge pandas or pip install pandas

pandas uses expressive data structures to make working with labelled data more efficient. It simplifies data analysis in Python by representing missing data, allowing insertion or deletion of data, and converting data into data frames, which include merging, joining, and concatenate options. It also has useful In/Out (IO) functionality which you can use to import data directly from CSV, Excel, and databases. 

3. SciPy

Install commands: conda install scipy or python -m pip install scipy

SciPy is a scientific computing package with high-level algorithms for optimisation, integration, differential equations, eigenvectors, algebra, and statistics. It enhances the usage of NumPy-like arrays by using other matrix data structures as its main objects for data. This gives you an even wider range of ways to analyse and compute data. 

4. Matplotlib

Install commands: conda install matplotlib or python -m pip install -U matplotlib 

Matplotlib is an essential tool for data science visualisations because it creates various data plots and graphs in print-ready formats. It creates plots like pairwise data, statistical graphs, gridded data, irregular data, and 3D volumes. Matplotlib works with Python scripts, Jupyter Notebook, web applications, and other graphic user interfaces (GUI) to generate plots, which makes it a versatile visualisation tool for data scientists.

5. Seaborn

Install commands: conda install seaborn or pip install seaborn

Seaborn is a library built on top of the Matplotlib library and helps make statistical graphics more straightforward. It works with pandas data structures and automatically plots data with characteristics, creates a legend, and performs statistical analysis on the data. This makes it an important tool if you are looking to create high-quality plots and statistical computations at the same time.

6. Pillow

Install commands: conda install anaconda::pillow or python3 -m pip install --upgrade Pillow

Pillow is the newer fork of the old Python library PIL that allows you to manipulate image pixels directly whilst combining NumPy and SciPy for computations.  Pillow is a useful image-processing tool used directly within a Python interpreter.  It has features similar to other image-processing applications to convert files, resize images, create thumbnails, perform colour space conversions, and perform statistical analysis on images. 

7. Plotly

Install commands: conda install -c plotly plotly=5.20.0 or pip install plotly==5.20.0

Similar to Matplotlib, Plotly produces high-quality graphs, charts, plots, polar graphs, and more. This library also helps you create interactive and print-ready plots. Plotly is a useful program for data visualisations displayed directly in Jupyter Notebook or Dash, downloadable as HTML files. 

8. ScraPy

Install commands: conda install -c conda-forge scrapy or pip install Scrapy

Scrapy is a web scraping and extraction tool for data mining. Its use extends beyond just scraping websites; you can also use it as a web crawler and to extract data from APIs, HTML, and XML sources. Scraped data turns into JSON, CSV, or XML files to store on a local disk or through file transfer protocol (FTP). 

9. AutoViz

Install commands: conda install conda-forge::autoviz or pip install autoviz

AutoViz helps data scientists find patterns in their data through automated exploratory data analysis. It can be used to train beginner data scientists to see important patterns, or if you are more advanced, it can help  ensure that you don’t miss anything crucial. It makes plotting easy, speeds up plot generation with less code, works with any size data set, and even gives a quality assessment of the data. It analyses any CSV or JSON files, and it can work with a pandas data frame. 

How to get started in Python libraries for data science

To start using Python libraries for data science, install some or all of the Python libraries above and use them on your own data. The following steps give you an overview of how to install Python libraries.

1. Download Python or Anaconda.

If Python is not already on your computer, one of the simplest methods to install Python and its various libraries is using the open-source Anaconda software, which is an environment and package distribution system that uses conda as its command in the environment. It makes installing packages simpler by allowing you to use a command line prompt or a graphical user interface to launch, create, and install Python libraries. 

Alternatively, if you already have Python installed or want to install packages as you need them, you can just use Python and the Python Package Index (PyPi).

2. Create a virtual environment.

To install packages into a particular virtual environment, create a new conda environment or use an existing conda environment for Anaconda. Or, if you are using regular Python, you can use venv.

Using Anaconda

To create a new conda virtual environment in the terminal:

  • conda create --name conda-env python

To activate the conda environment:

  • conda activate conda-env

Using regular Python

To create a Python venv in the terminal:

  • Mac/Linux: python -m venv /path/to/new/virtual/environment

  • Windows: python -m venv c:\path\to\myenv

To activate that venv in the terminal:

  • Mac/Linux: source name-env/bin/activate

  • Windows: name-env\Scripts\activate

3. Install Python libraries.

Once your virtual environment is active, find and install the Python libraries you want to use within that environment. Every library type has its own unique installation commands. A generic process for installing libraries uses conda commands for Anaconda environments or pip commands for venv environments. 

Using Anaconda

Ensure your conda virtual environment is active using the steps above. Using NumPy as an example, you can install a library using its specific commands:

  • conda install numpy

Using regular Python environments

Ensure your venv virtual environment is active using the steps above. Using NumPy as an example, you can install a library using its commands:

  • pip install numpy

4. Deactivate the virtual environment.

After you install your packages, deactivate the virtual environment using either:

  • conda deactivate

Or, if you’re using regular Python:

  • Deactivate

Now, you can start using Python libraries for data science.

Learn more with Coursera. 

Python libraries are powerful tools for data science users to mine, analyse, and visualise data to find patterns. To begin developing in-demand skills using Python, try the Python for Everybody Specialisation from the University of Michigan. You also can expand your data science skills through a course like the IBM Data Science Professional Certificate, both found on Coursera.

Article sources

  1. TIOBE. “TIOBE Index for April 2024, https://www.tiobe.com/tiobe-index/.” Accessed 5 May 2024.

Keep reading

Updated on
Written by:
Coursera Staff

Editorial Team

Coursera’s editorial team is comprised of highly experienced professional editors, writers, and fact...

This content has been made available for informational purposes only. Learners are advised to conduct additional research to ensure that courses and other credentials pursued meet their personal, professional, and financial goals.