Common data analysis terms to know for certification prep, interviewing, and resume writing.
Data analysis is the process of working with data to derive useful information, which can then be used to make data-informed decisions. Data analysis is generally a six step process: ask a question, prepare your raw data sets, process your data for analysis, analyze your data, share your results, and act in accordance with your data.
Data analysts are data professionals who gather, clean, study, or interpret data in order to solve business problems. They tend to work alongside other data analytics professionals, such as data scientists and data engineers.
This beginner-friendly data analysis glossary can be a useful reference if you are launching a new career in data or looking to enhance your data skills.
You’ll find common data analysis terms in the glossary below.
When working in a spreadsheet or database, an attribute is a common descriptor used to label a column. Labeling columns clearly and precisely can enable you to keep your data organized and ready for analysis.
A changelog is a list documenting all of the steps you took when working with your data. This can be helpful in the event that you need to return to your original data or recall how you prepared your data for analysis.
Clean data is data that is accurate, complete, and ready for analysis. Data cleaning, an important step in the data analysis process, involves checking your data for inaccuracies, inconsistencies, irregularities, and biases.
A CSV file is a text file that separates pieces of data with commas. This is a common file type when downloading data files for analysis, as it tends to be compatible with common spreadsheet and database software.
A dashboard is a tool used to monitor and display live data. Dashboards are typically connected to databases and feature visualizations that automatically update to reflect the most current data in the database.
Data analytics is the collection, transformation, and organization of data in order to draw conclusions, make predictions, and drive informed decision making. Data analytics encompasses data analysis (the process of deriving information from data), data science (using data to theorize and forecast) and data engineering (building data systems). Data analysts, data scientists, and data engineers are all data analytics professionals.
There are four key types of data analytics:
Descriptive analytics tell us what happened
Diagnostic analytics tell us why something happened
Predictive analytics tell us what will likely happen in the future
Prescriptive analytics tell us how to act
Learn more: Deepen your analytical skills and drive impactful decisions with Data Analytics Courses.
Data architecture, also called data design, is the plan for an organization’s data management system. This can include all touchpoints in the data lifecycle, including how the data is gathered, organized, utilized, and discarded. Data architects design the blueprints that organizations use for their data management systems.
Learn more: Learn to design effective data systems and optimize information flow with Data Architecture Courses.
Data cleaning, cleansing, or scrubbing is the process of preparing raw data for analysis. When cleaning your data, you verify that your data is accurate, complete, consistent, and unbiased. It’s important to make sure you have clean data prior to analysis because unclean or dirty data can lead to inaccurate conclusions and misguided business decisions.
Learn more: Dive deeper into data preparation with Data Cleaning Courses.
Data engineering is the process of making data accessible for analysis. Data engineers build systems that collect, manage, and convert raw data into usable information. Some common tasks include developing algorithms to transform data into a more useful form, building database pipeline architectures, and creating new data analysis tools.
Learn more: Develop essential skills in building data pipelines and architectures with Data Engineering Courses.
Data enrichment the process of is adding data to your existing dataset. You’d typically enrich your data during the data transformation process as you are getting ready to begin your analysis if you realize you need additional data in order to answer your business question.
Data governance is the formal plan for the way an organization manages company data. Data governance encompasses rules for the way data is accessed and used, and can include accountability and compliance rules.
Learn more: Master the strategies for managing data integrity, security, and compliance with Data Governance Courses.
Data integrity encompasses the accuracy, reliability, and consistency of data over time. It involves maintaining the quality and reliability of data by implementing safeguards against unauthorized modifications, errors, or data loss.
Data mining is closely examining data to identify patterns and glean insights. Data mining is a central aspect of data analytics; the insights you find during the mining process will inform your business recommendations.
Learn more: Unlock the potential of big data by mastering pattern detection and insight generation with Data Mining Courses.
Data science is the scientific study of data. Data scientists ask questions and find ways to answer those questions with data. They may work on capturing data, transforming raw data into a usable form, analyzing data, and creating predictive models.
Learn more: Uncover data-driven insights and innovate with Data Science Courses.
Read more: Data Science Terms: A to Z Glossary
A data source refers to the origin of a specific set of information. As businesses increasingly generate data year over year, data analysts rely on different data sources to measure business success and offer strategic recommendations.
Data visualization is the representation of information and data using charts, graphs, maps, and other visual tools. With strong data visualizations, you can foster storytelling, make your data accessible to a wider audience, identify patterns and relationships, and explore your data further.
Learn more: Transform data into compelling visual stories with Data Visualization Courses.
Data wrangling, also called data munging or data remediation, is the process of converting raw data into a usable form. There are four stages of the munging process: discovery, data transformation, data validation, and publishing. The data transformation stage can be broken down further into tasks like data structuring, data normalization or denormalization, data cleaning, and data enrichment.
Learn more: Enhance your data manipulation skills and streamline your analysis process with Data Wrangling Courses.
A database is an organized collection of information that can be searched, sorted, and updated. This data is often stored electronically in a computer system called a database management system (DBMS). Oftentimes, you’ll need to use a programming language, such as structured query language (SQL), to interact with your database.
Learn more: Master the art of managing and optimizing databases with Database Management Courses.
Metadata is data about data. It describes various characteristics of your data, such as how it was collected, where it’s stored, its file type, or creation date. Metadata can be particularly useful for verification and tracking purposes.
Open data, also called public data, is data that is available for anyone to use. Exploring and analyzing open datasets is one way to practice data analysis skills.
Qualitative data is data that describes qualities or characteristics. It’s generally non-numeric data and can be subjective, for example eye color or emotions.
Quantitative data is objective data with a specific numeric value. It’s generally something you can count or measure, such as height or speed.
A query is a request for information. It’s essentially the question you ask a database in order to return the data you want to retrieve. In data analytics, you’ll formulate your database queries using a query language, such as Structured Query Language (SQL).
A relational database is a database that contains several tables with related information. Even though data is stored in separate tables, you can access related data across several tables with a single query. For example, a relational database may have one table for inventory and another table for customer orders. When you look up a specific product in your relational database, you can retrieve both inventory and customer order information at the same time.
Structured data is formatted data, for example data that is organized into rows and columns. Structured data is more readily analyzed than unstructured data because of its tidy formatting.
Structured Query Language, or SQL (pronounced “sequel”), is a computer programming language used to manage relational databases. It’s among the most common languages for database management.
Learn more: Unlock the power of database management and query optimization with SQL Courses.
Unstructured data is data that is not organized in any apparent way. In order to analyze unstructured data, you’ll typically need to implement some type of organization.
Elevate your analytical capabilities by exploring the available data analysis courses. Whether you're just beginning your journey or seeking to refine your skills, these courses are structured to equip you with the tools and insights needed for data-driven decision-making. Take advantage of the opportunity to enhance your proficiency in data analysis. Discover the courses that can help you significantly impact your career and beyond. Begin your learning adventure today and turn your data insights into actionable results.
Writer
Coursera is the global online learning platform that offers anyone, anywhere access to online course...
This content has been made available for informational purposes only. Learners are advised to conduct additional research to ensure that courses and other credentials pursued meet their personal, professional, and financial goals.