Chevron Left
Back to Getting and Cleaning Data

Learner Reviews & Feedback for Getting and Cleaning Data by Johns Hopkins University

4.5
stars
8,062 ratings

About the Course

Before you can work with data you have to get some. This course will cover the basic ways that data can be obtained. The course will cover obtaining data from the web, from APIs, from databases and from colleagues in various formats. It will also cover the basics of data cleaning and how to make data “tidy”. Tidy data dramatically speed downstream data analysis tasks. The course will also cover the components of a complete data set including raw data, processing instructions, codebooks, and processed data. The course will cover the basics needed for collecting, cleaning, and sharing data....

Top reviews

HS

May 2, 2020

This course provides an introduction of some important concepts and tools on a very important aspect of data science: cleaning and organizing data before any analysis. A must for any data scientist.

DH

Feb 1, 2016

Easy, mostly instructive Course. The Assignments and quizzes are quite good, and illustrates the lessons very well.

See the videos for general presentation, but use the energy on the excersizes.

Filter by:

826 - 850 of 1,311 Reviews for Getting and Cleaning Data

By Edgardo G

Sep 2, 2023

El contenido del curso es muy bueno, las explicaciones son claras y bien organizadas. Sin embargo, tiene el mismo problema que otros cursos de Johns Hopkins: en algunos casos es imposible dar con los archivos de datos que se emplean en el curso, porque las web desde donde se descargaban ya no están disponibles o cambiaron la estructura de los datos. Sería una buena práctica que en las lesiones pusieran a disposición los archivos de datos.

The content of the course is very good, the explanations are clear and well organized. However, it has the same problem as other Johns Hopkins courses: in some cases it is impossible to find the data files used in the course, because the websites from which they were downloaded are no longer available or the data structure has changed. . It would be good practice for injuries to make the data files available.

By Miguell M

Jun 30, 2018

This course was pretty useful for learning the various ways to acquire, clean, and manipulate data, which I think is an awesome real-world skill. The course project at week 4 was a good way to exercise some of these skills, but I have some qualms about the delivery of the course project - primarily the instructions. The course project involves getting and cleaning a dataset, but the instructions are rather vague in some key areas that I believe could lead to a great variety in submissions. I'm not sure if the vagueness in instructions was intentional (perhaps to mimic real-world scenarios?), but it certainly lead to a lot of confusion in the interpretation of the instructions, a sentiment reflected in the discussion forums. That being said, the course was useful!

By Haonan J

Feb 4, 2018

the content about getting data is too difficult for me, as I'm a student who just completed the R Programming course. It's hard for me to learn data mining from API, website and excel in only one week. So I don't reommend this courses for some starter like me.

However, the content on Cleaning Data is great. the dplyr package is more convenient than what I've learned in the last course, And the mentor is still great.

In all, This is a nice course and help me a lot. Thanks a lot to the mentor. Maybe somedays later when I have a better foundation on programming, I will review the knowledge and skills in this course again.

By Christian B

Nov 4, 2016

The course content is important. I felt the final assignment quite hard. I struggled a lot with R on it. Interesting enough, when looking at the solutions during the peer reviews, they seem to have found way easier solutions than I had. I am not sure why. I got the same result but my code looks way more complicated. Also the description of the final assignment was a bit unclear. For example, where we supposed to rename the features or not? Where we supposed to calculate the mean per activity , subject or activity subject combination? Where we supposed to select the mean() only or also the FreqMean(). etc.

By Cristobal M

Jul 5, 2022

Buen curso, el ingles es muy necesario pues las clases se quedan cortas en cuanto a contenidos y es necesario revisar foros y sitios externos para solucionar los quizes y proyectos.

Falta actualizar el curso y algunas fuentes de datos. Además, creo que algunos temas no estan bien explicados, pienso en web scraping que finalmente tuve que buscar informacion externa para entenderlo al nivel que requerían los test.

Independiente de todo buen curso, creo que aprendí a cargar la mayoría de tipos de datos más comunes , junto con manejar herramientas muy utiles para la manipulacion de datos.

By Rouholamin R

Jan 14, 2019

I've passed two courses of this specialization before this. first of all I think it was a little bit harder and filled with more content. for me it's like anything professors say, I'll start to R&D about them and learn. but for this course there were lots of stuff I learnt and unfortunately I already started to forget regex patterns and so on.

I liked the project in so many levels except the main dataset wasn't well documented. after finding out what's the data set is about I did the project and I think it helped me take back my confidence .

By Amr E (

Jul 15, 2020

I very much liked this course. It is challenging and take you once step deeper into data science. The final project is a real-world project that you may face in your professional career. It is well organized in many aspects. However, what I didn't like in this course is the following:

1- Many of the used functions are deprecated as of 2020 and haven't been updated

2- Some lectures took longer than they should (Reading HD5 lecture for example)

3- Unlike the first two courses, the discussion forms are not as rich (especially in week 3 & 4)

By Karin R

May 24, 2020

This course is wonderful for those who are already equipped with coding experience. For the rest, it's extremely difficult, and I found myself wishing that there were better resources available for those who aren't already there. I have taken each course in succession and have purchased books to help guide me—and also have a very patient brother with advanced computer engineering expertise who had to answer all of my questions. I absolutely love the package in R that allows you to do tutorials. All of my stars are for that.

By Jason J

Jan 29, 2016

Thank you to the professors who made this course possible and especially to Dr. Peng who was willing to spend some time with us face to face via video. I found the course very challenging but at the same time I did learn quite a bit about R. Working through the course assignments and the final project was the best part. My rating is 4 stars because the course lectures are not engaging. The lecture style is basically just reading the slides to us and they don't take much time to explain what is going on.

By Rick H

Mar 8, 2016

It is a great course so far, with a lot of applicable topics covered. However, I feel that some of the questions are structured poorly to achieve their goals.

It is fine to have difficult questions where students are expected to do a lot of extra research, but it should done in a way that they students know what they are getting into ahead of time, and not include questions or code that does not run without explaining what is supposed to be done.

It's just not a good teaching technique for learning.

By Jason B

Jun 4, 2016

Good course, though I have to say that the final project was a bit confusing, and I am not sure that the people who did the final project really understood the course and how to create a tidy dataset, as the ones that I looked at did not meet all the principles of tidy data that were outlined. What concerns me is that they all had similar issues, and are all doing peer review of each other - this means that there is no one that can make sure that their answers are really tidy...

By Dzmitry B

Nov 7, 2020

The principles of tidy data are well delivered and, overall, the course structure is great. Many great packages were covered (maybe some are a little outdated). Personally I felt that the data.table and dplyr/plyr shouldn't be covered to such depth, just mentioned. The main reason being is that they are constantly updated and often enough deprecate functions/parameters. I believe learning R dialects should be individual's choice and is not required for the data processing in R.

By Alex B

Nov 3, 2016

This is an interesting, helpful class. It was challenging, and exposed me to a very wide variety of topics outside R for data analysis, including databases, XML, APIs for getting data. I would have found found swirl type exercises for those topics helpful, as the additional practice really reinforced the lecture material and homework/quiz problems. I also would have found some worked examples or discussion of the homework problems after they were submitted helpful.

By Matthew D

Jun 1, 2020

Pretty good, I liked the instructor. He explained things better than other instructors and didn't just read off the slides. Some of the quizzes were a bit off in my opinion. As far as I could tell with some of my code and the consistent answers I got, I think the quiz is not up to date with the data. The data is outside the course and is updated fairly frequently, one as early as December 2019 and I think the quiz answers were not updated accordingly.

By Tamir L

Jul 25, 2016

This course could be a little difficult for people with no programming experience what so ever, even if they took the previous R programming course in this series. Examples are often a little too laconic and not all of the material is as practically useful as the best of it.

However Jeff Leak teaches some excellent data tidying, cleaning and extraction techniques with modern tools and libraries, that I find very useful in my everyday work with data.

By Lee Y L R

Jun 13, 2017

This is a tough but important course. I learnt how to get the data from the web sources other than reading files of various formats, manipulate and group the data, and how to prepare a tidy data set for future analysis. There is ample practice to do each of the above. While the discussion forum is a great platform to address our queries, it would be good if there is greater clarity on some of the tools employed especially those in Week 2.

By Beñat G

Aug 11, 2016

I liked the initial approach of this course, aswell as the resources given. However, I feel that the difficulty the exercises showed wasn't really linked to the concepts: many things were not explained in the lectures, and I had to find the complementary informations in various sites. So, rather than evaluating the understanding of the concepts given in the lectures, the skill to look for further informations was assessed.

By Varun B

Jan 2, 2018

The course is really great, it starts of a bit slow initially but then really picks up pace with new concepts and R packages as you move to week 3-4. It really helps you strengthen the basics in what exactly Tidy and Clean data is before you move on to more advanced concepts. The course material needs updating though, some links did not work, and the presentations which are downloadable are not selection friendly.

By Steven Y

Feb 5, 2021

Great course. Just one suggestion. Thousands of students take this course. They have different internet environments, and the videos were recorded several years ago. It is possible that some of them are not able to download a file from a URL. It would be better if the course could provide files directly in case that students fail to download them, they can still continue to practice other skills.

By Stefan H

Apr 9, 2019

pretty good examples, good guidance. However again it would be more helpful to start learning from a PROBLEM statement first, moving to an EXAMPLE on how to solve it and then explain how the new information helps you with this in THEORY. it makes learning so much easier and i don't understand teachers that don't follow this human problem solving approach for better understanding and learning.

By Rok B

May 15, 2019

The course has valuable content, but there is not enough emphasis on how to create a tidy data set. You kind of learn what a tidy data set is (although the definition is vauge), but you would need to see examples of messy data sets and how to convert it to tidy data set. There is one exercise in swirl called tidyr that addresses that, but it would be nice to have also videos on this topic.

By Ingrid M V

Dec 23, 2020

Compared with other courses in the same series I observed several problems:

1. The explanation was not good, I had so many doubts that I clarified in other forums. The APIs lecture was too easy compared to the required to solve the quiz. The Dplyr section taught by Professor Roger Peng was the best explained.

2. Links s don't work.

3. The questions are not answered by the teachers.

By Juha R

May 18, 2018

I like the specialization quite a bit as it contains real world data and difficult enough exercises. This particular course is maybe not as good as the other courses I have taken (1,2,5) as the instructions lack a bit of clarity sometimes. However, the peer reviewed assignments are quite tricky and an excellent opportunity for learning. Took my some serious work to get this course done.

By Abdul S

Apr 2, 2020

The first thing about the course is that the learning objective was clearer. And the content tied back to it, while also leaving room for self research and study. The project instructions could be a bit clearer, but perhaps the availability of the discussion forum allows this to foster curiosity and community interaction. Overall, it was a worthwhile course.

By Lalit O

Jan 17, 2018

All Coursera data science courses have been designed very carefully. I found this course very beneficial as it explains the concepts and also tests the knowledge of the learner through tests.

In this course I learnt basics of fetching data from different sources like, API, Text-file, web-page e.t.c. Also I learnt cleaning data using various techniques.