Chevron Left
Back to Getting and Cleaning Data

Learner Reviews & Feedback for Getting and Cleaning Data by Johns Hopkins University

4.5
stars
8,065 ratings

About the Course

Before you can work with data you have to get some. This course will cover the basic ways that data can be obtained. The course will cover obtaining data from the web, from APIs, from databases and from colleagues in various formats. It will also cover the basics of data cleaning and how to make data “tidy”. Tidy data dramatically speed downstream data analysis tasks. The course will also cover the components of a complete data set including raw data, processing instructions, codebooks, and processed data. The course will cover the basics needed for collecting, cleaning, and sharing data....

Top reviews

HS

May 2, 2020

This course provides an introduction of some important concepts and tools on a very important aspect of data science: cleaning and organizing data before any analysis. A must for any data scientist.

DH

Feb 1, 2016

Easy, mostly instructive Course. The Assignments and quizzes are quite good, and illustrates the lessons very well.

See the videos for general presentation, but use the energy on the excersizes.

Filter by:

1101 - 1125 of 1,311 Reviews for Getting and Cleaning Data

By Constantin S

Feb 20, 2016

In some weeks only about an hour of input where several topics have already been covered in R Programming. That's very little value for money.

The final course project again feels like it's done in a rush and without another review: The submitted dataset should be automatically checked. It's simply impossible to derive from it whether the student did everything right, but it could be easily done programmatically. Some of the questions have wording and grammar issues that make it hard to understand. Also there is slightly contradicting instructions between the task and review description.

By Angela W

Jul 14, 2017

I did learn a lot, but I thought the first half of the course (getting the data) was very challenging.

What does annoy me though is that links aren't clickable, sometimes they're wrong, there are typos on the slides etc. The response to these complaints in the forums is that these lectures were recorded a while ago and it takes time to change things and so on - but for $50 a month, I don't think it's too much to ask that the course materials be kept up to date!

So honestly, I feel like I'm being ripped off a bit here.

I did really enjoy the course project though.

By Carla P

Oct 25, 2021

In my opinion, the lessons are just a basic overview of some concept and do not gives you the competences you need to pass the Quiz and the Peer Graded Assignment. Therefore, for most of the questions of the assignments, you need to look for the tools you need somewhere else in the web! On one side, without the lessons, yhou would probably not know what to look for on google, however the lessons are not enough to achieve a good grade in the assignments! Also the peer graded assignment takes to long to receive the evaluation!

By Luis P

Jan 25, 2018

The most challenging so far of the 9 courses on the Data Scientist track. Would like to see some errors removed from slides. Some parts of the lectures seemed rushed. Would like to see some of the non-self-evident usage of some functions to be described a little better in more detail. I found myself having to look at multiple online areas to really understand some of the functions that were glossed over. Otherwise, this was a very helpful course that should be taught to all disciplines involving any amount or type of data.

By Raymond B

Sep 27, 2020

The "Reading from..." lessons from week 1 and week2 were extremely frustrating, since we did not get much info on where we would see them most often or the benefit of using one over the others. Instead, we simply sat for hours listening to lectures moving from one type of document to the next before being handed the quiz. The dplyr and data manipulation lectures were great and I really anticipate using them frequently in the future. I think regular expressions deserved more lecture time/ practice.

By Chanchal D

Jul 8, 2020

The Course Design is good however what i give three stars is for the following reason

The Sound Quality is straight up very poor . i have to put my speakers to full volume to atleast make it clear and audible , which leads to other pc programs to cause loud noise with the same sound volume

Many Topics in the course like Factors etc were not clear in the tutorial videos and i had to most probably go out of my way to find the meaning and uses

Rest The Course Is Top Quality . Thank You For the course

By Paul R

Mar 11, 2019

This is really R part 2, getting into file/API handling, data frames, regular expressions etc. The specialization focuses on data frames though little coverage of data tables needed for the capstone. Some of the ordering of the materials was confusing e.g. this course revisits date/time handling which was started in the previous course. Assignments are interesting and Swirl exercises are useful. All in all, the combination of these R courses gets you up to speed.

By Lawrence G D

Nov 29, 2020

Very challenging but rewarding. The first two weeks of material were a bit condensed I think, hard to follow how to import some obscure data types into R and too complex to be covered in a 5 minute video. Could have been spread out more or omit some that are not probably practically useful. The quizzes and the final project were difficult to navigate using only the material provided in the lectures, and had to rely a lot on Googling stuff.

By Kai P

Aug 8, 2018

The quality of this course is much better than the earlier two. Although this course still has the problem of feeling like a disjointed series of topics on singular functions, there is much more of a cohesive overall theme and structure so it feels a bit more like you're building towards an overarching goal. The final project directly relates to the lectures and felt like a solid way to connect most of the ideas to a project on real data.

By Daniel H

May 18, 2020

I suggest changing the quizzes and assignment questions more often because they're all over the internet for this course and rest of the courses in the specialization. I understand that students who are cheating are mostly hurting themselves, but it also affects the value and credibility of the certificates you're giving out.

In terms of course modeling and content, it's very nice. I really enjoyed. The swirl package is genius. Thanks.

By Mary S

Apr 20, 2016

There were a lot of good nuggets in here, but overall this course felt somewhat disjointed compared to the others. It would be nice to have more practice with some of the different formats (e.g., JSON) and for exercises to loop back to some of the early content. I did like that the final exercise required a fair amount of investigation into understanding the documentation and relationship between the files before undertaking to code.

By Mark B

Apr 7, 2020

The data downloads for two quizes appear to have been updated, meaning that there is no way to come up with the right answer. The course project could use some minor clarifications. I was very difficult to determine what was wanted, and this lead to my having to re-submit twice. The deliverables seemed to be confusing to both me and the graders. Course was difficult because of this kind of confusion, not because of the material.

By Jake T T

May 30, 2017

Difficult course, I had to complete it over two sessions. I came into the Data Science track with no knowledge of computer language, which has made learning R particularly difficult; however, after the previous classes I am finally able to search for the information I need to complete the assignment. The other reviewers are correct that the final assignment is a doozy - it took me several hours to complete.

By Christoph J

Aug 9, 2017

I would have given the course 4 stars if it wasn't for the last assignment which relies on other students to review your coursework. I understand that it is difficult to find another way of grading the assignments but the results of the process here are just too subjective and people influence your grade based on their subjective view on things, which I think is just wrong. Otherwise the course was good

By Jo S

Jan 27, 2016

The content in this course is essential, but the delivery is patchy and the course project is hard to complete with just the learning materials provided. Read around the course and visit the data science specialisation wiki for extra information, and work through it at your own pace, rather than that suggested by the course. It's much easier to do this now it's on the new Coursera platform :o)

By Tim j

Dec 31, 2016

decent enough but this is a heavy subject and really it is not that interesting although clearly necessary. I feel maybe it could have been organized better to make it more interesting Also reading some of Haldey wickhams book he deliberately keeps this part of Data Science away from new learners as it can be a bit dreary, so my recommendation would be to do some of the other courses first.

By L M

Dec 8, 2020

Slides are images and cannot copy text or code, same with some of the quiz Qs - cannot copy the code.

Many issues with people not getting expected results with some quiz questions, different systems give different results.

Should be teaching tibble library, not data.table (tibble data frames can be used to pass/receive via pipes)

Audio quality is terrible - needs better recording equipment.

By Bill J

Jan 7, 2020

In weeks two and three, the course presents a list of data format and how to read them into R. I would have preferred a better description on why tidy data sets are considered tidy that included some side-by-side comparisons and downstream effects of untidy data. This would help me evaluate the effort and risk of introducing errors from tidying the data against the benefit of tidying it.

By Daniel P

Oct 24, 2019

T

h

e

c

o

u

r

s

e

i

s

good. I like the videos and the assignment. There is cerain redundancy of information. Much of the "new" information was already elaborated in the previous courses of the same specialization. Additionally, the grading system is based on other students whose knolledge may be not beyond the course scope and submitting an inovative solution can mean not passing the course.

By Edward C

Feb 15, 2017

Lectures add very little to what you get simply by looking at the slides on your own. Facilitators are expert biostatisticians, not R programmers, and sometimes their explanations of R functionality is superficial and imprecise. The assignments are rigorous and challenging, however, and if you take the time to go through all of the exercises you will gain valuable knowledge.

By Youssuf A

Apr 22, 2020

The theory is explained well and there is not much of a problem to follow the content. But there is a huge gap between understanding the theory and applying it practically. After one finished all lessons one is just not well enough prepared to solve the assignments. The problems, which one faces, are far too difficult to address without previous knowledge / experience.

By Alexis C

Oct 12, 2018

first two week need an update, because many thing on the videos dint work easy on the computer, is not bad to look for more information about the subject on the web, but at least made that the examples on the videos work fine went anybody run the scripts on theirs computers, last two week are good a brief summary of R, and how to work with data, love those 2 weeks

By Andrew M T

Oct 25, 2017

The course fits nicely in the specialisation, and I enjoyed the Swirl exercises, which are massively useful. The structure, though, is a bit chaotic, with loads of topics touched only briefly. Perhaps less is good here. Also, I found that the Swirl exercises were repeated across Weeks, and sometimes they didn't have codes to earn extra credits.

By chris

May 31, 2016

Peer reviewed assessment with students who are unsure of the correct answers = unsure if solution is correct. Perhaps a formal process (same as previous course where a SHA commit is submitted and source is automatically downloaded (and plagiarism detected) & run to verify the output that columns / data meet an acceptable criteria

By Tareq R

Oct 22, 2018

I think some concepts could have been taught better with simple examples first, and then gradually move to more complex ones, but using noisy data blur the learning objective , and again... the instructors are just showing up a slide.. I think the power of video and illustrations could have been better utilized