Chevron Left
Back to Introduction to Data Science in Python

Learner Reviews & Feedback for Introduction to Data Science in Python by University of Michigan

4.5
stars
27,081 ratings

About the Course

This course will introduce the learner to the basics of the python programming environment, including fundamental python programming techniques such as lambdas, reading and manipulating csv files, and the numpy library. The course will introduce data manipulation and cleaning techniques using the popular python pandas data science library and introduce the abstraction of the Series and DataFrame as the central data structures for data analysis, along with tutorials on how to use functions such as groupby, merge, and pivot tables effectively. By the end of this course, students will be able to take tabular data, clean it, manipulate it, and run basic inferential statistical analyses. This course should be taken before any of the other Applied Data Science with Python courses: Applied Plotting, Charting & Data Representation in Python, Applied Machine Learning in Python, Applied Text Mining in Python, Applied Social Network Analysis in Python....

Top reviews

CB

Feb 6, 2023

The assessments, quizzes, and course coverage are quite good. The main points are covered, although it does not cover everything. Additionally, it provides opportunities to learn and conduct research.

PK

May 9, 2020

The course had helped in understanding the concepts of NumPy and pandas. The assignments were so helpful to apply these concepts which provide an in-depth understanding of the Numpy as well as pandans

Filter by:

3626 - 3650 of 5,951 Reviews for Introduction to Data Science in Python

By Howard C

•

Nov 6, 2017

Very good content. One downside for me was that being new to Python, Pandas, Numpy, Scipy etc, I found the amount of new information being thrown at me to be a bit overwhelming. Each of these languages/packages could be a separate course even before you start talking about Data Analysis concepts. I was able to complete all the assignments, but I feel like I know "just enough to be dangerous".

Speaking of the assignments, if you're a newbie like me, give yourself plenty of time to complete to work on them. My rule of thumb was to multiply the "estimated time" for each assignment by a factor of 4. The assignment that was supposed to take 2 hours ended up taking my whole Saturday and the 4 hour project at the end of the course pretty much consumed an entire weekend. This might not apply if you have previous experience in this development environment or are just smarter than me ;-)

Not everything that you need to know to do the homework is provided in the lecture, so expect to spend a lot of time in StackOverflow. The discussion forums are also very useful. Sometimes a teaching assistant will offer some hints that make all the difference.

One gripe I have is with the automated grader. It's a great idea, but sometimes you can submit a fairly complicated bit of code and the only feedback you get from the grader is: "Wrong!". My suggestion: have two data sets, one for testing and another for grading. Then students could openly discuss and debug their test results in the discussion forums without violating the Honor Code. They would still have to submit a valid algorithm to pass against the test data.

By Dionyssios M

•

Nov 19, 2017

I am a PhD scientist and heavy user of matlab, R, Stata, bash scripting, and some more esoteric computer languages. I took this course with the idea of covering some background in python skills in a structured manner, the goal being to move many of my data science and some of my data processing code to python.

I found the exercises useful. The lectures are not bad, I just felt they were an overview that either didn't connect much with some of the minutiae of the assignments or they were not always key to me given my background. Eg I found the week 2 videos more interesting; week 4 videos far less so especially the video about running a t-test in python (my statistical skillset is far more advanced).

The real point of frustration is the grader which is extremely sensitive to slight variations. I feel there should be a feedback system where users/students document such cases that could then become a FAQ. Examples:

Grader chokes on type but won't tell me: Submitting string 'True' instead of Boolean True.

Grader chokes on useless (non)significant digits: using round(*,2) at one point crashes the submitted work.

These "errors" are so slight that are almost beyond the human ability to catch them. The result is that, in part, the course turns from 'learning python skills' to 'getting to understand minutiae of what the grader does' which can be really frustrating.

In sum, I believe there is value in this course but the grader is fairly broken and needs a FAQ or similar to warn re choke points generated from trivial differences. I am subtracting stars in the review for that particular reason.

By Declan C

•

Sep 18, 2017

Overall I would certainly recommend this course, I've found it immediately relevant in my field of science/engineering. It is fast-paced and difficult yes, especially for those of us with limited python experience, but that kick it gives leaves you with some solid, immediately-applicable skills.

Where it goes well: Strong content and excellent delivery.

Fosters independence. The course starts from the basics but accelerates at a fast pace, introducing you to roots of concepts but expecting you to expand on them yourself with outside resources rather than rote-learning. In this manner it leaves you VERY prepared to tackle unscripted challenges.

Concise content. The lectures videos themselves contain almost zero fluff. The lecturer conveys relevant information in a very smooth and efficient manner. Replaying specific parts to revise or solidify understanding becomes a pleasure due to this.

Where it could be improved: Could be a more polished.

Time estimates for the assignments were WAY off. I do not mind a challenging assignment, however if it advertises that it will take 3 hours, I would hope not to expect to spend closer to 20 hours, however this certainly was the case. Multiply the estimated time by 4-5 to get a more realistic time.

The assignment wording can sometimes be a little ambiguous. It's almost mandatory to go through the forum posts for clarification. I realise some things were noticed after the publication of the course, and contained by pinned posts in the forum, but perhaps if the next installment of the course could be updated, ironing out some of these wrinkles.

By Nguyen P A

•

Feb 8, 2020

Teaching an online course is difficult since audience comes from various backgrounds. The lecturer in this course is very careful in choosing the materials to include so that the course covers the needs of a large base of audience. The title is quite misleading though, because it is neither an introduction course nor data science course, the true title should be: "Getting pro at data handling in Python"

Although the course has been pretty well design and truly at a level much better than many online courses. I think there are several improvements should be made to avoid learners' frustration:

1. Lecturer speaks too fast. I feel like he was busy reading from his script without truly care whether people are really following or not.

2. Assignments are poorly written: although the concept introduces in assignments are relevant to real Data Science problems, the instructions are unclear and getting worse from week 1 to week 4. The most unorganized instruction is week 4 assignment, which seems to be prepared by several different TAs and then hurriedly combined together.

3. The pre-requisite should clearly state that learners should have some real programming experience to be able to debug his own code. Completing one or two courses in Python previously will not help performing well in this course.

4. The instructor should also state who should not take this course, it should be a waste of time for someone with many years of programming experience.

By Mark P

•

Jul 1, 2017

On balance, I'd recommend the course, but use it as only one of many sources you'll use to learn data science. My main complaints: 1. the course concentrates on mechanics of Python rather than on understanding of Data Science as a discipline, 2. It's rushed.

Pros

Good information on how to use the pandas library to manipulate data.

Information on "advanced" Python functionality.

Challenging assignments. Give yourself time (a weekend or several evenings for each assignment) and you'll learn a lot.

Cons

Very little information is given in the way of conceptual understanding of data science as a discipline. Focuses on the "how" but not the "why." Not much context given for the examples, so they seem a bit contrived. (For the opposite extreme, see the John Hopkins Data Science specialization.)

Lectures are rushed. Looks like the prof is reading cue cars straight down without pausing for the students to digest anything. (The one or two lectures given by the TA just fly by without nary a pause for breath.) If you're trying to follow along, keep your finger on the pause button.

The Jupyter notebooks are useful, but contain minimal commenting (barely more than a few headers), so if you plan to refer to the notebooks later (for reference), you'll need to put comments in yourself.

Given code is sometimes sloppy (e.g., standard Python naming conventions are not always followed.)

The automated grader is very picky. Read through the discussion board!

By Tony K

•

May 14, 2020

I really did like the course overall. The content and its concept are excellent. Make sure you already have Python fundamentals or you will work a little harder (I already had Python experience under my belt). The final project was a good first hands-on approach to doing something most employers would consider to be a real project. Since an autograder is used, the data sets were already provided, but one could imagine where one would have to acquire the data themselves and then follow similar procedures to accomplish a statistical task. My only complaints are: 1 (Minor): the attitude of some of the instructors in the Discussion Forums is off-putting. I don't know if this is a language barrier problem or if they truly just lack "soft skills" in working with people, but you have to take them with a grain of salt. Not all of them, but some in particular. 2 (Major): the autograder is years behind. The APIs (mostly Pandas) that you will use in real life have features that will work for your assignments, but then fail in the Autograder. I spent 1.5 to 2x as long on this month of the specialization due to how archaic the autograder is. I will definitely be warning my employer about this if other people in my organization want to take the course. Figuring out these issues exist requires going to the Discussion Forums (the warnings are in the assignments), and there you will encounter problem #1 I pointed out.

By Eric N

•

Jan 8, 2020

Most of the course is great. The lecture content is interesting, well explained and backed up by examples. The notebooks to for following along the videos are very good too. The assignment problems are also interesting, applicable and to a reasonable level of difficulty. However there is one significant problem which, while not big enough to stop me recommending the course, is very frustrating. The assignment auto-grader has so many issues and peculiarities that on the third and fourth assignments I have spent more time getting the autograder to accept my answers which are already correct, than I spent actually obtaining the answers in the first place. Just one example of such a problem is that the autograder works with an older version of pandas so there is some syntax which will work perfectly when ran in the jupyter notebook but which will fail when the autograder tries to run it. I know it is not practical to have all assignments reviewed by a real person but something needs to be done about this, when it can take more time to get the assignment submitted than to actually complete it. It would also be good to have access to optimised assignment solutions after the assignment has been passed because it is certainly possible to pass the assignments without having a very efficient or 'pandorable' solution.

By Robert C

•

Feb 20, 2021

I felt the course material was helpful to improve my skills in data preparation within Pandas. I believe the course is a bit advanced to be considered an "Introduction" but perhaps it only gets much harder from here :). The programming assignments are estimated at 3hr each week, but I found myself putting in much more time than this - at least 10 hrs per assignment. I do not consider this a bad thing - the assignments are where you roll up your sleeves and actually become proficient at this stuff. The biggest negative to the course (cost 1 star on my rating) are in the the layout and autograder for the assignments. Most of the assignments require significant levels of work that have nothing to do with data science or the course material -e.g., doing web searches on which city a major league team is based out of, interpreting census data acronyms, etc. This should be provided in some form of a summary or cheat sheet so the student can focus their time on learning the course material. On some assignments, the autograder has known issues and will mark a correct answer as wrong. This can be frustrating - and if known by the staff-should be corrected. In summary-good course, learned alot, could be great with better assignment planning and more robust autograder.

By Jim

•

Jul 2, 2017

It was a good course, and I found I learned a good amount from it. I feel the following changes would give future participants even more value:

First, assignment grading is too rigid. There are questions I failed that looked perfectly OK to me. If a question fails, hints on how to get it to pass need to be provided. One great example had something like "incorrect output. Be sure that DataFrame.loc['Texas']['2012q3'] returns a float64" or something like that. I tried that, found out I didn't strip spaces (I had 'Texas '), sorted that out, and got the right answer. I feel spending too much time on that is "hacking the grader" vs. learning the material.

Second, the course emphasized following along in the notebook, playing with it, and doing the assignments via checking Google / Stack Overflow. That works, but it doesn't lead to as "Pandorable" code as it could otherwise. Some of my solutions were slow, and I started to get into "one-trick pony" mode using .apply() everywhere. That oftentimes isn't fast enough. Pandas, by nature of optimizations under the hood, is a very "meta" language. Teaching the primitives and building to those pieces could use more emphasis.

By Sashi B

•

Jun 10, 2017

Overall a good learning experience. The assignments were challenging and time consuming at times which is primarily what I am basing my experience on.

The lectures on the other hand fell short of substance and not so helpful understanding philosophy behind using prescribed python tools. Most of my learning was self taught trying to solve the assignments which I really enjoyed! The lectures felt rushed and crammed up. Instructor focused more on python tools for Data Science rather than why use Python for data science in first place and pros and/or cons it(tools) brings to Data Science. I felt I learnt more on why use python for Data Science from the course specialization "Python for Everybody" taught by Prof. Charles Severance from same University.

The Jupyter notebook interface was great! I really like the fact that you could play with the code shown on lecture slides/videos.

This course dwells deep into Python tools such as pandas. For python newbies (like myself) I recommend taking some introductory python course(s) which would greatly help with solving assignments on this course.

By Nicholas B

•

Oct 22, 2017

the course material is relevant to real-world data science and, while not every topic needed for the assignments is covered in the class, this is also realistic as you commonly have to search documentation to discover the "right" object or method needed for your particular task/dataset.

The only reason I hold back from a five-star rating is that the assignments are often stated in vague terms and it's not always clear as to exactly what measure your code should return. For example, one assignment asks students to measure a decline in some value across a date period. this could be a difference, or a ratio, but it wasn't stated in the assignment. only in the forums was it clarified that we should calculate a ratio.

finally, the expected time requirements are very underestimated. There's no way that these assignments can be completed within only 2-4 hours unless you already know pandas. Having plenty of python experience and being new to pandas, i still had to search documentation extensively for the right methods to transform data appropriately.

By Osama L

•

Jul 17, 2017

First of all i would say that anyone with not a strong background or not with a strong functional programming background will be in for a lot of hardwork during this course. Secondly i would say that this course helps you search documentation yourself instead of being spoon feed. This specially where i come from a teacher is only as good as much as he spoon feeds you. So the the people of the above mentioned category should definitively take this course if not to become a data scientist then to learn how does one learn at this level and that definitely not being spoon feed. I think this is a good intro to data science in general because it is such a vast field that one always has that difficulty deciding where to start.

In the course i can say it was mostly basic descriptive statistics and just one thing from inferential statistics which came in the last week's project that is the two independent samples means t testing.

So all in all if you are not strong in your statistics you can still take this course easily

By Richard D

•

Mar 7, 2017

Felt like this course was trying to pack a lot of information into only four weeks. I find Python to be a very frustrating language - at least that's how Pandas feels. Compared to R the syntax is dreadful. And I found myself digging through help files and FAQs to find ways to do what should be the simplest things - like accessing values in a data frame. The Pandas help documents are not written for beginners, but written in a mathematically precise manner that makes them inaccessible. (Their preference for 1-letter variable names is unhelpful.) Stack Overflow is much better but it's a crap shoot as to whether somebody has addressed the exact issue that is bothering me at any given point.

One final issue: it would be nice to see model solutions. I understand the difficulty in having online solutions available for an online course, but I found myself writing what I considered to be inefficient code fairly often. Would love to see slicker ways to do these tasks.

By Kostadin A

•

Jul 8, 2017

Excellent material - covers basics, shows most used approaches, laid out logically and sequentially. The language was clear. Very important - presenters did not improvise, but read from a script, which ensured that the materials was delivered in a succinct and precise way, respecting the time of the student. The teaching team is solid in their grasp of the theory. Good support on the forums by the teaching staff with smart tips and tricks for more efficient "pythonic" way of coding the algorithms. Very relevant and thought-provoking side reading material.

Downside - submissions could be graded wrong for small technicalities (despite the numbers being correct). In many cases I needed to resubmit just to satisfy some insignificant technical requirement. This passes the message that what is important are the formalities and not getting the answer right - there should have been more flexible/smart auto-grading.

Overall I recommend the course.

By 周玮晨

•

Jun 13, 2018

I NEARLY QUIT,but finally i made it.This is my first time to take a "learn by doing" course, i'm not feel comfortable and confident. When i'm ocuppied in the assignment, i always feel frustrated. However, when i become adapting the course ,i feel great and i love the course.After finishing the course, i learned a lot,not only skill, but how to self-learning as well.And i love the reading work,although my english is broken,i still enjoy the topic argument.I like the article 'end of theory' best, which gives me a lot of insight. And i have to mention,the autograder is lacking feedback, when i got wrong anwser, i didn't know where i make mistakes. So it spent me plenty of time to find what mistake i made. Thanks to enthusiastic participator in the forum,especially 'Sophie Greene','Barthold Albrecht' and 'Yusuf Ertas',without you i can't finish the course.But i hope teachers can make the autograder give us more feedback.

By Maximilian W

•

Jun 29, 2019

Really good introduction to Data Science. Great lectures and really good exercises to enforce what you have learnt.

If you are wondering to do this course, and have reservations given the many reviews about the lack of spoon feeding, I would advise to still consider it. For some other courses, which have those kind of reviews, have often been spot on.

However, personally, I found the balance to be good in this course. Learning to solve the problems yourself, with the safety net of a great discussion board and grader to tell you if you are right or wrong, you will get there. The approach really allows personal development for problems without either of those two aids. Some have been disparaging about the over dependence on Stackover flow to help finish the exercises. I can see the frustration, but the Fundamentals were taught well, and learning to bridge the gap efficiently using Stack overflow is a large learning.

By David C

•

Jun 30, 2017

This was a good course. The professor (Chris Brooks) was EXCELLENT, although much of the material was presented very quickly and was difficult to follow at times. The real learning came during the exercises, which I generally found to be very difficult but also very good at ensuring the material presented in lecture was actually reinforced through hands-on programming. I would recommend making more exercises, perhaps two per week, rather than one very detailed exercise at the end. The last week's exercise, in particular, was very difficult for me and I would recommend breaking part 5 (of 6) of that exercise into several smaller chunks that could be completed and graded individually. Rather than 10-10-10-10-10-50 as the points awarded in that exercise, I would also recommend changing the scoring to 10-10-10-10-35-25 as the next-to-last part of the exercise was by far the most difficult and time consuming.

By Robert O

•

Apr 26, 2022

First off - Kudos to Yusuf Ertas for all his help and patience getting me through the programming assignments. I really appreciate his willingness to help!

I believe a different text book would be helpful. I understand it is optional, however Wes McKinney has created a great reference manual, however he uses too many theoretical examples. I really need real-world, applied examples. He seems to get to into corner cases and shows you 10 different ways to do something, however no clear direction for a new students.

I would also like to see the lectures topics more closely reflected in the programming assignments. Maybe additional assignements after each lecture. The student should then be able to pull from the lecture assignements to complete the final programming assignment.

Need more direction and guidance on the programming assignments for an Intro course.

By Zhou S

•

Jul 12, 2020

This course is kind of difficult for those who have zero or little computer programming background. But if you have completed the Python for Everybody in Coursera, you can quickly get on it. To take this course, make sure you have basic knowledge in Python, or you may get frustrated quickly. Also, you can refer to Stack Overflow, Github for solutions if you have struggled for a long time on assignments. But, don't just copy the answers, go search the Python/Pandas/Numpy Docs for further explanation, and try your own way to optimize the solutions.

All in all, this is a very good course to deeply explore data science by Python. One problem for me, an international student, is what the instructor says sometimes confused me because I can't understand the abstract concept just by listening. I hope more graphic explanation could be added in lectures.

By Udit C

•

Jan 24, 2020

'Introduction' is a bit of a misnomer for the course. If you already have even a fairly elementary understanding of analytics/data science, this course adds nothing of value; if you don't have that understanding, you don't really a great intro into what it is. If you already have at least some programming and/or python experience, this course does a good job of showing you the kind of tools Python has specifically for dealing with data; if you don't have any programming experience (especially in python), you may be way out of your depth. The exercises were well designed and were great learning opportunities, but were far more difficult than course materials implied, especially for python beginners. In spite of limited lectured information, the activities helped me get a great appreciation of python, which is the course's saving grace.

By Andrew A

•

Aug 12, 2017

In conjugation with learning from the Python for Data Science book this course is a nice introduction to the topic. I would say there is too much scope to pass the course with bad code and not learn much. It will take self discipline to not accept whatever works and learn the general aspects taught. There is a large degree of independent learning required which means that to get the best out of the course requires dedicated time exploring. If I didn't have the book I may have felt lost. The discussion forum contained key information a rushed student may not have picked up, however it is a testament to the mentors that this became readily available along with their support. This course could have done with some peer review to allow code comparisons to check quality, methodology and readability without breaking the honor code.

By Christopher F

•

Oct 22, 2018

Generally very interesting and helpful course. Lectures could have been a bit meatier (I spent a lot more reading through the docs than most other Coursera programming classes to complete my homework). It's one of the few Data Science sequences that seem to be offered in Python, rather than R, which was a big motivator for taking this particular class (the transition to working with big data with for example pyspark should be significantly easier).

As an side: one thing that would be helpful is if learners could see the solutions after submitting/getting a grade. Learners would stand to learn a lot by seeing the 'better'/'accepted' answers -- even though I aced the assignments, I _know_ my code isn't as "pandorable" as it could be. That remains one of (a few) bigs differences between a MOOC and in-person teaching...

By Shah M

•

Jan 30, 2022

This course is fairly difficult in regards to the assignments and the lectures cover a lot of material!

But trust me when I say the that the lectures are sufficient to point you in the right direction for the assignments. For the most part I ended up using stack overflow when I was working on the assignments, this is nothing new since most of my past programming assignments consisted of me scrounging through stack overflow posts. This course did in fact teach me alot concerning regex and how to apply it on a pandas dataframe. I learned a lot when it comes to data cleaning and for that I think this course is well worth it! The material and how it's presented does add to the difficulty, but honestly it's a fun course if you sufficiently go over the lectures and use stack overflow for general aid.

By Shantanu A

•

Mar 29, 2020

The course is excellent. I really enjoyed this course. But I think that the assignments are a bit too tough because sometimes the concepts required in assignments are not covered in the lectures and help is needed from 'Discussion Forums'. Since I approached discussion forums only when I could not think of anything else, sometimes it was very frustrating for me because I got stuck on a couple of problems for even some days. Thus, assignments should be only on the concepts covered or it would be better if lectures could be made more exhaustive so that we can learn more in the lectures itself. Otherwise, the course is great and the person who helps us on the Discussion Forums is super. His efforts are commendable and he is very knowledgable, cooperative and active.

By Cole H

•

Apr 13, 2020

The course material was good, but don't expect to have your hand held during this experience. While the lectures are informative, the assignments often leap ahead in complexity. Each assignment does say that you will have to do some research on your own, to be fair, but the amount of self teaching required is, in my opinion, too much. That said, the problems posed by the assignments do build on each other (even if they don't line up well with the lecture material) and if you honestly take the time to learn the things you have to understand in order to complete each assignment, you will walk away with some new skills and a sense of accomplishment. 4 out of 5 for teaching me some new things, minus the one star for being more of a challenge than I was expecting.