Two years worth of IRC logs parsed into a single graph representing name changes and joins. Quinlan Pfiffer/Flickr
A year and a half ago, I dropped out of one of the best computer science programs in Canada. I started creating my own data science master’s program using online resources. I realized that I could learn everything I needed through edX, Coursera, and Udacity instead. And I could learn it faster, more efficiently, and for a fraction of the cost.
I’m almost finished now. I’ve taken many data science-related courses and audited portions of many more. I know the options out there, and what skills are needed for learners preparing for a data analyst or data scientist role. So I started creating a review-driven guide that recommends the best courses for each subject within data science.
Here’s a summary of all my previous guides, plus recommendations for 13 other data science topics.
For each of the five major guides in this series, I spent several hours trying to identify every online course for the subject in question, extracting key bits of information from their syllabi and reviews, and compiling their ratings. My goal was to identify the three best courses available for each subject and present them to you.
The 13 supplemental topics — like databases, big data, and general software engineering — didn’t have enough courses to justify full guides. But over the past eight months, I kept track of them as I came across them. I also scoured the internet for courses I may have missed.
For these tasks, I turned to none other than the open source Class Central community, and its database of thousands of course ratings and reviews.
Class Central’s homepage. Class Central
Since 2011, Class Central founder Dhawal Shah has kept a closer eye on online courses than arguably anyone else in the world. Dhawal personally helped me assemble this list of resources.
How we picked courses to consider
Each course within each guide must fit certain criteria. There were subject-specific criteria, then two common ones that each guide shared:
It must be on-demand or offered every few months.
It must be an interactive online course, so no books or read-only tutorials. Though these are viable ways to learn, this guide focuses on courses. Courses that are strictly videos (i.e. with no quizzes, assignments, etc.) are also excluded.
We believe we covered every notable course that fit the criteria in each guide. There is always a chance that we missed something, though. Please let us know in each guide’s comments section if we left a good course out.
How we evaluated courses
We compiled average ratings and number of reviews from Class Central and other review sites to calculate a weighted average rating for each course. We read text reviews and used this feedback to supplement the numerical ratings.
We made subjective syllabus judgment calls based on a variety of factors specific to each subject. The criteria in our intro to programming guide, for example:
Coverage of the fundamentals of programming.
Coverage of more advanced, but useful, topics in programming.
How much of the syllabus is relevant to data science?
Here are the best courses overall for each of these topics. Together these form a comprehensive data science curriculum.
The University of Toronto’s Learn to Program series has an excellent mix of content difficulty and scope for the beginner data scientist. Taught in Python, the series has a 4.71-star weighted average rating over 284 reviews.
The University of Toronto offers Learn to Program: The Fundamentals (LPT1) and Crafting Quality Code (LPT2). University of Toronto
Rice University’s Interactive Programming in Python series contains two of the best online courses ever. They skew towards games and interactive applications, which are less applicable topics in data science. The series has a 4.93-star weighted average rating over 6,069 reviews.
The courses in the UT Austin’s Foundations of Data Analysis series are two of the few with great reviews that also teach statistics and probability with a focus on coding up examples. The series has a 4.61-star weighted average rating over 28 reviews.
Duke’s Statistics with R Specialization, which is split into five courses, has a comprehensive syllabus with full sections dedicated to probability. It has a 3.6-star weighted average rating over 5 reviews, but the course it was based upon has a 4.77-star weighted average rating over 60 reviews.
MIT’s Intro to Probability course by far has the highest ratings of the courses considered in the statistics and probability guide. It exclusively probability in great detail, plus it is longer (15 weeks) and more challenging than most MOOCs. It has a 4.82-star weighted average rating over 38 reviews.
Kirill Eremenko’s Data Science A-Z excels in breadth and depth of coverage of the data science process. The instructor’s natural teaching ability is frequently praised by reviewers. It has a 4.5-star weighted average rating over 5,078 reviews.
Big Data University’s Data Science Fundamentals covers the full data science process and introduces Python, R, and several other open-source tools. There are no reviews for this course on the review sites used for this analysis.
A five-course series, UC Davis’ Data Visualization with Tableau Specialization dives deep into visualization theory. Opportunities to practice Tableau are provided through walkthroughs and a final project. It has a 4-star weighted average rating over 2 reviews.
Data Visualization with Tableau Specialization. University of California, Davis
Endorsed by ggplot2 creator Hadley Wickham, a substantial amount of theory is covered in DataCamp’s Data Visualization with ggplot2 series. You will know R and its quirky syntax quite well leaving these courses. There are no reviews for these courses on the review sites used for this analysis.
An effective practical introduction, Kirill Eremenko’s Tableau 10 series focuses mostly on tool coverage (Tableau) rather than data visualization theory. Together, the two courses have a 4.6-star weighted average rating over 3,724 reviews.
Taught by the famous Andrew Ng, Google Brain founder and former chief scientist at Baidu, Stanford University’s Machine Learning covers all aspects of the machine learning workflow and several algorithms. Taught in MATLAB or Octave, It has a 4.7-star weighted average rating over 422 reviews.
David Venturi is a Content Developer at Udacity. He also created his own data science master’s program. A version of this article originally appeared on Class Central.