With the pandemic, my wife grading like crazy because of an increased load owing to academic calendar changes, and my daughter gone off to college, I've had a lot of time on my hands. In a late night moment of weakness, I not only signed up for Stanford's famous Machine Learning course (videos by Andrew Ng) on Coursera, but paid for it so as to get a certificate. The next day, I regretted a bit the expense and time commitment, but paying for it has kept me honest and diligent. My repo suggests that I started it in September 19th and I finished it at the end of October, passing everything with 100% (which is not too difficult because you can resubmit until your work is correct). Now it's time for a little reflection.
I took this course because I wanted to be familiar with the vocabulary -- and I also wanted to get a feel for the kinds of problems basic machine learning can address. And I've learned a lot. So with regard to those outcomes, the course is a good one. I'm pretty late to the party: based on what I've learned I wish I had taken this some years ago when the course was new. Andrew Ng is of course one of the leading lights of productionizing machine learning at scale, having founded Google Brain and then for awhile being the Chief Scientist at Baidu.
One great aspect of the course is the way Ng has designed it so that certain concepts come back later. For instance, linear regression and gradient descent are worked up in the first lectures, and then return in a discussion of content-based recommendation systems. I love that kind of reinforcement.
The course, though, is starting to show its age. It really needs a refresh. I guess Ng never went back to it because he now has a more holistic learning venture, deeplearning.ai (which is still delivered through Coursera).
The awkward stuff: Every lecture has a number of errata (example) -- part of that is because of the sharp eyes of the millions of students who have enrolled in the course. But the thing that really gives one pause is the inconsistency of the presentation. In early weeks the slides are punctuated with "Readings" that summarize the lecture material in a condensed form (example), which is great for review. Later weeks don't have that. At some point a graduate student contributed some fantastic lecture notes, but they seem a little uneven. There are some really useful things here: For instance, each programming assignment has a tutorial to coach students over some of the bumps -- and, most critically, there is test data for each part of the assignment. I wrote tests in code to verify against the test data that my actual code produces the expected results: This is essentially what the automatic grader does, but the grader is a black box. The reason the test data is so great is because sometimes your assignment solution misses some cases that the automatic grader checks: But the test data has good coverage, so by verifying your code against the test data, you can have more confidence that your code will pass the grader. And it's faster: The test data is really pared down. Anyway, the upshot of this is that to get a full understanding, sometimes impacting the way you would do the coding assignments, you have to scramble over the assignment's PDF, the web page regarding programming the assignment, errata, tutorials, and test cases. At one point in a tutorial for Programming Exercise 4, the author says: "This tutorial outlines the process of accomplishing the goals for Programming Exercise 4. The purpose is to create a collection of all the useful yet scattered and obscure knowledge that otherwise would require hours of frustrating searches.This tutorial is targeted solely at vectorized implementations." It just isn't as unified as it might be. Another thing that is kind of weird is that it's entirely clear from the lectures that students should strive for vectorized implementations, but the tutorials and assignments suggest that some students write loops and don't get to take advantage of the speed of Octave's libraries. It's kind of incredible to me that looping solutions are even permitted.
Ng's style is very soft-spoken if not modest. At the end of the earlier lectures on linear and logistic regression, he notes to the audience that at this point one probably knows more about machine learning than a lot of people doing it in Silicon Valley, which is pretty funny and kind of nice take-down of the Valley. He has some tics, for instance, he likes to say "concretely" which I guess is math talk for "now I'm going to get down to a case." Another moment is when Ng will say that he's not going to prove something or derive something: This is almost always followed by an appeal to intuition where he says "you know" how something is -- and I would say he's compelling in explaining without a lot of math. There are some interesting pedagogical tricks. For instance, about 2/3 of the way through a given video lecture, there will be a pause for a quick single-question quiz. These quizzes frequently check for knowledge that is conveyed immediately after. In other words, you have to draw out the implications of what Ng has just taught. Some of these are hard. But the experience of failing a quiz or guessing for a correct answer, then to see Ng's walkthrough, produces a solid learning memory.
The graded assignments come in two forms: First, a 5-question quiz. In my experience, these can be quite hard. Frequently they turn on drawing something out from the lecture. One extremely painful aspect of the 5-question quizzes is that there may be a question where the rubric is to "pick all answers that are correct." So with five possible answers, you may have to do some hard thinking because one of the answers may be a bit non-obvious. In Coursera, you can re-do quizzes. A couple of times I found I had to write down the questions and then go back to the lectures to see what I had missed. On occasion the lecture provided the underpinning for an answer, but you had to figure out the implication yourself.
The programming assignments are top-notch. An early assignment is about recognizing hand-written digits, and you feel this definite jaw-drop / "oh my god" when it turns out to work. I know that this work was done years and years ago, but it is still really cool. Another example is about spam classification. I think after going through the assignments, almost anyone will have thoughts about problems in his or her own domain. Generally you fill in code for a missing function and then a test harness runs the data. When you submit, your results are validated by a server at Coursera. About my only gripe here is that there's a bit of a "so what" at the end of an assignment because it doesn't really sum up what you would do with the technique when presented with a fresh problem, starting from nothing. These programming assignments are also a treasure-trove of supporting code -- for instance, nice plots of contours that would be useful in your own work to provide a view into your data.
The programming assignments use Matlab or Octave (open source Matlab) and solutions using vectorization are preferred (rather than writing your own for-loops to find solutions by iteration). I found it pretty easy to find vectorized solutions but I imagine that trips up others who have been programming for a long time.
The last thing I want to say is that there is a high cognitive bar in this course because you really have to learn the material in three ways: First there are the equations: It's not hard but of course it's its own language (Ng doesn't turn the crank very much on the calculus and partial derivitives that make a lot of it work). Then there is the linear algebra. (So, for example, there may be something that is expressed as multiplication in the equation but you have to pick up the fact that it is a dot product or an element-wise multiplication, based on context.) And, finally, there is the expression of the algorithms in code (in Octave). So you are translating between three different languages. I found this to be very difficult, and not, I think, because the last time I did linear algebra in earnest was 40 years ago. There are some moments when you are translating between the linear algebra and the code in Octave, and you find that you have to transpose a matrix or vector to get it to work. I found this pretty frustrating: It didn't seem to have much rhyme or reason why, when going through a solution, you would find something in the form of a row vector rather than a column vector, or vice versa. The TAs know this, and advise students to display the dimensions of the variables in Octave to help with debugging, but it seems like there's an impedance mismatch between the different languages.
I was impressed with Octave. It makes it very easy to manipulate and run operations on matrices. There are some missing data structures, though: It would be helpful to have a proper hash or dictionary: You can do dictionary operations with an associative array, but the syntax is obtuse. Still, the pleasure of having matrix operations embedded in syntax is nice. No need to create objects and run methods to do basic matrix arithmetic.
In sum: Great course but it's showing its age. I think the next course for me might be the one on beginning TensorFlow (too bad the assignments aren't in Ruby!).
Having completed the course, I do have some advice for learning the material and getting the assignments finished: