Week 4: Classical ML with a taste of Neural Networks

Week in Review

This might be my most productive winter break to-date.

TLDR:

Wrapped up review of multi-variable calculus.
Watch about 1/3 of STAT 110 lectures.
Watched all videos in Andrew Ng's ML specialization.
Watched the 3b1b Neural Networks playlist.
Read 3/4 of "Python for Data Analysis"
Read 1/4 of required reading of "Practical Statistics for Data Scientists"

Of course, it goes without saying, iteration is at the heart of my approach. While I did "complete" the ML course, it's absolutely crucial that I read all the labs and complete the practice now, as well as trying out the ML methods in the "wild". This week was more focused on being exposed to the theory-side of things.

The same goes for the data wrangling tools I've been exploring lately. While the book conveniently showcases an organized assortment of library functions and strategies, the only way I can truly absorb this knowledge is to experiment with it on my own.

And not only is what I intend on doing, but it's also the fun that I was awaiting this entire time! Looking forward to messing around with some data on Kaggle soon.

Progress Timeline

December 17, 2023

Reviewed multi-variable calculus during CLOV exam session
Improved blog design and published first entry
Worked through more than half of problem set #2

Can't consider today as one of my most productive days, but nonetheless we are one step closer to ML!

December 18, 2023

Watched lectures 10 - 14 of STAT 110. This is all the content up until the midterm, so much iteration will be done over the next few days to solidify understanding.
Started reading the book "Python for Data Analysis" and finished chapters 1 through 5.2.
Found out about 3b1b's neural networks playlist which I plan on adding to the curriculum for deep learning.

December 19, 2023

Watched Lecture 16 and 17 and finished problem set #2.
Plan on spreading out problem sets 3-5 up until saturday.
Finally began Andrew Ng's ML specialization on coursera. Completed weeks 1 and 2 for the first course.

December 20, 2023

Completed Week 1 of Andrew Ng's ML course 2 on neural networks. The fact that a NN can just automatically figure out the relationships between features and output is mind blowing. I do think I still need some more intuition on it though. Nevertheless, looking forward to diving deeper into deep learning later on in stage 3.
Reviewed course 1 week 1-2 labs to understand regression and gradient descent implementation better as well as learn how to use scikit-learn.

Unfortunately, I didn't get the chance to iterate on probability or continue with the data analysis textbook today, but hey there's always tommorow!

December 21, 2023

Watched the 3b1b playlist on neural networks and learned the calculus behind backpropagation. I plan on tracing the propagation myself on a toy neural network on paper to understand it better.
Spent the majority of the day finishing the (last two weeks of) 2nd course of ML specialization on neural networks, decision trees, and ML ops.
Thinking about trying to implement XGBoost for neural networks as an exercise. It's refreshing to see probability and random sampling being applied here.
Could only review lectures 1 - 3, but I plan on reviewing the rest in the upcoming 2 days for the CLOV session on Saturday.
Looking forward to messing around with tensorflow and doing my own exploration! This stuff is genuinely mind-blowing to me.

December 22, 2023

Can't consider today to be quite productive, but I ended up still completing the most important tasks.

Reviewed probability lectures 4 - 15 for tommorow's review session.
Read chapter 6 and 9 of the data analysis textbook.

Tommorow's plan is to:

Map out the big picture of how math, data wrangling, statistical analysis, ML iteration loop, and everything else connects with each other.
Of course, review probability content up until the midterm.
Explore kaggle datasets and experiment with some hands-on data analysis
Improve the website!

Small, but consistent steps towards progress. Looking forward to tommorow!

December 23, 2023

Read chapter 6 of the data analysis textbook.
Finished week 1 of course 3 of ML specialization.
Watched 1 lecture of STAT 110 and did a group review up until midterm content.
Went through seeing theory probability visualizations.

Changelog

We chose the book "Practical Statistics for Data Science" as a way to gain some fundamental knowledge on EDA (exploratory data analysis) while bridging the gap in our descriptive stats understanding.
Rather than waiting to complete probability (STAT 110), we decided to start the deeplearning.ai ML courses on the side, as a way to refill the motivation that was starting to wane off. Afterall, it can be quite easy to lose track of the end-goal with our (somewhat) bottom up approach to learning ML.
- Therefore we reduced our STAT 110 pace to 1 lecture a day.
We decided to skip the (fairly lengthy) problem-sets for STAT 110, and focus on the conceptual understanding. Although we may not gain the most rigorous and thorough understanding with this approach, we believe that over the winter break, it's a better use of time to finally start ML. Moreover, we have to take probability as a course for our degree sooner or later anyways.

Goals for next week

I'm extremely eager to dive deeper into deep learning starting January 1st, 2024, approximately a week from now. Before that, I have a few goals that need to be accomplished by then, including:

Read the Practical Statistics for Data Scientists book up until page 140 (first 3 chapters. Currently, I'm on page 36 (almost finished the first chapter), so I plan on continuing this pace and completing this task soon.
Read the rest of Python for Data Analysis. Only 4 chapters left to go!
Overall, not in a hurry with STAT 110, so I'd like to continue this smooth pace of 1 lecture per day. With this pace, the course should be completed roughly 2 weeks from now.
Most importantly, I need to thoroughly review and practice the machine learning knowledge I've accumulated this week in the ML specialization. In other words, iteration!
Lastly, I would like to revisit backpropagation and try to trace out the chain rules by myself to gain a deeper intuition of the algorithm.