Weeks 11 & 12: Multi-head attention on ISLR and NLP

Weeks in Review

TLDR this week:

ISLR chapter 7-10
CS 244N:
- Lectures 7 - 12
- Full theory review up to and including transformers
- Assignments 1 - 4

After getting slightly overwhelmed by the seemingly never-ending stream of dense theory from the NLP course, I realized I was going a bit too fast. Iteration to the rescue!

Towards the end, I got some exposure to transformers and LLM underpinnings. I'm super excited for later on when we dig into multi-modal transformer models for sound, images, and videos! Quite refreshing to make this transition away from older techniques and towards the world of SOTA.

Personally, I've also touched a sweet balance between school, CLOV, fitness, and other commitments/hobbies. It's all about finding that optimal schedule and not overdoing things. I was quite conscious to not let Parkinson's law grab ahold of my time as a college student, and finding a "clock out" time was absolutely essential for this.

You can be sure to keep expecting more consistent but firm steps in our CLOV journey!

Progress Timeline

February 5, 2024

ISLR 10 pages of chapter 7 on nonlinearities
NLP lecture 7
NLP Assignment #1
Further data transformation and wrangling work for VNDB data. Considering just starting with a base NCF model and iterate from there.

Keeping the ball rolling!

February 6, 2024

ISLR 10 pages, finishing chapter 7. Smooth splines and GAMs were completely foreign to me, and while there's better methods out there, it's still great to be familiar with the classics.
NLP Lecture 9 on Transformers & Attention. By far the most dense lecture that I had to take in. The architecture is relatively complex with many moving parts, and it took me quite some time to digest it in one go. I'm definitely gonna have to revisit it again this week, as well as read the associated papers after the course.

This is about all the time I had for today, as I had an exam. Tommorow, I hope to make some more progress getting a basic NCF model configured for the VN data. Looking forward to more grinding! Stay tuned.

February 7, 2024

ISLR 10 pages, learning about trees in a more formal sense
NLP lecture 10 on pre-training transformers. I think at this point, I've accumulated so much theory that I do need to pause and work through the assignments and review the notes. To focus on this a bit more, I'm also pausing my VN project due to time constraints alongside school commitments. Never rush the process!

February 8, 2024

To be quite honest, I couldn't do much at all today due to many external events and overall just a storm of a day. I wasn't feeling too well either, so I decided to treat today as more of a rest and relax kind of day. It's always good to recharge and come back stronger than ever before!

February 9, 2024

Finished up ISLR chapter 8 on random forests & BART. Looking forward to working on the problemsets for chapter 7 & 8 on Sunday.
Reviewed NLP lecture 1 notes on word2vec, negative sampling, and the nitty gritty calculus for finding U & V.

A steady comeback! Next up: working on NLP assignment 2 and reviewing the next two lecture notes. We're soon going to be adding new features to the website, so keep a lookout for that!

February 10, 2024

10 pages ISLR chapter 9 on Support Vector Machines.
Completed the theory portion of NLP assignment 2, this time optimizing the cross entropy function myself by hand, with and without negative sampling. It all adds up now, except just a few minor math holes that I plan on figuring out tommorow.
Started on my chinese known words classifier again after a huge realization that I was just using the wrong metric to judge my model the entire time! No wonder I was so dejected after finding out the accuracy was lower than the baseline null model. I was supposed to be using more fine-grained metrics like AUC, F1, etc with heavily imbalanced datasets.

Can't wait to produce a burst of progress on my project now! It's always the most unexpected times that roadblocks clear up, and it's up to you to grab ahold of that opportunity.

February 11, 2024

During our CLOV session today:

Worked through ISLR Problemsets for chapters 7 and 8.
Re-derived assignment 2 equations and finally clarified our understanding fully

February 12, 2024

Finished ISLR chapter 9 on SVMs! Honestly, I really wished Hastie would talk more about the linear algebra behind the kernel trick, but it's to be expected of an introductory text. ESL is definitely a reference to look forward to when it comes to these mathy bits!
Finished NLP assignment 3 on dependency parsing.
Read the lecture notes for NLP lecture on dependency parsing, just to recall the background to complete the homework.

Keeping up the daily ISLR + NLP combo while making sure school doesn't slip past my fingers!

February 13, 2024

ISLR 10 pages on deep learning chapter. This is all super basic and very much a quick review, so unfortunately not much to gain from this chapter. But the main reason we started this book was to understand the classical ML methods better, like GLMs.
Read NLP notes on GLOVE, backpropagation, and RNNs (total about 45 pages). I still feel sorta iffy on backprop as much as it hurts to say, but that's why we're going to review it thoroughly on Sunday!

Feels good to have caught up content-wise up until transformers (that's where the real fun begins). Looking forward to starting assignment 4 and absolutely locking in for transformers & attention.

February 14, 2024

ISLR 10 pages
Finished theory portion of NLP assignment 4 to truly understand seq2seq attention. As I'm mainly trying to grokk the theoretical aspects from this course, I just skimmed through the code to get a rough understanding of how it translates. The specifics can be easily looked up when we begin our capstone project (still to be decided)
Watched NLP lecture 8. Went over the basics of attention and project advice.

Can't wait to review transformers tomorrow!

February 15, 2024

ISLR 10 pages, finished chapter on deep learning. Only 3 more chapters left! Can't believe we're already near the goal post.
Went through the illustrated transformer article. This was a fantastic read and really solidified my understanding of the architecture. I mean, it was shocking how simple they boiled it down with colorful diagrams. Of course though, tommorow is for diving much deeper into the equations.

Consistent steps towards where we want to be! Nothing feels better than comparing yourself to none other than yourself.

February 16, 2024

5 pages of ISLR on survival analysis
Reviewed lecture notes from cs224n on transformers.
Reviewed pretraining slides from the lecture.

Little less emphasis on ISLR, but grew more fascinated with the new era of transformers. The announce of Sora has made me really interested in diffusion as well, so I'm looking forward to part 2 of fast AI!

February 17, 2024

ISLR 10 pages starting chapter on unsupervised learning. Decided it would be best to return to the highly dense survival analysis chapter after finishing the last two chapters.
NLP Lecture 12 on Question Answering with LSTMs and BERT. Lots of great information from Danqi Chen.

During our meetup today:

Reviewed RNN forward and backward prop math, and GRU + LSTM gates.
Reviewed seq2seq attention theory from assignment 4

Today was the Axxess hackathon and we looked around to see if we could turn it into a worthwhile opportunity for our CLOV improvement journey, but we ultimately came to the realization that it ends up in wrapping GPT. We believe that this won't lead us to where we want to be, but definitely found points of interest to dig deeper theory-wise. Vision transformers, we're coming for you soon!

Changelog

Temporarily pausing personal projects in order to fully concentrate on working through Stanford's (quite rigorous) CS244N course and ISLR.

Short-term goals

By next week, we should be finished with ISLR!
Going at a pace of 1 lecture a day, I should complete the rest of the cs224N lectures by next week as well.
For practice, we will be working on assignment 5. The final capstone project will be decided next Sunday.
Revamp this website! A few more improvements and polishing wouldn't hurt, as we plan on sharing our roadmap to the public.