Having written a slightly more involved version of this recently myself I think you did a great job of keeping this compact while still readable. This style of library requires some design for sure.
Supporting higher order derivatives was also something I considered, but it’s basically never needed in production models from what I’ve seen.
Imho, we should let people experiment as much as they want. Having more examples is better than less. Still, thanks for the link for the course, this is a top-notch one
Cleaner, more straightforward, more compact code, and considered complete in its scope (i.e. implement backpropagation with a PyTorch-y API and train a neural network with it). MyTorch appears to be an author's self-experiment without concrete vision/plan. This is better for author but worse for outsiders/readers.
P.S. Course goes far beyond micrograd, to makemore (transfomers), minbpe (tokenization), and nanoGPT (LLM training/loading).
Ironically the reason Karpathy's is better is because he livecoded it and I can be sure it's not some LLM vomit.
Unfortunately, we are now indundated with newbies posting their projects/tutorials/guides in the hopes that doing so will catch the eye of a recuiter and land them a high paying AI job. That's not so bad in itself except for the fact that most of these people are completely clueless and posting AI slop.
Supporting higher order derivatives was also something I considered, but it’s basically never needed in production models from what I’ve seen.
P.S. Course goes far beyond micrograd, to makemore (transfomers), minbpe (tokenization), and nanoGPT (LLM training/loading).
It arguably reads cleaner than Karpathy's in some respects, as he occasionally gets a little ahead of his students with his '1337 Python skillz.