TLDR; Train an "energy" model that checks if the output is correct (rather than just outputting something), and gradient descent to find good outputs. Using transformers.
I've seen some of that channel's videos before, and many of them contain errors. I haven't read the Energy-Based Transformers paper yet, so I can't say for sure if this video contains any errors, but be careful.
https://alexiglad.github.io/blog/2025/ebt/
Also, see:
https://www.reddit.com/r/MachineLearning/comments/1lu1ia0/r_...