TurboDiffusion: 100–200× Acceleration for Video Diffusion Models

(github.com)

92 points | by meander_water 7 hours ago

5 comments

mishu2 43 minutes ago
Having the ability to do real-time video generation on a single workstation GPU is mind blowing.
I'm currently hosting a video generation website, also on a single GPU (with a queue), which is also something I didn't even think possible a few years ago (my show HN from earlier today, coincidentally: https://news.ycombinator.com/item?id=46388819). Interesting times.
codingbuddy 21 minutes ago
We are scarily close to realtime personalization of video which if you agree with this NeurIPS paper [1] may lead to someone inadvertently creating “digital heroin”
[1] https://neurips.cc/virtual/2025/loc/san-diego/poster/121952
[-]
- hapticmonkey 13 minutes ago
  > We further urge the machine learning community to act proactively by establishing robust design guidelines, collaborating with public health experts, and supporting targeted policy measures to ensure responsible and ethical deployment
  We’ve seen this play out before, when social media first came to prominence. I’m too old and cynical to believe anything will happen. But I really don’t know what to do about it at a person level. Even if I refuse to engage in this content, and am able to identify it, and keep my family away from it…it feels like a critical mass of people in my community/city/country are going to be engaging with it. It feels hopeless.
jjcm 4 hours ago
Looks like there is some quality reduction, but nonetheless 2s to generate a 5s video on a 5090 for WAN 2.1 is absolutely crazy. Excited to see more optimizations like this moving into 2026.
[-]
- avaer 49 minutes ago
  Efficient realtime video diffusion will revolutionize the way people use computers even more so than LLMs.
  I actually think we are already there with quality, but nobody is going to wait 10 minutes to do a task with video that takes 2 seconds with text.
  If Sora/Kling/whatever ran cool locally 24/7 at 60FPS, would anyone ever build a UI? Or a (traditional) OS?
  I think it's worth watching the scaling graph.
  [-]
  - IsTom 40 minutes ago
    > If Sora/Kling/whatever ran cool locally 24/7 at 60FPS, would anyone ever build a UI?
    I like my buttons to stay where I left them.
    [-]
    - pavlov 23 minutes ago
      Yeah, it’s like asking “why would anyone read a book today when LLMs can generate infinite streams of text”
- villgax 4 hours ago
  That’s not the actual time if you run it, encoding and decoding is extra
  [-]
  - Lerc 1 hour ago
    Nevertheless it does seem that generating will fairly soon become fast enough to extend a video clip in realtime. Autoregressive by the second. Integrated with a multi modal input model you would be very close to an AI avatar that would be extremely compelling.
villgax 4 hours ago
I mean the baselines were deliberately worse and not how someone would be using these to begin with maybe noobs and the quoted number is only for DIT steps not for other encoding and decoding steps, which is actually quite high still. No actual use of FA4/Cutlass based kernels nor TRT at any point.
redundantly 3 hours ago
Now if someone could release an optimization like this for the M4 Max I would be so happy. Last time I tried generating a video it was something like an hour for a 480p 5-second clip.