Running local LLMs offline on a ten-hour flight

(deploy.live)

63 points | by darccio 3 hours ago

12 comments

  • vladgur 1 hour ago
    That window seat with the 14” laptop seems extremely claustrophobic.

    That’s the real limitation on an economy flight - space rather than power or the internet… at least it would be for me.

    The only times I was able to get my laptop out and do some productive work was when I either was sitting in premium economy isle seat with room to spare or when there was an empty seat next to me

    • bs7280 54 minutes ago
      I have a 16" M1 Max that I only got because it was $1500 cheaper than MSRP, and it sucks on planes. I have really long arms and I can barely get it out of my bag without elbowing my neighbor.

      A few years ago I saw some very interesting custom ergonomic setups optimized for traveling + flying.

      One person with a thinkpad is able to get the monitor to be 180 degrees flat w/ the keyboard, and can hang it off the seat. He also brings a split ergo keyboard with a lap mount.

      Another person did something similar with a M1 laptop, but needs an Ipad to act as the external monitor (laptop stays in bag) with a built and designed from scratch split ergo keyboard.

    • rootusrootus 1 hour ago
      I'd probably choose the window seat myself, because while it is cramped, it is predictably so. When I sit in an aisle seat, it's not as cramped but I regularly get shoulder checked by passing people or beverage carts.

      What really makes me nervous if I'm in an economy seat is the seat in front of me. Depending on how the seat is designed, if the person suddenly reclines (or hell, just flexes the seat a bunch while moving around), it can come pretty close to pinching the laptop screen. That would be bad news.

      • ryandrake 1 hour ago
        That was the first thing I thought of when I saw the image. That's a very expensive computer that you risk destroying when the 300lb guy in front of you decides to lean back.

        The ergonomics of using a laptop on an economy-class tray table are not worth it. You're sitting there like a T-rex trying to make your arms as small as possible to tap on the keys. And the vertical viewing angle to your screen sometimes prevents you from even seeing anything. I wouldn't even bring my laptop out during a flight.

        • sweetjuly 28 minutes ago
          >The ergonomics of using a laptop on an economy-class tray table are not worth it. You're sitting there like a T-rex

          The trick I've found is to pack a bluetooth keyboard. If you put your laptop on the tray table, you can put the bluetooth keyboard on your legs _under_ the tray table and have your arms fully and comfortably extended. This works especially well if you're a vim/emacs/other keyboard driven editor user as you very rarely need to reach up to poke the trackpad .

        • walthamstow 1 hour ago
          In the image it's on his lap, not the tray table. I agree, using the tray is not worth it. The ideal is a tray that folds in half so I can use that to hold a drink and keep the machine on my lap.

          The tradeoff of poor comfort is insane productivity, for me anyway. Being restricted in place, no wifi, inconvenient toilet breaks, not in control of meal times, all means I get a lot of work done

    • zdw 53 minutes ago
      That's a 16" (from the size of the speaker grille on each side of the keyboard), so even more claustrophobic.
    • stavros 37 minutes ago
      I got some Xreal glasses and it's made flights so much more enjoyable. I can watch movies or work on something lying back, and the "screen" looks massive.
      • JSR_FDED 35 minutes ago
        I’ve been so tempted but some of the reviews say it’s not good for reading code. What’s been your experience? What is the effective resolution of the screen you get? Is it sharp enough for coding?
        • mrbonner 2 minutes ago
          Don’t listen to anyone saying it is fine for reading or writing extensively with the xReal. I have one and it is PITA to do that over a long period. You better just stick with watching videos or playing games with it.
        • stavros 31 minutes ago
          It's a definite "it depends". The resolution is fine, but I think it's more about the specific pair of glasses you get? I got the same model three times (long story), and the first two were fine, the third has some blurring in the middle of the right eye.

          It's also uncomfortable to look at the very bottom of the screen (which is where all the chat text boxes are), so I usually resize all my windows to be a bit smaller. With that, it's very good (and you can always just increase the font size).

          I would like glasses with smaller fov, so I didn't have to look around so much, but that's probably just me, since everyone else likes them larger.

  • scastiel 1 hour ago
    Interesting, I did and document the same kind of experiment a few months ago [1], it looks like so much changed since then!

    [1] https://betweentheprompts.com/40000-feet/

  • deanc 1 hour ago
    This has been exactly my experience too. I've tried multiple harnesses (pi, claude code, codex) with multiple variants of qwen3.6 and gemma4 driven by both o mlx and ollama - and every single time I try to do anything meaningful I end up in a loop. On a 64GB Macbook Pro M3 Max.

    I really don't know what the hell people are doing locally, and suspect a lot of the hype around running these models locally is bullshit. Sure, you can make it do something but certainly nothing useful or substantial.

    • NitpickLawyer 1 hour ago
      > a lot of the hype around running these models locally is bullshit. Sure, you can make it do something but certainly nothing useful or substantial.

      There is certainly a lot of hype around local models. Some of it is overhype, some of it is just "people finding out" and discovering what cool stuff you can do. I suspect the post is a reply to the other one a few days ago where someone from hf posted a pic with them in the plane, using a local model, and saying it's really really close to opus. That was BS.

      That being said, I've been working with local LMs since before chatgpt launched. The progress we've made from the likes of gpt-j (6B) and gpt-neoX (22B) (some of the first models you could run on regular consumer hardware) is absolutely amazing. It has gone way above my expectations. We're past "we have chatgpt at home" (as it was when launched), and now it is actually usable in a lot of tasks. Nowhere near SotA, but "good enough".

      I will push back a bit on the "substantial" part, and I will push a lot on "nothing useful". You can, absolutely get useful stuff out of these models. Not in a claude-code leave it to cook for 6 hours and get a working product, but with a bit of hand holding and scope reduction you can get useful stuff. When devstral came out (24B) I ran it for about a week as a "daily driver" just to see where it's at. It was ok-ish. Lots of hand holding, figured out I can't use it for planning much (looked fine at a glance, but either didn't make sense, or used outdated stuff). But with a better plan, it could handle implementation fine. I coded 2 small services that have been running in prod for ~6mo without any issues. That is useful, imo. And the current models are waaay better than devstral1.

      As to substantial, eh... Your substantial can be someone else's taj mahal, and their substantial could be your toy project. It all depends. I draw the line at useful. If you can string together a couple of useful tasks, it starts to become substantial.

    • ryandrake 1 hour ago
      Same here. Every time a new local model comes out, I give it a spin with a pretty vanilla coding task ("refactor this method to take two parameters instead of one", or "fix this class of compiler warning across the ~20 file codebase") and more often than not, they get in endless loops, or fail in very unusual ways. They don't yet even approach the usefulness of SOTA models. It's obviously not a fair comparison, though. My 20GB GPU is never going to beat whatever enormous backend Google or Anthropic have.
      • 2ndorderthought 1 hour ago
        You can do this with really small models but you have to do a more legwork. I wouldn't expect most trivially small models to handle anything more than 1 file reliably. The new qwen 3.6 is different though, I have heard cases where it is behaving close to sonnet.

        That said I don't see why people are so scared to touch code even if it saves them 500 euro a month. Using my IDEs find across my repo and auto replacing 2 patterns is trivial to do and way faster to do by hand. I mostly use small models, it prevents a lot of the issues I've seen with large models and vibe/agentic coding medium to long term. I also write a lot of code.

    • proxysna 1 hour ago
      You need to set sampling parameters for the llm. Had the same issue with Qwen3.5 when i first started. You can grab them off the model card page usually.

      From Qwen3.6 page:

      Thinking mode for general tasks: temperature=1.0, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=0.0, repetition_penalty=1.0

      Thinking mode for precise coding tasks (e.g. WebDev): temperature=0.6, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=0.0, repetition_penalty=1.0

      Instruct (or non-thinking) mode: temperature=0.7, top_p=0.80, top_k=20, min_p=0.0, presence_penalty=1.5, repetition_penalty=1.0

      • deanc 1 hour ago
        Yes, have tried all of these (as per the docs). Have you actually tried these? Because I have tried all 3 configurations with agentic coding that you mentioned and have the same result - loops.
        • proxysna 7 minutes ago
          I've used only Qwen3.5 so far for work and was, after initial struggles, successful with GPU setup, no mlx. Ngl the fact that they are using `presence_penalty: 0` and no `max_tokens` is weird after that exact setup caused me "initial struggles", but i've set up a simple docker-compose with vllm and qwen3.6 right now to test it out and it worked perfectly fine for me.

          Gist with the compose and example of an output. https://gist.github.com/meaty-popsicle/f883f4a118ff345b430c3...

    • mft_ 1 hour ago
      I’m frequently surprised how little I can find online about exactly this - different harnesses for local models and how to set them up. The documentation for opencode with local models is (IMO) pretty bad - and even Claude Opus (!) struggled to get it running. And so far I’ve not found a decent alternative to Claude Desktop.

      (I’ve recently discovered that you can pipe local models into Claude’s Code and Desktop, so this is on my list to try).

      • 2ndorderthought 55 minutes ago
        Qwen3.6 is brand new. But also, search engines are so plastered with AI slop that is written by tools and companies that have no interest in you using local models. Ollama makes it 1 command to run local small models, but with the newest ones there can be kinks to work out first.

        /R/localllama is okay for some information but beyond that there is so much noise and very little signal. I think it's intentional.

    • 2ndorderthought 1 hour ago
      In the article the author describes what they made. It's definitely not bullshit, but it's also not as reliable or as handsfree as the 1t models.

      For people who aren't completely vibe or agent coding these models are better than say copilot or the free models appearing after a Google search. Probably better than chatgpts flagships in some ways.

      I mostly use 4b to 9b models for basic inquiries and code examples from libraries I haven't used before. Many of them can solve pretty hard math problems, and these are several steps away from say qwen3.6.

      I would not discount running models locally. It's the best case scenario of a future with LLMs from a human rights and ecological perspective.

    • xienze 53 minutes ago
      It's probably a combination of things:

      * New models running in llama.cpp (what's under the hood of ollama et al) frequently require bug fixes.

      * The GGUF models that run in llama.cpp frequently require bug fixes (Unsloth is notorious for this -- they release GGUF models about 10 minutes after official .safetensors releases).

      * You're probably running a <Q8 quantization of the model, and a good chance <BF16 quantization for KV cache. This makes for compounding issues as context grows and tool calls multiply.

      Local models really are great but I think a major problem are the people in groups like r/localllama who run models at absurd quantization levels in order to cram them on their underpowered hardware and convince themselves that they're running SOTA at home.

      The best way to run these models is, frankly, a lot of VRAM and vLLM (which is what the people developing these models are almost certainly targeting).

  • mumbisChungo 42 minutes ago
    >Qwen 4.6 36B

    Did the author mean Qwen3.6-27B? Qwen3.6-35B-A3B?

  • builderminkyu 1 hour ago
    tried doing exactly this with ollama on a cross-country flight last month. my macbook basically turned into a jet engine and the battery died in under an hour.

    curious if you had to heavily throttle the cpu or stick to super small quants (like 4 bit phi3) to actually make it through 10 hours without a power outlet?

  • seattle_spring 14 minutes ago
    With more and more flights offering Starlink, I don't see why this would really ever be necessary.

    Also, agreed with the other commenters: just read a damn book and take a nap.

    • AntiUSAbah 4 minutes ago
      I hope starlink will not be the default.

      Not only shouldn't we support someone like Elon Musk but also don't you find it hypcritic to respond with 'just read a damn book' and suggesting starlink?

  • walrus01 35 minutes ago
    As much as it's a fun gimmick to run a relatively good sized LLM like qwen 3.6 35B locally, I would much rather have the ability to run it remotely on a piece of hardware I control via VPN session. Much better on battery life and heat. If I'm on an airplane I care about having as much battery life as possible.

    Let's say you have a basic setup like llama.cpp and llama-server on a remote server (even if it's just sitting under your home office desk) running a 35GB Q8 quantized model of qwen 3.6 35B, it's not difficult to make llama-server available to your laptop over just about any form of internet connection and VPN.

    Having the ability to run that same model locally if you really need to because no internet connection whatsoever is available, but the times that you simultaneously have no internet and a serious need for something the model can output are fairly rare these days.

  • ddarolfi 1 hour ago
    Qwen 4.6 36B? Do they mean Qwen3.6-35B-A3B?
    • trvz 1 hour ago
      Yes. The author is really sloppy if that wasn’t clear from the article.
    • Johnny_Bonk 1 hour ago
      So I have a RTX 3080 10GB VRAM which I've been using with Qwen2.5 Coder and Gemma 4 E2B. Im wondering what models you have tried with which quants.
    • mikeatlas 1 hour ago
      yes
  • tamimio 1 hour ago
    Can’t wait for more people to do the same and eventually getting laptops banned on board due fear of catching fire..
  • j1000 1 hour ago
    To be honest, I think possibility to work and travel is con rather than perk of current times.
    • HoldOnAMinute 51 minutes ago
      They keep removing the ability for you to have any downtime.
    • ryandrake 1 hour ago
      It hit different at different points in my life. When I was in my 20s I thought "Wow! I get to go on an international trip to a place I've never been, and work is paying for everything?!? I'll go whenever you need me to go!" Now that I'm almost 50, it's "Fuck. Another 14 hour international flight, to somewhere I'll likely only have time to see the inside of two buildings. What's the local language again? Do I drive on the left or right? Wait, how long do I need to stay? Please no."
  • bobro 1 hour ago
    Can’t you guys just read a book and take a nap?
    • 3form 1 hour ago
      I suppose the ones that do, wouldn't consider such a turn of events postworthy.
    • ducttape12 21 minutes ago
      Yeah, for real. Imagine being so addicted to the AI slot machine that you can't be without it for 10 hours.
    • cpursley 44 minutes ago
      I'm jealous of people who can actually get comfortable enough to sleep on flights.
      • koolba 33 minutes ago
        With enough drinks and a long enough flight, it’s unavoidable.
        • fernie 26 minutes ago
          The keyword being "comfortable".

          Most certainly avoidable, unfortunately.

    • dude250711 16 minutes ago
      If you nap, then you might end up living in a world where someone else is making the world a better place better than you are.
    • mdni007 41 minutes ago
      But then how can I show random people how productive I am?
    • stavros 42 minutes ago
      Why would I do that when making things is so much fun?
  • bilekas 1 hour ago
    Trying LLM in the air with a 6.200 EUR laptop... Sorry if it's not exactly relatable..