9 comments

  • singpolyma3 58 minutes ago
    Love this.

    It says MIT license but then readme has a separate section on prohibited use that maybe adds restrictions to make it nonfree? Not sure the legal implications here.

    • CGamesPlay 10 minutes ago
      For reference, the MIT license contains this text: "Permission is hereby granted... to deal in the Software without restriction, including without limitation the rights to use". So the README containing a "Prohibited Use" section definitely creates a conflicting statement.
    • jandrese 11 minutes ago
      The "prohibited uses" section seems to be basically "not to be used for crime", which probably doesn't have much legal weight one way or another.
    • Buttons840 15 minutes ago
      Good question.

      If a license says "you may use this, you are prohibited from using this", and I use it, did I break the license?

  • armcat 1 hour ago
    Oh this is sweet, thanks for sharing! I've been a huge fan of Kokoro and event setup my own fully-local voice assistant [1]. Will definitely give Pocket TTS a go!

    [1] https://github.com/acatovic/ova

    • gropo 50 minutes ago
      Kokoro is better for tts by far

      For voice cloning, pocket tts is walled so I can't tell

      • echelon 14 minutes ago
        What are the advantages of PocketTTS over Kokoro?

        It seems like Kokoro is the smaller model, also runs on CPU in real time, and is more open and fine tunable. More scripts and extensions, etc., whereas this is new and doesn't have any fine tuning code yet.

        I couldn't tell an audio quality difference.

    • amrrs 1 hour ago
      Thanks for sharing your repo..looks super cool.. I'm planning to try out. Is it based on mlx or just hf transformers?
      • armcat 1 hour ago
        Thank you, just transformers.
  • dust42 1 hour ago
    Good quality but unfortunately it is single language English only.
    • phoronixrly 1 hour ago
      I echo this. For a TTS system to be in any way useful outside the tiny population of the world that speaks exclusively English, it must be multilingual and dynamically switch between languages pretty much per word.

      Cool tech demo though!

      • Levitz 17 minutes ago
        But it wouldn't be for those who "speak exclusively English", rather, for those who speak English. Not only that but it's also common to have system language set to English, even if one's language is different.

        There's about 1.5B English speakers in the planet.

      • kamranjon 27 minutes ago
        That's a pretty crazy requirement for something to be "useful" especially something that runs so efficiently on cpu. Many content creators from non-english speaking countries can benefit from this type of release by translating transcripts of their content to english and then running it through a model like this to dub their videos in a language that can reach many more people.
      • echelon 10 minutes ago
        English has more users than all but a few products.
  • lukebechtel 1 hour ago
    Nice!

    Just made it an MCP server so claude can tell me when it's done with something :)

    https://github.com/Marviel/speak_when_done

  • tschellenbach 1 hour ago
    It's cool how lightweight it is. Recently added support to Vision Agents for Pocket. https://github.com/GetStream/Vision-Agents/tree/main/plugins...
  • syntaxing 1 hour ago
    Is there something similar for STT? I’m using whisper distill models and they work ok. Sometimes it gets what I say completely wrong.
  • GaggiX 1 hour ago
    I love that everyone is making their own TTS model as they are not as expensive as many other models to train. Also there are plenty of different architecture.

    Another recent example: https://github.com/supertone-inc/supertonic

  • oybng 36 minutes ago
    >If you want access to the model with voice cloning, go to https://huggingface.co/kyutai/pocket-tts and accept the terms, then make sure you're logged in locally with `uvx hf auth login` lol
  • snvzz 1 hour ago
    Relative to AmigaOS translator.device + narrator.device, this sure seems bloated.