Ask HN: Why do people think text with em dashes indicates LLM generation?

My understanding of LLMs for text generation is they are trained on human-authored text. Therefore, all punctuation generated by LLMs (like em dashes) are likely to have been used in the human-generated content used for training.

I am mostly frustrated because I enjoy using em dashes in my writing and don't want to have to stop using them because people will assume my content is AI-generated.

3 points | by ronbenton 7 hours ago

3 comments

  • tocs3 6 hours ago
  • mmastrac 7 hours ago
    As humans are squishy, biological, natural AI, we've trained on LLM vs human text and seen that em dashes tend to indicate AI text. Naturally, we also overfit on our dataset.

    If you want to humanize your text, sprinkle in some artificial spacing errors. Group the em dash with the preceding character sometimes. Occasionally switch between the british and american versions of words.

    • ronbenton 6 hours ago
      >If you want to humanize your text, sprinkle in some artificial spacing errors. Group the em dash with the preceding character sometimes. Occasionally switch between the british and american versions of words.

      In the words of Michael Bolton from Office Space: "No way! Why should I change? He's the one who sucks!"

      • mmastrac 6 hours ago
        Consider it performance art -- when you write in public, you are an artist playing for an audience, no?
        • ronbenton 6 hours ago
          I hadn't thought about it that way. I suppose if I am concerned with whether people think my writing is AI-generated, then I am "performing" in a sense.
  • Trasmatta 6 hours ago
    Because ChatGPT uses it in almost every single response, and most humans do not.

    It uses the full em dash (—) whereas humans usually use the en dash (-), because it's actually on your keyboard.

    It's as simple as that. It doesn't prove something was written by ChatGPT, but it's a useful heuristic.