Comparing manual vs. AI requirements gathering: 2 sentences vs. 127-point spec

We took a vague 2-sentence client request for a "Team Productivity Dashboard" and ran it through two different discovery processes: a traditional human analyst approach vs an AI-driven interrogation workflow.

The results were uncomfortable. The human produced a polite paragraph summarizing the "happy path." The AI produced a 127-point technical specification that highlighted every edge case, security flaw, and missing feature we usually forget until Week 8.

Here is the breakdown of the experiment and why I think "scope creep" is mostly just discovery failure.

The Problem: The "Assumption Blind Spot"

We’ve all lived through the "Week 8 Crisis." You’re 75% through a 12-week build, and suddenly the client asks, "Where is the admin panel to manage users?" The dev team assumed it was out of scope; the client assumed it was implied because "all apps have logins."

Humans have high context. When we hear "dashboard," we assume standard auth, standard errors, and standard scale. We don't write it down because it feels pedantic.

AI has zero context. It doesn't know that "auth" is implied. It doesn't know that we don't care about rate limiting for a prototype. So it asks.

The Experiment

We fed the same input to a senior human analyst and an LLM workflow acting as a technical interrogator.

Input: "We need a dashboard to track team productivity. It should pull data from Jira and GitHub and show us who is blocking who."

Path A: Human Analyst Output: ~5 bullet points. Focused on the UI and the "business value." Assumed: Standard Jira/GitHub APIs, single tenant, standard security. Result: A clean, readable, but technically hollow summary.

Path B: AI Interrogator Output: 127 distinct technical requirements. Focused on: Failure states, data governance, and edge cases. Result: A massive, boring, but exhaustive document.

The Results

The volume difference (5 vs 127) is striking, but the content difference is what matters. The AI explicitly defined requirements that the human completely "blind spotted":

- Granular RBAC: "What happens if a junior dev tries to delete a repo link?" - API Rate Limits: "How do we handle 429 errors from GitHub during a sync?" - Data Retention: "Do we store the Jira tickets indefinitely? Is there a purge policy?" - Empty States: "What does the dashboard look like for a new user with 0 tickets?"

The human spec implied these were "implementation details." The AI treated them as requirements. In my experience, treating RBAC as an implementation detail is exactly why projects go over budget.

Trade-offs and Limitations

To be fair, reading a 127-point spec is miserable. There is a serious signal-to-noise problem here.

- Bloat: The AI can be overly rigid. It suggested microservices architecture for what should be a monolith. It hallucinated complexity where none existed. - Paralysis: Handing a developer a 127-point list for a prototype is a great way to kill morale. - Filtering: You still need a human to look at the list and say, "We don't need multi-tenancy yet, delete points 45-60."

However, I'd rather delete 20 unnecessary points at the start of a project than discover 20 missing requirements two weeks before launch.

Discussion

This experiment made me realize that our hatred of writing specs—and our reliance on "implied" context—is a major source of technical debt. The AI is useful not because it's smart, but because it's pedantic enough to ask the questions we think are too obvious to ask.

I’m curious how others handle this "implied requirements" problem:

1. Do you have a checklist for things like RBAC/Auth/Rate Limits that you reuse? 2. Is a 100+ point spec actually helpful, or does it just front-load the arguments? 3. How do you filter the "AI noise" from the critical missing specs?

If anyone wants to see the specific prompts we used to trigger this "interrogator" mode, happy to share in the comments.

2 points | by thesssaism 11 hours ago

3 comments

  • muzani 2 hours ago
    Movies screenwriters seem to have found the perfect balance.

    A character description looks like this: "APOLLO CREED. Creed is twenty-eight years old. He is a tall, smooth-muscled Black with barely a scar on his light coffee-colored face."

    It includes everything important for casting this guy. It doesn't say what he's wearing, hair style, things like a wedding ring. Once they cast an actor, the actor fills this up.

    Fight scenes are designed to be as fast to read as the action. Poor writing is something like, "Terrorists B and C fire RPGs at the van. The van makes evasive maneuvers. After the third rocket, the van flips off road." It doesn't make the story nor the scene clearer.

    The better script: "The van takes evasive maneuvers to dodge the RPGs. BLAM! BOOM! BANG! The van flips off road."

    Maybe when filming, they realize two rockets make more sense. Leave the implementation details to the experts.

    However dialogue forms a large part of these scripts. Dialogue is engineered by writers, right down to the syllables. (Funny enough, AI screenwriters often forget syllables exist, and you can tell because they're difficult to actually speak)

    What's the purpose of the spec? Instructions? To iron out risks and roadblocks? The document should aim for the bare minimum for that. What's your "dialogue" part - the thing that you need analysts to plan out precisely?

  • mackatsol 8 hours ago
    Both are bad IMHO .. a senior human analyst who accepts a 2 sentence product description? What kind of weird reality is that coming from? I’ve had a client do that too, but it’s the analysts job to ask all the point follow-up questions so they end up with a proper requirements list. That ends up being multiple pages long.. feed that to the AI! I agree the AI as stated above has a bigger coverage, but it’s not doing a better job, it’s being just as lazy and adding a ton of filler to a lousy prompt. Yeah, this set me off. Great topic! Looking forward to reading the discussion. :)
  • allinonetools_ 7 hours ago
    I have seen this play out on real projects. The missing edge cases are usually what cause delays, not the main features. Using AI as a checklist and then trimming it down with human judgment seems to work better than relying on assumptions alone.