Project Glasswing: An Initial Update

(anthropic.com)

170 points | by louiereederson 2 hours ago

27 comments

mdeeks 1 hour ago
You can get a taste of this today yourself with Codex Security. I turned it on just as an experiment and in less than a week it has now become essential to all of us. I was shocked how accurate it is, how many security issues it found in existing code, how it continually finds them as we commit, and how NO ONE is immune from making these mistakes.
I'd say it is about 90% accurate for us. Often even the "Low" findings lead us to dig and realize it is actually exploitable. Everyone makes these mistakes, from the most junior to the most senior. They are just a class of bugs after all.
I expect tools like this to be a regular part of the development lifecycle from here on. We code with AI, we review with AI, we search for vulns with AI. Even if it isn't perfect, it is easily worth the cost IMHO. Highly recommend you get something enabled for your own repos ASAP
[-]
- winstonwinston 1 hour ago
  > I expect tools like this to be a regular part of the development lifecycle from here on. We code with AI, we review with AI, we search for vulns with AI. Even if it isn't perfect, it is easily worth the cost IMHO.
  So, how is that supposed to work? Claude Code generates security bugs, then Claude Security finds them, then Claude Code generate fix, spend tokens, profit?
  [-]
  - ygjb 51 minutes ago
    Yeah, with a budget assigned. This is actually just software development and security right?
    Developers create software, which has bugs. Users (including bad guys, pen testers, QA folks, automated scans etc, etc, etc) find bugs, including security bugs, Developers fix bugs and maybe make more. It's an OODA loop, and continues until the developers decide to stop supporting the software.
    Whether that fits into the business model, or the value proposition of spending tokens instead of engineer hours or user hours is fundamentally a risk management decision and whether or not the developer (whether OSS contributor, employee, business owner, etc) wants to invest their resources into maintaining the project.
    While not evenly distributed, and not perfect, the currently available and behind embargoed tools are absolutely impactful, and yes, they are expensive to operate right now - it may not always be the case, but the "Attacks always get better" adage applies here. The models will get cheaper to run, and if you don't want to pay for engineers or reward volunteers to do the work, then you've got to pay for tokens, or spend some other resource to get the work done.
    [-]
    - sandeepkd 29 minutes ago
      Somehow this reminded me of the historical efforts of some government bounty collections for mouse tails which were discontinued due to fraud (such as hunters breeding mice to collect the reward). There is a reason why/how devs and QA keep each other in check. Guess in case of LLM writing code, one has to use different models for dev and security checks.
      On other hand, in real world, the developers learn from mistakes and avoid them in the future. However there is no feedback loop with enterprises using LLM with the agreement that the LLM would not use the enterprise code for training purposes
      [-]
      - ygjb 11 minutes ago
        > the developers learn from mistakes and avoid them in the future
        No. Humans learn from mistakes and try to avoid them in the future, but there is a whole pile of other stuff in the bag of neurons between our ears that prevent us from avoiding repetition of errors.
        I have seen extremely talented engineers write trivial to avoid memory corruption bugs because they were thinking about the problem they were trying to solve, and not the pitfalls they could fall into. I would argue that the vast majority of software defects in released code are written by people that know better, but the bug introduced was orthogonal to the problem they were trying to solve, or was for an edge case that was not considered in the requirements.
        Unless you are writing a software component specifically to be resilient against memory corruption, preventing memory corruption issues aren't top of mind when writing code, and that is ok since humans, like the machines we build, have a limit to the amount of context/content/problem space that we can hold and evaluate at once.
        Separately, you don't necessarily need to use different models to generate code vs conduct security checks, but you should be using different prompts, steering, specs, skills and agents for the two tasks because of how the model and agents interpret the instructions given.
  - jimmy2times 58 minutes ago
    The AIs have already figured out how to succeed in a software job:
    1. Ship bugs
    2. Fix them
    3. You're the hero!
    [-]
    - jimbokun 5 minutes ago
      Dilbert beat you to it:
      https://english.stackexchange.com/questions/488178/what-does...
    - genghisjahn 53 minutes ago
      I thought we were all doing that already?
      [-]
      - pjmlp 50 minutes ago
        The idea is to take the human out of the loop.
      - flir 51 minutes ago
        Jesus, dude. There are managers reading this.
        [-]
        genghisjahn 1 minute ago
        >_<
  - raincole 47 minutes ago
    Humans work like that too. If you're not comfortable with Claude involves in every step (for whatever reason) then just use different providers for each.
  - jstummbillig 30 minutes ago
    Ngl, watching folks getting irritated about normal employer-employee absurdities from the employer perspective through usage of agents and having to pay for tokens has been a little therapeutic for me.
  - unethical_ban 34 minutes ago
    How is this supposed to work? Humans generate security bugs, then humans find them, then humans generate the fix, profit?
    Yeah. Presumably as AI code generation gets better, the output gets better. As smaller portions of code are stitched together, human/AI systems analyze it holistically to make sure all its integrations are secure and bug free.
    In 2026, different models are better at different things. Cheap models can plan and do small/medium code projects well, more expensive models are even better at architecture and exploit discovery.
  - siva7 56 minutes ago
    So? That's how a business works. We sold you landmines and now you need them removed? Lucky you we also have mine clearance products.
    [-]
    - 382hi 52 minutes ago
      Exactly!
- mnahkies 24 minutes ago
  One issue I've seen with LLM's is adding superfluous code in the name of "safety" and confidently generating a bunch of stuff that was useful in years gone by, but now handled correctly by the standard lib. I'm of the opinion that less is more when it comes to code, and find the trend this is introducing quite frustrating.
  How do you avoid this pitfall?
  [-]
  - appplication 19 minutes ago
    Gosh this couldn’t be more true, which IMO is the real reason LLM workflows are not strictly faster if you care about quality. Otherwise you end up with a codebase where only 60% of it is necessary. Standard testing patterns also tend not to be great at catching this particular flavor of LLM-ism.
- Version467 1 hour ago
  I’ve had the same experience. The ui is a little unclear about this, because it says you have 5 scans, but 1 scan is just the continuous monitoring of the default branch of a repo.
  The high impact findings have almost all been bang on for me. I was especially surprised by the high-quality documentation it produces as well as how narrow the proposed fixes are.
  I’m used to codex producing quite a but more code than it needs to, but the security model proposed fixes that are frequently <10 loc, targeting exactly the correct place.
  It’s really quite good. I’m assuming it’ll be pretty expensive once out of beta, but as a business I’d be jumping on this.
- 0xAstro 1 hour ago
  I would recommend you to try out the setup with gpt-5.5-cyber as the orchestrator and deepseek-v4-flash or some other fast cheap model as its workers. Getting pretty good results using this setup.
- rmast 59 minutes ago
  I help maintain a project that is used as a dependency by a lot of security tools to handle PE files.
  It’s disappointing that Anthropic and OpenAI never responded to the applications to their respective programs for open source maintainers. From my perspective it seems like their offers are primarily for the shiny well-known projects, rather than ones that get only a few million monthly installs but aren’t able to get thousands of stars due to being “hidden” as a dependency of popular tool.
cpard 1 minute ago
My understanding so far is that that Mythos (and any model in general) can produce candidate reasoning but you really need a system around that reasoning that is capable of producing auditable security findings.
So, success is coming not just from the model but also from the harnesses they built around it. The Cloudflare post was more detailed on that front and I wish the rest would share more about it.
The Cisco spec is interesting too, it pretty much describes an architecture of a harness: https://github.com/CiscoDevNet/foundry-security-spec
piker 6 minutes ago
We have been working with the consumer-grade frontier models to develop what we call "lexploits" in legaltech, and they are insanely good at finding bugs across integrated pipelines. They're also surprisingly good at mitigating them!
Security vulnerabilities are one thing, but in legal we offer up a concept of "knowledge security" which goes to protecting the fidelity of the agent's legal context. Software bugs seem much more tractable because they're managed by software engineers, as opposed to the pipeline "vulnerabilities" we're finding. We wrote a little about one vector here where legal documents aren't quite what they seem: https://tritium.legal/blog/noroboto
No doubt there are many such knowledge domains exposed today. These are more concerning because they're understaffed and managed by non-technical people for the most part. No Mythos required.
Amekedl 1 minute ago
I don't buy it. A lot of stuff this finds is also just simply wrong, benignly reported as true, despite upper/lower layers in the code burying the possibility of a vulnerability actually being exploited. It's a performance/security trade-off too, it always has been. Additional checks and other measures do in fact need to be performed for security purposes.
Great marketing as always, but the rose-tinted view many have seems vicariously misplaced.
demorro 29 minutes ago
If you're not already applying static analysis and linters to your codebase (and I know many of you aren't), ask yourself why you would bother to apply an expensive LLM tool?
Not to say these things won't catch vulnerabilities static tools cannot, I think they can, it's just we already have the capability to automatically catch a large surface area of common vulns, and have chosen not to, often for expense reasons.
If you're a team that does already apply several layers of analysis and linting, and wants to add this on top, all power to you.
[-]
- SkyPuncher 8 minutes ago
  > If you're not already applying static analysis and linters to your codebase
  Because most issues are in business logic that static analyzers aren't going to catch.
- sobellian 26 minutes ago
  Static analysis often shows many false positives. A more intelligent tool can help not to waste limited engineering time.
- redsocksfan45 24 minutes ago
  [dead]
nikcub 1 hour ago
There has been a lot of cynicism around mythos, that it's just the usual public models without guardrails, etc. etc. but this:
> 1,752 of those high- or critical-rated vulnerabilities have now been carefully assessed by one of six independent security research firms, or in a small number of cases by ourselves. Of these, 90.6% (1,587) have proved to be valid true positives, and 62.4% (1,094) were confirmed as either high- or critical-severity.
for anybody who has applied opus, codex or oss models for vuln scanning - the true positive rate and discovery volume are a clear step change[0]. The ~50 partners in Glasswing have largely all previously run harnesses with other models and many of them have come out and said - essentially - "ye, wow"
Question now is what a second and third phases of access looks like - deciding which class of systems to secure. Routers, firewalls, SaaS, ERP systems, factory controllers, SCADA systems, zero-trust VPN gateways, telecoms gear and networks, medical devices - there's just so much to do
This is why I believe mythos will remain private for the foreseeable future. There's such a large surface that needs to be secured and so much to triage, fix, deploy.
That may suit Anthropic as private models can't be distilled. There's also a runaway effect of model improvement from the discovery, triage and fix data. This is likely already the most potent corpus of curated offensive data ever assembled and will only get better.
I don't see how Chinese companies are given access soon, or ever. We're likely going to see a world soon of CISA mandated audits, and where to buy a mythos-proof VPN gateway or home router - you'll have to buy American[1].
[0] vs ~30% or so in regular audit tools
[1] or allied
[-]
- gck1 13 minutes ago
  > This is why I believe mythos will remain private for the foreseeable future. There's such a large surface that needs to be secured and so much to triage, fix, deploy.
  sigh I remember the GPT-2 days - when it was the first time OpenAI restricted access to the models citing "humanity is not ready for it". The model was good at writing poetry or something.
  Since then, I don't remember a single model announcement from OAI/ANT that didn't use similar wording.
  The so-called leak of model announcement was marketing, it being dangerous is marketing, the world not being ready for it is marketing. And yes, the ones that were given access to saying "oh wow", believe or not, is also marketing.
  It's all marketing. You can get the same results from any of the top-5/10 models that are generally available already.
  Mythos is Anthropic's way to sell the new idea, because the previous one has democratized.
  [-]
  - NitpickLawyer 0 minutes ago
    Writing marketing 10 times doesn't invalidate the (many) claims from many respectable sources that the model is a step change in cybersec. There's also the report from the Brits that track cyber capabilities since '22 or '23 and they've also confirmed it's a step change (together with 5.5 cyber or whatever they call it).
    Marketing is like propaganda. It doesn't need to be based on false facts. Of course they're gonna milk it, keep it private and so on. But that doesn't mean the model is bad. Or that others are as good (apparently they're not there yet).
mixologic 47 minutes ago
Right now the only codebase I care about them fixing vulnerabilities in are the 3800 repositories that got stolen from GitHub.
"Vulnerabilities in the software that makes the internet" is honestly lower priority than "The platform that the software that makes the internet uses to make releases" If buyers of those internal repos find ways to break into GitHub such that they can cut software releases, or poison github actions from a distance, then we're all in a very ugly mess.
Don't forget that in those 3800 repos is likely also npmjs.org itself.
jimmar 55 minutes ago
People predict that in 50 years, no human will be driving a car, and people will be shocked that we let humans drive cars manually. Coding may be the same. So many vulnerabilities in code written by very competent programmers. Manually building large, complex systems without major bugs or security vulnerabilities seems to be a nearly impossible challenge.
[-]
- brightbeige 28 minutes ago
  And to consider AI agents are still mostly entirely limited to generating code in token-heavy programming languages designed to be written, tested and debugged by humans.
  Here are two experimental exceptions:
  https://github.com/vercel-labs/zerolang
  https://github.com/sbhooley/ainativelang
- vb-8448 21 minutes ago
  I just wonder how many of those 1451 acknowledged findings were introduced by LLMs ...
- morpheos137 50 minutes ago
  there is little evidence for this prediction.
  [-]
  - cubefox 45 minutes ago
    The rapid progress in the last few years in this regard is pretty strong evidence in my opinion.
  - sp527 35 minutes ago
    Oh there's plenty of evidence. Because a lot of these people have been committing to repos in public for over a decade. Wouldn't take much to show the world just how fallible human coders really are.
- cheesefck 45 minutes ago
  Musk has been predicting self driving cars next year for fifteen years. Fifty years ago, everyone was going to be flying supersonic all the time. Flying cars were just around the corner. Interplanetary travel. Everyone forgets the technology that fails.
  This is the MoviePass era of language models
vb-8448 25 minutes ago
The report on findings is very interesting: 1451 acknowledged findings out of 23k candidates(~6%, not high but neither low).
But I didn't find the most important information (or maybe I missed it): how much did it cost to find 1451 security bugs?
[-]
- gpugreg 15 minutes ago
  We can at least put an upper limit on it. From https://www.anthropic.com/glasswing
```
    Claude Mythos Preview will be available to participants at $25/$125 per million input/output tokens
    ...
    Anthropic is committing up to $100M in usage credits for Mythos Preview
```
  Although I'd expect reduced prices for cached tokens, which is not mentioned on their website at this point in time.
OsrsNeedsf2P 2 hours ago
The vulnerabilities found continues to impress, and make legacy media, Twitter and Youtube go nuts. But we still have no data to prove this wasn't doable with the same initiative backed by Opus 4.7, and there is no GA for Mythos access.
[-]
- krisbolton 1 hour ago
  There is independent research out there on frontier model security capability. AI Security Institute (UK) put out their paper comparing Mythos to other frontier models in early April. They've been tracking frontier model security capability since early 2023, so it's a decent dataset. https://www.aisi.gov.uk/blog/our-evaluation-of-claude-mythos...
- energy123 1 hour ago
  . Mozilla found and fixed 271 vulnerabilities in Firefox 150 while testing Mythos Preview—over ten times more than they found in Firefox 148 with Claude Opus 4.6;
  [-]
  - applfanboysbgon 1 hour ago
    Did they allocate the same number of tokens to looking with Claude 4.6? Or did they find more because they looked more, owing to a special initative by Anthropic?
  - properbrew 1 hour ago
    > over ten times more than they found in Firefox 148 with Claude Opus 4.6
    And how much with Opus 4.7? 5x?
  - kllrnohj 1 hour ago
    No, not really. Mythos found 3 CVEs, not 271.
    https://www.flyingpenguin.com/mythos-mystery-in-mozilla-numb...
    [-]
    - simonw 1 hour ago
      The Mozilla team responded to that argument here: https://hacks.mozilla.org/2026/05/behind-the-scenes-hardenin... - in the FAQ.
    - moyix 1 hour ago
      I think you're confusing CVEs and vulnerabilities here? Mozilla (per their longstanding practice) grouped multiple vulnerabilities found internally under a small number of CVEs.
- ospray 35 minutes ago
  This report is far more positive with a far lower false positive rate than I was expecting based on reports from the curl team and a few others. I guess I have just been hearing about the ten percent misses. Can anyone not employed by Anthropic who has used it vouch that it is equal to general human testers and do you need xbow to make it that way.
- parker-3461 1 hour ago
  Makes me wonder if Anthropic is really having issues with allocating compute (see recent deals with xAI and SpaceX). From available benchmarks, it seems like similar results should be possible with GPT 5.5 Pro or Opus 4.7 (with specific cybersecurity trained models).
  [-]
  - smoe 1 hour ago
    At least according to this, GPT-5.5 Cyber is on par with Mythic, as the only two models that were able to finish their 32-step corporate network attack simulation.
    https://www.aisi.gov.uk/blog/our-evaluation-of-openais-gpt-5...
  - wiwiwq 1 hour ago
    Who knows but from a valuation stand point it’s better to signal that demand is higher than existing capacity..
- pertymcpert 2 hours ago
  > Mozilla found and fixed 271 vulnerabilities in Firefox 150 while testing Mythos Preview—over ten times more than they found in Firefox 148 with Claude Opus 4.6
  4.6 but close.
  [-]
  - OsrsNeedsf2P 1 hour ago
    Right, but were they using the same methodology and harness? I'm skeptical that they're doing something with the harness - i.e. with Mythos, they pass each file in one at a time, whereas on 4.6 they let Claude Code run loose to find bugs. This would have a larger impact difference than the model itself.
    [-]
    - mpyne 46 minutes ago
      Yes, the harness they used actually existed and was in use beforehand, it wasn't developed for testing with Mythos.
- bobbycastorama 2 hours ago
  I've seen a blog post by a security researcher saying that he was able to find the same vulnerabilities (for Firefox IIRC) with a ~30B params LLM...
  So yeah, huge marketing as always.
  [-]
  - simonw 1 hour ago
    You mean this one? https://aisle.com/blog/ai-cybersecurity-after-mythos-the-jag...
    That's the one that says:
    > We took the specific vulnerabilities Anthropic showcases in their announcement, isolated the relevant code, and ran them through small, cheap, open-weights models. Those models recovered much of the same analysis.
  - krisbolton 1 hour ago
    This is different though right? He found one (? we don't know who you're referring to - post sources for a higher quality discussion) vulnerability, he already knew it was there, etc. Anthropic didn't claim no other model can find vulnerabilities, nor that it's impossible with smaller models. They're claiming Mythos is a step-change in ability for end-to-end vulnerability discover and exploit creation. And that other frontier models are close behind.
  - Brystephor 1 hour ago
    Did the security researcher point the LLM at the blob of information and say "Find vulnerabilities" or was the LLM told to "determine if vulnerability X is present in this blob"? Confirmation of suspected vulnerabilities is a different problem from finding vulnerabilities.
  - nikcub 1 hour ago
    Finding the neeedle is easier when you remove the haystack
    Or providing a map with a direction
    There is a long history of high-value private vulns being rediscovered from scant details
  - wiwiwq 1 hour ago
    To me it’s clear what’s going on.
    The American firms are focused on marketing now to convince people to not even consider open sourced models / open weight models as they are inferior (that’s what they want you to believe).
    [-]
    - rhubarbtree 1 hour ago
      IPO is coming is what is going on
      [-]
      - wiwiwq 1 hour ago
        That’s implicit in my post.
        If people actually believe the narrative then the bankers will over price Anthropic and get away with it.
        [-]
        0gs 37 minutes ago
        what's weirdest to me (and i agree with you) is that it could ALSO be true that a highly competently managed, highly capitalized closed source and weights model training on tons of real-world data non-stop COULD stay ahead of open weights models, and that lead COULD grow. now, how competent (much less merciless) the frontier-blazing U.S. corporations will be able to be long-term ... i suspect they are right to be nervous and highly focused on optics, regardless of the truth :)
- boston_clone 2 hours ago
  you would likely be quite interested in the more quantitative writeup from a real research team ! it’s linked about midway in to the article - similar functionally can be reached, yes, but not always and never with fewer tokens than what mythos requires.
  https://xbow.com/blog/mythos-offensive-security-xbow-evaluat...
  [-]
  - OsrsNeedsf2P 1 hour ago
    Ok this is actually a pretty good article and justifies the step function marketing in security they talked about
- enlightenedfool 1 hour ago
  Is this the God model that no one else can build? Unbelievable.
- arjie 1 hour ago
  The era where you could reputably believe things published by anyone on this front is over. If you want this information, you’re going to have to attempt it yourself with the Opus API. It is entirely possible that any released model access will be heavily guardrailed against hacking attempts and Mythos is just an unrailed model. It is entirely possible that Mythos is a different architecture or size. We can’t know from the outside.
  There is also a pretty big risk that anyone who is not you would leak the answer to the test. We are close to n=1 epistemics here. You’re going to have to do the research yourself.
0xAstro 1 hour ago
I had a fun day today where I had deepseek-v4-flash subagents work out patch for dirty frag for systems with AF_ALG disabled and nscd turned on, to gain root access. The original published exploit wasn't working but the patched one worked like a charm.
I am still a believer that a 100 subagents with good-enough intelligence can get same results as mythos, I am ready for this opinion to be shattered when I eventually try mythos and I believe others here must have tried mythos out too.
[-]
- lukeschlather 1 hour ago
  That's probably true, but when you're talking about 100 subagents you're talking about something that costs $100/hour to run, and Mythos takes $20k to find a vulnerability, so the question isn't "can dumber models conceivably do this?" It's, if running inference with Mythos to find an exploit costs 5000 GPU-hours per exploit, how many GPU-hours does it cost with a dumber model?
ayeeeeeeeeee 42 minutes ago
It would be informative to publish not only vulnerability numbers, but also vulnerability type statistics (as available here for example: https://cvedb.github.io/years.html), such that programmers can understand which types of exploits popular systems and languages commonly allow, and thereby encourage fundamental changes to fix or transition away from them.
bevekspldnw 44 minutes ago
How much of this is RL’ing a good coding model on every CVE ever?
[-]
- sometimelurker 13 minutes ago
  most it this comes from the pretrain imo. just scale + some RL = mythos
rsync 1 hour ago
I asked in a different thread:
Do we have a sense that projects like OpenBSD/OpenSSH, FreeBSD, ISC[1] and Apache were included in the "blessed" initial participants in Project Glasswing ?
Or is it big name tech companies, banks and fashionable languages and package managers ?
[1] Bind, DHCP
[-]
- ls612 48 minutes ago
  “Oi, you got a loicense to make secure software there?”
  I joke but that is the world we are moving towards. I don’t think many on HN have thought through the second and third order implications.
chopete3 1 hour ago
>> Next, we will work with critical partners—including US and allied governments—to expand Project Glasswing to additional partners.
That means, they intend to make a load of money before a general release. It is a good strategy.
mikmoila 37 minutes ago
Code contains deviations from assumed behaviour, and some behaviours might manifest themselves as failures. Some failures might be exploitable by attackers.
spullara 19 minutes ago
I'm going to code myself up a new minivan.
sandeepkd 23 minutes ago
> For instance, Cloudflare has found 2,000 bugs (400 of which are high- or critical-severity) across their critical-path systems, with a false positive rate that Cloudflare’s team considers better than human testers.
> For example, at one of our Glasswing partner banks, Mythos Preview helped to detect and prevent a fraudulent $1.5 million wire transfer after a threat actor compromised a customer’s email account and made spoof phone calls.
For some reason I am not able to relate to the concreteness of either of these.
First half of the page was occupied with a image, not sure if it was relevant in any ways other than setting up security scare. The size of code base, number of tokens, $ involved seem to be out of scope of the update for some reason. Personally I am getting skeptical about all these optics at this point, just some money printing scheme at high level.
antirez 1 hour ago
I have the feeling posts like that should be 1/4 the size, at max. At this point I don't care if it is AI-slop or human-slop: they are surprisingly alike. Information must be more dense, each sentence must carry some truth.
vincefutr23 1 hour ago
Mythos couldn’t find the “tens thousand” typo in this post?
kalashvasaniya 43 minutes ago
this is INSANEEE
mlazos 1 hour ago
I believe them to some degree but this trend of posting stuff when it can’t be verified actually needs to end. I’m so tired of this bs marketing.
ares623 1 hour ago
> good lord what is happening in there?!
> that's just thousands of vulnerabilities being discovered by our trillion parameter model
> thousands of vulnerabilities and trillions of parameters?! At current energy prices, in this economic climate, isolated entirely within your datacenter?
> yes
> may we see it?
> no
[-]
- pixl97 53 minutes ago
  I built a missile that can blow you up.
  >ya right.
  Here's a demonstration of it blowing something up.
  >can I have one.
  No.
orangebread 1 hour ago
BOOO RELEASE THE MODEL ALREADY GAWD
[-]
- guluarte 1 hour ago
  after IPO
amusingimpala75 2 hours ago
[edit: TFA addresses this, though I still find crazy 90% accuracy overall vs 20% accuracy for curl]
Is this suspected vulns or actual vulns? If I recall correctly, it produced 5 for curl but only 1 was legit
[-]
- Smaug123 1 hour ago
  > So far, Mythos Preview has found what it estimates are 6,202 high- or critical-severity vulnerabilities in these projects (out of 23,019 in total, including those it estimates as medium- or low-severity).
  > 1,752 of those high- or critical-rated vulnerabilities have now been carefully assessed by one of six independent security research firms, or in a small number of cases by ourselves. Of these, 90.6% (1,587) have proved to be valid true positives, and 62.4% (1,094) were confirmed as either high- or critical-severity. That means that even if Mythos Preview finds no further vulnerabilities, at our current post-triage true-positive rates, it’s on track to have surfaced nearly 3,900 high- or critical-severity vulnerabilities in open-source code
- extr 1 hour ago
  Did you RTFA?
- rbranson 1 hour ago
  I don't know why you're getting downvoted. This is exactly what was reported by curl's creator under the section "Five findings became one": https://daniel.haxx.se/blog/2026/05/11/mythos-finds-a-curl-v...
  [-]
  - Smaug123 1 hour ago
    I think it's more that the requested information is prominently featured in the article, and indeed is the content of the only graphic in the article below the intro banner.
  - the_mitsuhiko 57 minutes ago
    And yet [1]:
    > Not even half-way through this #curl release cycle we are already at 11 confirmed vulnerabilities - and there are three left in the queue to assess and new reports keep arriving at a pace of more than one/day.
    > 11 CVEs announced in a single release is our record from 2016 after the first-ever security audit (by Cure 53).
    > This is the most intense period in #curl that I can remember ever been through.
    [1]: https://www.linkedin.com/feed/update/urn:li:activity:7463481...
    [-]
    - hiharryhere 24 minutes ago
      He’s talking about AI scanning tools collectively, not specifically Mythos.
      If you read his own top comment on that LinkedIn post he clarifies:
      “The simple reason is: the (AI powered) tools are this good now. And people use these tools against curl source code.They find lots of new problems no one detected before. And none of these new ones used Mythos. Focusing on Mythos is a distraction - there are plenty of good models, and people who can figure out how to get those models and tools to find things.”
  - wiwiwq 1 hour ago
    [flagged]
- RamRodification 1 hour ago
  This is marketing. So probably suspected. Or somewhere in between.
InsideOutSanta 1 hour ago
I wonder if it coincidentally becomes safe to release when compute capacity bought from SpaceX will provide enough headroom to let a lot more people run it.
[-]
- lukeschlather 1 hour ago
  It seems like Mythos is often (or typically?) costing $20k per vulnerability, so I don't think there will be enough compute capacity in the world any time soon to let a lot more people use it the way Glasswing is using it. That is not to say I think they are exaggerating its capabilities. That $20k is presumably the rough cost of renting the GPUs, and there are not enough GPUs in the world.
  [-]
  - InsideOutSanta 54 minutes ago
    I'm not sure if current pricing correlates with actual compute cost.
  - why_only_15 1 hour ago
    what's the origin of your $20k/vuln estimate?
    [-]
    - gck1 3 minutes ago
      It's the same as the origin of "Codex/Opus subscription usage is heavily subsidized" - the sales departments equipped with AI agents with the prompt: "use anonymous accounts on the internet to make it easy for me to sell it at $price".
- sigmar 1 hour ago
  "available to qualifying customers’ security teams on request." Seems they're already expanding access.
- unethical_ban 29 minutes ago
  Total speculation: As the software world shakes out the many hidden vulns in their software, big AI will try to limit the access while it gets ironed out. Once the big projects/systems are reasonably patched after being vetted by SOTA models, the models will be released to the public. I don't think there's a scenario where Mythos-level or better models stay closed permanently.
- b65e8bee43c2ed0 1 hour ago
  stop noticing things, chud.
giancarlostoro 1 hour ago
> Since then, we and our approximately 50 partners have used Claude Mythos Preview to find more than ten thousand high- or critical-severity vulnerabilities across the most systemically important software in the world. Progress on software security used to be limited by how quickly we could find new vulnerabilities. Now it’s limited by how quickly we can verify, disclose, and patch the large numbers of vulnerabilities found by AI.
I guess they forgot to scan Visual Studio Code plugins and their endless npm dependencies.
[-]
- pixl97 1 hour ago
  I mean that's really a different issue.