Beyond Semantics: Unreasonable Effectiveness of Reasonless Intermediate Tokens

(arxiv.org)

138 points | by nyrikki 39 days ago

12 comments

valine 39 days ago
I think it’s helpful to remember that language models are not producing tokens, they are producing a distribution of possible next tokens. Just because your sampler picks a sequence of tokens that contain incorrect reasoning doesn't mean a useful reasoning trace isn’t also contained within the latent space.
It’s a misconception that transformers reason in token space. Tokens don’t attend to other tokens. High dimensional latents attend to other high dimensional latents. The final layer of a decoder only transformer has full access to entire latent space of all previous latents, the same latents you can project into a distribution of next tokens.
[-]
- woadwarrior01 39 days ago
  > Just because your sampler picks a sequence of tokens that contain incorrect reasoning doesn't mean a useful reasoning trace isn’t also contained within the latent space.
  That's essentially the core idea in Coconut[1][2], to keep the reasoning traces in a continuous space.
  [1]: https://arxiv.org/abs/2412.06769
  [2]: https://github.com/facebookresearch/coconut
- x_flynn 38 days ago
  What the model is doing in latent space is auxilliary to anthropomorphic interpretations of the tokens, though. And if the latent reasoning matches a ground-truth procedure (A*), then we'd expect it to be projectable to semantic tokens, but it isn't. So it seems the model has learned an alternative method for solving these problems.
  [-]
  - valine 38 days ago
    You’re thinking about this like the final layer of the model is all that exists. It’s highly likely reasoning is happening at a lower layer, in a different latent space that can’t natively be projected into logits.
  - refulgentis 38 days ago
    It is worth pointing out that "latent space" is meaningless.
    There's a lot of stuff that makes this hard to discuss, ex. "projectable to semantic tokens" you mean "able to be written down"...right?
    Something I do to make an idea really stretch its legs is reword it in Fat Tony, the Taleb character.
    Setting that aside, why do we think this path finding can't be written down?
    Is Claude/Gemini Plays Pokemon just an iterated A* search?
- aiiizzz 38 days ago
  Is that really true? E.g. anthropic said that the model can make decisions about all the tokens, before a single token is produced.
  [-]
  - valine 38 days ago
    That’s true yeah. The model can do that because calculating latents is independent of next token prediction. You do a forward pass for each token in your sequence without the final projection to logits.
- jacob019 39 days ago
  So you're saying that the reasoning trace represents sequential connections between the full distribution rather than the sampled tokens from that distribution?
  [-]
  - valine 39 days ago
    The lower dimensional logits are discarded, the original high dimensional latents are not.
    But yeah, the LLM doesn’t even know the sampler exists. I used the last layer as an example, but it’s likely that reasoning traces exist in the latent space of every layer not just the final one, with the most complex reasoning concentrated in the middle layers.
    [-]
    - jacob019 39 days ago
      I don't think that's accurate. The logits actually have high dimensionality, and they are intermediate outputs used to sample tokens. The latent representations contain contextual information and are also high-dimensional, but they serve a different role--they feed into the logits.
      [-]
      - valine 39 days ago
        The dimensionality I suppose depends on the vocab size and your hidden dimension size, but that’s not really relevant. It’s a single linear projection to go from latents to logits.
        Reasoning is definitely not happening in the linear projection to logits if that’s what you mean.
        [-]
        pyinstallwoes 38 days ago
        Where does it happen ?
        [-]
        valine 38 days ago
        My personal theory is that it’s an emergent property of many attention heads working together. If each attention head is a bird, reasoning would be the movement of the flock.
    - bcoates 39 days ago
      Either I'm wildly misunderstanding or that can't possibly be true--if you sample at high temperature and it chooses a very-low probability token, it continues consistent with the chosen token, not with the more likely ones
      [-]
      - valine 39 days ago
        Attention computes a weighted average of all previous latents. So yes, it’s a new token as input to the forward pass, but after it feeds through an attention head it contains a little bit of every previous latent.
ngruhn 39 days ago
Man that "Unreasonable Effectiveness of ..." pattern is getting a bit overused. With the original paper [1] you could still say that there really is some deeply philosophical mystery. But they now slap that on everything.
[1] https://en.m.wikipedia.org/wiki/The_Unreasonable_Effectivene...
[-]
- gipp 39 days ago
  Engineering blogger's love of parroting the titles of famous papers/articles (unreasonable effectiveness..., all you need is..., ... Considered harmful, etc) has always been lightly annoying to me
  [-]
  - jorvi 39 days ago
    With software engineering, every single thing in the 2010s had "syntactic sugar" and "sane defaults". I still get a slight blood pressure spike whenever someone uses either of those terms.
    [-]
    - nine_k 39 days ago
      Fine. "The unreasonable effectiveness of syntactic sugar considered harmful: all you need is sane defaults."
      Now, in comparison, nothing in this thread is going to annoy you!
    - joe_the_user 39 days ago
      I guess those are overused but at least they have some meaning. "Unreasonable Effectiveness..." is essentially pure meaninglessness.
      [-]
      - tim333 39 days ago
        It was meaningful in the original paper.
    - ruuda 39 days ago
      "modern"
  - airza 39 days ago
    It’s just not that common for the same person to have serious engineering chops and writing abilities.
    [-]
  - layer8 39 days ago
    All you need is for the unreasonable effectiveness of snowclones to be considered harmful.
- jvanderbot 39 days ago
  In this case it's probably more a pun (intentional or not I guess) about "reasonless" or "unreason"
- EGreg 39 days ago
  Would you go further, and say that Unreasonable Effectiveness… is considered harmful?
  [-]
  - kevindamm 39 days ago
    Indeed, considering unreasonable effectiveness harmful is all you need
    [-]
    - lareter77 39 days ago
      Thank you, I have had a few beers but that really made me laugh good.
- dkga 39 days ago
  TIL. I am not from an engineering/physics background so for me the original Unreasonable Effectiveness paper was Karpathy’s blog post about RNNs.
  [-]
  - godelski 39 days ago
    (Karpathy's might be more a call back to Halevy, Norvig, and Pereira's "The Unreasonable Effectiveness of Data"[0].)
    But I think is a good example that fits the OP's critique (I don't think the critique fits to the arXiv paper. Even though I expected the main results, see my main comment).
    The "unreasonableness" in Karpathy's post[1] is using sequencing to process non-sequential data. But the reason this isn't unreasonable is that we explicitly expect non-sequential processes to be able to be reformulated as sequential ones.
    The SVHN (hose numbers) he shows is actually a great example of this. We humans don't process that all at once. Our eyes similarly dart around, even if very fast. Or we might think about how to draw a picture. We don't do everything at once, but we work in sections, building up, and have layers that end up being ordered even though this technically isn't a requirement. I'm actually struggling to think of things that cannot be broken down into sequences. He says as much here
```
  | an important point to realize is that even if your inputs/outputs are fixed vectors, it is still possible to use this powerful formalism to process them in a sequential manner.
```
    So really the question is: what part of this was unreasonable? Or what part was unexpected? Honestly, we should be expecting this as the nature of neural nets is itself sequential, data being processed layer by layer. Hell, every computer program has a trace, which is sequential. I can give tons of examples. So it is quite reasonable that sequential processing should work.
    [0] https://static.googleusercontent.com/media/research.google.c...
    [1] https://karpathy.github.io/2015/05/21/rnn-effectiveness/
    [-]
- mensetmanusman 38 days ago
  All you need is unreasonable effectiveness.
- MeteorMarc 39 days ago
  What is not unreasonable about intermediate tokens without reason? See the abstract.
  [-]
  - godelski 39 days ago
    It's not "unreasonable" if you weren't anthropomorphizing COT, equating it to thinking or "internal dialogue." The results aren't surprising to people in this camp, but I also wouldn't say that makes the work less impactful.
    But it would also be more unreasonable to dismiss the fact that a significant portion of the research community (and even greater portion of the public) was operating under these beliefs: that COT was akin to thinking (it's literally there in the name...). It is possible to disagree with something but also not believe someone is being unreasonable by coming to different conclusions.
- godelski 39 days ago
  It's also worth pointing out that Winger's (position) paper[0] is really about something that would sound silly today. He's arguing that we should use math to drive physics. Today, many people think these are indistinguishable things and you'll get into arguments about math being invented or discovered. But Winger is talking about how mathematics provides us with a framework where we can drive physics forward through theory instead of relying purely upon experimentation to poke and prod the universe.
  It is rather "unreasonable" to think we can explore the world simply through pen and paper, from the comfort of a chair. You'd think you'd need to go out and touch grass, but incredibly this is not necessary.
```
  | The first point is that the enormous usefulness of mathematics in the natural sciences is something bordering on the mysterious and that there is no rational explanation for it. Second, it is just this uncanny usefulness of mathematical concepts that raises the question of the uniqueness of our physical theories. 
```
  Which is exactly why a lot of these other things are overused. Hamming's seems like an extension or corollary[1] and I even think Norvig's (Halevy's) is highly appropriate[2]. It is "unreasonable" to think these things would be effective.
```
  -------------------------------------
```
  With this paper?
  I think is fine. It is being used in a similar way to Winger, with similar context.
  I can see two camps. One has always interpreted the COT as analogous to a model's internal dialogue. While the other has always thought there's a much larger gap between the manipulations within latent representations and what has been decoded, not necessarily needing be strongly aligned.[3] To the former, the results here would be shocking, while to the latter it is "yes, and?" Clearly they're addressing the former camp. There were plenty of people that Winger did not need to convince.
  I'm of the latter camp[4], and I'm happy people are not just asserting and are demonstrating. Honestly, I'm even frequently upset when works get dismissed because they "demonstrate something we already knew" but no one had ever actually demonstrated. The proofs and evidencing is more important than the answer. Quite often we're highly certain about results but they are difficult to even evidence (let alone prove). I mean it would be quite silly to dismiss a proof that P != NP, even though the vast majority of us have long been convinced that this is the relationship we'll end up with. Yet, no one's done it.
```
  -------------------------------------
```
  [0] https://web.archive.org/web/20210212111540/http://www.dartmo...
  [1] https://math.dartmouth.edu/~matc/MathDrama/reading/Hamming.h...
  [2] https://static.googleusercontent.com/media/research.google.c...
  [3] Both camps can be further broken down too. Lots of nuances and opinions here and the lines really get fuzzy as we try to make it more accurate. I don't want to pretend there's a hard defining line, but the distinction helps the discussion and I think is reasonably accurate enough. Let me know if you think it is a gross mischaracterization.
  [4] I can expand more why this side seems "obvious" to me. But a warning, you can probably guess I'm not good at being terse.
  [Note]: I'd even go so far as say we should revisit Winger's argument around AI. I'm certain mathematics can be and will be "unreasonably effective." But not enough time has been dedicated to formulate the right type of math to use. We really do have to invent a new kind here. This may sound weird to non-mathematicians, but even physics uses multiple kinds of mathematics. The operations, fields, and algebras you use in one part may not be appropriate in another part. That's okay. But we don't have a TOE yet either, and that's a critical part of finding a TOE, is bringing all this together.
  [-]
  - tim333 39 days ago
    >It's also worth pointing out that Winger's (position) paper[0] is really about something that would sound silly today. He's arguing that we should use math to drive physics.
    I think you misinterpret what it's about. He's pointing out how remarkable it is that the universe obeys laws like E=MC^2 exactly as far as we can tell which is not something you would probably expect just from looking at the world. The pre scientific understanding of the world was it was driven my gods and spirits. The mathematical laws were only discovered by scientific investigation.
    Or as he puts it:
    >The miracle of the appropriateness of the language of mathematics for the formulation of the laws of physics is a wonderful gift which we neither understand nor deserve.
    If he was just saying use maths it would be boring and not a famous paper 65 years on.
    [-]
    - godelski 39 days ago
      > I think you misinterpret what it's about. He's pointing out how remarkable it is that the universe obeys laws like...
      I apologize for not being clear. But we are talking about the same thing.
      > The pre scientific understanding of the world was it was driven my gods and spirits.
      Winger's paper was written in 1960. I do not think such claims need have been said. Those arguments were prolific and had been made for centuries. He did not need to convince anyone in the scientific community that the laws of nature were not driven by gods and spirits. By the 1960s the scientific age was already mature and it was well established in the community that the laws of nature are not the domain of gospel.
      | "Well, now you are pushing your joke too far," said the classmate, "surely the population has nothing to do with the circumference of the circle."
      The point is made here. It is surprising that math describes reality. It is surprising that a circle has anything to do with a population.
      I really did mean it when I said "about something that would sound silly today". We take this for granted now, with 60 years of working under this framework, but this wasn't always so. It seems silly now because much of the math we learn is in science classes and even outside we have a particular focus of teaching math most relating to science, but this is a small portion of a much larger field. Fwiw, I am not saying this as a complete outsider, I have a degree in physics.
      It is also worth paying attention to the fact that Wigner helped create Mathematical Physics[0]. "Mathematical Physics" is not a pleonasm.
      Don't take it just on my word! The Wiki page says something extremely similar!
      | In it, Wigner observes that a theoretical physics's mathematical structure often points the way to further advances in that theory and to empirical predictions. Mathematical theories often have predictive power in describing nature. [1] | Wigner argues that mathematical concepts have applicability far beyond the context in which they were originally developed[1] > The mathematical laws were only discovered by scientific investigation.
      I should make sure this is clear though (unsure which interpretation you intend). Math and science aren't interchangeable. Physics uses the language of math as its main method for developing theories and logic. But it is also important to stress that it doesn't use the same mathematical language throughout. The frameworks that those working in relativity use are not useful for those that work in quantum mechanics. If the math was uniform, we would not be dedicating so much time to bridge these. Nor is math absolute here, as it is a map, and we still rely heavily on the language of experimental evidence.
      Yes, he was saying "use maths". Yes, it sounds silly today, but so do a lot of things that needed be said in the past. I see no reason that the (now) obvious claim by Copernicus would make him any less famous.
      [0] https://en.wikipedia.org/wiki/Mathematical_physics
      [1] https://en.wikipedia.org/wiki/The_Unreasonable_Effectiveness...
      [-]
      - tim333 38 days ago
        I think we agree on all the facts but I disagree on what his message was and whether it sounds silly these days, which is a matter of opinion I guess.
        You have "use maths" which sounds silly, I take it as "the appropriateness of maths is a miracle we don't understand" which is deeper and still largely true.
        [-]
        godelski 38 days ago
        To change your mind, what would I need to demonstrate? I'm providing third party sources, I have personal experience, I can quote from the original source. What is missing that results in being unconvincing? I want to make sure, there are ways to change your opinion, right?
        I really encourage you to read that wiki page.
        | The quantum theory of the Lamb shift, as conceived by Bethe and established by Schwinger, is a purely mathematical theory and the only direct contribution of experiment was to show the existence of a measurable effect. The agreement with calculation is better than one part in a thousand."
        I think you're missing a lot of context in that physics was highly non-mathematical in the past. Physicists called Einstein a mathematician. It isn't too hard to see when he asserted that his theories were correct and didn't need experimental confirmation.
        | Hamming argues that Albert Einstein's pioneering work on special relativity was largely "scholastic" in its approach. He knew from the outset what the theory should look like (although he only knew this because of the Michelson–Morley experiment), and explored candidate theories with mathematical tools, not actual experiments. Hamming alleges that Einstein was so confident that his relativity theories were correct that the outcomes of observations designed to test them did not much interest him. If the observations were inconsistent with his theories, it would be the observations that were at fault.
        Hell, go read Ian Hacking, any metaphysics, or ask ChatGPT. They will confirm what I'm saying. Even some of this is discussed in An Opinionated History of Mathematics[0], though much more focused on math. I'm more mentioning it because it is good and helps provide some of that historical context.
        It is kinda crazy that a thing we created, without the specific intent of modeling the world, ended up being so great at modeling the world. That's the unreasonable effectiveness.
        In fairness, to change my opinion, you would need to show me some chain of reasoning or a conversation Wigner is clearly responding to that involves religion. Because this is what I see, but around math not being physics, and is what drives my interpretation.
        [0] https://intellectualmathematics.com/opinionated-history-of-m...
modeless 39 days ago
> we then train models on noisy, corrupted traces which have no relation to the specific problem each is paired with, and find that not only does performance remain largely consistent with models trained on correct data, but in some cases can improve upon it
This is the interesting part. We've probably all had the experience where the model is going off the rails during the thinking process but somehow spits out the right answer at the end. Apparently the reasoning doesn't even need to be correct during training?
I guess it suggests to me that the reason CoT helps is that the model gets more compute to think internally, not that the words it produces are meaningful. I'm surprised nobody has come up with a good scheme for adaptive compute per token yet. Maybe we can skip CoT entirely.
[-]
- thomastjeffery 39 days ago
  That sounds to me more like evidence that an LLM is never reasoning at all, even when it looks like it is.
  The mock conversation that is written between think tags is not a conversation. It's the collection of tokens that are most likely to be written after a prompt to a model that was trained on example conversations.
  Why is that different? In a real conversation, participants use logic to choose what is worth saying next. The next statement is already determined in the speaker's mind to be logically sound. In a mock conversation (the LLM's CoT), there is no logic. The next statement is only determined to be statistically familiar, then written immediately.
  The end result of a desirable CoT interaction is text that would have been written by a thoughtful/logical conversationalist. Whether or not the mock conversation itself is logically consistent with the mock conclusion is irrelevant, because the LLM is only concerned with how familiar that mock conclusion is to the prompt, its mock conversation, and its training.
  The overall vibe of how something is written behaves as a replacement for actual logic. Logical deduction is replaced with measures of confidence, conversations turns, etc. in writing style. It all works out in the end, because we are so consistent with the style in which we write real logical deductions, we have ended up providing an invisible semantics for the LLM to follow.
  There is something meaningful that we are entirely blind to. Unfortunately, it doesn't follow rules the way logic does, so it's not a trustworthy replacement. Fortunately, it's useful for more general exploration.
- trehalose 39 days ago
  > We've probably all had the experience where the model is going off the rails during the thinking process but somehow spits out the right answer at the end. Apparently the reasoning doesn't even need to be correct during training?
  How do we know if the reasoning was correct or not? Do we have more information about what the model was thinking besides just what it says it was thinking?
  [-]
  - rickyhatespeas 39 days ago
    It's definitely not explicitly writing out everything it's "thinking" if you are considering all dimensions of the latent space that are connected, that can't really be exhibited with a sentence.
    CoT builds on existing prompt engineering techniques by adding it to reinforcement learning to force the models to build their own CoT prompt essentially. So it's not what it's thinking but all indications are that it does guide the reasoning abilities of LLMs through the output distribution.
  - modeless 38 days ago
    During fine tuning the model does not produce reasoning traces, it consumes them. And the researchers presented it with traces deliberately constructed to be wrong except for the answer at the end. That's easy enough to do.
- AlexCoventry 39 days ago
  No, the words are meaningful to it. It's effectively using the CoT text as a "scratch space" for intermediate steps it can't calculate on one iteration through the transformer. These papers give examples of how it works:
  - https://physics.allen-zhu.com/part-2-grade-school-math/part-...
  - https://physics.allen-zhu.com/part-3-knowledge/part-3-3
  [-]
  - modeless 39 days ago
    I mean, this theory is directly contradicted by the paper under discussion. If you want to assert this then you need to be arguing why the paper is wrong.
- kelseyfrog 39 days ago
  > I'm surprised nobody has come up with a good scheme for adaptive compute per token yet.
  I have one, I just don't have the time or money to research it :(
  [-]
  - golol 39 days ago
    Post it let's go.
- x_flynn 38 days ago
  I like to think of the intermediate tokens as low-dimensional hidden states. Also see the Coconut paper/citation
- istjohn 39 days ago
  Uh... hmmm... uhhh... ummm...
nullc 39 days ago
Even when you train AI on human language, the tokens can have "subtext" that is only legible to the AI. And, unfortunately, it's not even legible to the AI in ways that it could ever explain it to us.
It's no different than how in English we can signal that a statement is related to a kind of politics or that it's about sex through particular word and phrase choice.
Training for reasoning should be expected to amplify the subtext, since any random noise in the selection that by chance is correlated with the right results will get amplified.
Perhaps you could try to dampen this by training two distinct models for a while, then swap their reasoning for a while before going back-- but sadly distinct models may still end up with similar subtexts due to correlations in their training data. Maybe ones with very distinct tokenization would be less likely to do so.
[-]
- nihakue 39 days ago
  This is such a bonkers line of thinking, I'm so intrigued. So a particular model will have an entire 'culture' only available or understandable to itself. Seems kind of lonely. Like some symbols might activate together for reasons that are totally incomprehensible to us, but make perfect sense to the model. I wonder if an approach like the one in https://www.anthropic.com/research/tracing-thoughts-language... could ever give us insight into any 'inside jokes' present in the model.
  I hope that research into understanding LLM qualia eventually allow us to understand e.g. what it's like to [be a bat](https://en.wikipedia.org/wiki/What_Is_It_Like_to_Be_a_Bat%3F)
  [-]
  - nullc 39 days ago
    In some sense it's more human than a model trained with no RL and which has absolutely no exposure to its own output.
    We have our own personal 'culture' too-- it's just less obvious because its tied up with our own hidden state. If you go back and read old essays that you wrote you might notice some of it-- that ideas and feelings (maybe smells?) that are absolutely not explicitly in the text immediately come back to you, stuff that no one or maybe only a spouse or very close friend might think.
    I think it may be very hard to explore hidden subtext because the signals may be almost arbitrarily weak and context dependent. The bare model may need only a little nudge to get to the right answer and the you have this big wall of "reasoning" where each token could carry very small amounts of subtext that cumulatively add up to a lot and push things in the right direction.
- candiddevmike 39 days ago
  IMO this is why natural language will always be a terrible _interface_--because English is a terrible _language_ where words can have wildly different meanings that change over time. There's no ambiguity with intentions with traditional UX (or even programming languages).
  [-]
  - nullc 39 days ago
    It can happen more or less no matter what language the model uses, so long as its reinforcement trained. It's just in English we have an illusion of thinking we understand the meaning.
    An example of this is toki pona, a minimalist constructed human language that is designed to only express "positive thinking". Yet it is extremely easy to insult people in toki pona: e.g. sina toki li pona pona pona pona. (you are speaking very very very very well).
    To be free of a potential subtext sidechannel there can be essentially no equivalent outputs.
    [-]
    - pona-a 39 days ago
      Can't you just say "sina toki ike suli a." (you are speaking very bad <exclamation>)? Just because it doesn't have official swearwords like most natural languages doesn't mean you can only express "positive thinking".
      [-]
      - nullc 39 days ago
        My mistake, in the future I'll refrain from using Toki pona for making a rhetorical point. :)
timhigins 39 days ago
This paper seems to focus on highly algorithmic/puzzle-like problems, which are not the typical application domain of LLMs, using a <500M parameter model. So my hunch is "reasoning" works much better for math, coding, factual recall, and writing tasks that most LLMs actually deal with.
imtringued 37 days ago
What this paper tells us is that the token itself is meaningless. When the model is "thinking", it is storing hidden information in the logits. Most of that information is discarded as a result of token sampling. This is the same reason why knowledge distillation (training on the logits) works, but training on LLM produced text does not.
So the solution to this is to build a multi modal LLM that accepts both text and raw logits as input.
You query a model to produce the reasoning trace, then you feed the reasoning trace's logits into a second model query to produce the answer.
These steps have to be done separately, because there is no training data that contains the logits. You can only parallelize training when it is done on known input + output data.
theptip 39 days ago
So is the interpretation here something like “CoT tokens are actually neuraleese”? They do boost performance, so the model must be stashing some intermediate reasoning outputs there. But perhaps not using the literal human meaning of those tokens?
[-]
- x_flynn 38 days ago
  Exactly, the traces lack semantics and shouldn't be anthropomorphized. (I'm one of the students in the lab that wrote this, but not one of the authors)
  [-]
  - theptip 38 days ago
    Thanks! So, how does this impact Deliberative Alignment[1], where IIUC the intermediate tokens are assessed (eg for referencing the appropriate policy fragment)?
    Does you see your result as putting that paradigm in question, or does the explicit reasoning assessment perhaps ameliorate the issue?
    [1]: https://arxiv.org/html/2412.16339v2
r0ze-at-hn 38 days ago
Very much related to this is "Chain-of-draft"
https://arxiv.org/abs/2502.18600
Similar level of results in a fraction of the tokens resulting in similar quality for less cost for longer runs.
But also when interacting and needing to read the token responses I can read shorter responses way faster so my own speed is faster.
naasking 39 days ago
I wonder if this finding would hold for something like Meta's Large Concept Models.
throwawaymaths 39 days ago
why is it unreasonable that giving the llm a spot to think and collate long range attention and summarize without the pressure of building a meaningful next token so quickly would result in higher effectiveness?
[-]
- x_flynn 38 days ago
  It's more about the lack of semantic meaning in the intermediate tokens, not that they aren't effective (even when the intermediates are wrong)
meltyness 38 days ago
Brought to you by lightspeed briefs