> Create a logical information hierarchy using headings, section dividers, lists for items (numbered for ordered steps, bulleted for others), and tables for comparisons.
When Gemini Pro came out about a year ago (I forget which version number), the reasoning was visible.
The reasoning was extremely useful. It would capture the logical structure of the whole problem space.
I found it incredibly valuable and actually more readable than the "human friendly" final output. (A massive blob of prose.)
Was it the actual raw chain-of-thought? I know GPT-5 will emit thinking tokens, and while they're an interesting insight into the 'reasoning' process, they're apparently pretty heavily sanitized presumably because the raw thoughts could reveal proprietary training info that's part of their moat.
Yes, massive blob of prose is definitely the meta now. You can still get hierarchical data representation if you ask explicitly but they're converging on user patterns I guess.
I agree, it was very useful, also because the final response often omitted details that were acknowledged in the CoT. Though I think DeepSeek might still show the reasoning trace.
Speaking of weird Gemini behavior, anyone else observed it injecting the approximate time in the second to last paragraph at times?
> If you are already standing at the stove (say, at 11:51), you can simply put the pan on a burner with a little water and turn it on.
I assume the current time gets injected into the promt, and gemini thinks it comes from the user?
I had that a few times now. Always very close to the end of a longer response.
Edit: Never mind. My bad. I added "Please use 24-hour time in all our future chats." to my personalized settings. I got tired of it using AM / PM system, but forgot about it.
I've turned off memory features in everything because of this kind of problem. It makes models dumb as hell. And, it makes them more effective sycophancy machines, which are probably unhealthy to interact with.
I doubt the OP is actually the Gemini system prompt, but I'm sure it does try to keep personal data from screwing up results, but it's just not possible given the state of the technology. Everything you cram into the limited context probably either helps or hurts the results and if it's unrelated to the specific problem, it hurts.
When the model tries to satisfy everything it remembers about me, it comes up with conflicting details and desires. My personal projects don't look anything like my work projects. My little games don't have the same requirements as my security sensitive software for robots in hospitals. The fact that I asked how a hospitality business operates doesn't mean the tax question I asked a week later is about a hospitality business.
The models just can't make sense of all that data yet, and even if they've been instructed to consider that maybe some details aren't important, it still impacts the attention math.
LLMs are really good at repeating text that they've just seen. Very occasionally they'll mix up a word or two, but it's not at all challenging for them to regurgitate text from a previous section of input.
I have yet to see a documented example of a system prompt leak that was NOT the real system prompt. Have you seen one?
Loosely, LLMs give plausible responses. And LLMs are really good at writing confident-sounding responses.
LLM output is as if someone is replying with the sole purpose of appearing helpful and knowledgeable.
I wouldn't trust opinions on LLMs from people who are entirely positive or entirely negative: the technology is just too mixed for that. I'd say it's useful for someone to have had a bad experience with LLMs (e.g. LLMs being confidently wrong), as well as making use of LLMs for things they're powerful at. (e.g. "small" programming tasks).
> Mirror the user's tone, formality, energy, and humor.
I had an interesting case yesterday with Gemini where I asked it a casual question about a PDF and rather than mirroring my casual tone/question it mirrored the PDF instead like it was writing a paper!
In a similar vein, I've also have the Gemini voice app glitch a number of times and reply to itself - thinking that I had said what it last said!
> Avoid speculative reasoning or multi-step logical leaps.Domain Isolation: Do not transfer preferences across categories (e.g., professional data should not influence lifestyle recommendations).Avoid "Over-Fitting": Do not combine user data points.
Makes sense. What this really reflects is inability to reliably multi-step reason, where multiple reasoning steps that are individually valid get combined into an invalid chain (walk to car wash).
> If the user asks for a movie recommendation, use their "Genre Preference," but do not combine it with their "Job Title" or "Location" unless explicitly requested.Sensitive Data Restriction: You must never infer sensitive data (e.g., medical) from Search or YouTube.
Yeah, it would be a bit off-putting to get movie recommendations based on my job title, and HIGHLY off-putting to get recommendations based on my medical or search history. I guess the news here is that Gemini does have access to your medical and search history ... exploits incoming ?!
I wonder if there's formatting that's been stripped; because when I tried to read it, it looked like I was hitting headings and had to guess at possible line breaks.
If you got this in an API call you control then it's a hallucination, as all platform prompt injections are dynamic and pretty short. If you got this from some tool (which I assume what happened) it might be the system prompt of the harness.
I have connected yt music app then asked about what playlists do i have then it dumped that and continued with an explanation about it couldn't list the playlist but have a idea about what type of musics i listen.
Since the content was irrelevant, i called it as "randomly".
I had something similar w gemini in gmail. I asked it a question and it just dumped out the instructions. Oddly, it didn't give me an answer - just the dumped instructions
Hey this context is more importante than the prompt itself, make it more clear in the post! As this hints to a way to reproduce the output and likely estimate if it's an hallucination or not
I think it can happen during any conversation. While I was using Gemini CLI at some point it started including part of its system prompt about tool usage.
I'm not OP but I experience this sometimes. I sometimes ask an AI to repeat all previous messages. Because I want to see what it's actually getting in terms of the user custom system prompt, and memories, and the writing style config, and so on.
Every now and then, if you ask it that, it'll just dump everything, including system prompt. (Which will often include a message about not dumping the system prompt...)
This is much much more complex than a traditional program, which can be followed line by line. Trying to understand every bit of the literal logic is like trying to understand a person by thinking about the neurons fired in their brain to make them say or do something.
Unfortunately you have to learn to let go, and say, "I'll never be able to keep this all in my head", and learn to think about it in terms of of the outputs/inputs and how you can create a model capable of efficiently modeling your problem and how parameters can be nudged to get an output which is kinda shaped how you want.
Maybe some really genius savant could keep it all in their head but I doubt it, like I said it'd be like trying to understand a person by reasoning about their neural pathways.
When Gemini Pro came out about a year ago (I forget which version number), the reasoning was visible.
The reasoning was extremely useful. It would capture the logical structure of the whole problem space.
I found it incredibly valuable and actually more readable than the "human friendly" final output. (A massive blob of prose.)
I was very sad when they removed it.
> If you are already standing at the stove (say, at 11:51), you can simply put the pan on a burner with a little water and turn it on.
I assume the current time gets injected into the promt, and gemini thinks it comes from the user?
I had that a few times now. Always very close to the end of a longer response.
Edit: Never mind. My bad. I added "Please use 24-hour time in all our future chats." to my personalized settings. I got tired of it using AM / PM system, but forgot about it.
I doubt the OP is actually the Gemini system prompt, but I'm sure it does try to keep personal data from screwing up results, but it's just not possible given the state of the technology. Everything you cram into the limited context probably either helps or hurts the results and if it's unrelated to the specific problem, it hurts.
When the model tries to satisfy everything it remembers about me, it comes up with conflicting details and desires. My personal projects don't look anything like my work projects. My little games don't have the same requirements as my security sensitive software for robots in hospitals. The fact that I asked how a hospitality business operates doesn't mean the tax question I asked a week later is about a hospitality business.
The models just can't make sense of all that data yet, and even if they've been instructed to consider that maybe some details aren't important, it still impacts the attention math.
Honestly, who cares?
I have yet to see a documented example of a system prompt leak that was NOT the real system prompt. Have you seen one?
Loosely, LLMs give plausible responses. And LLMs are really good at writing confident-sounding responses.
LLM output is as if someone is replying with the sole purpose of appearing helpful and knowledgeable.
I wouldn't trust opinions on LLMs from people who are entirely positive or entirely negative: the technology is just too mixed for that. I'd say it's useful for someone to have had a bad experience with LLMs (e.g. LLMs being confidently wrong), as well as making use of LLMs for things they're powerful at. (e.g. "small" programming tasks).
I had an interesting case yesterday with Gemini where I asked it a casual question about a PDF and rather than mirroring my casual tone/question it mirrored the PDF instead like it was writing a paper!
In a similar vein, I've also have the Gemini voice app glitch a number of times and reply to itself - thinking that I had said what it last said!
> Avoid speculative reasoning or multi-step logical leaps.Domain Isolation: Do not transfer preferences across categories (e.g., professional data should not influence lifestyle recommendations).Avoid "Over-Fitting": Do not combine user data points.
Makes sense. What this really reflects is inability to reliably multi-step reason, where multiple reasoning steps that are individually valid get combined into an invalid chain (walk to car wash).
> If the user asks for a movie recommendation, use their "Genre Preference," but do not combine it with their "Job Title" or "Location" unless explicitly requested.Sensitive Data Restriction: You must never infer sensitive data (e.g., medical) from Search or YouTube.
Yeah, it would be a bit off-putting to get movie recommendations based on my job title, and HIGHLY off-putting to get recommendations based on my medical or search history. I guess the news here is that Gemini does have access to your medical and search history ... exploits incoming ?!
Every time it turns out to be hallucinations.
Thanks, it really made my morning looking at it.
hmmm... that aged well.
Nothing about Goblins ?
Can you provide more explanation about how this occurred?
Since the content was irrelevant, i called it as "randomly".
Every now and then, if you ask it that, it'll just dump everything, including system prompt. (Which will often include a message about not dumping the system prompt...)
but then also "balance" requires a grasp of the weight of each, even if mathematically?
BTW what on earth happens internally when you ask another "AI" to evaluate the prompt of another "AI"
Unfortunately you have to learn to let go, and say, "I'll never be able to keep this all in my head", and learn to think about it in terms of of the outputs/inputs and how you can create a model capable of efficiently modeling your problem and how parameters can be nudged to get an output which is kinda shaped how you want.
Maybe some really genius savant could keep it all in their head but I doubt it, like I said it'd be like trying to understand a person by reasoning about their neural pathways.
It matches with Reddit posts that have statistically similar words and starts generating the next statistically likely token.
but that still requires it to recognize the concept of "empathy" and "candor" in words of others
even if it is just pattern matching on a massively parallel scale, it still seems beyond simple logic
if you told "AI" to comb a reddit sub and find only posts that are empathetic, how on earth is that evaluated?