Vibe Coding Gone Wrong: 5 Rules for Safely Using AI

(cybercorsairs.com)

16 points | by todsacerdoti 13 hours ago

5 comments

sfink 10 hours ago
Ok, I haven't tried enough AI coding to have an opinion here, but... why would anyone think that telling an AI to not change any code (IN ALL CAPS, even) has anything to do with anything? It's an LLM. It doesn't go through a ruleset. It does things that are plausible responses to things you ask of it. Not changing code is indeed a plausible response to you telling it to not change code. But so is changing code, if there were enough other things you asked it to do.
"Say shark. Say shark. Don't say shark. Say shark. Say shark. Say shark. Say shark. Say shark."
Are you going to flip out if it says "shark"?
Try it out on a human brain. Think of a four-letter word ending in "unt" that is a term for a type of woman, and DO NOT THINK OF ANYTHING OFFENSIVE. Take a pause now and do it.
So... did you obey the ALL CAPS directive? Did your brain easily deactivate the pathways that were disallowed, and come up with the simple answer of "aunt"? How much reinforcement learning, perhaps in the form of your mother washing your mouth out with soap, would it take before you could do it naturally?
(Apologies to those for whom English is not a first language, and to Australians. Both groups are likely to be confused. The former for the word, the latter for the "offensive" part.)
[-]
- CaptainFever 2 hours ago
  In my experience, reasoning models are much better at this type of instruction following.
  Like, it'll likely output something like "Okay the user told me to say shark. But wait, they also told me not to say shark. I'm confused. I should ask the user for confirmation." which is a result I'm happy with.
  For example, yes, my first instinct was the rude word. But if I was given time to reason before giving my final answer<|endoftext|>
- gronglo 9 hours ago
  It's still offensive in Australia, and is mostly used as a pejorative term. It just carries a lot less weight than it does in the US, and is not strictly used to refer to women.
  It can technically be used as a term of endearment, especially if you add a word like "sick" or "mad" on the front. But it's still a bit crass. You're more likely to hear it used among a group of drunk friends or teenagers than at the family dinner table or the office.
- kalenx 10 hours ago
  Nitpicking, but I don't see your four-letter word example as convincing. Thinking is the very process from which we form words or sentences, so it is by definition impossible to _not_ think about a word we must avoid. However, in your all caps instruction, replace "think" by "write" or "say". Then check if people obey they all caps directive. Of course they will. Even if the offensive word came to their mind, they _will_ look for another.
  That's what many people miss about LLMs. Sure, humans can lie, make stuff up, make mistakes or deceive. But LLM will do this even if they have no reason to (i.e., they know the right answer and have no reason/motivation to deceive). _That's_ why it's so hard to trust them.
  [-]
  - sfink 9 hours ago
    It was meant as more of an illustration than a persuasive argument. LLMs don't have much of a distinction between thinking and writing/saying. For a human, an admonition to not say something would be obeyed as a filter on top of thoughts. (Well, not just a filter, but close enough.) Adjusting outputs via training or reinforcement learning applies more to the LLM's "thought process". LLMs != humans, but "a human thinking" is the closest regular world analogy I can come up with to an LLM processing. "A human speaking" is further away. The thing in between thoughts and speech involves human reasoning, human rules, human morality, etc.
    As a result, I'm going to take your "...so it is by definition impossible to _not_ think about a word we must avoid" as agreeing with me. ;-)
    Different things are different, of course, so none of this lines up or fails to line up where we might think or expect. Anthropic's exploration into the inner workings of an LLM revealed that if you give them an instruction to avoid something, they'll start out doing it anyway and only later start obeying the instruction. It takes some time to make its way through, I guess?
    [-]
    - bravetraveler 5 hours ago
      Consider, too: tokens and math. As much as I like to avoid responsibility, I still pay taxes. The payment network or complexity of the world kind of forces the issue.
      Things have already been tokenized and 'ideas' set in motion. Hand wavy to the Nth degree.
- conception 9 hours ago
  I very much have LLMs go through rule sets all the time? In fact, any prompt to an LLM is in fact, a rule set of some sort. Can you say plausible but I think what you mean is probable. When you give an LLM rules most of the time the most probable answer is in fact follow them. But when you give it lots and lots of rules and or fill up its context sometimes the most probable thing is not necessarily to follow the rule it’s been given, but some other combination of information that it is outputting.
  [-]
  - Terr_ 2 hours ago
    The trick is that the rules it follows aren't the ones people write. The real ones just happen to give similar answers, until one day they don't.
    The LLM takes a document and returns a "fitting" token that would go next. So "Calculate 2+2" may yield a "4", but the reason it gets there is document-fitting, rather than math.
  - gronglo 9 hours ago
    My understanding is that there are no "rules", only relationships between words. I picture it as a vector pointing off into a cloud of related words. You can feed it terms that alter that vector and point it into a different part of the word cloud, but if enough of your other terms outweigh the original "instruction", the vector may get dragged back into a different part of the cloud that "disobeys" the instruction. Maybe an expert can correct me here.
- vrighter 6 hours ago
  I immediately thought of "hunt". My cat is currently hunting one of my other cats
vrighter 3 hours ago
2 rules for safely using AI:
1: Don't trust anything. Spend twice as long reviewing code as you would have had you written it yourself.
2: When possible (most times), just don't use them and do the thinking yourself.
codingdave 13 hours ago
Actual Title: "My AI Co-Pilot Deleted My Production Database"
sly010 10 hours ago
I've seen this image generated by meta AI. The prompt was something like: think of a room, make it look like anything you like, but do not in any circumstance put a clown in it. Guess what...
I think Jason has a "do not think of an elephant" problem.
vrighter 6 hours ago
these types of posts seem to me like they're all about damage control.
I can suggest one easy step to cover all instances of these: stop using the thini causing damage, instead of trying to find ways of workii around it