If all general LLM are eventually exposed to the same data, and a lot of the same use cases will they over time converge in responses?
Even if they are of different arcitecture? or are the current architecture companies use for their big LLM close enough to each other?
Given that AI models are randomly initialized with noise, and the goal of training is to avoid overfit, there will always be variance between the weights of models, even if trained from the same data, due to those initial conditions, and chaos theory.
And all of the above, is for the same model architecture. I expect you could do some principle component analysis and come up with a transform to work between models, again if they were overfit to zero error. (After all, that would be a compression engine instead of an AI at that point)
Upon reflection, it seems to me that free Stanford AI course I took a decade ago actually stuck. 8)
Intelligence is a model of reality and the future. They'll converge into the same system as a reflection of the laws of physics and human psychology.
And then when they are used as weapons they'll perhaps try to diverge and it will become an arms race to create models of the adversaries models.
_
Another way to look at it is our own history. Intelligent apes all "converged" into our one homo sapien.
However, the RL and especially the RLHF does a lot to reshape the responses, and that's potentially a lot more varied. For the training that wasn't just cribbed from ChatGPT, anyway.
Lastly, it's unlikely that you'll get the _exact same_ responses; there's too many variables at inference time alone. And as for training, we can fingerprint models by their vocabulary to a certain extent. So in practical terms there's probably always going to be some differences.
This assumes our current training approaches don't change too drastically, of course.
I could be wrong, but I have not heard a convincing argument for what you propose.
There are a zillion questions that can be asked where you can get a prob dist where multiple tokens have the same probability (flat probability distributions). Then it has to randomly pick one and you can get large variation.
Just treat it like a commodity (like cloud infrastructure) and build cool shit using it.
If the provider can roll that feature into their offerings then you’re not actually adding any value to the world.
https://arxiv.org/abs/2405.07987
We argue that representations in AI models, particularly deep networks, are converging. First, we survey many examples of convergence in the literature: over time and across multiple domains, the ways by which different neural networks represent data are becoming more aligned. Next, we demonstrate convergence across data modalities: as vision models and language models get larger, they measure distance between datapoints in a more and more alike way. We hypothesize that this convergence is driving toward a shared statistical model of reality, akin to Plato's concept of an ideal reality. We term such a representation the platonic representation and discuss several possible selective pressures toward it. Finally, we discuss the implications of these trends, their limitations, and counterexamples to our analysis.