CRDT and SQLite: Local-First Value Synchronization

(marcobambini.substack.com)

72 points | by marcobambini 4 days ago

5 comments

philsnow 16 hours ago
We shouldn't be surprised because the writer works with both sqlite and AI but
> Here’s a polished section you can insert into your article (it fits naturally after the Sync Phase section):
[-]
- marcobambini 9 hours ago
  I sincerely apologize for that. I am not a native English speaker, so I always use LLM to polish my articles before publishing.
hahn-kev 12 hours ago
My problem with this kind of design is that you can't really use any relational constraints. Or constraints between columns in a given table because each column is independently merged
[-]
- canadiantim 8 hours ago
  I wonder if a columnar database like DuckDB might be better suited for CRDT Local-first solutions, using batched writes to mitigate
briandw 17 hours ago
For a primer on CRDTs, Martin Kleppmann has a number of good videos: https://www.youtube.com/watch?v=x7drE24geUw
withinboredom 17 hours ago
This works assuming everyone has the same clock or performs changes causually distant from each other. It fails to work if, say, 1000 people all make a change around the same time. This also applies to lamport timestamps.
[-]
- p1necone 15 hours ago
  If a thousand people all made a change at the same time in a totally deterministic, always online system a single one of those writes would arbitrarily win in exactly the same way.
  In practice "1000 people edit same thing at same time" is not a problem that needs to be solved via software, the users are just doing silly things and getting silly results.
  [-]
  - withinboredom 15 hours ago
    If it isn’t handled correctly, you’ll eventually end up with parallel histories on different devices. Even if it isn’t 1000 people, people will share documents with entire classrooms, offices, etc., which increases the probability of this situation tremendously.
    [-]
    - jchanimal 13 hours ago
      We handle this in Fireproof with a deterministic default algorithm, in addition to having a hash-based tamperproof ledger of changes. Fireproof is not SQL based, it is more like CouchDB or MongoDB, but with cryptographic integrity. Apache 2.0 https://use-fireproof.com
      In practice during CouchDB's heyday, with lots of heavy users, the conflict management API almost never mattered, as most people can make do with deterministic merges.
      [-]
    - ncruces 14 hours ago
      CRDTs only care that the end result is eventually the same.
      It doesn't need to make sense, or be the most recent change, only that given the same inputs, everyone independently agrees on the same output.
      [-]
      - withinboredom 14 hours ago
        We are saying the same thing. I was pointing out that the article missed one of the hardest parts of actually implementing this, where your algorithm architecture can totally fuck you over if you didn’t plan for it. I just think it’s interesting that they missed pointing it out. Either they got it right on the first try or they haven’t realized the issue with the schema they’re proposing.
- tombert 11 hours ago
  Yeah, I implemented a vector clock a few years ago, and I never really found an elegant way to deal with conflicts like this. My very-much-inelegant solution was every item attached an epoch time in milliseconds which was used in a tiebreaker, and if both timestamps were the same I would hash something and choose the smaller one of those.
  It seems wrong to rely on NTP for a distributed system like this, but I couldn't really figure out a better way at the time.
  [-]
  - withinboredom 6 hours ago
    The most elegant solutions is to look at Lamport’s other papers, like Paxos or their derivatives. Tie-breaking doesn’t actually happen at the clock level, but at the conflict resolution level, which is a bit higher. IIRC, paxos traditionally uses the node id as the tie-breaker, making leadership deterministic in the face of conflicts.
    Though in all honesty, NTP is mostly fine for datacenter deployments where clocks are usually within nanoseconds of each other, so you can use a timestamp with microsecond precision and probably be fine.
- marcobambini 16 hours ago
  The algorithm has a way to resolve conflicts even if, by any chance, the Lamport clock has the same value in all peers
  [-]
  - withinboredom 16 hours ago
    Yeah, but the fact that they didn’t even mention it in their post is why I brought it up.
what 8 hours ago
Isn’t this just vlcn’s crsql?