That said, I think most coders just can't deal with it. For reasons I won't go into, I came to fdb already fully aware of the compromises that software transactional memories have, and fdb roughly matches the semantics of those: retry on failure, a maximum transaction size, a maximum transaction time, and so on. For those who haven't used it, start here: https://apple.github.io/foundationdb/developer-guide.html ; especially the section on transactions.
These constraints _very_ inconvenient for many kinds of applications so, ok, you'd like a wrapper library that handles them gracefully and hides the details (for example count of range).
This seems like it should be easy to do - after all, the expectation is that _application developers_ do it directly - but it isn't actually so in practice and introduces a layering violation into the data modeling if you have any part of your application doing direct key access. I recommend people try it. It can surely be done, but that layer is now as critical as the DB itself, and that has interesting risks.
At heart, the problem is, the limits are low enough that normal applications can and do run into them, and they are annoying. It would be really nice if the FDB team would build this next layer themselves with the same degree of testing but they themselves have not, and I think it's pretty clear that it turns out a small-transaction KV store is not enough to build complex layers in actuality.
Emphasis on the tested part - it's all well and good for fdb to be rock solid, but what needs to be there is that the actual interfact used by 90% of applications is rock solid, and if you exceed basic small-size keys or time, that isn't really true.
- If you’re a developer wanting to build an application, you should really use a well designed layer between yourself and FDB. A few are out there.
- If you’re a dev thinking you want to build a database from scratch you probably should just use FDB as the storage engine and work on the other parts. To start, at very least!
(One last thing that I think is a bit overlooked with FDB is how easy it is to build basically any data structure you can in memory in FDB instead. Not that it solves the transaction timeout stuff, etc. but if you want to build skip list, or a quad tree, or a vector store, or whatever else, you can quite easily use FDB to build a durable, distributed, transactional, multi-user version of the same. You don’t have to stick to just boring tables.)
We did use fdb to backend more complex data structures (b+ and a kind of skiplist) and it's very cool. fdb basically presents the model of a software transactional memory and it's kind of wonderful, but it's not wonderful enough.
Another issue that I forgot to mention is that comprehensibility of keys is your own problem. Keys and values are just bytes - if you don't start day one with a shared envelope for all writers, you _will_ be in pain eventually. This can get kind of ugly if you somehow end up leaking keys that you can't identify.
All I could find is https://github.com/FoundationDB/awesome-foundationdb#layers - not sure how complete and up-to-date that list is.
fdb transactions are real transactions. These aren't batches.
They just aren't the same thing. It’s like comparing a binary tree to json. If you squint you can see how they could be similar but really aren’t.
FDB provides "layers", such as the Record layer. It helps map data to keys and values. But a more sophisticated solution that I sometimes wish would take off is this library:
It's a small(ish) open source project that implements an ORM-like API but significantly cleaned up, and it can run on any K/V backend. There's an FDB plugin for it, so you can connect your app directly to an FDB cluster using it. And with that you get built-in indexing, derived data, triggers, you can do queries using the Java collections API, there's a CLI, there's an API for making GUIs and everything else you might need for a business CRUD app. It's got a paper of its own and is quite impressive.
There are a few big gaps vs an RDBMS though:
1. There's no query planner. You write your own plans by using functional maps/filters/folds etc in regular Java (or Kotlin or Python or any other language that can run on the JVM).
2. It's weak on analytics, because there's no access control and the ad-hoc query language is less convenient than SQL.
3. There's no network protocol other than FDB itself, which assumes low latency networks. So if there's a big distance between the user generating the queries and the servers, you have a problem and will need to introduce an app specific protocol (or move the code).
* and the only previous related submission appears to be https://news.ycombinator.com/item?id=21646037.
more info: https://blog.senx.io/introducing-warp-10-3-0/
fdb is best when your workload is pretty well-defined and will stay such for a decade or so. It is not usually the case for new products which evolve fast. Two most famous installations of fdb are iTunes and Snowflake metadata. When you rewrite petabyte-size database in fdb, you transform continuous SRE/devops opex costs into developers capex investment. It comes with reduced risks for occasional data loss. For me it's mostly a financial decision, not really a technical one.
Would you mind expanding/educating me on this point? When I think of capex I think of “purchasing a thing that’s depreciated over a time window”. If you’d said “transform SRE/COGS costs into developer/R&D/opex costs” I would’ve understood, but eventually the thing leaves development and goes back into COGS.
See my other message for the developer issues, though. IMHO fdb as it is today is too hard for most developers if their use case is anything beyond redis simple keys.
- developer time is approximately fungible with money
- project delivery is building a thing, that you own, and that has value, and that you will use to produce other value...
- ...which can therefore be entered on the balance sheet.
I've just left a company a little after it floated. In the run-up to the float, we were directed to maximise our capital time logged. That meant any kind of feature delivery. Bugfixes were opex.
I believe this was done to grow the balance sheet and maximise market cap.
Would love to hear from anyone with experience in fdb whether these assumptions hold.
Worst case if there is heavy contention on the same keys then resolvers will eventually fail more transaction writes but for read-only transactions most applications should be fine with a slightly "old" version.
(Yes, all this will start to cause down if there is high key contention and many conflicts)
The reason external consistency is nice is that if you change the database, as soon as you get a commit signal, you can tell any other client "hey, go check the database and see what I did" and they will see the changes. No worries about whether the changes have sync'ed yet or anything like that.
We are trying out a version of what CockroachDB does - run transaction A possibly on stale data, spy on the things it reads, call our flavor of ReadVersion for the "shards" the transaction interacted with, and re-run transaction A with the data versions we got from ReadVersion. It will usually just return after the the first run, rarely requiring a second run, and in some pathological cases require infinite runs.
MTTR on failure is seconds. Really, there's no system I've used that is as robust and performant as fdb and I include s3 in that list - s3, for example, _routinely_ has operations with orders of magnitude latency variance and huge, correlated spikes.
Demystified a lot about FDB for me.
> ”Summary: FDB is probably the best k/v store for regional deployment out there.”
Why should someone use Memcache or Redis then?
Is it for the data types in Redis?
Memcache is a cache. Fdb is a an ordered kv store.
Very much seems like an acceptable use now
It's very slow, but if you really want to wait for fsync before replying, it can do that.
Thanks for the correction.
Memcache on the other habd is just solid and mature. It also has some inertia as being a solid k/v cache. For example: NextCloud supports afaik both Redis and Memcache as caching engines but doesn't have FDB support.
- does not have a persistent/disk-backed state
- It is a singleton process
- it and only it does order, no logs do ordering
... if the singleton sequencer crashes, I do not see on this high level description how the system recovers, if the sequencer is the only one that knows write order but has no persistent write "log".
What am I missing?
This... does not appear to be something you run outside of a dedicated datacenter, AWS with its awful networking and slow/silently throttling storage would probably muck this thing up under any substantive scale?
The reason it can fail without a correctness issue is that it can just reject all transactions in flight for the clients to retry. This is something the clients need to be prepared to do anyway because of optimistic concurrency.
It can run fine on AWS. Upon a failure, the sequencer role is very fast to re-elect onto another machine in the cluster because there is no persistent state at all.
I am running this set up in my dev (personal) environment on AWS.
Wait a minute, I know that formula... Viewstamped Replication?? I need to read the foundation DB paper. (I mainly read CRDT stuff so hopefully it's understandable).
In general I'm really impressed foundation DB folks.
The talk "Testing Distributed Systems w/ Deterministic Simulation" by Will Wilson blew my mind. TL;DR they spent the majority of their initial dev effort into making a simulation of the database, then when they were happy with that plugged in real storage, time and networks at the end. Well worth a watch for anyone interested in distributed systems & reliability.
It makes sense if you think about it: these systems follow a leader/replica model, and naturally you only need one leader to make progress
I thought "these systems follow a leader/replica model" would be the former, but "f failures with f+1 replicas" the latter.
f failures with f+1 replicas is a cluster size of n replicas can sustain n-1 failures. n=f+1 or f=n-1. You wanna be able to sustain f failures, you need a cluster size (n) of f+1.
When there is a failure, a non-failing node becomes the leader (or there's no leader change if the current leader isn't the one that failed). A cluster size of 1 has 1 leader, and can sustain 0 failures.
But if you can reliably confirm that all but one nodes have "failed", for a suitably robust definition of failed, that's a different scenario. This means even though you can't communicate with a failed node in the normal way, you are able to get confirmation that the node cannot respond to normal messages to any other nodes or clients, and something (maybe controlling the node, or software on the node itself) guarantees to prevent those responses, until the node goes through a recovery and reintegration process.
Some ways this is done are using remote-controlled power, remote-controlled reboot, or reconfiguring the network switches to cut off the node. Just to ensure it can't come back and carry on responding as if nothing happened except a temporary delay. There's some subtlety to doing this robustly: Consider a response packet that got onto the network before the cut off event, but is delayed a long time inside the network due to a queue or fault.
After reliable "failure" confirmation, you can shrink the quorum size dynamically in response, even down to a single node, and then resume forward progress.
What usually happens is that a leader won't confirm an operation as successful until such operation has been applied in a quorum of replicas (see: synchronous replication).
In theory, nothing prevents a leader from accepting new writes even if it can't reach a quorum, provided it never allows reading operations that haven't been replicated to a number of replicas.
They recently turned that knowledge into a product, still early/rough but holy crap it feels like dark wizardry to use it.
Plus these folks are really top shelf humans to work with.
If "failure" is a netsplit, only single partition would allow writes, because they choose CP from CAP theorem.
General consensus (no pun intended!) is the term availability is not really well defined, and the CAP thoerem is not a useful way to think about things (see Martin Kleppmanns "the unhelpful CAP theorem" in DDIA).
FoundationDB does not give you Availabity though, only CP.
Is it correct to assume FDB is the perfect framework for creating a queue?
There's this however: "QuiCK: A Queuing System in CloudKit": https://www.foundationdb.org/files/QuiCK.pdf. I suspect it really depends on what you expect from a queue, e.g. if you need strict FIFO or priorities, and how much effort you're willing to invest.
- memcached if you don't need to persist the data
- Redis if you don't know whether you need to use Redis or FoundationDB
- FoundationDB if you learn that Redis doesn't do what you need
I don't mean this in any kind of a derogatory way but I suspect that if you need to ask then you probably don't need FDB.
The principle of keeping tech stacks boring and using well established components is less exciting as an Engineer but is usually the best choice.
Disclaimer: i am involved with RonDB
Spoiler: they even had their own custom power supplies used to test against power failures.
If that's just because I haven't noticed the others, I'd love to hear about them for comparison.
I also find it somewhat irritating that they won't take fixes or reports of problems with the storage engine because "we fixed this in redwood" when redwood is completely theoretical.
Curious about that data loss bug. do you have a link? most bugs I've seen have to due with latency spikes and cluster unavailability. haven't seen any around data loss after transaction has committed.
Nginx eixsts, why do I need to learn caddy?
Redis exists, why do I need to learn FundationDB?
> Nginx eixsts, why do I need to learn caddy?
If you've learned to manage nginx, by all means use it, but for new users it's more like "caddy works with like 3 lines of configuration, including HTTPS, why would I learn nginx?"
I hadn't spun up a webserver other than Kestrel for a long time, and was absolutely looking for the easiest solution for putting a reverse proxy in front of an API. No huge traffic requirements or low latency, seemed a perfect fit for caddy.
Then I googled to make sure that the necessary featureset was there and saw that rate limiting is a plugin that's marked WIP. What's more, there seemed to be a couple to choose from.
So I went through the certbot steps (very quick + straightforward) and wrote the short nginx config based on one page of getting started docs and was up and running.
(I'm the author of ProxyKit that predated yarp)
This particular (small, internal) project was in Go, so I wanted to use the opportunity to forego any extra runtimes and just used nginx.
Unless you are running Redis only with nothing else, fdb and redis do not play in the same space.
Only in terms of transactions across multiple data centers. In every other way vertically scaler sql performs better, especially for your dollar.
But of course, hence why i referenced multidc transactions.
Not to my understanding. Can you elaborate?
I think they have different use cases: fdb when you want transactions, cassandra when you want throughput.
so you know that this question does not make sense.
>non-sharded, strict serializable, fault tolerant, key-value store that supports point writes, reads and range reads.
k-v store, non-sharded, fault tolerant, reads and range reads. Redis has these.
Redis's single threaded model does this (maybe, not entirely sure).
Please help me understand why is this comparison orthongonal or does it really replace redis in ways that I don't understand.
The two words that come to mind are "durable", which is the "D" in ACID, and the extensive testing of FDB's distributed robustness properties in a wide range of testing and fault scenarios.
If you want Redis to store durably, which means the data is reliably stored by the time the database replies to the client and won't magically disappear if there's a crash or power failure just after, you need to turn on its "fsync at every query" mode. This mode is very slow so durable storage is off by default in Redis. So by default Redit can lose the last 1 second of writes on power failure, kernel crash, or virtual machine abrupt termination.
In other words, FDB is desiged for very reliable storage of every transactional write, and has been built and tested with that in mind. Whereas Redis is not; they consider losing recent data in some realistic scenarios to be an acceptable default, and even with "fsync on every query" mode turned on it does not have the same level of testing and focus on durable distributed storage as FDB.
Without the "D", things built on top which fill out the rest of ACID transactions and database indexing aren't as reliable either. For example "C", consistency (with foreign keys, indexes, etc), is impossible to maintain if some of your recent writes may be lost while others are not. I don't know what guarantees Redis offers in this area, but it does not seem to be the focus. Whereas FDB authors make it clear this sort of thing is a core focus of the product which it is architected for and also heavily tested.
It solves a problem that others have not solved. Which problem is that? And how better does it work than whatever there was?
Though "super reliable distributed database presenting a single logical shard to client code" is a class of system for which I don't think there's anything else even close out there, and I suspect generally if that's something you -need- then you'll already know that.
Clearly untrue, however FoundationDB is open source, with a permissive license.
So is much of the operational tooling for it: