When starting a new project, you might want to ask yourself : “How am I gonna index the objects I store in my database?”
The fast answer is “I don’t care. Just use default”. Which is incremental. And this is fine in lots of cases. But if you’re here, reading this article, it’s because this is not the answer you’re looking for. UUID has been brought to your attention and you want to know more about it. So let’s dig in it a little.
What is UUID?
Well first of all, UUID stands for Universally Unique IDentifier. You might encounter the acronym GUID (for Globally Unique IDentifier) someplace, which is exactly the same thing.
It’s basically just a 16 bytes number, but the way it is generated, following a standardized algorithm, ensures that every UUID is unique. Worldwide! And that, without using any kind of central authority. Actually, nothing guaranties, mathematically, that a UUID is really unique. But the probability of generating 2 identical UUIDs is so low, that it is considered negligible. Consider yourself more likely too survive a crash on an island where a black smoke uproots trees and kills people, than generating a UUID that already exists somewhere else in the world.
UUID is not represented as a large number, but as a string. It’s called the canonical form. The 16 bytes of the UUID are represented as a 32 hexadecimal digits, displayed in five groups separated by hyphens. It looks like that:
The is the form you will manipulate, and that is how it will be stored in your database.
Why would you want to use it?
Yes, why, actually? The usual auto incremented number works perfectly fine. Well There is a few cases where UUID might become handy.
One is related to security. A UUID is way more difficult to remember that a simple number. So someone passing by, having a glance at your screen would not be able to know what file number you are working on. Hacking into a system to retrieve information is way more difficult if you don’t know what you’re looking for or where to look.
Another is concerning database scaling. Imagine you’ve been writing blog articles on two self-hosted blogging platforms. And for some reason you want to merge those two blogs into one. If you had used usual auto-incrementing IDs, you would have to re-index every blog post of the databases and update every foreign key that might point to them. But if you had used UUID as primary keys… No work to do!
There is surely some other benefits to using UUID as primary keys, just look around on the internet. But the reason that interests me today is for “local-first” applications.
Let’s say I have a collaborative application. And I want to be able to work with that application, event when I’m offline. And that means creating new content that should be added to the common database. What I expect is for the application to let me create my new content, and merge it to the central shared database when I’m back online. And I expect my coworkers to be able to do the same. In this scenario, entries in the database can’t be indexed with an auto incrementing number. Because my coworkers and myself, while offline, would be creating entries with the same ID. And once back online, we would face numerous data merging issues. UUID in this case, is a marvelous solution!
So what’s the catch?
Ok there is ONE catch. And it’s not a small one: performance.
First, the size. Storing a 36 characters string is not as efficient as storing a 4 or 8 byte integer. I hear that MySQL 8 and MongoDB offer the possibility to store UUIDs in a compact binary format, that would occupy only 16 bytes, which is significantly better.
Second, the speed. In a database, entries are indexed in order. Which is very convenient for auto-incremented values since every new row is then inserted after the last one. And this is how light-speed search is achieved when done on primary key. But when using UUID, inserted rows will usually not be after the last one. This is slower because the database engine has to browse the entire database to make the insert. And it has some side effects on the database file structure.
Make your choice
So, is UUID for you? Should you use it?
If your database is or will eventually be distributed, like in the case of a local-first application, or simply if your NoSQL database is scaling up and divided upon multiple servers, I’d say that you have almost non choice : Use UUID! Just know that there is some things that you can do to improve performance. I encourage you to look into the different version of UUID standard.
For every other case, don’t bother.
This article is based on my latest readings on the subject. Here is a sample of the more enlightening articles I’ve read on UUID. May they provide you with some more details.
What’s your thoughts on UUID ?
Do you have some good literature to share on the subject ?