Why UUIDs are bad, and why you should consider the alternative
Universally Unique Identifier, or UUID for short, is widely used as a unique identifier across systems. Their appeal lies in their simplicity and universality. But they come with costs (and they are huge).
In this article, we'll examine why UUID might not be the best choice for your project and what alternative you can consider.
TL;DR: Use Cuid2 instead of UUID.
What does a UUID look like?
Here's an example of a UUID:
ad464b7e-ca89-4ec5-95ea-858e7200b8ae
UUID is a common type of unique identifier (ID) used across systems, especially in a distributed environment where you need to generate unique identifiers that are collision-resistant, independent from each other.
The Drawbacks of UUIDs
1. Takes up too much space
UUIDs are 128-bit values, typically represented as 36-character strings (not forgetting the dashes). This size poses challenges:
- Increased storage costs.
- Reduced performance.
Compared to simpler identifiers like integers (1, 2, 3, ...), UUIDs consume more space in databases and indexes, leading to larger storage requirements. UUIDs also require more CPU cycles and memory to process, slowing down query performance.
2. UUIDs are humanly not possible to read
Imagine accessing a resource with an ID of ad464b7e-ca89-4ec5-95ea-858e7200b8ae, and later you find that the resources you were looking for require you to look up another resource whose ID is 88d682d9-a3bb-4a77-a7e3-e8e02da9dc7b. You do this again and again, and suddenly you find yourself thinking, "did I copy the right UUID?".
UUIDs are never designed to be worked on by humans; you could never easily distinguish between UUIDs at a glance. It is easy to make human mistakes (we are not robots) when working with UUIDs.
3. Your database hates them
Most databases are not optimized for UUIDs. For example, if you use UUIDs as primary keys, you might find that your database is not performing as well as you'd like. This is because UUIDs are not sequential; you can't really sort them in any meaningful way.
Database indexes are optimized for sequential data (meaning sortable data), because it is easier to look up data when you know roughly where it is in your database. With UUIDs, you have to search the entire index to find the data you are looking for, which can be slow. Do you still remember why binary search is faster than linear search back in your Data Structure lessons?
In the worst case, you might find that your database grinds to a halt when you inserted a lot of UUIDs in a short amount of time. This is because UUIDs are inherently random, and this randomness leads to a high level of fragmentation. As more and more UUIDs are inserted, the database has to do significantly more work to keep track of more and more random locations, grinding the database to a halt.
4. UUIDs are not as unique as you think
UUIDs are not as unique as you might think. There are different versions of UUIDs, and some versions are more unique than others. For example, version 1 UUIDs are based on the current time and the MAC address of the computer generating the UUID. This means that if two computers generate a UUID at the same time, they might generate the same UUID. This is not a problem if you are generating UUIDs on a single computer, but if you are generating UUIDs on multiple computers, you might run into problems.
Another issue with UUIDs is that you could never tell if the UUID is generated by a collision-resistant algorithm. You should never assume that UUIDs are unique and always check how they are generated.
Cuid2 – a better alternative
Cuid2 is a successor to CUID (Collision-resistant Unique Identifiers) – the original CUID specification addresses many of the drawbacks of UUIDs, but it initially does not do a good job at security. Its successor, Cuid2, addresses the issue by combining many entropy sources to prevent them from being predictable to hackers.
Cuid2 is a better alternative to UUIDs as they are designed to be:
- Collision-resistant – Cuid2 is designed to be unique across systems, even if they are generated at the same time.
- Human-readable – Cuid2 is designed to be human-readable, making them easier to work with.
- URL-friendly – Cuid2 is designed to be URL-friendly, meaning they can be used in URLs without encoding.
- Database-friendly – Cuid2 is designed with the database in mind, meaning they work well with databases and indexes.
- Compact – Cuid2 is designed to be compact, meaning they take up less space in databases and indexes.
- Sequential – Cuid2 can be used to generate roughly monotonically increasing ids designed to preserve the order of creation; hence they are (almost) sequential relative to their creation time, meaning they are great for primary key index performance.
...and we can go on and on. In a nutshell, Cuid2 is designed to be better than UUID in every way, especially for your database (and your users).
How do they look like?
A Cuid2 looks like this:
cm4cj21ww000008l54h0t7xhx
Immediately you can see that it is shorter than UUID and is also easier to read and remember.
Libraries
There are many libraries that can generate Cuid2 for you:
- paralleldrive/cuid2 for JavaScript.
- gordon-code/cuid2 for Python.
- obsidiaHQ/cuid2 for Dart.
- nrednav/cuid2 for Go.
- mplanchard/cuid-rust for Rust.
What's the catch?
The catch is that Cuid2 is not widely adopted in many database engines. For example, there is no native way to generate Cuid2 by default, unlike UUID. This does not mean that you can't use Cuid2, you can still generate Cuid2 in your application instead and use them as primary keys in your database.
So you might have to do some work to get them to work with your database. But the benefits of using Cuid2 far outweigh the costs and are worth considering.
5 Mar 2024 • cybersecurity, technique, database