Discord Unveiled Searcher

Search

What's this?

Discord Unveiled is a 2 terabyte dataset, comprising 2 billion messages that some idiots researchers released to the public:

The dataset comprises over 2.05 billion messages from 4.74 million users across 3,167 public servers, representing approximately 10% of servers listed in Discord’s Discovery feature.
[...]
The 3,167 collected servers yielded a total of 2,052,206,308 unique messages sent by 4,735,057 distinct users. From the total number of messages, 364,447,569 (17%) originated from bots. The data spans from Discord’s launch, on May 13, 2015, to December 17, 2024, when the data collection process began.
source: https://arxiv.org/pdf/2502.00627

Every message visible to their scraping bots in those servers was scooped up and put into this dataset. The dataset is 2,099,556,854,904 bytes unzipped, which is extremely difficult for most people to handle, and most people would just like to know whether they're in it or not. I've processed all this data into a nice neat single SQLite3 database, containing just the user IDs in the dataset mapped to the guild IDs they were found in. Bots have been excluded from this database.

How does it work?

The "researchers" said they took "anonymization techniques to obscure sensitive information". This is, quite frankly, a complete load of horseshit.
Each ID in the dataset, except for guild and channel IDs, was sent through a SHA2-256 hash, then only the upper 48 bits of the hash was retained in the final dataset.
This is easily reversible, as Discord IDs are extremely finite (only about 1018 IDs), and a brute force on them could (probably) be done in a few weeks or months on a single computer. However, it's much easier to know the user ID it maps to if you just input your user ID, and that's exactly what this does.
Essentially all this boils down to is just running the user ID you input above into a SHA2-256 hash, taking the upper 48 bits, then running SELECT guilds FROM users WHERE user_id = ? on the SQLite3 backing database.
This doesn't store your user ID at all.

How bad is this?

I'm very biased and anything I say will very much not sound good... but that said: really bad.
At this point you shouldn't trust Discord at all. They clearly did nothing to stop scrapers of two billion messages. This is a critical failure at many levels to not notice this. That, combined with the ongoing enshittification of the platform that continues to proceed up to an IPO, means, quite franky, run if you can.

Questions

Can I see the messages I had in the dataset?

Not yet, but this is something I want to get done. It will require you to log in with Discord to prove you own the account as a safety measure.

How many users/servers are in here?

I'm pretty sure the "researchers" lied, as after processing the data I get a hell of a lot more than what they said:

> sqlite3 output.sqlite "SELECT COUNT(*) FROM users"
13011531 # user count
> jq 'reduce .[] as $_ (0;.+1)' output.json
2864 # server count
    

The server count is much lower than the server count the researchers gave, as I also excluded empty guilds during processing.
However, the greatly enlarged user count can't be explained. The hashing I'm doing is indeed correct, so the researchers must have lied.

Did you collect these messages?

No. I hate every part of this dataset and wish it didn't exist, but I'd rather people know if they're in it instead of leaving them in the dark.

Can I download the database?

Yep. Poke me on fedi (scroll down for the link) and I can send it your way.

Can I have my data deleted from here?

I have no control over this dataset, and it's out there permanently. You cannot recover control of anything in here, and everything in here should be treated as public forever. That said, if you want to delete yourself from the little database I have as the backend for this (despite that serving literally no purpose), feel free to poke me on fedi.

Is the source for this site public?

Yep! https://codeberg.org/tazz4843/discord-unveiled-searcher

This site looks kinda shit.

Not a question, and see the answer above :)

Who are you?

Hi I'm Niko :3
If you're curious for more about me check out my website.
Poke me on the fediverse if something broke here.
I'm also looking for a job. If you like what I'm doing, feel free to reach out to me. You can find my contact info at https://niko.lgbt/contact.

🏳️‍⚧️ trans rights are human rights