Discord Compiler’s Road to 10k Guilds

One of my personal projects hit an important milestone recently, I thought detailing a timeline on its history up until now may be interesting as an inaugural blog post. We’ll start from its conception and end where we are as of this writing (current stats here). I’ll also take the opportunity to share some of the technologies and lessons I’ve learned at the end.

Origins

Discord Compiler first launched on October 23rd 2018 to fill a void within many Discord programming guilds (servers). Before the first version was implemented and released, there were no bots that were able to execute code and display a program’s output. Well, not safely anyway.

A few bots existed before as shallow attempts in achieving this, many of which piped user input through Javascript’s eval(), effectively providing an RCE vulnerability as a service. We’ve even had a few of attempts of users trying to obtain our bot’s secret token who assumed we were that naïve, but many less than I expected.

valiant attempts


Initial Release

Anyway, the first version of the bot launched with approximately 500 lines of Javascript thrown together in haste to get something working out the door. We used the same strategy that we use today to handle compilation requests; Instead of implementing our own process jailing system to execute arbitrary code from untrusted sources I decided to leverage an API provided by wandbox.org (and later, godbolt.org). I chose this course because I could not trust myself with the burden of ensuring that these processes were jailed safely and, being honest to myself, I knew it’d oversaturate my noob programming brain.

Recognizing that this project might eventually gain some momentum, I also hurriedly constructed a simple statistics API for the bot to post daily server and requests counts to. Looking back, I’m grateful to have these statistics from day one despite it was surely unnecessary at the time.

From it’s initial release to the year 2020, I slowly iterated on the the project. The goal throughout this period was to implement every endpoint of the WandBox API, which I did, but not without accruing mass amounts of technical debt. At this point the bot was able to handle stdin, compiler flags, code execution from URLs (i.e. pastebin), and probably other bits that I’m now forgetting.

If you used this bot in its early years, you’ll recognize the original profile picture taken from Compilers: Principles, Techniques, and Tools by Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ulman.

The First Rewrite

What’s a good software project if it’s never burnt to the ground at some point?

By April 5th 2020 I knew the project was starting to get off the ground. I committed a patch that cleaned up swaths of my prior messes by building nice abstractions. This was also done to maintain some level sanity throughout the codebase for future contributors.

At this point our statistics tracking reported the following.

Development on this project began to speed up, godbolt.org was added to inspect assembly, and around this time our userbase began to increase drastically. From January 1st 2020 to October 24th 2020, our guild count increased by 300%.

We’ve Got Trouble

The library I chose for the bot was discord.js, excellent for small bots and novice bot developers. Unfortunately by the time we reached ~2k guilds, discord.js’ caching strategy got in the way of the project’s reliability.

As a general disclaimer, I have no idea if discord.js has improved in this aspect since I’ve switched away from it. Any of my experiences here may have been resolved, so to be safe I’ll speak in the past tense and get into the problems.

For all bots, whether you wanted it or not, discord.js would cache everything it received from Discord. Every single message, every single user, every single guild, every single channel, everything. That’s a bit of a waste if your bot does not require any state, isn’t it? Of course, Discord Compiler didn’t need any of this. It simply acted as a middleman, shipping requests back and forth from WandBox and Compiler Explorer to Discord. Unfortunately, I don’t have any juicy htop screenshots for this period.

Anyway, the caching led to an exorbitant amount of memory consumption, so much that our process would restart quite often as the VPS would exhaust all of it’s memory and kill node. After researching if others ran into this problem I found an issue detailing possible solutions to this problem, only to see a consensus of:

”[Any previous fix discussed] has been determined to be unreasonable for a library like discord.js that relies heavily on internal state in order to function.” [src]

At this point, it was crystal clear that we had outgrown discord.js.

The Rust Rewrite

What’s a good software project if it’s never been burnt to the ground, twice?

After six short months the project was rewritten again, this time in Rust. During this rewrite, special care was taken to avoid some of this mistakes made in the previous version. I detail it quite well in the pull request so I’d like to just quote myself here.

“The mentality building this rewrite was centered around the following: ‘Build the necessities with the least amount of boilerplate possible.’ If you’re familiar with our Javascript codebase, we took great care in abstracting as much of the ugly away as possible. Most of the time this meant that codepaths became obscure or difficult to follow. This proposed version trims the fat in Discord Compiler, the new codebase is easier to read, maintain, and expand going forward.” [src]

You might ask: “Wasn’t the last rewrite supposed to fix these problems?”

Well, yes. While there wasn’t a stinky pile of technical debt anymore, I still hated working on it. Everything was accessible through a single CompilerClient object which meant that logic nearly always passed through it in some way. While ownership was clear and the single-responsibility principle was respected, there was definitely too much contrived complexity to enjoy working in it.

Switching to Rust largely solved the contrived complexity problem. Since it is not object oriented in the traditional sense, I found it forces the writer to think carefully about implementing new types, what they’re responsible for, and how they should be used.

In Javascript I might more quickly determine that it makes sense for a parser object to own a string, house methods to handle different operations on that string, and return a result of some sort. In Rust I found myself instead following a more functional approach. Do I need a parser struct? Why would it need to own the string it’s operating on? Why would implementing a struct for parsing be better than just having a function return some sort of ParseResult?

These are the kinds of questions I asked myself while implementing the bot in Rust. To be clear, I’m not suggesting that Rust inherently prevents you from making these poor design decisions, but Rust’s emphasis on ownership and lifetimes ensures that the conscious mind is involved at every step.

As we approach two years since patch landed, to the current day, the project has grown considerably. With us using serenity as our library to interact with the Discord API, we’ve faired quite well. The code base has grown to around 6k lines of Rust, we’re in over 10k guilds, and it doesn’t look like we’re slowing down.

In case you’re curious of our memory usage, we typically sit around 97mb per-shard, not bad.

Let’s get into some of the big picture items to wrap this up.

Lessons Learned

I’ll end this up with a list of some of the most important lessons I’ve learned thus far working on Discord Compiler. This list includes things I believe I got right and things I definitely got wrong. I’m leaving out a ton here, but the takeaways are:


As always it takes a village, thank you to everyone who has helped Discord Compiler get to 10k.