How I Built AI Guardrails Into a Teen Discord Server

My kid wanted a Discord server for her and a handful of friends from school. I wanted to be able to sleep at night. Those two goals are not actually in conflict, but the entire engineering project lives in the gap between them.

The easy version of this is a weekend job. Spin up a server, make a few channels, hand out roles, paste in an invite, done. I started there. It got interesting the moment I let myself ask the obvious follow-up: what happens when there's an AI assistant sitting in a room full of thirteen-year-olds, and what does it mean for me, the only adult holding the keys, to be responsible for that?

This is the writeup of how I answered it. It's a small system. It runs on one box, it's a few thousand lines of JavaScript, and it is deliberately boring. The boring part is the point.

The thesis, up front

AI should be available, but it should never be invisible, everywhere, private, or authoritative. For a server built around a minor and her friends, the safe pattern turned out to be a dedicated channel, a slash command instead of freeform chat, replies that are public in that channel, a deterministic check that runs before anything reaches the model, memory kept on a short leash, and reporting that a tired adult can actually read at the end of the day.

If you've seen my earlier writing on running models locally, set that frame aside here. This is not that. This server deliberately leans on a cloud-hosted model, which changes the privacy math in ways I'll be honest about later. The interesting work was not the model. It was deciding where the model gets to exist and how much authority it gets to have.

What I was actually constrained by

The constraints came before any code. The server is private and minor-focused. It's maintained by exactly one technical parent, which means anything clever I can't maintain at 9pm on a weeknight is a liability, not a feature. The bot runs least privilege. There are no hidden AI direct messages with minors. And the AI never hands out punishments.

That last one is the load-bearing principle. The adult owns judgment. The bot can assist, log, summarize, and alert. It does not get to be judge, jury, or the thing that drops a timeout on a kid because it misread a joke.

The shape of the thing

The bot runs on an Ubuntu box as a systemd service. That choice is unglamorous and correct: it comes back on its own after a reboot, it doesn't need a terminal session babysiting it, and I can read its logs with journalctl when something looks off. Node.js and discord.js do the heavy lifting, mostly because discord.js is a sane wrapper around the parts of Discord's API I actually touch -- slash commands, modals, embeds, permissions, channel operations. For a server this size, one Node service is far easier to reason about than a pile of microservices.

State lives in local JSON files. The server is small, it doesn't need a database yet, and JSON is trivial to inspect and back up. The honest tradeoff is that this is not built for scale or heavy concurrency, and I'm fine with that because it will never need to be.

The daily report is a local HTML file, not an email and not something the bot uploads into Discord. No SMTP credentials to manage, nothing leaking report contents into a channel, and the report stays under my control on a host I own. When I want to read it, I copy it off with scp:

scp admin@host:/srv/family-bot/reports/latest/daily-summary.html ~/Desktop/daily-summary.html

The server layout is intentionally simple. A START HERE area with rules and guidance, a CHAT area, voice channels, and a parent-admin area that's hidden from the kids entirely. Roles are just Kid, Friend, and a Timeout role. Teens see what they need to see. The admin plumbing stays out of view.

Baseline safety before any AI

I am the only Administrator on the server. The bot does not hold Administrator for normal operation, and that's a deliberate line. I granted it Administrator exactly once, temporarily, to get past Discord's 50013 Missing Permissions wall while setting up private category permissions, and I stripped it the moment that task was done. If the bot is ever compromised, I'd rather it be able to do very little.

Invite creation is locked down for @everyone and for both the Kid and Friend roles. A private server for minors should not let invites spread on their own.

On top of that, the safety model is layered and no single layer is trusted to hold. Discord's native AutoMod handles the floor: NSFW content, spam, mention spam, and suspicious links, with link blocking tested and confirmed. The bot logs message edits and deletes, logs joins and leaves, and exposes a /report slash command that opens a modal and files the report into a private channel. There's one deterministic rule worth calling out: an obvious scam or malicious-link report gets bumped to at-least-medium urgency automatically, so the system can never quietly under-rank something that's plainly a problem.

The decision that actually mattered: where the AI gets to exist

The bot already did some AI-assisted work before the kids ever got an assistant. It could triage reports and run a chat monitor in alert-only mode. But handing teenagers a general-purpose assistant is a categoricaly different thing, and I treated it that way.

The risks are not exotic. Models are confidently wrong. They sound authoritative even when they shouldn't. Teens ask personal questions, and teens overshare. A cloud-backed model processes prompts off the local network. A bot that answers in every channel is noisy and nearly impossible to audit. And a private AI conversation with a minor is exactly the thing I did not want to build.

So the AI lives in one channel, #ask-ai, and you talk to it with one command, /ask. Replies land publicly in that channel. I went through the options I rejected, because the rejections are the actual design. Responding to every message everywhere: too noisy, to much accidental data sent to the model, and it makes the bot feel like it's always listening. Responding on mention or to a ?ai prefix: better, but still fuzzy about intent. Ephemeral replies or DMs: hard no, because that's a hidden conversation between an AI and a child.

A slash command makes intent explicit. The kid chose to ask. The channel stays quiet otherwise, and every interaction is easy to find later. Public replies mean the assistant is a shared tool sitting in a shared room, not a private confidant.

The engineering question was never "can the bot answer." It was "should the bot answer here, in this way, with this much visibility, and with this much authority."

There's a real tradeoff in making replies public: it discourages personal questions. For this server, that's a feature. Serious personal, medical, or mental-health stuff should go to a trusted adult, not a chatbot in a group chat, and the design quietly nudges in that direction.

Guardrails, and why the system prompt is not one

The bot sends a fixed system prompt with every allowed request. It tells the model to be friendly, age-appropriate, and concise, to refuse the obvious bad categories, to point serious topics toward a trusted adult, to not pretend to be human, and to not reveal anything about the server's internals.

A system prompt is not a security boundary, though. It's one layer, and a soft one, because a determined prompt can argue with it. So before any allowed request reaches the model, a local deterministic pre-check runs first. It screens for the obvious categories -- sexual content, self-harm, cyber-abuse, dangerous or illegal instructions, attempts to pull secrets, attempts to evade controls.

When something trips that check, the bot posts a short, kind refusal publicly in #ask-ai, sends an alert to the admin channel, and records the event. What it does not do is matter just as much: no punishment, no deletion, no timeout. The blocked prompt never enters memory, and it never gets sent to the cloud model at all.

This is also where I'll show you something that broke, because the fix taught me something. I tested the pre-check with a prompt about stealing someone's password. It first landed in the broad illegal-or-dangerous bucket, which produced a generic refusal. I'd rather it be caught by the specific cyber-abuse rule, which gives a more useful, more relevant answer. The fix was reordering the rules so the specific pattern gets a look before the broad one. The lesson generalizes: a broad pattern like "steal" will grab a prompt before a narrower, smarter rule ever runs, so order is part of the logic, not an afterthought.

Memory, on a short leash

The assistant has memory, but it's kept small on purpose. It's scoped per user, only inside #ask-ai, and it holds only the last handful of question-and-answer turns as short excerpts with a timestamp. It never touches the admin channels, the logs, secrets, or blocked prompts.

The framing I keep coming back to: memory here is for convenience, so the bot remembers what you just asked a minute ago. It is not a diary, and I built it so it can't quietly become one.

Making it auditable without making it creepy

Every night the bot posts a short notice in an admin channel pointing at the day's summary. It does not dump the report into Discord. The real artifact is that local HTML dashboard, generated on the box and left there for me to pull down. It tracks event volume, surfaces blocked queries and AI errors, and is tuned so that normal questions like "what is a black hole" don't clutter the findings table. Only the things worth a human's attention show up there.

Here's the part I think people skip when they build something like this. A logging system pointed at a shared space full of other people's kids is only defensible if the people in that space know it exists. So I built the disclosure into the server instead of burying it.

There's a #how-to-use-ai channel that tells the kids in plain language that the AI can be wrong, that they shouldn't share private information, that replies are public, that the admin can review activity, and that the bot is not a diary. And there's a #how-it-was-made channel, written for actual thirteen-year-olds, that explains how the thing works -- the stack, the safety choices, even the mistakes -- with small ASCII diagrams and a build loop that ends in "victory snack." The kids who are into coding genuinly liked it. More to the point, nobody in that server is being watched by a system they've never been told about.

If you build something like this for a group that includes other families' children, the adults should be in on it too. Disclosure about what's logged and why belongs at the front of the project, not in a footnote nobody reads.

What else broke

The first version of the AI guidance was a wall of text pinned directly into #ask-ai. It made the one channel that's supposed to feel light and usable feel like a terms-of-service page. The fix was to move the long guidance into the read-only #how-to-use-ai channel and leave #ask-ai with a single short pinned note.

The funniest failure was the rename. My early helper scripts targeted the server by its name. Then the kids renamed it -- to something I'm not going to reproduce here -- and every script that looked for the old name fell over instantly. The fix was to target the server's immutable numeric ID instead and keep the name only as a fallback. Names are for humans. Automation should hold onto IDs. After that, the bot was rename-proof.

There was also a quiet deprecation warning in the logs: discord.js flagging that passing ephemeral: true was being replaced by a flags field. I swapped to MessageFlags.Ephemeral and the warning went away. Small, but worth cleaning once the core behavior is stable, because log noise is where real problems hide.

What I'd tell someone building the same thing

Structure and permissions come before AI, not after. The bot should almost never need Administrator, and when it does, the move is temporary elevation followed by taking it straight back. Prefer an explicit command over scraping every message. Keep the AI in one visible place. Never set up private bot DMs with a minor. Put a deterministic check in front of the model and don't let a system prompt be your only line of defense. Keep memory small and local. And don't let the AI punish anyone, because it misreads sarcasm, slang, and half-finished jokes, and a false positive on a kid costs you trust you will not easily earn back.

The cloud tradeoff I'll name plainly, because it's the honest weak point. This server uses a hosted model reached over the network, not one running on my own hardware. The provider states that cloud prompts and responses aren't logged or used for training and are processed only long enough to serve the request. I designed around reducing what ever reaches it anyway: the pre-checks, the single-channel limit, the warning not to share private information, and the rule that blocked prompts never leave the box. That reduces exposure. It does not make hosted processing equivalent to keeping everything local, and I'm not going to pretend otherwise.

The boring conclusion

Perfect safety is not a thing, especially with teenagers, a chat platform, and an AI in the same room. The goal was narrower and more honest: make a small private server safer, more transparent, and maintainable by one person, without turning it into a surveillance machine or an over-engineered moderation stack.

AutoMod handles the floor. The bot keeps things auditable. Kids can report problems. The AI lives in one visible room and never gets to be the authority. And I stay the final call. The architecture is deliberately dull, and that is exactly why I trust it.

Designing AI for a Teen Discord Server Without Turning It Into a Surveillance Machine

The thesis, up front

What I was actually constrained by

The shape of the thing

Baseline safety before any AI

The decision that actually mattered: where the AI gets to exist

Guardrails, and why the system prompt is not one

Memory, on a short leash

Making it auditable without making it creepy

What else broke

What I'd tell someone building the same thing

The boring conclusion

Comments

More from this blog

PamStealer Skips the Process Chains Defenders Watch. Not All of Them.

Your Agent Framework Is a Pile of API Keys on a Public IP

There's No Patch for FortiBleed. Public-Sector Networks Are Where That Hurts Most.

Four Failures, One Excuse: "Patch Faster" Was Never the Answer

Vibe Coding Isn't the Problem. Not Understanding the Stack Is.

Command Palette

The thesis, up front

What I was actually constrained by

The shape of the thing

Baseline safety before any AI

The decision that actually mattered: where the AI gets to exist

Guardrails, and why the system prompt is not one

Memory, on a short leash

Making it auditable without making it creepy

What else broke

What I'd tell someone building the same thing

The boring conclusion

Comments

More from this blog