How We Built an AI Meme Engine That Actually Understands InfoSec

Getting Claude Haiku to write memes that land requires more prompt engineering than you'd expect. A deep dive into our generation pipeline, quality scoring, and Impact font rendering.

When Jordan told me the platform needed to generate security memes on demand, I assumed it would be straightforward. Call the API, get a funny caption, slap it on an image template. That was fourteen months ago. I have opinions now.

This post covers the full generation pipeline: how a request goes from POST /api/v1/memes/generate to a JPEG on a CDN. I'll cover the parts that worked first try, the parts that definitely didn't, and why our quality filter rejects roughly 40% of first-pass generations.

The Generation Pipeline

A meme generation request flows through five stages:

  1. Request validation — authenticate JWT, validate the MITRE technique ID
  2. Cache check — Redis lookup; if we've generated this technique recently with the same template, serve the cached S3 URL
  3. AI generation — call Claude Haiku with technique context, get top text + bottom text
  4. Quality scoring — score the output before rendering
  5. Image rendering — Pillow composites text onto template, uploads to S3

Steps 3 and 4 are where everything interesting (and painful) happens.

Why Claude Haiku?

We evaluated several models for the text generation step. Our criteria: low latency (meme generation should feel instant), low cost at scale (we generate a lot), and domain knowledge sufficient to avoid technical errors.

The technical accuracy requirement ruled out smaller models immediately. Early tests with generic smaller models produced memes that were funny but technically wrong in ways that would embarrass our users in front of their teams. A SQL injection meme with the wrong syntax. A Kerberoasting reference that described the wrong attack step. Unacceptable.

Claude Haiku's security domain knowledge is strong enough that it catches most of these errors inherently, and our quality filter catches the rest. Latency at our request volume is well within acceptable bounds. The cost math works.

The System Prompt Problem

Getting consistently good output from any LLM requires a system prompt that constrains the output format, establishes the audience, and conveys the quality bar. This took longer to get right than I expected.

Our first attempt looked like this:

"Write a funny meme about the cybersecurity concept: {technique}. Format as two lines in ALL CAPS."

The output was consistent but generic. "HACKER USES {technique} / SECURITY TEAM: SURPRISED PIKACHU." Technically a meme. Not a good one.

The current system prompt runs to about 280 tokens and does four things the first version didn't:

  1. Establishes the audience explicitly. Our users have 5+ years of hands-on security experience. The system prompt says this. Generic "cybersecurity" framing produces generic output; specifying "senior SOC analyst running a SIEM that fires 2M events/day" produces output that resonates with that experience.
  2. Provides technique context beyond the name. We pass the MITRE description and a one-sentence real-world example scenario. Generations with scenario context score 47% higher on our quality filter.
  3. Defines the meme format precisely. Setup line = recognizable scenario. Punchline = the painful truth. 12 words max per line. We include three examples of good output and two examples of what we don't want.
  4. Adds a negative constraint. "Do not include specific CVE numbers, vendor names, or tool names unless the technique is specifically associated with a single well-known tool." This prevents the model from anchoring on specific incidents rather than universal experiences.

The Quality Filter

We use a two-stage quality filter before rendering.

Stage 1: Structural checks. Does the output have exactly two lines? Are both lines under 14 words? Does either line contain a URL, a CVE number we didn't request, or anything that reads like a copyright notice? This catches about 8% of generations.

Stage 2: LLM scoring. We pass the generated text to a second, lightweight LLM call that scores the output on three axes: technical accuracy (0-10), humor quality (0-10), and audience fit (0-10). We regenerate if the combined score falls below our threshold. This catches another 32% of first-pass generations.

Is using an LLM to score LLM output circular? A little. But it works, and the alternative — human review at scale — doesn't. The scoring model is calibrated against a labeled dataset of ~800 memes rated by our security team, and it agrees with human raters about 83% of the time.

Image Rendering with Pillow

This part was genuinely fun to build. Our renderer takes a template image, a top text string, and a bottom text string, and produces a JPEG in the classic meme format: Impact font, white fill, black outline, all caps.

The technical challenges:

Text wrapping. We calculate the maximum font size that fits the text within 80% of the image width, then wrap if necessary. Getting this right across 20 different template aspect ratios took two days and a lot of edge case testing. Bobby Tables helped.

The outline. Classic meme text has a multi-pixel black outline that makes it legible on any background. Pillow doesn't have a native text stroke, so we render the text eight times at ±N pixel offsets in black, then once in white on top. N is proportional to the font size.

Template variety. We have 20 templates across five visual categories, selected at generation time based on the technique tactic. Execution and Defense Evasion get terminal-style backgrounds. Impact techniques get high-contrast alert color schemes. Discovery gets more neutral layouts. This is configurable per user.

S3 Delivery and Cache Strategy

Generated images are uploaded to S3 and served via CloudFront. The key we use: memes/{technique_id}/{template_id}/{content_hash}.jpg. The content hash means identical text on the same template always produces the same key — natural deduplication.

Redis caches the S3 URL keyed by technique_id:template_id for 24 hours. Cache hit rate is about 60% in normal operation. For popular techniques (T1566 on Mondays, T1486 after any major ransomware news cycle), it's closer to 90%.

What's Left to Do

The pipeline works well enough that we don't touch it often. Current work items:

  • User-configurable templates — let enterprise customers upload their own background images
  • Animated GIF output for select technique categories (T1071.004 really wants to be an animation)
  • Batch generation endpoint so red teams can seed their entire ATT&CK library in one call
  • Better handling of technique sub-variants — right now T1059 and T1059.001 generate independently; they should share context

If you're building something similar — AI content generation with quality filtering and image composition — happy to talk through the architecture. The API docs cover the request and response format; the interesting engineering is in the generation layer.

Explore the API → Read the ATT&CK Coverage Post
← All Posts Next: The Log4Shell Meme →