The Laszlozone - CAPP·E 2

CAPP·E 2 (Chonky Animal Photoinsertion Program · Extreme 2) is a large rodent insertion engine that lives in my cellphone.

Text it a photo, and CAPP·E will humorously revise your image and automatically send back a capybara-enhanced version.

↓ CAPP·E In Action ↓

The CAPP·E Backstory

In 2022, OpenAI released DALL·E 2, their landmark image generation model. It blew me away with its versatility and distinct, dreamlike quality. That was the first time an AI had me excited about how it would follow my prompt, rather than whether it could.

Recently, Google released a new image model nicknamed "Nano Banana," notable for its groundbreaking image editing capabilities, and it knocked my socks off. I spent hours placing all manner of zany things into photographs. The results could be impressive, and hilarious, but they required super-specific prompting to get right.

I wondered if I could build a service that could reason through and implement these edits from end-to-end. A purpose-built image-editing service that would, like DALL·E 2, pique users' curiosity about what's possible.

In other words... Could I get an AI to come up with funny ideas, then use those ideas to edit an image autonomously?

Short answer: YES!

It's a Hit!

CAPP·E 2 has been a rousing success since its launch a few weeks ago. Despite a soft "friends and family" launch, the service has generated 1000+ capy'd photos for approximately 50 users. Some standout quotes:

"Easily the most fun I've had with AI yet" · "10/10 use of AI" · "OMFG IT'S WEARING A FISHING HAT"

"Oh my god the hands" · "this has been a blessing" · "I am having so much fun with this. Thank you"

It has been very fun to share CAPP·E. The most satisfying aspect: many friends submitting all sorts of photos because they were curious about how CAPP·E would react to them. Paintings, online screenshots, hand-scribbled drawings, selfies, family photos, epic ski vacation pictures, you name it.

The CAPP·E Pipeline

CAPP-E 2 uses a four-step AI reasoning pipeline that begins whenever someone sends it a photo.

The user sends a photo, and CAPP·E lets them know it was received.

First, the image is detected by CAPP·E, then it begins processing. Each colored step in the chart below represents a Gemini API call. The accompanying text shows truncated examples of Gemini's text outputs for each step.

→ STEP 1: VISUAL ANALYSIS

CAPP·E first analyzes the image and produces a detailed description of it.

"The image is a hand-drawn sketch on a torn piece of off-white paper, placed on a white background. The sketch is done in black ink and depicts a rectangular device with various controls on its surface" . . . "The context appears to be a conceptual drawing or a schematic of some kind of electronic gadget. There are no people, animals, or a specific environment depicted, other than the white background and the torn paper."

→ STEP 2: HUMOR MODULE

With the help of a complex "humor ideation" prompt, CAPP·E generates a highly detailed (and hopefully funny) plan to alter the image.

"This is an unusual subject. Instead of placing a photorealistic capybara in the scene, it would be funnier and more surprising to replace the sketched object with a single, large capybara in the same sketched style" . . . "the rough, torn bottom edge of the original device will be transformed into the slightly ragged, fur-textured bottom of the capybara, blending naturally with the torn paper" . . . "replacing a functional-looking device with a passive, relaxed capybara is inherently funny and unexpected. It takes the seriousness of a schematic and injects pure, unadulterated capybara chill" . . . "a single, well-placed capybara can achieve maximum comedic impact without overwhelming the scene . . . "

→ STEP 3: PROMPT FORMATTER

Then, CAPP·E turns that plan into a properly-formatted text prompt detailing in exact language how to edit the original image.

"Replace the rectangular device with a single, large capybara in a relaxed 'sploot' position. The rounded top edge of the sketch should become the rounded back of the capybara. The rough, torn bottom edge of the device transforms into the slightly ragged, fur-textured bottom of the capybara" . . . "the horizontal rectangular slot becomes a contented capybara mouth" . . . "the diagonal shading on the device becomes subtle shading on the capybara's fur" . . . "the thin black lines extending from the right side become small tufts of capybara fur or stylized whiskers on its flank."

→ STEP 4: IMAGE GENERATOR

The original image is edited by the model according to the generated prompt, and that image is sent back to the user via iMessage!

The generated image is automatically sent to the user.

Wrapping Up

I built CAPP·E 2 to answer this question: Can you prompt an AI to be creative and funny?

The answer is a resounding yes. It reasons about the image context and decides how to add capybaras in surprising, contextually-appropriate ways, including but not limited to replacing people and pets, reshaping objects in the environment, subtle "easter egg" background swaps, and total environmental changes. Rather than following rigid rules, the AI is making creative decisions. And the results are pretty funny, to boot!

Note on access: CAPP·E 2 is currently deployed as an iMessage service for friends and family. Contact me for access. Public iMessage access and web interface coming soon!

Tech: Python, Flask, Gemini API (vision + generation), iMessage bridge, macOS automation, capybara maximalism

You may also like