pwshub.com

Using AI to Roast Your Photos

Chalk this up as another of my "this is probably not a good idea, but it's fun" blog posts. A few weeks back my buddy and ColdFusion Evangelist Mark Takata shared a fun little thing he did with GenAI - using it to roast himself. That immediately set me off on a quest to see just how much fun I could have with the idea. Now, to be clear, I do not like mean people. But having a disembodied set of code routines roast me? Sounds perfect.

Back in December last year, I built an experiment where I used the device camera on a mobile web app and asked Google Gemini what kind of cat breed was in the picture: Using Generative AI to Detect Cat Breeds

That experiment worked really well actually. The only real issue I ran into involved the size of the image I was sending to the API. When the Gemini API first came out, you could only do multimodal prompts by converting your image to base64 and including it in the prompt. That had a max file size limit that was easy to reach with images taken on a good mobile camera. I ended up doing client-side resizing in JavaScript to get around it.

Now, however, Gemini has a proper "Files" API. I first wrote about this back in May: "Using the Gemini File API for Prompts with Media"

I decided to take that existing front end app, remove the resize portion, and hook it up to a new back end. How did it work?

Omg really, really, well:

An example from the app showing the roast.

Brutal. And I love it. Alright, so how about the code...

The Front End

I'm not going to share all of the code on the front end as it's pretty much the same as before. I will remind folks that you can easily add camera support via the web on mobile like so:

<input type="file" capture="camera" accept="image/*" @change="gotPic" :disabled="working">

What's nice is that on desktop, that just reverts to a file picker. (And to be clear, you can ask for a web cam on desktop too, but this was all I needed for the moment.)

On the JavaScript side, I'm using Alpine, and basically I just read in the data to base64 and pass it to the server:

document.addEventListener('alpine:init', () => {
  Alpine.data('cameraRoast', () => ({
	imageSrc:null,
	working:false,
	status:'',
    async init() {
		console.log('init');
    },
	async gotPic(e) {
		let file = e.target.files[0];
		if(!file) return;
		let reader = new FileReader();
		reader.readAsDataURL(file);
		reader.onload = async e => {
			this.imageSrc = e.target.result;
			this.working = true;
			this.status = '<i>Sending image data to Google Gemini...</i>';
			let body = {
				imgdata:this.imageSrc
			}
			let resp = await fetch('/roast', {
				method:'POST', 
				body: JSON.stringify(body)
			});
			let result = await resp.json();
			console.log(result);
			this.working = false;
			this.status = result.text;
		}
	}
  }))
});

I don't have any error checking here because I like to live dangerously.

The Back End

In the previous post I made use of Cloudflare Workers, but I'm a little ticked at them currently so I avoided that and just wrote some Node.js code. I'll share a link to the entire codebase at the end, but here is, roughly, what it does:

import { GoogleGenerativeAI, HarmCategory, HarmBlockThreshold } from '@google/generative-ai';
import { GoogleAIFileManager } from "@google/generative-ai/server";
import mime from 'mime';
const MODEL_NAME = "gemini-1.5-pro-latest";
const API_KEY = process.env.GOOGLE_API_KEY;
const si = `
You are professional photographer and you review photos. You are incredibly mean spirited and will rarely say anything good about a photo.
`;
const genAI = new GoogleGenerativeAI(API_KEY);
const model = genAI.getGenerativeModel({ model: MODEL_NAME, 
	systemInstruction: {
		parts: [{ text:si }],
		role:"model"
	} 
});
const fileManager = new GoogleAIFileManager(API_KEY);

This first block has my imports and sets up the objects I'll use later. Of special note is the si variable which is my system instruction.

The Node.js code listens for the call to /roast and then runs this function with the data:

async function callGemini(photo) {
	/*
	I'm sent b64 data, lets to binary that thing and store it temporarily. Note
	I've hard coded a name here, it should be a UUID instead.
	*/
	photo = photo.replace(/data:.*?;base64,/, '');
	let buf = Buffer.from(photo, 'base64');
	fs.writeFileSync('./test_temp.jpg', buf);
	const uploadResult = await fileManager.uploadFile('./test_temp.jpg', {
		mimeType:'image/jpeg',
		displayName: "temp cemera content",
	});
	const file = uploadResult.file;
	const generationConfig = {
		temperature: 0.9,
		topK: 1,
		topP: 1,
		maxOutputTokens: 2048,
	};
	const safetySettings = [
		{ category: HarmCategory.HARM_CATEGORY_HARASSMENT, threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE, },
		{ category: HarmCategory.HARM_CATEGORY_HATE_SPEECH,	threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE, },
		{ category: HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT, threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE, },
		{ category: HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT, threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE, },
	];
	let text = 'Roast the attached picture.';
	const parts = [
    	{text},
		{
			fileData: {
				fileUri:file.uri, 
				mimeType:'image/jpeg'
			}
		}
  	];
	const result = await model.generateContent({
		contents: [{ role: "user", parts }],
		generationConfig,
		safetySettings
	});
	console.log(JSON.stringify(result,null,'\t'));
	try {
		if(result.response.promptFeedback && result.response.promptFeedback.blockReason) {
			return { error: `Blocked for ${result.response.promptFeedback.blockReason}` };
		}
		const response = result.response.candidates[0].content.parts[0].text;
		return { response };
	} catch(e) {
		// better handling
		return {
			error:e.message
		}
	}
}

I convert the base64 back to binary and save it. That file is then uploaded to Gemini and then I run my prompt, which is fairly simple since most of the 'definition' for how the results should work is done in the system instruction.

You can find the full code here: https://github.com/cfjedimaster/ai-testingzone/tree/main/camera_roast

More Examples

So I played around with this a bit, of course. I shared it with my brother, Alex, who took this great pic:

A guy frowning

The result was:

"Oh honey, put the camera down. The only thing appealing in this photo is that crisp blue siding in the background. Did you roll out of bed, sweat for a few hours and think "Time for my close-up"? The forced sadness expression? Get outta here. This belongs in a 'how-not-to-selfie' textbook."

And here's another result, this from a random picture I had saved already:

A cat and a tribble

Gemini loses a point though for identifying the tribble as a cat.

Anyway, it would be interesting to see what the service says if it wasn't set to "roast" mode. I don't know enough about photography to judge how well the responses would be, but if folks want to fork my code and give it a shot, let me know what you find in a comment below.

Source: raymondcamden.com

Related stories
1 week ago - Marianna Zidaric, Senior Director of Ecommerce at Spin Master, talks about how digital might have changed the way we can reach the shopper. The post Leader Spotlight: Driving demand with the digital shelf, with Marianna Zidaric appeared...
3 weeks ago - Let's talk about how to find the right balance when implementing valuable AI into products without impeding human creativity. The post Where AI enhances UX design — and where it doesn’t appeared first on LogRocket Blog.
1 month ago - An AI selfie generator is a tool that turns ordinary photos into appealing selfies with various styles and effects. It uses advanced lighting optimization algorithms, personalized touch-ups, and facial recognition to create...
2 weeks ago - Data analysis is the systematic process of collecting, organizing, examining, and modeling data to extract valuable insights. It utilizes statistical and computational techniques to identify patterns and trends within datasets. AI-powered...
1 month ago - Neda Nia, Chief Product Officer at Stibo Systems, talks about how the company has used AI to minimize liabilities around data ecosystems. The post Leader Spotlight: Reducing liabilities through strong data, with Neda Nia appeared first on...
Other stories
2 hours ago - Vivaldi web browser has arrived on the Canonical Snap Store – officially. This closed-source, Chromium-based web browser has been available on Linux since its debut in 2015, providing an official DEB package for Ubuntu users (which adds...
6 hours ago - Four Prometheus metric types that all developers and DevOps pros should know form the building blocks of an effective observability strategy
7 hours ago - The 2024 Gartner Magic Quadrant positions AWS as a Leader, reflecting our commitment to diverse virtual desktop solutions and operational excellence - driving innovation for remote and hybrid workforces.
8 hours ago - Understanding design patterns are important for efficient software development. They offer proven solutions to common coding challenges, promote code reusability, and enhance maintainability. By mastering these patterns, developers can...
8 hours ago - APIs (Application Programming Interfaces) play an important role in enabling communication between different software systems. However, with great power comes great responsibility, and securing these APIs is necessary to protect sensitive...