pwshub.com

Using Google Gemini's File API with ColdFusion

I promise, I'm not turning this back into a ColdFusion blog, but as I prepare my presentation next week at Summit and update my Google Gemini code for some ColdFusion demos, I ran into a particularly gnarly bit that I wanted to share in a post. For the most part, I've had no issues using Gemini's REST APIs in ColdFusion, but the File API ended up being more difficult.

If you go the documentation for uploading, and use the 'Shell' language tab, you can see an example like so:

MIME_TYPE=$(file -b --mime-type "${IMG_PATH_2}")
NUM_BYTES=$(wc -c < "${IMG_PATH_2}")
DISPLAY_NAME=TEXT
tmp_header_file=upload-header.tmp
# Initial resumable request defining metadata.
# The upload url is in the response headers dump them to a file.
curl "${BASE_URL}/upload/v1beta/files?key=${GOOGLE_API_KEY}" \
  -D upload-header.tmp \
  -H "X-Goog-Upload-Protocol: resumable" \
  -H "X-Goog-Upload-Command: start" \
  -H "X-Goog-Upload-Header-Content-Length: ${NUM_BYTES}" \
  -H "X-Goog-Upload-Header-Content-Type: ${MIME_TYPE}" \
  -H "Content-Type: application/json" \
  -d "{'file': {'display_name': '${DISPLAY_NAME}'}}" 2> /dev/null
upload_url=$(grep -i "x-goog-upload-url: " "${tmp_header_file}" | cut -d" " -f2 | tr -d "\r")
rm "${tmp_header_file}"
# Upload the actual bytes.
curl "${upload_url}" \
  -H "Content-Length: ${NUM_BYTES}" \
  -H "X-Goog-Upload-Offset: 0" \
  -H "X-Goog-Upload-Command: upload, finalize" \
  --data-binary "@${IMG_PATH_2}" 2> /dev/null > file_info.json
file_uri=$(jq ".file.uri" file_info.json)
echo file_uri=$file_uri
# Now generate content using that file
curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash:generateContent?key=$GOOGLE_API_KEY" \
    -H 'Content-Type: application/json' \
    -X POST \
    -d '{
      "contents": [{
        "parts":[
          {"text": "Can you tell me about the instruments in this photo?"},
          {"file_data":
            {"mime_type": "image/jpeg", 
            "file_uri": '$file_uri'}
        }]
        }]
       }' 2> /dev/null > response.json
cat response.json
echo
jq ".candidates[].content.parts[].text" response.json

That's... a lot. :) I don't really do bash scripting at all, but I was able to guess as to a lot of what was being done here. It's a two step process - first create the file object which in turn returns a URL you can use for uploading the actual bits.

When I began replicating all of this in ColdFusion, I started off as simple as possible, and didn't think the headers shown where that important. I was wrong - they were critical. I eventually got things working and turned it into a nice little UDF:

function uploadFile(path) {
	var mimeType = fileGetMimeType(path);
	var fileSize = getFileInfo(path).size;
	var result = "";
	var body  = {
		"file": {
			"display_name":getFileFromPath(path),
			"mimeType":mimeType
		}
	};
	cfhttp(url="https://generativelanguage.googleapis.com/upload/v1beta/files?key=#application.GEMINI_API_KEY#", method="post", result="result") {
		cfhttpparam(type="header", name="Content-Type", value="application/json");
		cfhttpparam(type="header", name="X-Goog-Upload-Protocol", value="resumable");
		cfhttpparam(type="header", name="X-Goog-Upload-Command", value="start");
		cfhttpparam(type="header", name="X-Goog-Upload-Header-Content-Length", value=fileSize);
		cfhttpparam(type="header", name="X-Goog-Upload-Header-Content-Type", value=mimeType);
		cfhttpparam(type="body", value="#serializeJSON(body)#");
	}
	cfhttp(url=result.responseheader['X-Goog-Upload-URL'], method="put", result="result") {
		cfhttpparam(type="header", name="Content-Length", value=fileSize);
		cfhttpparam(type="header", name="X-Goog-Upload-Offset", value="0");
		cfhttpparam(type="header", name="X-Goog-Upload-Command", value="upload, finalize");
		cfhttpparam(type="file", name="file", file=path);
	}
	return deserializeJSON(result.fileContent).file;
}

All you need to use this is a path to a file and the function will handle the rest. (Oh, and an API key of course. The UDF reaches out to the Application scope for it which is bad, so feel free to modify the code make that an argument.) I'll note that the Files API also allows for giving a display name to a file. For my code, I just used the filename, but you could make that an optional argument to allow for nicer, friendlier names if that makes sense.

In case you're wondering why this is necessary, and for folks who are planning on attending my session next week I ask that you stop reading now so you won't be spoiled (grin), this is how you can handle basic multimodal prompts with Gemini. For example, I can upload an image and then run prompts against it. Here's a script that does just that - scan a directory and ask Gemini to describe each image:

<cfinclude template="utils.cfm">
<cfscript>
/*
I figured this out looking at the Shell tab here, https://ai.google.dev/api/files#File
*/
function uploadFile(path) {
	var mimeType = fileGetMimeType(path);
	var fileSize = getFileInfo(path).size;
	var result = "";
	var body  = {
		"file": {
			"display_name":getFileFromPath(path),
			"mimeType":mimeType
		}
	};
	cfhttp(url="https://generativelanguage.googleapis.com/upload/v1beta/files?key=#application.GEMINI_API_KEY#", method="post", result="result") {
		cfhttpparam(type="header", name="Content-Type", value="application/json");
		cfhttpparam(type="header", name="X-Goog-Upload-Protocol", value="resumable");
		cfhttpparam(type="header", name="X-Goog-Upload-Command", value="start");
		cfhttpparam(type="header", name="X-Goog-Upload-Header-Content-Length", value=fileSize);
		cfhttpparam(type="header", name="X-Goog-Upload-Header-Content-Type", value=mimeType);
		cfhttpparam(type="body", value="#serializeJSON(body)#");
	}
	cfhttp(url=result.responseheader['X-Goog-Upload-URL'], method="put", result="result") {
		cfhttpparam(type="header", name="Content-Length", value=fileSize);
		cfhttpparam(type="header", name="X-Goog-Upload-Offset", value="0");
		cfhttpparam(type="header", name="X-Goog-Upload-Command", value="upload, finalize");
		cfhttpparam(type="file", name="file", file=path);
	}
	return deserializeJSON(result.fileContent).file;
}
function promptWithFile(prompt, file) {
	var result = "";
	var body = {
		"contents": [
			{
			"role": "user",
			"parts": [
				{
				"text": prompt
				},
				{
				"file_data": { "file_uri":file.uri }
				}
			]
			}
		],
		"generationConfig": {
			"temperature": 1,
			"topK": 64,
			"topP": 0.95,
			"maxOutputTokens": 8192,
			"responseMimeType": "text/plain"
		}
	};
	cfhttp(url="https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-pro:generateContent?key=#application.GEMINI_API_KEY#", method="post", result="result") {
		cfhttpparam(type="header", name="Content-Type", value="application/json");
		cfhttpparam(type="body", value="#serializeJSON(body)#");
	}
	return deserializeJSON(result.fileContent);
}
sourceImages = directoryList(expandPath('../images'));
for(i=1;i<=sourceImages.len();i++) {
	fileOb = uploadFile(sourceImages[i]);
	writedump(fileOb);
	result = promptWithFile("what is this a picture of?", fileOb);
	writeDump(var=result,expand=false);
	imageBits = fileReadBinary(sourceImages[i]);
	image64 = toBase64(imageBits);
	writeoutput("<img src='data:image/jpeg;base64, #image64#'>");
	try {
		writeOutput(md2HTML(result.candidates[1].content.parts[1].text));
	} catch(any e) {
		/* The day I built this, Gemini was having "issues" - aren't we all? */
	}
}
</cfscript>

The function, promptForFile, shows how you can append the file as part of the prompt. Gemini can handle multiple files in one prompt as well, but for this demo I'm just showing how well Gemini can identify what's in a picture.

That's all for now. I'm thinking after this conference, I may try to wrap up my code in a simple component for folks to use. As always, you tell me if that would be useful for you!

Source: raymondcamden.com

Related stories
1 month ago - Chalk this up as another of my "this is probably not a good idea, but it's fun" blog posts. A few weeks back my buddy and ColdFusion Evangelist Mark Takata shared a fun little thing he did with GenAI - using it to roast himself. That...
1 month ago - I've looked at Chrome's on-device GenAI development a few times now, and as a feature it is moving pretty fast. In fact, that first post and my follow up both don't work anymore due to the API changing. I'm fine with that as I knew it was...
1 month ago - Way back in March of this year, I took a look at using Google's Gemini APIs to analyze PDF documents ("Using PDF Content with Google Gemini"). At the time, the Gemini API didn't support PDF documents, so I made use of our (Adobe) PDF...
4 days ago - SaaS (Software-as-a-Service) is a cloud-based software delivery model in which applications are available over the Internet rather than on local devices. SaaS is known for its scalability, cost-effectiveness, and flexibility. It helps...
2 weeks ago - `window.ai` integrates AI capabilities directly into the browser for more sophisticated client-side functionality without relying heavily on server-side processing. The post Integrating AI features in Vue using Chrome’s `window.ai` API...
Other stories
11 minutes ago - Introducing Serhii — an exceptional creative digital designer whose concepts and visual techniques evoke both tears and admiration from developers.
32 minutes ago - Influencing others isn’t about manipulation or getting your way at all costs. It’s about understanding people, building connections, and guiding conversations in a way that benefits everyone involved. Whether you’re at work, talking with...
32 minutes ago - The internet has come a long way since its humble beginnings, from a small network meant to share research to a global phenomenon that shapes the way we live, work, and play. Along the way, certain websites have left an indelible mark on...
2 hours ago - What is Hotjar? Hotjar is a product experience insight platform that helps businesses better understand their audience through visual behavior insights, interviews, and in-the-moment feedback. It offers 5 key features: heatmaps, session...
2 hours ago - Applicant Tracking System (ATS) frees hiring teams by automating tasks like resume parsing, data collection, interview scheduling, candidate ratings, onboarding, etc. Currently, the global market size of ATS is above $15 billion. By 2030,...