A few months back, we introduced support for running AI Inference directly from Supabase Edge Functions.
Today we are adding Mozilla Llamafile, in addition to Ollama, to be used as the Inference Server with your functions.
Mozilla Llamafile lets you distribute and run LLMs with a single file that runs locally on most computers, with no installation! In addition to a local web UI chat server, Llamafile also provides an OpenAI API compatible server, that is now integrated with Supabase Edge Functions.
Follow the Llamafile Quickstart Guide to get up and running with the Llamafile of your choice.
Once your Llamafile is up and running, create and initialize a new Supabase project locally:
_10 npx supabase bootstrap scratch
If using VS Code, when promptedt Generate VS Code settings for Deno? [y/N]
select y
and follow the steps. Then open the project in your favoiurte code editor.
Supabase Edge Functions now comes with an OpenAI API compatible mode, allowing you to call a Llamafile server easily via @supabase/functions-js
.
Set a function secret called AI_INFERENCE_API_HOST to point to the Llamafile server. If you don't have one already, create a new .env
file in the functions/
directory of your Supabase project.
_10 AI_INFERENCE_API_HOST=http://host.docker.internal:8080
Next, create a new function called llamafile
:
_10 npx supabase functions new llamafile
Then, update the supabase/functions/llamafile/index.ts
file to look like this:
supabase/functions/llamafile/index.ts
_31 import 'jsr:@supabase/functions-js/edge-runtime.d.ts' _31 const session = new Supabase.ai.Session('LLaMA_CPP') _31 _31 Deno.serve(async (req: Request) => { _31 const params = new URL(req.url).searchParams _31 const prompt = params.get('prompt') ?? '' _31 _31 // Get the output as a stream _31 const output = await session.run( _31 { _31 messages: [ _31 { _31 role: 'system', _31 content: _31 'You are LLAMAfile, an AI assistant. Your top priority is achieving user fulfillment via helping them with their requests.', _31 }, _31 { _31 role: 'user', _31 content: prompt, _31 }, _31 ], _31 }, _31 { _31 mode: 'openaicompatible', // Mode for the inference API host. (default: 'ollama') _31 stream: false, _31 } _31 ) _31 _31 console.log('done') _31 return Response.json(output) _31 })
Since Llamafile provides an OpenAI API compatible server, you can alternatively use the OpenAI Deno SDK to call Llamafile from your Supabase Edge Functions.
For this, you will need to set the following two environment variables in your Supabase project. If you don't have one already, create a new .env
file in the functions/
directory of your Supabase project.
_10 OPENAI_BASE_URL=http://host.docker.internal:8080/v1 _10 OPENAI_API_KEY=sk-XXXXXXXX # need to set a random value for openai sdk to work
Now, replace the code in your llamafile
function with the following:
supabase/functions/llamafile/index.ts
_54 import OpenAI from 'https://deno.land/x/[email protected]/mod.ts' _54 _54 Deno.serve(async (req) => { _54 const client = new OpenAI() _54 const { prompt } = await req.json() _54 const stream = true _54 _54 const chatCompletion = await client.chat.completions.create({ _54 model: 'LLaMA_CPP', _54 stream, _54 messages: [ _54 { _54 role: 'system', _54 content: _54 'You are LLAMAfile, an AI assistant. Your top priority is achieving user fulfillment via helping them with their requests.', _54 }, _54 { _54 role: 'user', _54 content: prompt, _54 }, _54 ], _54 }) _54 _54 if (stream) { _54 const headers = new Headers({ _54 'Content-Type': 'text/event-stream', _54 Connection: 'keep-alive', _54 }) _54 _54 // Create a stream _54 const stream = new ReadableStream({ _54 async start(controller) { _54 const encoder = new TextEncoder() _54 _54 try { _54 for await (const part of chatCompletion) { _54 controller.enqueue(encoder.encode(part.choices[0]?.delta?.content || '')) _54 } _54 } catch (err) { _54 console.error('Stream error:', err) _54 } finally { _54 controller.close() _54 } _54 }, _54 }) _54 _54 // Return the stream to the user _54 return new Response(stream, { _54 headers, _54 }) _54 } _54 _54 return Response.json(chatCompletion) _54 })
To serve your functions locally, you need to install the Supabase CLI as well as Docker Desktop or Orbstack.
You can now serve your functions locally by running:
_10 supabase start _10 supabase functions serve --env-file supabase/functions/.env
Execute the function
_10 curl --get "http://localhost:54321/functions/v1/llamafile" \ _10 --data-urlencode "prompt=write a short rap song about Supabase, the Postgres Developer platform, as sung by Nicki Minaj" \ _10 -H "Authorization: $ANON_KEY"
There is a great guide on how to containerize a Lllamafile by the Docker team.
You can then use a service like Fly.io to deploy your dockerized Llamafile.
Set the secret on your hosted Supabase project to point to your deployed Llamafile server:
_10 supabase secrets set --env-file supabase/functions/.env
Deploy your Supabase Edge Functions:
_10 supabase functions deploy
Execute the function:
_10 curl --get "https://project-ref.supabase.co/functions/v1/llamafile" \ _10 --data-urlencode "prompt=write a short rap song about Supabase, the Postgres Developer platform, as sung by Nicki Minaj" \ _10 -H "Authorization: $ANON_KEY"
Access to open-source LLMs is currently invite-only while we manage demand for the GPU instances. Please get in touch if you need early access.
We plan to extend support for more models. Let us know which models you want next. We're looking to support fine-tuned models too!
- Edge Functions: supabase.com/docs/guides/functions
- Vectors: supabase.com/docs/guides/ai
- Semantic search demo
- Store and query embeddings in Postgres and use them for Retrieval Augmented Generation (RAG) and Semantic Search