pwshub.com

Mozilla Llamafile in Supabase Edge Functions

Mozilla Llamafile in Supabase Edge Functions

A few months back, we introduced support for running AI Inference directly from Supabase Edge Functions.

Today we are adding Mozilla Llamafile, in addition to Ollama, to be used as the Inference Server with your functions.

Mozilla Llamafile lets you distribute and run LLMs with a single file that runs locally on most computers, with no installation! In addition to a local web UI chat server, Llamafile also provides an OpenAI API compatible server, that is now integrated with Supabase Edge Functions.

Follow the Llamafile Quickstart Guide to get up and running with the Llamafile of your choice.

Once your Llamafile is up and running, create and initialize a new Supabase project locally:


_10

npx supabase bootstrap scratch


If using VS Code, when promptedt Generate VS Code settings for Deno? [y/N] select y and follow the steps. Then open the project in your favoiurte code editor.

Supabase Edge Functions now comes with an OpenAI API compatible mode, allowing you to call a Llamafile server easily via @supabase/functions-js.

Set a function secret called AI_INFERENCE_API_HOST to point to the Llamafile server. If you don't have one already, create a new .env file in the functions/ directory of your Supabase project.


_10

AI_INFERENCE_API_HOST=http://host.docker.internal:8080


Next, create a new function called llamafile:


_10

npx supabase functions new llamafile


Then, update the supabase/functions/llamafile/index.ts file to look like this:

supabase/functions/llamafile/index.ts


_31

import 'jsr:@supabase/functions-js/edge-runtime.d.ts'

_31

const session = new Supabase.ai.Session('LLaMA_CPP')

_31

_31

Deno.serve(async (req: Request) => {

_31

const params = new URL(req.url).searchParams

_31

const prompt = params.get('prompt') ?? ''

_31

_31

// Get the output as a stream

_31

const output = await session.run(

_31

{

_31

messages: [

_31

{

_31

role: 'system',

_31

content:

_31

'You are LLAMAfile, an AI assistant. Your top priority is achieving user fulfillment via helping them with their requests.',

_31

},

_31

{

_31

role: 'user',

_31

content: prompt,

_31

},

_31

],

_31

},

_31

{

_31

mode: 'openaicompatible', // Mode for the inference API host. (default: 'ollama')

_31

stream: false,

_31

}

_31

)

_31

_31

console.log('done')

_31

return Response.json(output)

_31

})


Since Llamafile provides an OpenAI API compatible server, you can alternatively use the OpenAI Deno SDK to call Llamafile from your Supabase Edge Functions.

For this, you will need to set the following two environment variables in your Supabase project. If you don't have one already, create a new .env file in the functions/ directory of your Supabase project.


_10

OPENAI_BASE_URL=http://host.docker.internal:8080/v1

_10

OPENAI_API_KEY=sk-XXXXXXXX # need to set a random value for openai sdk to work


Now, replace the code in your llamafile function with the following:

supabase/functions/llamafile/index.ts


_54

import OpenAI from 'https://deno.land/x/[email protected]/mod.ts'

_54

_54

Deno.serve(async (req) => {

_54

const client = new OpenAI()

_54

const { prompt } = await req.json()

_54

const stream = true

_54

_54

const chatCompletion = await client.chat.completions.create({

_54

model: 'LLaMA_CPP',

_54

stream,

_54

messages: [

_54

{

_54

role: 'system',

_54

content:

_54

'You are LLAMAfile, an AI assistant. Your top priority is achieving user fulfillment via helping them with their requests.',

_54

},

_54

{

_54

role: 'user',

_54

content: prompt,

_54

},

_54

],

_54

})

_54

_54

if (stream) {

_54

const headers = new Headers({

_54

'Content-Type': 'text/event-stream',

_54

Connection: 'keep-alive',

_54

})

_54

_54

// Create a stream

_54

const stream = new ReadableStream({

_54

async start(controller) {

_54

const encoder = new TextEncoder()

_54

_54

try {

_54

for await (const part of chatCompletion) {

_54

controller.enqueue(encoder.encode(part.choices[0]?.delta?.content || ''))

_54

}

_54

} catch (err) {

_54

console.error('Stream error:', err)

_54

} finally {

_54

controller.close()

_54

}

_54

},

_54

})

_54

_54

// Return the stream to the user

_54

return new Response(stream, {

_54

headers,

_54

})

_54

}

_54

_54

return Response.json(chatCompletion)

_54

})


To serve your functions locally, you need to install the Supabase CLI as well as Docker Desktop or Orbstack.

You can now serve your functions locally by running:


_10

supabase start

_10

supabase functions serve --env-file supabase/functions/.env


Execute the function


_10

curl --get "http://localhost:54321/functions/v1/llamafile" \

_10

--data-urlencode "prompt=write a short rap song about Supabase, the Postgres Developer platform, as sung by Nicki Minaj" \

_10

-H "Authorization: $ANON_KEY"


There is a great guide on how to containerize a Lllamafile by the Docker team.

You can then use a service like Fly.io to deploy your dockerized Llamafile.

Set the secret on your hosted Supabase project to point to your deployed Llamafile server:


_10

supabase secrets set --env-file supabase/functions/.env


Deploy your Supabase Edge Functions:


_10

supabase functions deploy


Execute the function:


_10

curl --get "https://project-ref.supabase.co/functions/v1/llamafile" \

_10

--data-urlencode "prompt=write a short rap song about Supabase, the Postgres Developer platform, as sung by Nicki Minaj" \

_10

-H "Authorization: $ANON_KEY"


Access to open-source LLMs is currently invite-only while we manage demand for the GPU instances. Please get in touch if you need early access.

We plan to extend support for more models. Let us know which models you want next. We're looking to support fine-tuned models too!

  • Edge Functions: supabase.com/docs/guides/functions
  • Vectors: supabase.com/docs/guides/ai
  • Semantic search demo
  • Store and query embeddings in Postgres and use them for Retrieval Augmented Generation (RAG) and Semantic Search

Source: supabase.com

Related stories
2 weeks ago - Mozilla Firefox 130 is out with a variety of changes that make this phenomenally popular open-source web browser a touch more productive. On Linux, Firefox 130 enables overscroll animations by default, having added them on other platforms...
1 month ago - A new Mozilla logo appears to be on the way, marking the company’s first major update to its word-mark since 2017. The existing logo, which incorporates the internet protocol “://” and chosen based on feedback from the community, has...
1 month ago - Mozilla Firefox 129 is now available download, and comes with a couple of features customisation fans are sure to enjoy. It’s been 4 weeks since Firefox 128 dished out a unified cookie, cache n’ data clearing experience, the ability to...
1 month ago - Discover some of the interesting features that have landed in stable and beta web browsers during July 2024.
1 month ago - Whether you’re new to CSS or have years of experience, you’ve likely encountered pseudo-classes. The most commonly recognized pseudo-class is probably :hover, which allows us to style an element when it’s in the hover state, such as when...
Other stories
2 hours ago - Ubuntu 24.10 ‘Oracular Oriole’ is released on October 13th, and as you’d expect from a new version of Ubuntu, it’s packed with new features. As a short-term release, Ubuntu 24.10 gets 9 months of ongoing updates, security patches, and...
4 hours ago - Did you know that CSS can play a significant role in web accessibility? While CSS primarily handles the visual presentation of a webpage, when you use it properly it can enhance the user’s experience and improve accessibility. In this...
5 hours ago - Design thinking workshops are your key to turning big problems into clear solutions. In this blog, I share how to run them efficiently and keep your team aligned. The post How to run a design thinking workshop appeared first on LogRocket...
5 hours ago - New memory-optimized X8g instances offer up to 3 TiB DDR5 memory, 192 vCPUs, and 50 Gbps network bandwidth, designed for memory-intensive workloads like databases, analytics, and caching with unparalleled price/performance and efficiency.
5 hours ago - Gain indispensable data engineering expertise through a hands-on specialization by DeepLearning.AI and AWS. This professional certificate covers ingestion, storage, querying, modeling, and more.