pwshub.com

vec2pg: Migrate to pgvector from Pinecone and Qdrant

vec2pg: Migrate to pgvector from Pinecone and Qdrant

vec2pg is a CLI utility for migrating data from vector databases to Supabase, or any Postgres instance with pgvector.

Our goal with https://github.com/supabase-community/vec2pg is to create an easy on-ramp to efficiently copy your data from various vector databases into Postgres with associated ids and metadata. The data loads into a new schema with a table name that matches the source e.g. vec2pg.<collection_name> . That output table uses https://github.com/pgvector/pgvector's vector type for the embedding/vector and the builtin json type for additional metadata.

Once loaded, the data can be manipulated using SQL to transform it into your preferred schema.

When migrating, be sure to increase your Supabase project's disk size so there is enough space for the vectors.

At launch we support migrating to Postgres from Pinecone and Qdrant. You can vote for additional providers in the issue tracker and we'll reference that when deciding which vendor to support next.

Throughput when migrating workloads is measured in records-per-second and is dependent on a few factors:

  • the resources of the source data
  • the size of your Postgres instance
  • network speed
  • vector dimensionality
  • metadata size

When throughput is mentioned, we assume a Small Supabase Instance, a 300 Mbps network, 1024 dimensional vectors, and reasonable geographic colocation of the developer machine, the cloud hosted source DB, and the Postgres instance.

Pinecone

vec2pg copies entire Pinecone indexes without the need to manage namespaces. It will iterate through all namespaces in the specified index and has a column for the namespace in its Postgres output table.

Given the conditions noted above, expect 700-1100 records per second.

Qdrant

The qdrant subcommand supports migrating from cloud and locally hosted Qdrant instances.

Again, with the conditions mentioned above, Qdrant collections migrate at between 900 and 2500 records per second.

Why Use Postgres/pgvector?

The main reasons to use Postgres for your vector workloads are the same reasons you use Postgres for all of your other data. Postgres is performant, scalable, and secure. Its a well understood technology with a wide ecosystem of tools that support needs from early stage startups through to large scale enterprise.

A few game changing capabilities that are old hat for Postgres that haven't made their way to upstart vector DBs include:

Backups

Postgres has extensive supports for backups and point-in-time-recovery (PITR). If your vectors are included in your Postgres instance you get backup and restore functionality for free. Combining the data results in one fewer systems to maintain. Moreover, your relational workload and your vector workload are transactionally consistent with full referential integrity so you never get dangling records.

Row Security

Row Level Security (RLS) allows you to write a SQL expression to determine which users are allowed to insert/update/select individual rows.

For example


_10

create policy "Individuals can view their own todos."

_10

on public.todos

_10

for select

_10

using

_10

( ( select auth.uid() ) = user_id );


Allows users of Supabase APIs to update their own records in the todos table.

Since vector is just another column type in Postgres, you can write policies to ensure e.g. each tenant in your application can only access their own records. That security is enforced at the database level so you can be confident each tenant only sees their own data without repeating that logic all over API endpoint code or in your client application.

Performance

pgvector has world class performance in terms of raw throughput and dominates in performance per dollar. Check out some of our prior blog posts for more information on functionality and performance:

  • https://supabase.com/blog/pgvector-0-7-0
  • pgvector 0.6.0: 30x faster with parallel index builds
  • Matryoshka embeddings: faster OpenAI vector search using Adaptive Retrieval

Keep an eye out for our upcoming post directly comparing pgvector with Pinecone Serverless.

To get started, head over to the vec2pg GitHub Page, or if you're comfortable with CLI help guides, you can install it using pip :

If your current vector database vendor isn't supported, be sure to weigh in on the vendor support issue.

Source: supabase.com

Related stories
1 month ago - As the Supabase community has grown, so has demand for a diverse collection of client libraries and framework specific SDKs. This demand for the...
1 month ago - There's always a lot to cover in Launch Weeks. Here are the top 10, ranked by my own statistical reasoning. #10 Snaplet is now open source Snaplet...
Other stories
1 hour ago - Hello, everyone! It’s been an interesting week full of AWS news as usual, but also full of vibrant faces filling up the rooms in a variety of events happening this month. Let’s start by covering some of the releases that have caught my...
2 hours ago - Nitro.js is a solution in the server-side JavaScript landscape that offers features like universal deployment, auto-imports, and file-based routing. The post Nitro.js: Revolutionizing server-side JavaScript appeared first on LogRocket Blog.
2 hours ago - Information architecture isn’t just organizing content. It's about reducing clicks, creating intuitive pathways, and never making your users search for what they need. The post Information architecture: A guide for UX designers appeared...
2 hours ago - Enablement refers to the process of providing others with the means to do something that they otherwise weren’t able to do. The post The importance of enablement for business success appeared first on LogRocket Blog.
3 hours ago - Learn how to detect when a Bluetooth RFCOMM serial port is available with Web Serial.