9th February 2024
I wrote extensive annotated release notes for Datasette 1.0a8 and LLM 0.13 already. Here’s what else I’ve been up to this past three weeks.
New plugins for Datasette
datasette-proxy-url is a very simple plugin that simple lets you configure a path within Datasette that serves content proxied from another URL.
I built this one because I ran into a bug with Substack where Substack were denying requests to my newsletter’s RSS feed from code running in GitHub Actions! Frustrating, since the whole point of RSS is to be retrieved by bots.
I solved it by deploying a quick proxy to a Datasette instance I already had up and running, effectively treating Datasette as a cheap deployment platform for random pieces of proxying infrastructure.
datasette-homepage-table lets you configure Datasette to display a specific table as the homepage of the instance. I’ve wanted this for a while myself, someone requested it on Datasette Discord and it turned out to be just a few minutes work, thanks mainly to the datasette.client.get() method.
datasette-events-db hooks into the new events mechanism in Datasette 1.0a8 and logs any events (
loginetc) to a
datasette_eventstable. I released this partly as a debugging tool and partly because I like to ensure every Datasette plugin hook has at least one released plugin that uses it.
QuickJS appears to provide a robust sandbox, including both memory and time limits! I need to write more about this plugin, it opens up some very exciting new possibilities for Datasette.
I also published some significant updates to existing plugins:
- datasette-upload-csvs got a long-overdue improvement allowing it to upload CSVs to a specified database, rather than just using the first available one. As part of this I completely re-engineered how it works in terms of threading strategies, as described in issue 38. Plus it’s now tested against the Datasette 1.0 alpha series in addition to 0.x stable.
Plugins for LLM
LLM is my command-line tool and Python library for interacting with Large Language Models. I released one new plugin for that:
- llm-embed-onnx is a thin wrapper on top of onnx_embedding_models by Benjamin Anderson which itself wraps the powerful ONNX Runtime. It makes several new embeddings models available for use with LLM, listed in the README.
I released updates for two LLM plugins as well:
I finally started hacking on a
llm-rag plugin which will provide an implementation of Retrieval Augmented Generation for LLM, similar to the process I describe in Embedding paragraphs from my blog with E5-large-v2.
I’ll write more about that once it’s in an interesting state.
I dropped into the repo to add HTTP Basic authentication support and found several excellent PRs waiting to be merged, so I bundled those together into a new release.
Here are the full release notes for shot-scraper 1.4:
--auth-username x --auth-password yoptions for each
shot-scrapercommand, allowing a username and password to be set for HTTP Basic authentication. #140
shot-scraper URL --interactivemode now respects the
-harguments setting the size of the browser viewport. Thanks, mhalle. #128
--scale-factoroption for setting scale factors other than 2 (for retina). Thanks, Niel Thiart. #136
--browser-argoption for passing extra browser arguments (such as
--browser-args "--font-render-hinting=none") through to the underlying browser. Thanks, Niel Thiart. #137
Miscellaneous other projects
- We had some pretty severe storms in the San Francisco Bay Area last week, inspired me to revisit my old PG&E outage scraper. PG&E’s outage map changed and broke that a couple of years ago, but I got a new scraper up and running just in time to start capturing outages.
- I’ve been wanting a way to quickly create additional labels for my GitHub repositories for a while. I finally put together a simple system for that based on GitHub Actions, described in this TIL: Creating GitHub repository labels with an Actions workflow.
- datasette-enrichments-quickjs 0.1a0—2024-02-09
- datasette-events-db 0.1a0—2024-02-08
Log Datasette events to a database table
- datasette 1.0a8—2024-02-07
An open source multi-tool for exploring and publishing data
- shot-scraper 1.4—2024-02-05
A command-line utility for taking automated screenshots of websites
- llm-sentence-transformers 0.2—2024-02-04
LLM plugin for embeddings using sentence-transformers
- datasette-homepage-table 0.2—2024-01-31
Show a specific Datasette table on the homepage
- datasette-upload-csvs 0.9—2024-01-30
Datasette plugin for uploading CSV files and converting them to database tables
- llm-embed-onnx 0.1—2024-01-28
Run embedding models using ONNX
- llm 0.13.1—2024-01-27
Access large language models from the command-line
- llm-gpt4all 0.3—2024-01-24
Plugin for LLM adding support for the GPT4All collection of models
- datasette-granian 0.1—2024-01-23
Run Datasette using the Granian HTTP server
- datasette-proxy-url 0.1.1—2024-01-23
Proxy a URL through a Datasette instance
- Creating GitHub repository labels with an Actions workflow—2024-02-09
- Exploring ColBERT with RAGatouille—2024-01-28
- Logging OpenAI API requests and responses using HTTPX—2024-01-26