PWSHub News

3x Faster Stream processing

Feb. 8, 2024, 3:32 p.m.

3x Faster Stream Processing in Node.js: Boosting throughput by eliminating buffering.

Hi Folks,

This newsletter comes back to a theme I love: Node.js streams. I expect more contents about the fundamentals in the future, as I'm scavenging some old research to help folks prioritize work on Node.js itself.

3x Faster Stream Proessing

8 years ago, I explored how to make Node.js Streams significantly faster. This research never landed in Node.js, but given the renewed interest in Performance... who know? Anyhow, I was able to 3x the throughput of a Node.js stream pipeline.

A primer on Node.js streams

Node.js streams are the key way we use to transfer data using constant memory. There is a Readable side (upstream), and a Writable side (downstream), and the data flow in this direction at the maximum throughput that downstream allows. In order to accomodate to the different throughput, there are buffers on either side.

Streams manage backpressure, which could be called flow control in text books, which have a readable pause the flowing of data if the Writable own buffer has reached a threshold, which we call highWaterMark. If that happens, a Readable will accumulate data in its own buffer, up until its own highWaterMark.

Data flowing from one readable to one writable is partially useful. The power from streams come from using one or more Transform between Readable and Writable.

Transform is both a Writable and Readable at the same time, and it has both their buffers.

Here is a pipeline example:

import { createReadStream } from 'node:fs'
import { pipeline } from 'node:stream/promises'
import { Transform } from 'node:stream'
await pipeline(
  createReadStream(import.meta.filename),
  new Transform({
    transform(chunk, enc, cb) {
      this.push(chunk.toString().toUpperCase())
      cb()
    }
  }),
  process.stdout)

The problem with the above code is that by adding a Transform, you are adding two layers of buffering, which add overhead and memory usage.

The need of those two extra layers exists because backpressure is not enforced: a writable will accept data even if its buffer is full.

How SyncThrough increases throughput by 3x

Making any code faster involves reducing overhead, and SynchThrough implements the same API of transform without the use of any buffers by forcing the data transformation step to be synchronous. Lucky for us, this is the most common use case for Transform streams.

Here is a pipeline example:

import { createReadStream } from 'node:fs'
import { pipeline } from 'node:stream/promises'
import { syncthrough } from 'syncthrough'
await pipeline(
  createReadStream(import.meta.filename),
  syncthrough(function (chunk) {
    // there is no callback here
    // you can return null to end the stream
    // returning undefined will let you skip this chunk
    return chunk.toString().toUpperCase()
  }),
  process.stdout)

Benchmarks:

Here is the benchmarks results:

benchThrough2*10000: 1.680s
benchThrough*10000: 356.651ms
benchPassThrough*10000: 766.079ms
benchSyncThrough*10000: 260.583ms
benchThrough2*10000: 1.625s
benchThrough*10000: 336.915ms
benchPassThrough*10000: 745.687ms
benchSyncThrough*10000: 248.093ms

Should you contribute to Open Source?

I don't think you should contribute to Open Source to land a job, but it's often a clear way up if you don't have many opportunities.

In the video, I tell my experience

Releases

Articles

Events

Node.js HTTP Clients Masterclass

It's a new month & that means it's time for a new Platformatic masterclass! We'll be exploring the complexities surrounding HTTP Clients, looking at: ‣Choosing between libraries ‣Addressing unexpected breakages ‣Building an HTTP client with Platformatic For the chance to win a Platformatic swag bag, sign up to the masterclass, share it on LinkedIn or Twitter and tag us (@platformatic), and attend! We will announce the winner at the end of the masterclass.

Register here.

CityJS London

I'll be in London from the 2nd to the 6th of April, for Node.js Collaborator Summit and to Keynote CityJS London "The Alleged 'End' of Node.js is Much Ado About Nothing"

Tickets are available here.

Source: adventures.nodeland.dev

Related stories
2 weeks ago - GEEKOM, after becoming a brand leader in the mini PC space, has now released the IT12. We put it to the test in this detailed review.
3 weeks ago - Want to upscale your images without compromising on quality? Check out this guide to learn the best tools and techniques for preserving sharpness and detail in your photos.
5 days ago - Two months into 2024, I’ve decided to summarize the achievements in the Node.js space from 2023. As always, this list is curated by me, so I may overlook some pull requests due to the vast amount of work completed by Node.js...
1 week ago - Are you looking to create a WordPress-powered website that serves as a classified platform? If so, you're in the right place. This article showcases more
Other stories
1 hour ago - Using voice memos to brainstorm ideas is quick, and easier than ever thanks to AI-powered apps. Here's how to get started.
1 hour ago - Mistral Medium only came out two months ago, and now it's followed by Mistral Large. Like Medium, this new model is currently only available via their API. It scores well …
1 hour ago - Have you started learning React, only to face bugs that made you contemplate a career in goat herding? Don't worry – we've all been there. In this guide, you'll join me on a quest through the quirky wonders of React. I'll help you...
1 hour ago - The Geolocation API is a standard API implemented in browsers to retrieve the location of the people who are interacting with a web application. This API enable users to send their location to a web application to enable relevant...
1 hour ago - Explore the intricacies of reading and writing files in various formats, handling directories, mastering error-handling techniques, and harnessing the efficiency of streams for powerful, performance-boosting file operations.
1 hour ago - WordPress is a popular content management system (CMS) known for its flexibility and user-friendly interface. But its built-in editor is not ideal for collaborative editing. Many publishers who work with writers need collaborative writing...