Parallel Processing with Multithreaded Node.js

Parallel Processing with Multithreaded Node.js

Table of contents

No heading

No headings in the article.

Whether you're a few days in or a few years, a lot of people can't seem to understand how single-threaded NodeJS can compete with multi-threaded backends.

To identify the reasons, we have to understand what it really means when we say Node.js is single-threaded.

JavaScript itself was originally created to do basic things like validating forms, making things responsive, etc., and it was only in 2009 that Node.js creator Ryan Dahl made it possible to use JavaScript to write server-side code.

Server-side languages that support multithreading have all manners of structures and constructs in place for syncing values between threads and other thread-oriented features.

Supporting those things meant JavaScript would have needed to change the entire language which was never the plan of it's creator. So in order for plain JavaScript to support multi-threading, Dahl had to create a workaround. Let's dive in!


How does Node.js actually work?

Node.js uses two kinds of threads: a main thread handled by the event loop and several auxiliary threads in a worker pool.

The Event loop is the infrastructure that takes callbacks (functions) and registers them to be executed at a later time in the future. It executes in the same thread as the JavaScript code itself. When a JavaScript operation blocks the thread, the event loop is also blocked.

A Worker pool is an execution model that spawns and handles separate threads, it then synchronously performs the task and return the result to the event loop. The event loop then executes the provided callback with said result.

Basically worker pools handles asynchronous I/O operations - mainly, interactions with the system's disk and network. A few modules use worker pools out the box such as fs (I/O-heavy) or crypto (CPU-heavy). Worker pool is implemented in libuv, which results in slight delays whenever Node needs to transmit data internally between JavaScript and C++, but this is hardly noticeable.

The implication of both systems is that we are able to write code like this:

fs is a nodejs module

In the code above we don't have to synchronously wait for an event. We delegate the task of reading the file to the worker pool and call the provided function with the result. Since worker pool has its own threads, the event loop can continue executing normally while the file is being read.

This is seemingly sufficient until it is necessary to synchronously execute a complex operation - any function that takes too long to run will block the thread. If in your application you have functions of that nature, it could significantly reduce the throughput of your server or even completely freeze it. In this scenario we have no way of delegating the load to the worker pool.

Tech fields that required complex calculations - such as AI, machine learning, or big data - couldn't really use Node.js efficiently due to the operations blocking the main (and only) thread, making the server unresponsive. This was the case up until Node.js v10.5.0 came out in June 2018, which added support for multiple threads and opened up new possibilities for JavaScript.


I introduce to you: worker_threads

With the release of Node.js 10.5.0 came about worker_threads. It enables the creation of simple multi-threaded applications in JavaScript. Threads are pretty simple and, very importantly, fun.

The worker_threads module is a package that allows us to design fully functional multithreaded Node.js applications. A thread worker is a piece of code (usually taken out of a file) spawned in a separate thread.

It is important to note that the terms thread worker, worker, and thread are often used interchangeably; they all refer to the same thing.

image

Worker Threads in Node.js is useful for performing heavy JavaScript tasks. With the help of threads, Worker makes it easy to run JavaScript codes in parallel making it much faster and efficient. We can do heavy tasks without even disturbing the main thread.

Worker threads were not introduced in the older versions of Node. Therefore, first update your Node.js for getting started.

Now create two files for implementing the thread as shown below:

Filename: worker.js

// workerData is used for fetching the data from the thread and parentPort is used for manipulating the thread const { workerData, parentPort } = require('worker_threads');

// log some stuff console.log(``Write-up on how ${workerData} wants to chill with the big boys``);

// The postMessage() method is used for posting the given message in the console by taking the filename as fetched by workerData parentPort.postMessage({ filename: workerData, status: 'Done'});

Filename: index.js

const { Worker } = require('worker_threads');

// function runService() runs the worker thread and returns a Promise const runSerice = (workerData) => { return new Promise((resolve, reject) => { const worker = new Worker('./worker.js', { workerData }); worker.on('message', resolve); worker.on('error', reject); worker.on('exit', code => { if (code !== 0) reject(new Error(``Worker Thread stopped with exit code ${code}``)); }); }); }

// function run() is used for calling the function runService() and giving the value for workerData const run = async () => { const result = await runSerice('Tunde Ednut'); console.log(result); }

run().catch(err => console.error(err));

Output:

node index.js

Full working code on github: github.com/ocdkerosine/multithreaded-node-js


Conclusion By delegating heavy CPU computations to other threads, we can significantly increase our server's throughput and worker_threads provide a fairly easy way to add multithreading support to production applications.

With the official threads support, we can expect more developers and engineers from data and compute intensive fields to start harnessing the added power and notorious speed of Node.js.

Read More Looking to scale from local first development to production first coding? Join us at Kerosine Coding today!