Tech

Programmable Applications

Zef Hemel

22 May 2022 — 10 min read

To me, the most exciting software is the type that sparks creative use cases. Software that ends up being used in ways that its developers never anticipated.

This tends to happen primarily with software that has managed to transform itself into a platform. Rather than pretending to be able to cover every single use case under the sun, effort is put in making the product extensible by either the user itself or third parties.

This is not a novel concept. Offering an API so that others can programmatically poke at it has been a common practice for a solid few decades. Initially this happened at the operating system level (through interrupts and system calls), then progressively moved up the stack to REST APIs for SaaS products running in the cloud.

While offering public APIs has become standard, some products go beyond this, enabling even extension of the application’s user interface.

I myself became interested in the topic of extensibility about a decade ago when I worked on Cloud9 IDE. Integrated Development Environments operate in a virtually infinite problem space. They attempt to integrate your entire development environment. Therefore IDEs need to cater to different technology stacks, build tools, debugging tools, profilers, deployment tools, code analysis tools, communication tools, continuous deployment tools. The list goes on and on. There’s not a chance a single company could build an IDE that covers everything. Therefore all IDEs have an extension story. And indeed, so did Cloud9.

The Cloud9 extension model as I remember it was similar to other IDEs like IntelliJ, Eclipse at that time (and VS Code these days, although it did not yet exist). The way it worked is that you define extension points and APIs that extensions can use. Then, you effectively load a bundle of (in our case) JavaScript extension code and hope for the best. In our case that meant that extensions could define new server-side APIs with node.js, and load additional JavaScript code in the browser as well. Very flexible, but also pretty damn scary if you care about stability and security. And you probably should.

Obviously, having to host and run extension code in our infrastructure, we could not allow users to run their own extensions this way. This would a huge security risk. As a result, we always had to carefully curate extensions. Either we wrote them internally, or we carefully audited third-party ones before allowing them to be run in our infrastructure — with varying level of success. It was not uncommon for extensions to unintentionally break various pieces of UI, or to result in significant performance issues on the server. We never really found a great solution to this problem.

While code editors have a long tradition of extensibility (think Emacs and vim), these days the ability to turn products into platforms is becoming more and more widespread.

Consider the area I work in today: communication tools. Communicators like Mattermost and Slack all have some sort of extension story. Some more deep than others. Slack, for instance, offers extensibility through bots, slash commands and a limited set of UI features. Mattermost by virtue of being generally self-hosted by customers (and therefore having to worry less about the security aspects of deploying third-party extension code), in addition offers plugins. Plugs allow extension to a similar degree as the IDEs mentioned before: you can extend the server side with your own (Go) code, and have virtually unlimited flexibility in extending the UI as well. This is cool, but poses an additional portability issue. Running JavaScript code that plugs new elements into the UI works fine in a browser (or desktop app wrapping a web page), but it doesn’t really work in a native mobile app.

To summarize, running extension code alongside the rest of the application is a flexible and cheap to implement model, but challenging for a few reasons:

Security: if you control the environment in which extension code runs and you trust the extension code itself — all is fine. However if not: just allowing arbitrary code execution in a hosted application, or even in a browser tab can be dangerous.
Fragility: even if you create and carefully document extension points to hook into, it’s easy to break things even unintentionally. A CSS style may apply in surprising ways, a global variable may override something unexpected. If you allow server-side execution of extension code, this code may crash and take the server process down with it. It’s also very likely extension code will rely (without your knowledge) on undocumented APIs and break once those are changed or removed.
Portability: it is a multi device world we live in. While practically all devices have a web browser these days, many people still prefer a native experience. How do you write extensions that do not rely on the internals of a particular client and run anywhere? On the server-side there is a similar challenge. You may want to run your code on a massive AWS-scale cloud, or on the Raspberry Pi in your closet. How do you write code that abstracts from that?
Hot reloading and unloading: If you load code into your main process, it’s often hard to undo. This can be a hassle during development, often requiring to restart the server or client between updates.

A few years ago, I started to work on a hobby code editor project called Zed, which I ultimately abandoned after power houses like Microsoft entered the scene with VS code. One of the things I wanted to try to address with Zed was some of these extension-related challenges.

In Zed, I took a different approach than what most applications do today: rather than allowing extensions to have access to everything, I ran them in a pretty confined sandbox.

Sandbox all the things!

Zed ran as a Chrome App (once positioned as the “future of the desktop app” by Google, but since largely decommissioned). Chrome apps offered a sandbox feature, somewhat akin to running code in a sandboxed iframe today. Extension code would be evaled in this sandbox. The code would not have access to the DOM or anything of significance. Every interaction with the main application had to happen through carefully curated APIs (that worked through postMessage‘ing JSON objects across the sandbox boundary).

Here is an example of a basic “Git blame” extension in Zed. The zed/fs and zed/session packages proxied requests to the main application. Yes, in this case this still exposed the ability to run arbitrary shell commands, but at least the hosting application (Zed in this case) had control over what the extension could and could not do.

Beside safety and isolation — having each extension run its own sandbox had another nice side-benefit: simply reloading this sandbox and loading the new extension code resulted in a very fast and safe hot reload workflow without having to restart the editor itself.

This approach trades improved security and reduced fragility for reduced flexibility. By default, extension code can do nothing more than spin up the CPU and hog some memory. Everything it can do has to be explicitly designed upfront in two directions:

Outside in: how is code in the sandbox triggered by your application?
Inside out: how does code in the sandbox talk back to the application?

Do you want extensions to add items to your context menu? You need to offer a way to define those menu items, and specify what code it should invoke once selected. The good thing is you can now make sure these definitions make sense for all your clients, e.g. appear in a right-click context menu on desktop while appearing after a long tap in a mobile app.

Reversely, does your extension code need access to the file system to do its job? You need to expose these ability through sandbox APIs. The opportunity here is that you can explicitly specify the paths the extension has access to, without allowing everything that node.js’ fs module allows.

Less

A few years ago I was introduced to the brave new world of serverless. At the core of it: the lambda function. The idea is simple: you write code (a function) in some language (JavaScript, Python, Go, Rust), zip it up and upload it somewhere. Then, this code is triggered on demand based on some event. You don’t need to worry about where it runs and how to scale that infrastructure, it just works. In my previous job we were very productive building applications this way for a number of years.

When I joined Mattermost where a lot of our customers run our product on premise, I started to play with the idea of supporting this serverless model without relying on the cloud and hosting a significantly simplified version of this infrastructure on your own server. To me, the attractive part of serverless is in its programming model, not just its scaling model. The result of this exercise was Matterless. Matterless allows you to write simple applications that respond to events (such as HTTP requests, webhooks, websocket messages) and trigger JavaScript functions as a result. Similar to how my extensions worked in Zed, by themselves these functions can do little. To do anything useful, such as storing data, trigger new events on the event bus, they need need to make calls back to its host. These functions run in a (somewhat) sandboxed environment using Deno.

Pretty cool. If I may say so myself. I posted a few videos on Youtube pitching you the concept.

While I did not connect the dots immediately, a while ago I finally did.

There were three common aspects that were recurring:

Functions running in a sandbox. Extension code needs to be able to be dynamically loaded, unloaded and invoked. Functions need to be in some portable format. In modern times, this could be a docker container, or perhaps more lightweight as JavaScript code, or WebAssembly. Sandboxes could be hosted anywhere: in the cloud, on a server, at the CDN “edge,” in the browser, in a mobile app, depending on the use case. The sandbox could be very sophisticated like AWS lambda where it auto scales to support the number of invocations and puts in strict CPU, memory and time limits. Or it could be as simple as a WebView, or web worker.
The careful design of hooks provided by the platform: how are functions going to be triggered? How do we configure these extension points? This could be a generic event bus, where extensions specify the events to listen to. They could also be higly application and platform specific, like the ability to add extra items to a context menu and trigger functions as a result.
The sandbox APIs: how does extension code interact with its environment? What APIs do you provide to allow it to do useful things, without giving it free reign to everything?

What keeps fascinating me is the portability potential of functions. In this model, the same code could run on anything from a giant cloud server all the way down to your mobile phone — how could that be leveraged? When AWS lambda was conceived, its only relevant sandbox was the cloud: your code would run in AWS’s massive (spoiler alert) server-powered infrastructure. More recently they launched Lambda@Edge where your lambda code can run closer to the customer, at the CDN level which could be a few miles from your house. However, what about the actual client? Why not let code be run on the user’s device itself? It doesn’t get much edgier than that.

Potentially, this “where should code execute” can be determined dynamically. In some cases it may benefit from low latency for user-interaction based cases. When you type a character into a textbox, you wouldn’t necessarily want each key press to be sent to the cloud to get auto complete results, for instance. In others it may be better to move it into the cloud when large streams of data need to be processed. In some cases privacy may be a bigger concern, so you’d still want that processing to happen on device. Wouldn’t it be great if your code wouldn’t have to worry much about that and it would just figure it out?

I’ve been thinking about this problem a fair amount over the last few months. And to make progress on it I decided to, rather than put pen to paper, put keyboard to code editor.

Indeed, I’ve been writing code in my spare time again. Yes, this is also why it’s been quiet here at Zef+. I was writing TypeScript instead of prose.

It started with a proof of concept implementation of a tool that bundles some YAML definitions (similar to how the serverless framework’s serverless.yml) specifying hooks with TypeScript/JavaScript code into a JSON file. These JSON files (“plugs”) could then be loaded and executed in a sandbox. I had sandbox implementations that would make these plugs run on the server using node.js as well as in the browser in a web worker or iframe. I also got far getting it to run on mobile using React Native.

However, I paused the project for some time when I realized I didn’t have a compelling use case. What application would actually benefit from the ability to freely move code execution from the browser to the server and back again (which is where I saw a lot of the potential back then)? 🤷

Then, in my journey to find the ultimate writing and note taking app I discovered Obsidian. Obsidian is a knowledge management and note taking application. Obsidian runs as a locally installed desktop (and recently mobile) application. There’s a lot to like about Obsidian. For instance, rather than relying on some proprietary file format, it stores all its notes as markdown files on disk. I like markdown.

However, its true power comes from its very rich ecosystem of extensions. There are extensions that allow you to query your note collection as a database, for instance. Or to render your Markdown note as a Kanban board. Or to pull your Kindle highlights into your notes. That’s pretty damn cool!

How does this work, technically? Obsidian is built using JavaScript, and runs as an Electron app on desktop (and I’m guessing React Native on mobile, but I cannot be sure). To extend Obsidian you write TypeScript/JavaScript code which is loaded into the main UI and offers various hooks to extend the UI.

That same model of loading arbitrary JavaScript code again!

I spent a few hours trying to build a Ghost publishing for Obsidian, during which I figured out how this all worked under the hood. Restarting the UI to check if my code worked. Frustratedly I checked Github to find the Obsidian source code, but then realized it wasn’t actually open source. So I thought… opportunity!

The buried lede

The result is the application I’m using right now to write this very article. Half this post I wrote at my desk on my laptop. Half of it in the garden on my iPad. I will also be publishing it to Zef+ via its Ghost extension, which, indeed — I ultimately implemented for my new app instead of Obsidian.

I use it to take notes during my 1-1s. It’s the tool I use to track todos. And to follow my team’s activity on Github.

And if you ever visit me at my house and push that mysterious button in the kitchen, which toggles (short press) or switches (long press) Internet radio stations streamed to the homepods in our main living area — yeah, that’s powered by it too.

And those on-call schedules that are pulled from OpsGenie and posted to our Mattermost channel every week. Yeah, they are compiled and posted by it as well.

Obviously.

The most exciting type of software is the type that sparks creative use cases. Software that ends up being used in ways that its developers never could have anticipated. Including me.

I’d tell you more about this application, but look at the time! We’re already over 2800 words in, so more on that another time.

Programmable Applications

Zef Hemel

Sandbox all the things!

Less

The buried lede

Read more

Uncle Melv

Communication Issues

No More Armchairing

More Stories