How Passenger 6 Generic Language Support is implemented
At Phusion, we recently unveiled an important feature in version 6 of the Passenger application server. Passenger 5 supports Ruby, Python and Node.js, but v6 will support all programming languages. We call this effort generic language support, or "GLS" for short.
We held a coding livestream and Q&A session on November 1st. The session didn't only show the feature being implemented, but it also explained Passenger's code and architecture. Questions such as the following were answered:
- What does the code that's being written, actually do?
- How does it fit in the architecture?
- What are caveats (e.g. security) to look out for?
- What patterns does Passenger use to ensure high performance?
- How do certain C++ language features compare to other languages like Ruby, JavaScript, Java, etc.?
Today, we are publishing its recording, presentation files and diagrams. Scroll down in this blog post and you will find a table of contents, as well as a highlights review for the recording.
(see table of contents below)
What do you think of GLS? How would you use it? Join the discussion on Github, discuss on Hacker News or send us a Tweet!
1. Introductory presentation
The video begins with an introductory presentation (until 15:58) which covers:
- Motivation and goal
- Passenger architecture 101
- The implementation plan
You can also find the presentation on Slideshare.
1.1. Motivation and goal
-
Why did we embark on this journey? What do we want to achieve?
See also: section 1 ("Why GLS?") of the Github issue.
-
Describes the UX are we trying to implement.
See also: section 3 ("User experience") of the Github issue.
1.2. Passenger architecture 101
-
High-level Passenger architecture overview. Shows which high-level components Passenger consists of.
-
Passenger architecture 101: agent — 5:58
Deep-dives into the Passenger agent and its internal components: Watchdog, Core and more.
-
Passenger architecture 101: core — 6:35
Deep-dives into the Passenger Core and explains the responsibilities of its internal components and how they interact with each other: Controller, ApplicationPool, SpawningKit and more.
Same diagram as above applies.
-
Passenger architecture 101: request handling interaction flow — 8:27
Explains when Passenger processes a request, which components a request flows through and how they interact with each other.
-
Implementation plan: existing spawn flow (auto-supported apps) — 10:25
Turns out a lot of the groundwork for GLS had already been laid in Passenger 5.3.0! The remaining work to be done consists of activating this groundwork.
But what is the groundwork that we need to make use of? In order to understand that, we need to compare it with how the current spawn flow works, i.e. how Ruby, Python and Node.js apps are spawned. These apps are also called "auto-supported" apps because, unlike the generic option, they work automatically, and don't require configuration in Passenger.
-
Implementation plan: generic spawn flow — 12:29
Explains how the GLS groundwork — the "generic spawn flow" — works.
2. Coding session
The coding session starts at 15:58 and consists roughly of these parts:
- Apache integration mode (plus some shared code)
- Nginx integration mode
- Standalone mode
- Core Controller, ApplicationPool and SpawningKit
- Single-app mode (for Standalone mode + builtin engine)
- Compile and test
2.1. Apache integration mode
-
Apache integration mode: configuration system — 16:10
Demonstrates how to add a new Apache config option. This is much more painful than one might think. Because Apache's module system is written in C, adding a new config option requires a ton of boilerplate code — especially so if you want to keep the codebase clean and maintainable, e.g. by doing proper separation of concerns between components.
In order to fight boilerplate, and increase development velocity, we introduce a Ruby-based code generation system (which generates C/C++ code) that we wrote for the Passenger codebase.
-
Apache integration mode: application type autodetection — 24:10
The integration mode layer is the first component in the interaction sequence that needs to know what kind of app is associated with a request. With GLS, a new type of application (namely the "generic" category) has been introduced. Here we code the required modifications in the Apache integration layer.
-
Apache integration mode: passing request to Core Controller — 32:50
The Apache module must somehow pass the application type information to the Core Controller. Here we show what modifications need to be made.
Special points of interest:
- Secure headers (34:15) — The Apache module communicates with the Core Controller via HTTP. Secure headers is a mechanism we invented for passing private Apache <-> Core Controller information. This data cannot be tampered with by HTTP clients.
- StaticString (35:44) — A lightweight data structure is used throughout the Passenger codebase to ensure a zero-copy architecture. A comparison to how string management works in other languages is explored.
-
Shared application type autodetector core: AppTypeDetector — 41:17
The core logic for autodetecting application types actually lives in its own class (the AppTypeDetector), that's reused across multiple integration modes. The Apache integration layer "merely" makes use of this class. This part of the video shows the modifications that need to be made in this class.
Special points of interest:
- C++ header guards vs files/modules in other languages (46:50)
- Discussion on security concerns w.r.t. reading arbitrary files in the Passenger Core (50:00) — In particular, symlink and FIFO attacks, and possible mitigations, are discussed.
- Patterns for guaranteed resource cleanup (1:03:40) — C++ Resource Acquisition Is Initialization (RAII) compared to constructs from other languages (typically try-finally/begin-ensure)
- More security concerns: slurping & size limits (1:00:15)
- More security concerns: preventing leaking file contents via error messages and logs (1:07:20)
2.2. Nginx integration mode
-
Nginx integration mode: configuration system — 1:13:05
Just like in the Apache integration mode, writing Nginx module configuration is riddled with boilerplate. Here, too, we've invented a code generation system to combat that problem.
-
Nginx integration mode: application type autodetection — 01:16:00
Here we modify the Nginx module's content handler to allow it to detect generic apps.
-
Nginx integration mode: passing request to Core Controller — 1:22:09
Just like in the Apache integration mode, we pass the necessary information to the Core Controller via an HTTP protocol.
-
AppTypeDetector C bindings — 1:28:10
The AppTypeDetector class — which contains the core logic for detecting application types and which is shared between multiple integration modes — is written in C++. However Nginx modules must be written in C. Here we write a C binding so that the Nginx module can make use of the AppTypeDetector C++ class.
-
Nginx integration mode: passing request to Core Controller (cont'd) — 1:31:40
Continue passing the necessary information to the Core Controller via an HTTP protocol.
2.3. Standalone mode
At [position 1:35:02 in the video], we modify Passenger Standalone with the necessary changes. This mostly consists of adding the necessary configuration options.
Recall that Passenger Standalone (when using the default 'nginx' engine) just runs Passenger+Nginx under the hood. So this part demonstrates how to edit the internal Nginx config template.
This part also shows — when Passenger is used with the 'builtin' engine) — how to modify the Passenger Core CLI arguments in order to receive the new config options.
2.4. Core Controller, ApplicationPool and SpawningKit
With all the necessary changes made in the integration mode components, it's time to modify more fundamental parts of Passenger.
-
Core controller: processing info from web server / performing our own autodetection — 1:44:32
Recall that Nginx and Apache pass the request the Core Controller. The Core Controller must process that info somehow, so that's what we will implement in this part of the video.
Special points of interest:
- Pool options caching (1:48:35) — How we cache config option values for performance.
- Secure headers and how they are represented (1:51:10)
- [Linked strings (1:52:30)] — A special data structure(https://www.rubyraptor.org/pointer-tagging-linked-string-hash-tables-turbocaching-and-other-raptor-optimizations/#linked_strings) for reducing memory usage and ensuring zero-copy architecture.
-
SpawningKit configuration — 1:59:28
One caveat will be discussed: temporary C++ strings and the dangers of dangling pointers (2:01:45)
-
Terminate application process when no longer needed — 2:09:15
2.5. Single-app mode
At position 2:10:16 of the video, we describe the single-app mode, which is activated when Passenger Standalone is configured to use the 'builtin' engine.
Special points of interest:
-
The ConfigKit framework (2:10:50)
An internal framework for per-component configuration & schema composition.
Passenger is architected as a collection of loosely-coupled components. Each component is designed to be as simple as possible, and could be relatively easily extracted to be used outside of the Passenger codebase. Complex behavior is obtained by composing multiple components.
The configuration options supported in Passenger is thus a composition of the configuration options supported by each component. ConfigKit makes composing configuration options possible and robust, and introduces typechecking, compositional translation and validation.
If you've used something like React, then you may know the pains of drilling-down props. This is also one of the problems ConfigKit addresses, though in a totally different way than e.g. Redux.
-
The single-app-mode initialization function & initial autodetection (2:24:55)
2.6. Compile and test
Finally, at position 02:34:55 of the video we compile Passenger and test the feature.
Join the discussion
What do you think of GLS? How would you use it? Join the discussion on Github, discuss on Hacker News or send us a Tweet!