Stage Studio

How Stage Studio streams 1440p on less than one CPU core

Write the article. Assistant

The short answer

Stage Studio stays light because it hands the expensive work to the right silicon. Compositing happens on the GPU, and the H.264 encode runs on Apple silicon's dedicated media engine through VideoToolbox, not on your CPU cores. The result, measured live on an M4 Pro, is a 1440p30 stream that uses less than one of twelve CPU cores.

If you have ever watched a streaming app peg your CPU and turn your laptop into a space heater, that is almost always the software H.264 encoder doing the work. The interesting part of Stage Studio's architecture is how much of the pipeline never touches a general-purpose core at all. Let's walk through it.

The pipeline, end to end

A single frame travels through four stages before it leaves your machine.

The render loop runs at your target fps, 30, say, on a background queue. The on-screen preview is deliberately throttled, because nobody needs the preview repainted as often as the stream is encoded. The stream itself always stays full resolution. That split keeps the UI responsive without stealing frames from your viewers.

Why the media engine is the whole trick

H.264 encoding is the heaviest job in the pipeline. Motion estimation, transform, quantization, entropy coding, done in software, on every frame, for a live stream, it is a genuine CPU hog. That is exactly the cost x264 and most Electron-based tools pay.

Apple silicon ships with a hardware media engine built for this. It is a fixed-function block whose entire purpose is video encode and decode. When you route H.264 through VideoToolbox with the h264_videotoolbox encoder, that work lands on the media engine instead of your cores.

So the two most expensive stages each go to dedicated hardware. Compositing to the GPU, encoding to the media engine. Your CPU cores are left to do almost nothing, which is the point. They stay free for whatever you are actually demoing.

The measured numbers

Here is what 1440p30 looks like running live on an Apple M4 Pro (12-core CPU, 16-core GPU, 48 GB RAM).

MetricMeasured
CPU (total)~7%, less than one of 12 cores
CPU spent on H.264 encoderoughly 0% (it's on the media engine)
RAMabout 0.6 GB
Resolution / fps1440p at 30 fps

The detail worth sitting with is that second row. The encode, the part that would dominate a software pipeline, costs you almost no CPU, because it isn't happening on the CPU. The small amount of CPU work that remains is mostly moving composited frames over to the encoder. That's it.

For context. The default profile is 1440p30 at around 12 Mbps, and Stage Studio can output 1080p, 1440p, or 4K at 30 or 60 fps.

Multistreaming. Encode once, upload many

The naive way to stream to YouTube, LinkedIn, and Twitch at the same time is to run three encoders. That triples the most expensive part of the pipeline, and on a software stack, that is exactly where machines fall over.

Stage Studio does it differently. There is one hardware encode, and its output is fanned out to every destination using ffmpeg's tee muxer. Encode once, upload N times.

The consequence is that adding a destination does not add an encode. It adds upload bandwidth. Your CPU and your media engine do the same amount of work whether you are streaming to one platform or four, the only thing that scales is how much data leaves your network. For a multi-platform stream, that is the difference between "fine" and "fans audible from the next room."

An honest note on the ffmpeg subprocess

The RTMPS push runs through a bundled ffmpeg subprocess, and that is a deliberate choice rather than an accident. ffmpeg's RTMP handling and encoder settings are battle-tested across years of real streams. Shelling out to it for the network push let us ship a reliable MVP without reimplementing a decade of edge-case handling around RTMP.

It is not the end state. A pure in-process stack, VideoToolbox feeding an in-process RTMP implementation, no subprocess, is on the roadmap. It would tighten the pipeline and remove the process boundary. But the current design already gets the thing that matters most. The encode is on the media engine, so the subprocess is handling muxing and network I/O, not burning cores on compression.

We would rather be honest about the MVP seam than pretend the architecture sprang fully formed. The performance numbers above are real today, with the subprocess in place.

FAQ

Does the ffmpeg subprocess do the H.264 encoding?

No. The encode happens in-app via VideoToolbox on the media engine. ffmpeg handles the RTMPS push and, for multistreaming, the tee muxer that fans one encoded stream out to multiple destinations. The heavy compression work never goes to a software encoder.

Why is CPU usage so much lower than OBS with x264 or an Electron app?

Software encoders like x264 run H.264 compression on your CPU cores, which is genuinely expensive for live video. Electron apps carry a browser runtime on top of that. Stage Studio offloads compositing to the GPU and encoding to the dedicated media engine, so the cores stay mostly idle.

Does streaming to more platforms slow down my Mac?

Not in CPU or encode terms. One hardware encode is reused for every destination via the tee muxer, so adding platforms costs upload bandwidth, not extra compute. The encoder does the same work for four destinations as for one.

What do I need to run it?

macOS 14 Sonoma or later, on Apple silicon or Intel. The measured numbers above are from Apple silicon, where the media engine does the encoding. Stage Studio is free with a watermark; $49 Pro removes it.

Curious about the architecture, or want to see less-than-one-core streaming on your own machine? Download it and watch Activity Monitor while you stream, stagestudio.tv.

Try Stage Studio

A fast, native macOS streaming studio. Free with a small watermark, $49 to remove it.

Download for Mac