2024-03-18
Supporting ALSA compressed offload in PipeWire

Editor's note: this work was completed in late 2022 but this post was unfortunately delayed.

Modern day audio hardware these days comes integrated with Digital Signal Processors integrated in SoCs and audio codecs. Processing compressed or encoded data in such DSPs results in power savings in comparison to carrying out such processing on the CPU.

     +---------+      +---------+       +---------+
     |   CPU   | ---> |   DSP   | --->  |  Codec  |
     |         | <--- |         | <---  |         |
     +---------+      +---------+       +---------+

This post takes a look at how all this works.

Audio processing

A traditional audio pipeline, might look like below. An application reads encoded audio and then might leverage a media framework like GStreamer or library like ffmpeg to decode the encoded audio to PCM. The decoded audio stream is then handed off to an audio server like PulseAudio or PipeWire which eventually hands it off to ALSA.

                    +----------------+
                    |   Application  |
                    +----------------+
                            |           mp3
                    +----------------+
                    |    GStreamer   |
                    +----------------+
                            |          pcm
                    +----------------+
                    |    PipeWire    |
                    +----------------+
                            |          pcm
                    +----------------+
                    |      ALSA      |
                    +----------------+

With ALSA Compressed offload, the same audio pipeline would look like this. The encoded audio stream would be passed through to ALSA. ALSA would then, via it's compressed offload API, send the encoded data to the DSP. DSP does the decode and render.

                    +----------------+
                    |   Application  |
                    +----------------+
                            |           mp3
                    +----------------+
                    |    GStreamer   |
                    +----------------+
                            |          mp3
                    +----------------+
                    |    PipeWire    |
                    +----------------+
                            |          mp3
                    +----------------+
                    |      ALSA      |
                    +----------------+

Since the processing of the compressed data is handed to a specialised hardware namely the DSP, this results in a dramatic reduction of power consumption compared to CPU based processing.

Challenges

  • ALSA Compressed Offload API which is a different API compared to the ALSA PCM interface, provides the control and data streaming interface for audio DSPs. This API is provided by the tinycompress library.

  • With PCM there is the notion of bytes ~ time. For example, 1920 bytes, S16LE, 2 channels, 48 KHz would correspond to 10 ms. This breaks down for compressed streams. It's impossible to estimate reliably the duration of audio buffers when handling most compressed data.

  • While sampling rate, number of channels and bits per sample are enough to completely specify PCM, various parameters may have to be specified to enable the DSP to deal with multiple compressed formats.

  • For some codecs, additional firmware has to be loaded by the DSP. This has to be handled outside the context of audio server.

Requirements

  • Expose all possible compressed formats.

  • Allow a client to negotiate the format.

  • Stream encoded audio frames and not PCM.

PipeWire

PipeWire has become the default sound server on Linux, handling multimedia routing and audio pipeline processing. It offers capture and playback for both audio and video with minimal latency and support for PulseAudio, JACK, ALSA, and GStreamer-based applications.

SPA

PipeWire is built on top of SPA (Simple Plugin API), a header only API for building plugins. SPA provides a set of low-level primitives.

SPA plugins are shared libraries (.so files) that can be loaded at runtime. Each library provides one or more factories, each of which may implement several interfaces.

The most interesting interface is the node.

  • A node consumes or produces buffers through ports.

  • In addition to ports and other well defined interface methods, a node can have events and callbacks.

Ports are also first class objects within the node.

  • There are a set of port related interface methods on the node.

  • There may be statically allocated ports in instance initialization.

  • There can be dynamic ports managed with add_port and remove_port methods.

  • Ports have params which can be queried using the port_enum_params method to determine the list of formats EnumFormat, the currently configured format Format, buffer configuration, latency information, I/O areas for data structures shared by port, and other such information.

  • Some params such as the selected format can be set using the port_set_format method.

Implementing compressed sink SPA node

This section covers some primary implementation details of a PipeWire SPA node which can accept an encoded audio stream and then write it out using ALSA compressed offload API.

static const struct spa_node_methods impl_node = {
	SPA_VERSION_NODE_METHODS,
	.add_listener = impl_node_add_listener,
	.set_callbacks = impl_node_set_callbacks,
	.enum_params = impl_node_enum_params,
	.set_io = impl_node_set_io,
	.send_command = impl_node_send_command,
	.add_port = impl_node_add_port,
	.remove_port = impl_node_remove_port,
	.port_enum_params = impl_node_port_enum_params,
	.port_set_param = impl_node_port_set_param,
	.port_use_buffers = impl_node_port_use_buffers,
	.port_set_io = impl_node_port_set_io,
	.process = impl_node_process,
};

Some key node methods defining the actual implementation are as follows.

port_enum_params

params for ports are queried using this method. This is akin to finding out the capabilities of a port on the node.

For the compressed sink SPA node, the following are present.

  • EnumFormat

    This builds up a list of the encoded formats that's handled by the node to return as a result.

  • Format

    Returns the currently set format on the port.

  • Buffers

    Provides information on size, minimum, and maximum number of buffers to be used when streaming data to this node.

  • IO

    The node exchanges information via IO areas. There are various type of IO areas like buffers, clock, position. Compressed sink SPA node only advertises buffer areas at the moment.

The results are returned in an SPA POD.

port_use_buffers

Tells the port to use the given buffers via the IO area.

port_set_param

The various params on the port are set via this method.

Format param request sets the actual encoded format that's going to be streamed to this SPA node by a pipewire client like pw-cat or application for sending to the DSP.

process

Buffers containing the encoded media are handled here. The media stream is written to the IO buffer area which were provided in use_buffers. The encoded media stream is written to the DSP by calling compress_write.

add_port and remove_port

Since dynamic ports aren't supported, these methods return a ENOTSUP.

pw-cat

pw-cat was modified to support negotiation of encoded formats and passing the encoded stream as is when linked to the compressed sink node.

Deploying on hardware

Based on discussions with upstream compress offload maintainers, we chose a Dragonboard 845c with the Qualcomm SDM845 SoC as our test platform.

For deploying Linux on Embedded devices, the tool of choice is Yocto. Yocto is a build automation framework and cross-compile environment used to create custom Linux distributions/board support packages for embedded devices.

Primary dependencies are

  • tinycompress

  • ffmpeg

  • PipeWire

  • WirePlumber

The tinycompress library is what provides the compressed offload API. It makes ioctl() calls to the underlying kernel driver/sound subsystem.

ffmpeg is a dependency for the example fcplay utility provided by tinycompress. It's also used in pw-cat to read basic metadata of the encoded media. This is then used to determine and negotiate the format with the compressed sink node.

PipeWire is where the compressed sink node would reside and WirePlumber acting as the session manager for PipeWire.

Going into how Yocto works is beyond the scope of what can be covered in a blog post. Basic Yocto project concepts can be found here.

In Yocto speak, a custom meta layer was written.

Yocto makes it quite easy to build autoconf based projects. A new tinycompress bitbake recipe was written to build the latest sources from upstream and also include the fcplay and cplay utilities for initial testing.

The existing PipeWire and WirePlumber recipes were modified to point to custom git sources with minor changes to default settings included as part of the build.

Updates since the original work

Since we completed the original patches, a number of changes have happened thanks to the community (primarily Carlos Giani). These include:

Future work

  • Make compressed sink node provide clocking information. While the API provides a method to retrieve the timestamp information, the relevant timestamp fields seem to be not populated by the q6asm-dai driver.

  • Validate other encoded formats. So far only MP3 and FLAC have been validated.

  • May be the wider community can help test this on other hardware.

  • Add capability to GStreamer plugin to work with compressed sink node. This would also help in validating pause and resume.