Audio Core¶
Introduction¶
The SPARK Audio Core is the piece of software used by all SPARK-based audio applications. Its role is to manage audio samples flow from their recording to their playback, including audio processing and buffering. The Audio Core will mostly be used in conjunction with the SPARK Wireless Core, but it has been built in such a way that it is not dependant on it. A few abstractions like audio pipelines, audio endpoints and audio processing stages make it flexible and decoupled from other libraries and hardware.
Most of the Audio Core features are demonstrated using application examples found in the SPARK SDK. Those are the best place to start learning about how to use the Audio Core while reading through this documentation. Sections in the documentation have been ordered from a top-down perspective, which means it starts by speaking about application requirements, then the Audio API and finally the Audio Core itself.
End-User Application¶
The end-user’s application must take care of a few things before the Audio Core can be operational. First, it must initialize and configure the endpoints it wishes to use. In the case of endpoints that rely on hardware, such as a hardware audio codec, the application would need to:
Initialize MCU’s peripherals that interface with it like the I2S and I2C controller
Configure it to have, for example, the proper sampling rate and bias voltage
Manage its interrupt service requests or the ones of the MCU’s peripherals
In most cases, the application will also use the SPARK Wireless Core as an endpoint for sending or receiving audio samples over the air. If so, the Wireless Core will need to be configured by the application using the Wireless Core API.
Once the endpoints dependencies are taken care of, the application will need to make sure it can make available a chunk of RAM big enough for the Audio Core to initialize properly. The Audio Core will then be able to dynamically allocate memory from this pool to create the audio buffers amongst other things. The chunk of RAM is usually just a statically allocated bytes array.
Finally, Audio Core objects can be created and configured. Those are the endpoints themselves, the processing stages, and the pipelines.
A few audio application examples are available in the SPARK SDK that the user can use as a starting point.
Audio Core Components¶
The Audio Core is composed of several interacting components that will be described in this section.
Audio Pipeline¶
The audio pipeline is the main abstraction. It puts into relation audio endpoints and audio processing stages.

Figure 28: Audio Pipeline¶
As you can see in the above picture, an audio pipeline has an audio producing endpoint, an arbitrary number of processing stages and an audio consuming endpoint. In most cases, if an audio device records and playbacks audio, it will need two audio pipelines. In some specific use cases, an audio pipeline can have more than one consumer and more than one producer.
Audio Endpoint¶
An audio endpoint is something, either hardware or software, that deals with audio samples. It can either produce them or consume them. For example, an audio endpoint could be:
A hardware audio codec (e.g., MAX98091)
A network stack (e.g., SPARK Wireless Core)
An USB stack with the USB-Audio class
A dummy audio codec which generates a pre-recorded sin wave
An I2S interface used to create an audio pipe
To be considered a valid audio endpoint, one must adhere to a specific endpoint interface. The Audio Core includes two endpoints that a user can use for his audio pipelines. The user can also create his own endpoints to suit his application needs.
Here is the list of what makes an endpoint:
An instance specific to the endpoint. This instance enables the creation of multiple endpoints of the same type, useful when the same endpoint is used for producing samples in a pipeline, and consuming samples in another. The instance can also contain configuration parameters.
A name to describe the endpoint.
An endpoint interface which is the standard set of functions the endpoint must implement.
An endpoint is either used as a producer or as a consumer, but never both. If an audio device uses a single hardware codec to playback and record audio, for example, two endpoints need to be instantiated.
The endpoint interface is a set of three functions that each endpoint must implement to be compatible with the Audio Core. These functions are:
Action: Used by the Audio Core when it needs the endpoint to produce audio samples (e.g., recording of samples for an audio codec) if the endpoint is a producer, or when it needs the endpoint to consume audio samples (e.g., playback of samples for an audio codec) if the endpoint is a consumer.
Start: Used by the Audio Core to start the production or consumption of audio samples of the endpoint. For some endpoints (e.g., SPARK Wireless Core), this can do nothing.
Stop: Used by the Audio Core to stop the production or consumption of audio samples of the endpoint. For some endpoints (e.g., SPARK Wireless Core), this can do nothing.
Audio Processing Stage¶
An audio processing stage represents operations that will be applied on audio samples after they are created but before they are consumed. Any number of processing stages can be chained together. The output of one processing stage is the input of the next one, which means that the order used to add processing stages to a pipeline matters. Each processing stage has requirements for their inputs. Some will expect to process raw audio samples, some could expect the audio to have gone through an encoder and others could expect to process non-audio information, such as user data. For example, when receiving compressed audio, the decompression processing stage needs to be added first, then the digital volume control. If user data is used, it would be added last for a wireless core consumer and first for a wireless core producer.
To be considered a valid audio processing stage, one must adhere to a specific audio processing interface. The Audio Core includes three processing stages that a user can add to his audio pipelines. The user can also create his own processing stages to suit his application needs.
Here is the list of what makes a processing stage:
An instance specific to the processing stage. This instance enables the creation of multiple processing stages of the same type.
A name to describes the processing stage.
A processing stage interface which is the standard set of functions the processing stage has to implement.
The processing stage interface is a set of four functions that each processing stage must implement to be compatible with the Audio Core. These functions are:
Init: Used by the Audio Core to initialize the processing stage. This can be left empty if no specific initialization steps are needed.
De-init: Used by the Audio Core to de-initialize the processing stage. This can be left empty if no specific de-initialization steps are needed.
Ctrl: Used by the application to interact with the processing stage. This can be used to get information from (e.g., get statistics) or modify its behavior (e.g., raise the volume).
Process: Used by the Audio Core to modify audio samples.
Gate: Used by the Audio Core to know if the processing stage should be run or skipped. If the gate function returns true, the processing stage will execute. If the gate function returns false, the processing stage will be skipped. If the gate function is set to NULL, the processing stage will always execute. The gate function is mostly used when a processing stage needs to be activated/deactivated on the fly. A common use case is Fallback mode.
When the Audio Core processes a pipeline, it will cycle through every registered processing stage and call their Process function (only if their Gate function returns true or if it is set to NULL), specifying the size of the chunk and passing the samples through data_in. Once the processing is done, the Process function will return the processed samples through data_out and return the number of bytes processed through its return value. If Process has not done anything on the samples, it will return 0. The Audio Core will then know that there is nothing valid from data_out. In this case, data_in will be passed to the next stage instead of data_out.
The audio header is passed to the Process function since it may contain necessary information that the processing stage needs.
Clock Drift Compensation¶
Clock Drift Compensation (CDC) is mechanism used to handle situations when the audio clocks on both the record and playback devices are not exactly matched. A relevant example is when streaming audio through a SPARK RF link. One board will record the samples and the other one will play them back. The audio clock is asynchronous between both boards. This means that even if both audio master clocks are theoretically the same frequency and a sampling rate of 48 kHz is configured, the effective sampling rate on both boards could be slightly different due to crystal tolerances. This could lead to a device having a sampling rate of 47.980 kHz while the other has 48.030 kHz for example. In such situations, the clock drift needs to be compensated or some audio glitches will happen.
The CDC is done in two stages: drift detection and drift correction. The detection mechanism is basic and tailored around the use of a SPARK wireless link. For correction, the Audio Core provides two different approaches; one is software-based and the other is hardware-based.
Drift Detection Mechanism
The default detection mechanism provided by the Audio Core is the monitoring of the audio buffer (consumer queue) on the receiving end and averaging its load over time (sliding window). If the average is below a certain threshold (depleting) or over a certain threshold (filling), the audio clock will be considered “drifting” and a corrective action will be triggered. The audio header contains a bit called “TX queue level”. If that bit is set, the receiver/playback device will conclude that the audio buffer load variation is caused by bad RF link conditions (and not clock drifting), and therefore no corrective actions will be triggered in that case.
Drift Correction Mechanisms
The Audio Core provides two different approaches to drift correction: software resampling and hardware clock steering. The former can be used on any hardware whereas the latter requires specific hardware features.
Software Resampling
The SPARK resampling library uses a fixed block size (e.g., 1440 samples) to define the resampling period. At the end of the resampling period, a new sample is either added or dropped. This sample manipulation has minimal impact on audio quality. A sample is dropped when the audio clock on the playback device is slower than the one on the recording device, and a sample is added when the audio clock on the playback device is faster than the one on the recording device. CDC must be enabled only on the playback device and not on the recording device.
Clock Steering
This mechanism leverages the hardware’s ability to precisely adjust the source clock frequency (fractional PLL). Functions for incrementing or decrementing the source clock frequency must be provided by the application through the CDC instance. When a drift is detected, a momentaneous adjustment of the hardware source clock frequency will be applied to correct the buffer load. This process can be repeated several times. Once stabilized, a permanent offset will be added to the clock frequency to reduce future drift.
Warning
The CDC detection mechanism provided by the Audio Core does not support use cases with a single producer providing the same data to multiple consumers on the same device (e.g., codec as source (producer) and 4 different wireless connections as consumers).
Audio Compression¶
This processing stage can compress and decompress audio samples using Adaptive Differential Pulse-Code Modulation (ADPCM). ADPCM was initially developed for speech coding, but often offers acceptable performance for music coding. It can be used to save a significant amount of bandwidth as it reduces 16-bit samples into 4-bit codes.
To use ADPCM, a compression instance must be created and added to an audio recording pipeline. On the playback audio device, a decompression instance must be created and added to the audio playback pipeline. This processing stage works for mono and stereo streams, but it must be specified when configuring the instance.
Digital Volume Control¶
This processing stage will digitally modify the amplitude of the audio samples. When the control command is called to increase or decrease the volume, the amplitude is gradually varied until the target level is reached. The maximum amplitude corresponds to the source level, which means that this processing stage cannot increase it beyond that. This also means that no processing occurs when the volume level is set to 100%.
User Data¶
This processing stage enables a user to add 1 byte of trailing data to an audio payload. This is an efficient way to have a small data channel without creating a dedicated Wireless Core connection.
To use it, a TX instance of this processing stage must be created and added to the pipeline that has the Wireless Core as a consuming endpoint. Then, whenever the control command to send a byte is called, the audio header bit called User data is valid is set and the byte is appended to the audio payload. On the receiving audio device, an RX instance of this processing stage must be created and added to the pipeline that has the Wireless Core as a producing endpoint. When an audio packet with audio header bit User data is valid set is received, the processing stage will call the callback function passed by the user to retrieve the byte. The callback function is not called when the actual audio payload is played back, but when it is received. Hence, the RX audio buffer latency won’t affect the latency of this data transfer.
Sampling Rate Converter¶
This processing stage can modify audio samples in such way that they will look as if they were initially sampled at a different rate. This is useful when an audio device must handle audio streams with different sampling rates. A common use case is a gaming headset, where the main audio stream could be sampled at 48 kHz and the microphone stream at 16 kHz.
Note
The current implementation only supports sampling rate conversions on mono audio streams. It is also using the CMSIS DSP Software Library for its interpolation and decimation functions making it incompatible with non-Arm Cortex-M processor-based devices.
Audio Header¶
The Audio Core encapsulates the audio samples with a header before they are consumed by encapsulation enabled endpoints such as the Wireless Core. Its fields are:
Field |
Length |
Description |
---|---|---|
TX queue level high |
1 bit |
Used by the CDC mechanism. This bit is set when the TX audio buffer load reaches a certain level indicating that the wireless link is bad. |
User data is valid |
1 bit |
This indicates to the receiving audio device that the last byte of the audio packet is a valid data byte that can be retrieved and passed to the application. |
Fallback |
1 bit |
This indicates to the receiving audio device that the audio payload has been modified on the fly (e.g. compressed when it was originally uncompressed). |
Reserved |
1 bits |
Reserved for future use. |
CRC4 |
4 bits |
This is the 4-bit CRC used to validate the integrity of the audio header. |
Payload size |
8 bits |
The size, in bytes, of the audio samples carried in this audio packet. This size does not take into account the audio header nor the user data. |
Audio Module¶
An audio module is a component that adds functionality to the Audio Core. Some audio modules such as Audio Mixer and Audio Fallback are described below.
Audio Mixer¶
The Audio Mixer module merges two or more audio streams into a single stream using an audio mixing algorithm. To enable audio mixing, at least three Audio Pipelines using Audio Mixer Endpoints are required.
Audio Mixer Pipeline
The Audio Mixer Pipeline is different from a regular Audio Pipeline as it is split in two stages. The first stage is the Input Mixer Pipeline, and the second stage is the Output Mixer Pipeline.
The number of Input Mixer Pipelines matches the number of audio input streams that require mixing and only one Output Mixer Pipeline is present.
An example of an Audio Mixer Pipeline with two audio stream inputs is shown below:

Figure 29: Audio Mixer Pipeline¶
The Input Mixer Pipeline Producer Endpoint generates an audio stream, either from the SPARK Wireless Core or an audio codec like any regular audio pipeline. Audio packets are then sent into the processing stages before being stored in the Mixer Consumer Endpoint’s queue.
The Output Mixer Pipeline Producer Endpoint fetches the packets from the two endpoints that precede it. The packets are mixed and sent to the next processing stage.
Mixer Consumer Endpoint
This endpoint is part of the Mixer Input Stage and acts as a buffer; it does not have any “consume” action. Its queue is linked to the Mixer Producer Endpoint in the Output Mixer Pipeline. The length of the queue determines the audio latency.
Mixer Producer Endpoint
This endpoint is instantiated once per input stream (Input Mixer Pipeline). Its queue is linked to the respective consumer queue from the previous stage. This type of endpoint does not have any “produce” action.
Mixer Process
When a Mixer Producer Endpoint is used, a Mixer Process is performed on the audio samples before sending to other downstream processes. The Mixer Process will call the Audio Mixer Module to mix the audio samples.
Audio Mixer Module
The Audio Mixer Module is used by the application to configure audio mixing. It can be configured with the following parameters:
Number of inputs: This parameter is the number of inputs to be mixed. The currently supported number of inputs is 2 and 3.
Payload size: This parameter configures the audio payload size that must match the Output Mixer Pipeline’s consuming endpoint.
Bit depth: This parameter corresponds to the bit depth of the samples in the payload.
The configuration parameters are applied to the Audio Mixer Module upon initialization. The instance also includes the following parameters:
Input Samples Queue: This queue stores the incoming samples. There are as many Input Sample Queues as there are audio inputs.
Output Packet Buffer: This buffer stores the mixed samples.
Audio Mixer Module Algorithm
The Audio Mixer Module Algorithm is used by the Audio Mixer Module to mix the audio packets together. The process involves summing all the channel input values and then dividing the result by the number of input channels. To prevent clipping, this operation is done with 32-bit registers. After the division, the output result is converted back to 16 bits.
The following diagram illustrates the mixing algorithm:

Figure 30: Audio mixing algorithm¶
Audio Fallback¶
The Audio Fallback Module allows applications to dynamically adjust output power settings in response to a degrading link. Without fallback, if the link degrades beyond a certain point, audio packets will be lost. With fallback enabled, we can avoid dropping packets by dynamically enabling a processing stage which reduces the packet size. With smaller packets, higher transmit power settings can be used. Therefore, the effective range of the system is increased, at the cost of a lower quality audio stream. Smaller packets also reduce airtime and that could allow an application to increase the amount of CCA retries, resulting in more packets getting through.
The table below shows an example of a configuration for an arbitrary stereo audio application:
Parameter |
Level 1, Uncompressed |
Level 2, Compressed |
---|---|---|
Payload size |
64 |
22 |
Pulse Count |
1 |
2 |
Pulse Width |
7 |
4 |
Pulse Gain |
0 |
0 |
CCA Try Count |
2 |
3 |
The RF power transmitted by a SPARK UWB radio is determined by three settings: Pulse Count, Pulse Width and Pulse Gain (which is really an attenuation control). The table shows how the power settings and CCA Try Count are influenced by payload size. In normal mode, the settings for uncompressed data are used, whereas in fallback mode, the settings for compressed data are used.
A payload size of 64 bytes requires that the power settings be set to (1, 7, 0) and the airtime allows for 2 CCA retries.
A reduced payload size of 22 bytes allows us to increase the power settings to (2, 4, 0) and allows for 3 CCA reties due to shorter airtime. The link will benefit from higher range with the increased power settings.
The Wireless Core will automatically switch between the 2 setting configurations based on the payload size received from the Audio Core.
Fallback ON Trigger
Fallback ON is triggered by a rise in the transmitter’s TX queue load.
A normal TX queue load is either 0 or 1. When the queue level increases to 2, it indicates that the queue is accumulating packets. This happens when the link is weakening and retransmissions are occurring, slowing down the normal flow. The sampled TX queue load is stored in a circular array and averaged. The array size is set to 3 by default. The load values are multiplied by a factor of 10, so that a finer average can be determined over the averaging period.
The detection of link quality at the transmitter is determined as follows:
TX queue load values are multiplied by 10.
A rolling average of the last 3 sampled TX queue loads is calculated every time the array is updated.
If the rolling average becomes greater than a predetermined threshold, then Fallback ON is triggered.
At startup, the processor must wait for the sampling array to fill before performing the link quality algorithm.
The following table shows an example of link quality evaluation for a Trigger Threshold set to 13.
TX Queue Loads |
Running Average |
Threshold |
Fallback State |
---|---|---|---|
(10, 0, 20) |
10 |
13 |
OFF |
(10, 20, 10) |
13.3 |
13 |
ON |
A threshold value of 13 works well for unidirectional links where the timeslots are contiguous. In bidirectional links where there is a time gap between transmissions, a value of 23 could be used. The final value can be fine-tuned on a per-application basis.
The system should be pessimistic at start up to optimize the chances of successful communication when the system is brought up in a poor link environment. As such, fallback state should initially be ON until the Fallback OFF state is triggered.
Fallback OFF Trigger
Fallback OFF trigger is based on 2 metrics: the receiver’s Link Margin and the transmitter’s CCA retries.
Link margin values and CCA retry counts are obtained from the Wireless Core using the swc_connection_get_fallback_info() function, which returns the following:
typedef struct swc_fallback_info {
int32_t link_margin; /*!< Link margin value */
uint32_t cca_fail_count; /*!< CCA fail count value */
} swc_fallback_info_t;
The receiver device must return its link margin value to the transmitter device using the auto-reply mechanism. Although this value is stored as a 16-bit variable, it can be sent using only 1 unsigned byte if the programmer makes sure to saturate the link margin at 255. The link margin threshold used to turn OFF fallback is much lower than 255.
The link margin values are averaged and must be above a predefined threshold for a continuous period of 3 seconds to consider it good.
The CCA fail count is also averaged and must be below 5% of the maximum number of retries for a continuous period of 3 seconds to be considered good.
When both the link margin status and CCA count status are good, indicating good wireless conditions, the Fallback OFF trigger is activated.
Audio Processing Gate Functions
Processing stage gate functions are used in both the Coordinator and Node devices.
For example, in an application using audio compression as a fallback mechanism, the Coordinator would have two processing stages, each having their own gate function. The first processing stage performs audio compression when fallback mode is ON. It also sets a flag in the audio header to indicate when the packet is compressed. The second processing stage performs audio compression only on the last sample when fallback mode is OFF. This implementation is used to keep the compression algorithm state up to date. This ensures a seamless switch between compressed and uncompressed modes.
In the Node, the gate function for the decompression processing stage ensures that only packets that have been compressed will be decompressed.