Audio Core

Introduction

The SPARK Audio Core is the piece of software used by all SPARK-based audio applications. Its role is to manage audio samples flow from their recording to their playback, including audio processing and buffering. The Audio Core will mostly be used in conjunction with the SPARK Wireless Core, but it has been built in such a way that it is not dependant on it. A few abstractions like audio pipelines, audio endpoints and audio processing stages make it flexible and decoupled from other libraries and hardware.

Most of the Audio Core features are demonstrated using application examples found in the SPARK SDK. Those are the best place to start learning about how to use the Audio Core while reading through this documentation. Sections in the documentation have been ordered from a top-down perspective, which means it starts by speaking about application requirements, then the Audio API and finally the Audio Core itself.

End-User Application

The end-user’s application must take care of a few things before the Audio Core can be operational. First, it must initialize and configure the endpoints it wishes to use. In the case of endpoints that rely on hardware, such as a hardware audio codec, the application would need to:

  • Initialize MCU’s peripherals that interface with it like the I2S and I2C controller

  • Configure it to have, for example, the proper sampling rate and bias voltage

  • Manage its interrupt service requests or the ones of the MCU’s peripherals

In most cases, the application will also use the SPARK Wireless Core as an endpoint for sending or receiving audio samples over the air. If so, the Wireless Core will need to be configured by the application using the Wireless Core API.

Once the endpoints dependencies are taken care of, the application will need to make sure it can make available a chunk of RAM big enough for the Audio Core to initialize properly. The Audio Core will then be able to dynamically allocate memory from this pool to create the audio buffers amongst other things. The chunk of RAM is usually just a statically allocated bytes array.

Finally, Audio Core objects can be created and configured. Those are the endpoints themselves, the processing stages, and the pipelines.

A few audio application examples are available in the SPARK SDK that the user can use as a starting point.

Audio Core Components

The Audio Core is composed of several interacting components that will be described in this section.

Audio Pipeline

The audio pipeline is the main abstraction. It puts into relation audio endpoints and audio processing stages.

Audio Pipeline

Figure 26: Audio Pipeline

As you can see in the above picture, an audio pipeline has an audio producing endpoint, an arbitrary number of processing stages and an audio consuming endpoint. In most cases, if an audio device records and playbacks audio, it will need two audio pipelines. In some specific use cases, an audio pipeline can have more than one consumer and more than one producer.

Audio Endpoint

An audio endpoint is something, either hardware or software, that deals with audio samples. It can either produce them or consume them. For example, an audio endpoint could be:

  • A hardware audio codec (e.g., MAX98091)

  • A network stack (e.g., SPARK Wireless Core)

  • An USB stack with the USB-Audio class

  • A dummy audio codec which generates a pre-recorded sin wave

  • An I2S interface used to create an audio pipe

To be considered a valid audio endpoint, one must adhere to a specific endpoint interface. The Audio Core includes two endpoints that a user can use for his audio pipelines. The user can also create his own endpoints to suit his application needs.

Here is the list of what makes an endpoint:

  • An instance specific to the endpoint. This instance enables the creation of multiple endpoints of the same type, useful when the same endpoint is used for producing samples in a pipeline, and consuming samples in another. The instance can also contain configuration parameters.

  • A name to describe the endpoint.

  • An endpoint interface which is the standard set of functions the endpoint must implement.

An endpoint is either used as a producer or as a consumer, but never both. If an audio device uses a single hardware codec to playback and record audio, for example, two endpoints need to be instantiated.

The endpoint interface is a set of three functions that each endpoint must implement to be compatible with the Audio Core. These functions are:

  • Action: Used by the Audio Core when it needs the endpoint to produce audio samples (e.g., recording of samples for an audio codec) if the endpoint is a producer, or when it needs the endpoint to consume audio samples (e.g., playback of samples for an audio codec) if the endpoint is a consumer.

  • Start: Used by the Audio Core to start the production or consumption of audio samples of the endpoint. For some endpoints (e.g., SPARK Wireless Core), this can do nothing.

  • Stop: Used by the Audio Core to stop the production or consumption of audio samples of the endpoint. For some endpoints (e.g., SPARK Wireless Core), this can do nothing.

Audio Processing Stage

An audio processing stage represents operations that will be applied on audio samples after they are created but before they are consumed. Any number of processing stages can be chained together. The output of one processing stage is the input of the next one, which means that the order used to add processing stages to a pipeline matters. Each processing stage has requirements for their inputs. Some will expect to process raw audio samples, some could expect the audio to have gone through an encoder and others could expect to process non-audio information. For example, when receiving compressed audio, the decompression processing stage needs to be added first, then the digital volume control.

To be considered a valid audio processing stage, one must adhere to a specific audio processing interface. The Audio Core includes three processing stages that a user can add to his audio pipelines. The user can also create his own processing stages to suit his application needs.

Here is the list of what makes a processing stage:

  • An instance specific to the processing stage. This instance enables the creation of multiple processing stages of the same type.

  • A name to describes the processing stage.

  • A processing stage interface which is the standard set of functions the processing stage has to implement.

The processing stage interface is a set of four functions that each processing stage must implement to be compatible with the Audio Core. These functions are:

  • Init: Used by the Audio Core to initialize the processing stage. This can be left empty if no specific initialization steps are needed.

  • Ctrl: Used by the application to interact with the processing stage. This can be used to get information from (e.g., get statistics) or modify its behavior (e.g., raise the volume).

  • Process: Used by the Audio Core to modify audio samples.

  • Gate: Used by the Audio Core to know if the processing stage should be run or skipped. If the gate function returns true, the processing stage will execute. If the gate function returns false, the processing stage will be skipped. If the gate function is set to NULL, the processing stage will always execute. The gate function is mostly used when a processing stage needs to be activated/deactivated on the fly. A common use case is Fallback mode.

When the Audio Core processes a pipeline, it will cycle through every registered processing stage and call their Process function (only if their Gate function returns true or if it is set to NULL), specifying the size of the chunk and passing the samples through data_in. Once the processing is done, the Process function will return the processed samples through data_out and return the number of bytes processed through its return value. If Process has not done anything on the samples, it will return 0. The Audio Core will then know that there is nothing valid from data_out. In this case, data_in will be passed to the next stage instead of data_out.

The audio header is passed to the Process function since it may contain necessary information that the processing stage needs.

Clock Drift Compensation

Clock Drift Compensation (CDC) is mechanism used to handle situations when the audio clocks on both the record and playback devices are not exactly matched. A relevant example is when streaming audio through a SPARK RF link. One board will record the samples and the other one will play them back. The audio clock is asynchronous between both boards. This means that even if both audio master clocks are theoretically the same frequency and a sampling rate of 48 kHz is configured, the effective sampling rate on both boards could be slightly different due to crystal tolerances. This could lead to a device having a sampling rate of 47.980 kHz while the other has 48.030 kHz for example. In such situations, the clock drift needs to be compensated or some audio glitches will happen.

The CDC is done in two stages: drift detection and drift correction. The detection mechanism is basic and tailored around the use of a SPARK wireless link. For correction, the Audio Core provides two different approaches; one is software-based and the other is hardware-based.

Drift Detection Mechanism

The default detection mechanism provided by the Audio Core is the monitoring of the audio buffer (consumer queue) on the receiving end and averaging its load over time (sliding window). If the average is below a certain threshold (depleting) or over a certain threshold (filling), the audio clock will be considered “drifting” and a corrective action will be triggered. The audio header contains a bit called “TX queue level”. If that bit is set, the receiver/playback device will conclude that the audio buffer load variation is caused by bad RF link conditions (and not clock drifting), and therefore no corrective actions will be triggered in that case.

Drift Correction Mechanisms

The Audio Core provides two different approaches to drift correction: software resampling and hardware clock steering. The former can be used on any hardware whereas the latter requires specific hardware features.

Software Resampling

The SPARK resampling library uses a fixed block size (e.g., 1440 samples) to define the resampling period. At the end of the resampling period, a new sample is either added or dropped. This sample manipulation has minimal impact on audio quality. A sample is dropped when the audio clock on the playback device is slower than the one on the recording device, and a sample is added when the audio clock on the playback device is faster than the one on the recording device. CDC must be enabled only on the playback device and not on the recording device.

Clock Steering

This mechanism leverages the hardware’s ability to precisely adjust the source clock frequency (fractional PLL). Functions for incrementing or decrementing the source clock frequency must be provided by the application through the CDC instance. When a drift is detected, a momentaneous adjustment of the hardware source clock frequency will be applied to correct the buffer load. This process can be repeated several times. Once stabilized, a permanent offset will be added to the clock frequency to reduce future drift.

Warning

The CDC detection mechanism provided by the Audio Core does not support use cases with a single producer providing the same data to multiple consumers on the same device (e.g., codec as source (producer) and 4 different wireless connections as consumers).

Audio Compression

This processing stage can compress and decompress audio samples using Adaptive Differential Pulse-Code Modulation (ADPCM). ADPCM was initially developed for speech coding, but often offers acceptable performance for music coding. It can be used to save a significant amount of bandwidth as it reduces 16-bit samples into 4-bit codes.

To use ADPCM, a compression instance must be created and added to an audio recording pipeline. On the playback audio device, a decompression instance must be created and added to the audio playback pipeline. This processing stage works for mono and stereo streams, but it must be specified when configuring the instance.

Note

The compression processing stage is compatible with 16, 20 and 24-bit sample sizes. Under-the-hood, the compression algorithm (ADPCM) necessitate 16-bit samples at its input. When used with 20 or 24-bit samples, the samples are converted to 16-bit before compression and extended back to their original size after decompression.

Digital Volume Control

This processing stage will digitally modify the amplitude of the audio samples. When the control command is called to increase or decrease the volume, the amplitude is gradually varied until the target level is reached. The maximum amplitude corresponds to the source level, which means that this processing stage cannot increase it beyond that. This also means that no processing occurs when the volume level is set to 100%.

Sampling Rate Converter

This processing stage can modify audio samples in such way that they will look as if they were initially sampled at a different rate. This is useful when an audio device must handle audio streams with different sampling rates. A common use case is a gaming headset, where the main audio stream could be sampled at 48 kHz and the microphone stream at 16 kHz.

Note

The current implementation only supports sampling rate conversions on mono audio streams. It is also using the CMSIS DSP Software Library for its interpolation and decimation functions making it incompatible with non-Arm Cortex-M processor-based devices.

Mute Packet

This processing stage will delete the audio payload content if it contains only samples with a numerical value of zero. It will replace it with a byte containing the original payload size to allow the receiving pipeline to recreate the payload full of zeros.

To use this processing stage, add the processing stage at the end of the transmitting pipeline and at the beginning of the receiving pipeline. Set the is_tx parameter accordingly.

This processing stage can be useful when using a digital audio producer (e.g. USB Audio) with a SPARK Wireless Core audio consumer using IOOK as modulation. When the audio is muted, audio packets will contain zeros which do not generate any pulses when using the IOOK modulation. This makes the audio packets undetectable by another SPARK wireless device running a clear channel assessment (CCA) check.

Mute on Underflow

This processing stage will set all samples to a numerical value of zero in the payload if underflows are detected on the consumer of the pipeline.

To use this processing stage, add the processing stage at the end of the receiving pipeline. The reload_value needs to be set in the instance to define how many consecutive audio packets will be muted after an underflow is detected.

This processing stage can be useful to reduce the effect of crackling sound when the wireless link starts to cut.

Audio Packing

This processing stage will convert the audio sample bit depth in order to adapt the producer payload format either for the consumer or another processing stage.

To use this processing stage, add the processing stage to the audio pipeline on the recording and playback audio device. This processing stage works with different bit depth conversions and the packing mode must be specified when configuring the instance.

This is useful when an audio codec handles audio samples stored in 32-bits words, but the application uses a lower bit depth. In that case, a packing processing stage converts the samples in the desired bit depth.

Audio Header

The Audio Core encapsulates the audio samples with a header before they are consumed by encapsulation enabled endpoints such as the Wireless Core. Its fields are:

Field

Length

Description

TX queue level high

1 bit

Used by the CDC mechanism. This bit is set when the TX audio buffer load reaches a certain level indicating that the wireless link is bad.

Fallback

1 bit

This indicates to the receiving audio device that the audio payload has been modified on the fly (e.g. compressed when it was originally uncompressed).

Reserved

2 bits

Reserved for future use.

CRC4

4 bits

This is the 4-bit CRC used to validate the integrity of the audio header.

Payload size

8 bits

The size, in bytes, of the audio samples carried in this audio packet. This size does not take into account the audio header.

Audio Module

An audio module is a component that adds functionality to the Audio Core. Some audio modules such as Audio Mixer and Audio Fallback are described below.

Audio Mixer

The Audio Mixer module merges two or more audio streams into a single stream using an audio mixing algorithm. To enable audio mixing, at least three Audio Pipelines using Audio Mixer Endpoints are required.

Audio Mixer Pipeline

The Audio Mixer Pipeline is different from a regular Audio Pipeline as it is split in two stages. The first stage is the Input Mixer Pipeline, and the second stage is the Output Mixer Pipeline.

The number of Input Mixer Pipelines matches the number of audio input streams that require mixing and only one Output Mixer Pipeline is present.

An example of an Audio Mixer Pipeline with two audio stream inputs is shown below:

Audio Mixer Pipeline

Figure 27: Audio Mixer Pipeline

The Input Mixer Pipeline Producer Endpoint generates an audio stream, either from the SPARK Wireless Core or an audio codec like any regular audio pipeline. Audio packets are then sent into the processing stages before being stored in the Mixer Consumer Endpoint’s queue.

The Output Mixer Pipeline Producer Endpoint fetches the packets from the two endpoints that precede it. The packets are mixed and sent to the next processing stage.

Mixer Consumer Endpoint

This endpoint is part of the Mixer Input Stage and acts as a buffer; it does not have any “consume” action. Its queue is linked to the Mixer Producer Endpoint in the Output Mixer Pipeline. The length of the queue determines the audio latency.

Mixer Producer Endpoint

This endpoint is instantiated once per input stream (Input Mixer Pipeline). Its queue is linked to the respective consumer queue from the previous stage. This type of endpoint does not have any “produce” action.

Mixer Process

When a Mixer Producer Endpoint is used, a Mixer Process is performed on the audio samples before sending to other downstream processes. The Mixer Process will call the Audio Mixer Module to mix the audio samples.

Audio Mixer Module

The Audio Mixer Module is used by the application to configure audio mixing. It can be configured with the following parameters:

  • Number of inputs: This parameter is the number of inputs to be mixed. The currently supported number of inputs is 2 and 3.

  • Payload size: This parameter configures the audio payload size that must match the Output Mixer Pipeline’s consuming endpoint.

  • Bit depth: This parameter corresponds to the bit depth of the samples in the payload.

The configuration parameters are applied to the Audio Mixer Module upon initialization. The instance also includes the following parameters:

  • Input Samples Queue: This queue stores the incoming samples. There are as many Input Sample Queues as there are audio inputs.

  • Output Packet Buffer: This buffer stores the mixed samples.

Audio Mixer Module Algorithm

The Audio Mixer Module Algorithm is used by the Audio Mixer Module to mix the audio packets together. The process involves summing all the channel input values and then dividing the result by the number of input channels. To prevent clipping, this operation is done with 32-bit registers. After the division, the output result is converted back to 16 bits.

The following diagram illustrates the mixing algorithm:

Audio mixing algorithm

Figure 28: Audio mixing algorithm

Audio Fallback

The Audio Fallback Module allows applications to dynamically reduce the audio packet size to increase output power settings in response to a degrading link. Without fallback, if the link degrades beyond a certain point, audio packets will be lost. With fallback enabled, we can avoid dropping packets by dynamically enabling a processing stage which reduces the packet size. With smaller packets, higher transmit power settings can be used. Therefore, the effective range of the system is increased, at the cost of a lower quality audio stream. Smaller packets also reduce airtime and that could allow an application to increase the amount of CCA retries, resulting in more packets getting through.

The table below shows an example of a configuration for an arbitrary stereo audio application:

Table 10: Example Alternate Link Settings Used by Fallback Mode

Parameter

Level 1, Uncompressed

Level 2, Compressed

Payload size

64

22

Pulse Count

1

2

Pulse Width

7

4

Pulse Gain

0

0

CCA Try Count

2

3

The RF power transmitted by a SPARK UWB radio is determined by three settings: Pulse Count, Pulse Width and Pulse Gain (which is really an attenuation control). The table shows how the power settings and CCA Try Count are influenced by payload size. In normal mode, the settings for uncompressed data are used, whereas in fallback mode, the settings for compressed data are used.

A payload size of 64 bytes requires that the power settings be set to (1, 7, 0) and the airtime allows for 2 CCA retries.

A reduced payload size of 22 bytes allows us to increase the power settings to (2, 4, 0) and allows for 3 CCA reties due to shorter airtime. The link will benefit from higher range with the increased power settings.

The Wireless Core will automatically switch between the 2 setting configurations based on the payload size received from the Audio Core.

Fallback ON Trigger

Fallback ON is triggered by a rise in the transmitter’s TX queue load.

A normal TX queue load is either 0 or 1. When the queue level increases to 2, it indicates that the queue is accumulating packets. This happens when the link is weakening and retransmissions are occurring, slowing down the normal flow. The sampled TX queue load is stored in a circular array and averaged. The array size is set to 3 by default. The load values are multiplied by a factor of 10, so that a finer average can be determined over the averaging period.

The detection of link quality at the transmitter is determined as follows:

  • TX queue load values are multiplied by 10.

  • A rolling average of the last 3 sampled TX queue loads is calculated every time the array is updated.

  • If the rolling average becomes greater than a predetermined threshold, then Fallback ON is triggered.

At startup, the processor must wait for the sampling array to fill before performing the link quality algorithm.

The following table shows an example of link quality evaluation for a Trigger Threshold set to 13.

Table 11: Example TX Queue Load Threshold for Fallback ON Trigger

TX Queue Loads

Running Average

Threshold

Fallback State

(10, 0, 20)

10

13

OFF

(10, 20, 10)

13.3

13

ON

A threshold value of 13 works well for unidirectional links where the timeslots are contiguous. In bidirectional links where there is a time gap between transmissions, a value of 23 could be used. The final value can be fine-tuned on a per-application basis.

The system should be pessimistic at start up to optimize the chances of successful communication when the system is brought up in a poor link environment. As such, fallback state should initially be ON until the Fallback OFF state is triggered.

Fallback OFF Trigger

Fallback OFF trigger is based on 2 metrics: the receiver’s Link Margin and the transmitter’s CCA retries.

Link margin values and CCA retry counts are obtained from the Wireless Core using the swc_connection_get_fallback_info() function, which returns the following:

typedef struct swc_fallback_info {
    int32_t link_margin;        /*!< Link margin value */
    uint32_t cca_fail_count;    /*!< CCA fail count value */
    uint32_t cca_tx_fail_count; /*!< Number of times all CCA attempts failed */
    uint32_t tx_pkt_dropped;    /*!< Total number of tx dropped packets */
} swc_fallback_info_t;

The receiver device must return its link margin value to the transmitter device using the auto-reply mechanism. Although this value is stored as a 16-bit variable, it can be sent using only 1 unsigned byte if the programmer makes sure to saturate the link margin at 255. The link margin threshold used to turn OFF fallback is much lower than 255.

The link margin values are averaged and must be above a predefined threshold for a continuous period to consider it good. The period required can be configured.

The CCA fail count is also averaged and must be below a configurable threshold in percentage of the maximum number of retries for a continuous period to be considered good. The period required can be configured.

When both the link margin status and CCA have passed the period criteria, indicating good wireless conditions, the Fallback OFF trigger is activated.

Audio Processing Gate Functions

Processing stage gate functions are used in both the Coordinator and Node devices. The Coordinator will use sac_fallback_gate_is_fallback_on and sac_fallback_gate_is_fallback_off gate functions that will return the fallback module state and also update the flag in the audio header to indicate when the packet is in fallback mode. The node will use sac_fallback_gate_fallback_on_detect and sac_fallback_gate_fallback_off_detect functions that will return the fallback state of the flag in the audio header of the received packet and also update the fallback module state. The on and off variants of the gates are to be used with mutually exclusive processing stages based on the fallback state.

Example 1: An application using audio compression as a fallback mechanism.

In this example, the Coordinator would have two processing stages, each having their own gate function. The first processing stage performs audio compression when fallback mode is ON using the sac_fallback_gate_is_fallback_on gate. The second processing stage performs audio compression only on the last sample when fallback mode is OFF using the sac_fallback_gate_is_fallback_off gate. This implementation is used to keep the compression algorithm state up to date. This ensures a seamless switch between compressed and uncompressed modes.

In the Node, the gate function sac_fallback_gate_fallback_on_detect for the decompression processing stage ensures that only packets that have been compressed will be decompressed.

Example 2: A 48 kHz 24 bits application using audio packing to 16 bits as a fallback mechanism.

In this example, the Coordinator would have two processing stages, each having their own gate function. The first processing stage performs audio packing from 32 bits aligned audio to 16 bits aligned audio when fallback mode is ON using the sac_fallback_gate_is_fallback_on gate. The second processing stage performs audio packing from 32 bits aligned audio to 24 bits aligned audio when fallback mode is OFF using the sac_fallback_gate_is_fallback_off gate.

In the Node, there would also be two processing stages, each having their own gate function. The first processing stage performs audio unpacking from 16 bits aligned audio to 32 bits aligned audio when fallback mode is ON using the sac_fallback_gate_fallback_on_detect gate. The second processing stage performs audio unpacking from 24 bits aligned audio to 32 bits aligned audio when fallback mode is OFF using the sac_fallback_gate_fallback_off_detect gate.