Audio Speech Processing for Radio Communications Transmission

Understand how to process speech audio for transmission over a radio communications channel to improve its readability, especially under weak signal or interference conditions.


Audio Signal Processing for Transmission Includes:
Radio communications audio processing    


Years ago, external audio speech processors were commonplace with HF single sideband transmitters, especially within amateur radio.

However for commercial transmission, speech processing was equally important and often incorporated in other ways.

DATong radio communications speech processor dating from the 1970s / 1980s
DATong radio communications speech processor dating from the 1970s / 1980s

Nowadays, these speech processors are rarely seen because all the required speech processing is included within the transceivers and other transmission equipment used.

The principles are the same, but as the electronics is incorporated within the transceiver, it is more hidden and as a result, less is understood about what happens and also the real need for it.

Why is speech processing needed?

Speech audio is full of short peaks, followed by long periods of low intensity audio. This means that it has a very low average power content, whilst requiring any system to be able to accommodate the peaks. It is said to have a high peak to average power ratio, PAPR.

Waveform with a high peak to average power ratio, PAPR
Waveform with a high peak to average power ratio, PAPR

The waveform shows how a signal might have a high peak to average power ratio, even when the talking at a relatively constant volume.

Another of the issues with speech is that the speaker might place more emphasis on some words than others, or speak at different distances from the microphone. These add further variations in the level of the audio being transmitted.

To provide the most readable audio for SSB, AM, and FM transmissions, the audio should fully modulate the signal for virtually all the time.

For example, with a single sideband signal, the actual power transmitted is dependent upon the level of the audio. If the signal is to stand out and be as readable as possible, the peak to average power ratio should be as small as possible.

For SSB transmitters and transmissions, the peak power is the main limitation - if the level of the audio can be kept constant, then the average power will be increased and the signal will be more readable.

For FM used for communications applications, the situation is not quite as critical as it is for SSB, but nevertheless if the audio can be made as readable as possible, then it can make a distinct difference when the link is poor and interference is high.

Techniques used for improving the audio readability

There are several different techniques that can be used to improve the readability of a signal for radio communications applications.

Each of these techniques has its own attributes and can often work well when applied individually, but often the best performance can be gained incorporating them all, but in the correct manner.

The techniques that might be used are:

  • Compression
  • Clipping (audio or RF)
  • Bandwidth limitation and frequency tailoring

Audio compression

As the word "compression" implies this method of speech precessing reduces the dynamic range of a signal. This is achieved by reducing the gain of an audio amplifier as the signal level increases.

There are two ways in which this can be achieved. The first is adjusts the signal level instantaneously, varying the gain over each part of the waveform. This type of speech compression is known as "instantaneous compression".

The second and more common type of audio compressions acts upon the overall level of the incoming waveform and its adjusts the level accordingly. This is very akin to the AGC system in a radio, and it is implemented by introducing a time constant into the control loop placed around the audio amplifier.

This type of compressor is the one that was more likely to be found in amateur radio systems where the name VOGAD (Voice Operated Gain Adjusting Device) may be used. These circuits are often used to maintain a constant level of audio to the next stage of circuitry.

One of the key issues is to make sure that the attack and decay times are correctly chosen. A fast "attack time" is required so that the circuit can react very quickly to the sudden increases in signal level or transients which are always present.

The attack time is the time that the compressor takes to respond to a sudden increase in level. If the attack time is too slow then the transients will pass through into the next stages where they may cause overloading and distortion. As a general rule an attack time of around 10 milliseconds is usually chosen.

The decay time is also very important important. It is generally much longer so that the gain tends to follow the overall level of the speech better.

For speech processors used in amateur communications equipment a figure of around 300 milliseconds is often used because this enables the compressor to follow the general level variations in the speech level and keep the overall level approximately constant. If it is too short, then the speech tends to have a sort of "jerky" feel to it and it is not always pleasant to listen to.

Speech clipping

Compression is not the only technique that can be used. It is also possible to clip the waveform as well.

This technique is somewhat more harsh than compression because the process involves clipping or removing the peaks from the audio waveform, and this increases the peak to average power ratio considerably.

Often a figure for the degree of clipping may be indicated. The definition for the level of clipping is the ratio between the peak of the signal before clipping, to the peak of the signal after clipping and the result expressed in decibels.

Clipped waveform to remove peaks and reduce the peak to average power ratio
Clipped waveform to remove peaks

The clipping of a signal may be accomplished in a number of ways. Two common ways are:

  • High gain amplifier:   It is possible to run a transistor, FET, op-amp, or other amplifier into limiting. This will act to remove any peaks and limit them to the maximum the amplifier can accommodate. This is typically governed by the rail voltage or voltages for the amplifier.

  • Back to back diodes:   Another alternative is to run the output of an amplifier via a load resistor into two parallel back to back diodes. As the diodes turn on, they will prevent the output rising above the turn on voltage for the diodes.

    Two diode clipper / limiter circuit
    Diode clipper / limiter circuit

    As these have a more gradual turn on, this may be considered as a form of compression as the turn on characteristic for the diodes is not hard and occurs progressively and as a result the output could be expanded again.

Although it might be thought that clipping would severely distort the signal, the effect is not quite as bad as it may appear at first sight.

Typical audio spectrum
Typical audio spectrum

The human ear recognises sounds by their frequency content and not particularly by the amplitude changes. However clipping adds distortion that result in the introduction of harmonic and intermodulation distortion.

Audio spectrum when clipped
Audio spectrum when clipped
Note: only second and third harmonics are included here and no intermodulation products also included for clarity

To remove as many of the distortion products as possible a low pass filter is normally placed after the clipper. The audio normal bandwidth for communications purposes extends up to between 2.5 and 3 kHz and the filter is designed to remove any products above this frequency.

Whilst the distortion products outside the required bandwidth can be removed by a low pass filter, those within the wanted audio bandwidth will remain.

Audio spectrum when clipped
Audio spectrum after post clipping audio filtering
Note: distortion products wthin the audio bandwidth still remain.

These distortion products tend to detract from the quality of the signal, reducing its intelligibility. This means that the degree of clipping is limited to figures of around 10 to 15 dB, and this means that the maximum level of gain that can be achieved is around 4 to 5 dB. To increase the level of clipping that can be used, a method of removing the distortion products must be found.

RF Clipping

It is possible to remove the distortion products generated by using a process known as RF clipping because the harmonic and many other distortion products fall outside the narrow signal bandwidth and as a result they are easily filtered out. Harmonics, for example appear at twice, three, times . . . the RF signal frequency.

To clip a signal at a radio frequency, a single sideband signal needs to be generated. This should have good carrier and reverse sideband suppression of the intermodulation products are to be minimised. This may involve passing the audio into a balanced mixer with a local RF oscillator signal to generate a double sideband signal which is then filtered to remove the unwanted sideband. Another method would be to use phasing techniques.

What ever method is used, the signal is then clipped and as mentioned, the harmonics fall at multiples of the RF frequency and these are easily filtered.

Once filtered the audio signal can be regenerated using a balanced mixer, or it may be passed on to further stages in the transmitter if it is an integral part of the transmitter.

RF clipping provides a much better solution to signal clipping than clipping at the audio frequencies, although it does involve the use of more complicated circuitry.

However, using RF clipping it is possible to achieve almost infinite levels of clipping whilst still retaining the intelligibility. With these levels of clipping an RF clipper can offer a gain which is in the region of 8dB about 3 or 4dB more than an AF clipper.

Whilst much greater levels of clipping can be used when RF clipping is employed, at higher levels of clipping, the naturalness of the audio may be reduced. This results from the clipping process emphasising the stronger audio components, and reducing the lower level ones. This is one of the reasons why some people do not like to use high levels of clipping, especially under good conditions when the clipper is not needed.

Audio bandwidth

Clipping and compression form two major aspects of speech processing for a radio communications system.However major gains can also be made by limiting the bandwidth and tailoring the frequency response.

One of the main reasons for limiting the bandwidth is to keep the transmission bandwidth to a minimum so that it occupies less transmitted bandwidth or spectrum. Not only does this enable more signals to be accommodated within a given band, but it also means that receiver bandwidths can be reduced to keep the interference and noise levels to a minimum.

However there are also major gains to be made by focusing the signal or transmitted power on the frequencies that add most to the intelligibility of the signal.

Generally a bandwidth of 300Hz to 3.3kHz is taken as the telecommunications standard. Even so it is possible to reduce it still further and many professional radio communications and amateur radio transceivers have their audio response reduced to 300 Hz to 2.7kHz. The main problem encountered in reducing the bandwidth is that some of the sounds with a large high frequency content will not be so easy to distinguish from each other. This is particularly true for plosive sounds like "B" and "T" and and fricative sounds like "S" and "F. But the spelling alphabet often called the phonetic alphabet can be used to overcome this.

There are benefits in altering the frequency response to emphasis some frequencies and reduce the level of other frequencies. This is known as pre-emphasis. It is particularly useful because the process of clipping has the effect of emphasising the higher level components and reducing the lower level ones.

This can be useful because the components of speech below about 600 Hz have are relatively high level but contribute less to the intelligibility of the audio. By reducing the components below 600 Hz and emphasising those between about 1.5 and 3 kHz some improvements can be made. Usually a simple filter that reduces the level of frequencies below about 600Hz is quite satisfactory.

However it should be remembered that pre-emphasis will also reduce the naturalness somewhat, although it will considerably improve the intelligibility.

Overall signal processor

Each of the techniques that has been described can provide some useful gains in audio intelligibility for a radio communications system. However, by combining them, the optimum processor can be realised.

Pre-emphasis, compression and clipping when combined can give a really useful processor, especially if RF clipping is used.

A particular line-up for the system may include a filter to limit the audio bandwidth and apply some pre-emphasis at the input. A compressor or VOGAD circuit may then be used to remove the variations in level. This is particularly usefully placed after the input filtering and pre-emphasis because it means that the removed and reduced amplitude components will not affect the operation of the compressor / VOGAD stage.

Finally an RF clipper can be used to provide the clipping. By having the compressor / VOGAD before the clipper a constant level is applied so that a known level of clipping is obtained. The clipper may even have switched levels so that the level of clipping can be tailored to the prevailing conditions.

Setting the system up

It is important to ensure that the speech processing system is set up correctly with the transmitter or transceiver with which it is used.

In general the first stage is to adjust the peak output from the processor so that the transmitter provides its required peak output. Some processors include an audio oscillator to enable a steady state signal to be generated so that the transmitter output can be adjusted. Once this has been correctly set the audio gain can be set to provide the right level of processing.

However most modern radio communications and amateur radio HF transceivers will incorporate speech processors which often follow the main principles described above.

Care should be taken to adjust them so that the correct level of processing is used. Levels of processing should be adjusted to give the optimum performance - typically this is done by obtaining a number of over the air reports.



Audio speech processing is an essential part of any HF radio transmitter - and it is also very important for other forms of radio communications systems including VHF and UHF communications whether AM or FM. With interference causing signals to be harder to copy, it is necessary to make the most use of all the available power. Whilst most transceivers and transmitter incorporate speech processors, it is still very useful to have a good idea of how they operate so that the best use can be made of the available signal.

More Essential Radio Topics:
Radio Signals     Modulation types & techniques     Amplitude modulation     Frequency modulation     OFDM     RF mixing     Phase locked loops     Frequency synthesizers     Passive intermodulation     RF attenuators     RF filters     RF circulator     Radio receiver types     Superhet radio     Receiver selectivity     Receiver sensitivity     Receiver strong signal handling     Receiver dynamic range    
    Return to Radio topics menu . . .