The audio function has basically become the standard configuration of network cameras. Network cameras with audio functions usually provide a built-in microphone/pickup, or provide an audio input interface, users can choose to use other types or higher quality external microphone/pickup.
On the other hand, the network camera can also have a built-in speaker or provide an audio output interface, and the user can choose to connect to other types of speakers/speakers.
Audio working mode
Depending on the application, one-way or two-way audio transmission may be required, which can accomplish two-way audio transmission at the same time or one direction at a time.
There are three basic modes of audio communication:
Sampling rate, sample size
Sound is an energy wave with characteristics of frequency and amplitude. The frequency corresponds to the time axis, and the amplitude corresponds to the level axis. The wave is infinitely smooth, and the string can be seen as composed of countless points. To digitally transmit or save the sound through the network, it must first be encoded and the points of the string must be sampled. The sampling process is to extract the frequency value of a certain point.
Obviously, the more points extracted in one second, the more abundant frequency information can be obtained. In order to restore the waveform, there must be two sampling points in a vibration. The highest frequency that the human ear can feel is 20kHz, so To meet the hearing requirements of the human ear, at least 40k samplings per second are required, expressed in 40kHz, and this 40kHz is the sampling rate. Our common CD has a sampling rate of 44.1kHz, and the default sampling rate for audio encoding of many security cameras is also 44.1KHz.
It is not enough to have frequency information. We must also obtain the energy value of this frequency and quantify it to express the signal strength. The number of quantization levels is an integer power of 2, our common CD bit 16bit sampling size, that is, 2 to the 16th power. To give a simple example: Suppose that a wave is sampled 8 times, and the corresponding energy values of the sample points are 1-8, but we only use the 2bit sample size. As a result, we can only keep the value of 4 points and discard the other 4 Piece. If we take a sample size of 3bit, then all the information of just 8 points will be recorded. The larger the value of sampling rate and sampling size, the closer the recorded waveform is to the original signal.
Audio stream calculation
Audio stream = sampling rate value × sampling size value × channel number bps.
A WAV file with a sampling rate of 44.1KHz, a sampling size of 16bit, and dual-channel PCM encoding, its code stream is 44.1K×16×2 = 1411.2 Kbps. We often say that 128K MP3, the corresponding WAV parameter, is this 1411.2 Kbps, this parameter is also called data bandwidth, it is a concept with the bandwidth in ADSL. Divide the code rate by 8, and you can get the data rate of this WAV, which is 176.4KB/s. This means that the storage of one second sampling rate is 44.1KHz, the sampling size is 16bit, and the two-channel PCM encoded audio signal requires 176.4KB of space, and 1 minute is about 10.34M.
The amount of data is very large. To reduce the amount of data, there are only two methods, reducing the sampling index and compression. It is not advisable to reduce the index, so compression coding can only be used.
Encoding algorithm
There are many audio compression coding methods, which can be roughly divided into three categories: waveform coding, parameter coding, and hybrid coding. It will not be expanded here. If you want to know more about it, you can visit the reference materials after reading the article.
Technology Encoding algorithm Standard Bitrate(KBIT/S) Quality Application
Waveform coding PCM G.711 64 4.8 PSTN、ISDN
ADPCM G.726(G.721,G.723) 40/32/24/16 4.2 –
SB-ADPCM G.722 64/56/48 4.5 –
Parameter encoding LPC – 2.4 2.5 Secret voice
Mixed coding CELPC – 4.8 3.2 civil aviation
VSELPC GIA 8 3.8 Mobile communication, voice mail
RPE-LTP GSM 13.2 3.8 –
LD-CELP G.728 16 4.1 ISDN
MPE MPE 12.8 5.0 CD
Comparison of common audio coding algorithms
On the other hand, the network camera can also have a built-in speaker or provide an audio output interface, and the user can choose to connect to other types of speakers/speakers.
Audio working mode
Depending on the application, one-way or two-way audio transmission may be required, which can accomplish two-way audio transmission at the same time or one direction at a time.
There are three basic modes of audio communication:
- Simplex mode, can only send audio in one direction. In most cases, the audio is sent from the camera, but it can also be sent from the user.
- Half-duplex mode. Indicates that audio can be sent and received in both directions from the camera and the operator, but only in one direction at a time. The type of communication is similar to that of a walkie-talkie. To speak, the operator must press and hold the call button. Releasing the button allows the operator to receive audio from the camera. With half-duplex, there is no risk of echo problems.
- Full duplex mode. Means that users can send and receive audio at the same time (listen and speak at the same time). The communication mode is similar to that of a telephone conversation. Full-duplex requires the client PC to be able to handle full-duplex audio.
Sampling rate, sample size
Sound is an energy wave with characteristics of frequency and amplitude. The frequency corresponds to the time axis, and the amplitude corresponds to the level axis. The wave is infinitely smooth, and the string can be seen as composed of countless points. To digitally transmit or save the sound through the network, it must first be encoded and the points of the string must be sampled. The sampling process is to extract the frequency value of a certain point.
Obviously, the more points extracted in one second, the more abundant frequency information can be obtained. In order to restore the waveform, there must be two sampling points in a vibration. The highest frequency that the human ear can feel is 20kHz, so To meet the hearing requirements of the human ear, at least 40k samplings per second are required, expressed in 40kHz, and this 40kHz is the sampling rate. Our common CD has a sampling rate of 44.1kHz, and the default sampling rate for audio encoding of many security cameras is also 44.1KHz.
It is not enough to have frequency information. We must also obtain the energy value of this frequency and quantify it to express the signal strength. The number of quantization levels is an integer power of 2, our common CD bit 16bit sampling size, that is, 2 to the 16th power. To give a simple example: Suppose that a wave is sampled 8 times, and the corresponding energy values of the sample points are 1-8, but we only use the 2bit sample size. As a result, we can only keep the value of 4 points and discard the other 4 Piece. If we take a sample size of 3bit, then all the information of just 8 points will be recorded. The larger the value of sampling rate and sampling size, the closer the recorded waveform is to the original signal.
Audio stream calculation
Audio stream = sampling rate value × sampling size value × channel number bps.
A WAV file with a sampling rate of 44.1KHz, a sampling size of 16bit, and dual-channel PCM encoding, its code stream is 44.1K×16×2 = 1411.2 Kbps. We often say that 128K MP3, the corresponding WAV parameter, is this 1411.2 Kbps, this parameter is also called data bandwidth, it is a concept with the bandwidth in ADSL. Divide the code rate by 8, and you can get the data rate of this WAV, which is 176.4KB/s. This means that the storage of one second sampling rate is 44.1KHz, the sampling size is 16bit, and the two-channel PCM encoded audio signal requires 176.4KB of space, and 1 minute is about 10.34M.
The amount of data is very large. To reduce the amount of data, there are only two methods, reducing the sampling index and compression. It is not advisable to reduce the index, so compression coding can only be used.
Encoding algorithm
There are many audio compression coding methods, which can be roughly divided into three categories: waveform coding, parameter coding, and hybrid coding. It will not be expanded here. If you want to know more about it, you can visit the reference materials after reading the article.
Technology Encoding algorithm Standard Bitrate(KBIT/S) Quality Application
Waveform coding PCM G.711 64 4.8 PSTN、ISDN
ADPCM G.726(G.721,G.723) 40/32/24/16 4.2 –
SB-ADPCM G.722 64/56/48 4.5 –
Parameter encoding LPC – 2.4 2.5 Secret voice
Mixed coding CELPC – 4.8 3.2 civil aviation
VSELPC GIA 8 3.8 Mobile communication, voice mail
RPE-LTP GSM 13.2 3.8 –
LD-CELP G.728 16 4.1 ISDN
MPE MPE 12.8 5.0 CD
Comparison of common audio coding algorithms