Audio over Ethernet (AoE): The New Vehicle for Audio Transport

In Uncategorizedby tfwm

When digital audio consoles came along in pro sound systems, the workflow stayed unchanged for most audio installations. Digital audio recorders or computer based DAW (desktop audio workstation) replaced analog recorders, signal processing and switching became all digital but signal transport remained mainly analog. Countless cables are still carrying the audio signal from the stage to the FOH (Front Of House), monitor and broadcast consoles. We are quite familiar with Voice-over-IP (VoIP). This technology is used by many phone services and in corporations and is replacing the traditional PBX (Private Branch eXchange) systems. The concept of using an Ethernet based network to transport audio is now applied, with considerable benefit, in professional sound and many console manufacturers are already offering “networked audio systems”. This article does not support or promote any particular technology. It simply shows some of the practical aspects of real-world Audio-over-Ethernet (AoE).

The Age Of Digital Consoles
Sanctuary audio work could be very demanding. Not only the audio console is used for FOH functions but it would also serve for monitors, recording and even for broadcasting. Today, advantages of digital consoles over analog ones are obvious and well known. Digital audio systems are now becoming more and more the standard in “mega-Churches” as well as in smaller size sanctuaries. Nevertheless, an element that is less known is a very efficient technology to transport several digital audio signal, i.e., the Ethernet protocol.

Audio Transport
At first sight, Ethernet does not seem to be the easiest solution for audio signal transport. This technology was not designed to transport real-time data, but it is gaining an increasingly large market share in the professional audio world, mainly for the following reasons:

• Transport: Computer cable can transport the audio signal to a distance of up to 80 km using optical fiber (single-mode) and up to 100 meters using copper.
• Distribution: The receiving devices just “listen” to the feed from the source devices. Distribution amplifiers are no longer necessary.
• Routing: There is no need of cross-point audio router or patch field. Audio routing is done by standard computer switches.
• Convergence: The audio network uses the existing computer infrastructure as the Desktop Audio Workstations, servers and computers.
• Cost: Mainly because of the widespread availability of computer technologies, an Ethernet transport system is cheaper to install, support and operate.
• Flexibility: Compared to a traditional multi-wires installation, a modular Ethernet audio network can be configured, modified and upgraded much more easily.
• Scalability: In a multilevel topology, Ethernet is very scalable. The bandwidth could restrict the size of a complex network, but Gigabit Ethernet has pushed this limit significantly.

Unfortunately, some new digital installations still use passive multi-pair snakes (see Figure 2). In this type of setup several low level (mic) analog signal are transported on long distances, with known consequence (mainly on the noise level).

If the Church still uses an analog console or a digital console with no Ethernet snake capability, firms like Mackie, Roland and others offer “add-on” digital snakes (see Figure 3).

– A I/O box, controlled by a PC, would be connected – at line level – to the console.

– Another box (with mic preamps and analog to digital converters), is linked with the FOH box by a single Cat5e computer cable.

As presented in figure 4 several manufacturers offers a complete system i.e. a digital console, an active stage box (with mic preamps and analog to digital converters) and an Ethernet connection between the 2 units.

As presented in figure 5 some digital firms, as Yamaha, need an adaptor card (on the console side) and a network interface on the stage (for the mic preamps and line I/O units).

Network Layers
In-depth networking expertise is not essential, but basic Ethernet knowledge is mandatory to build and support a networked audio Church site.
The Network data structure is presented as 5 “Layers”. They are (from the bottom up); Physical or Interface, Data Link or Network (such as IP), Transport (such as TCP or UDP) and Application (such as HTTP). The third, forth and fifth layers are often proprietary in AoE (Audio over Ethernet). At this point, we will only look at the 2 first layers.

A first Layer: Physical Layer
Hardware communication for PCs and all Ethernet devices are provided by a NIC (Network Interface Controller, or Network Interface Card). One of the standards for audio is the Ethernet protocol, using 100Base-T (100 Mbps) and 1000Base-T (1000 Mbps) cables. These cables are made of eight conductors, four UTP (Unshielded Twisted Pairs), and RJ-45 8-pin plugs. Category 5e (enhanced Cat-5) cables are acceptable, but Cat-6 cables are often recommended. The other Ethernet standard would use optical fiber cables. It is to be noted that cables and connectors are blamed 70 to 80% of the time when problems occur in a network environment. In addition to conductivity testing, proper troubleshooting instruments and methods should be available, for 100/1000Base-T and fiber signal-carrying wiring. Mediatwist® cable from Belden is often suggested for media over Ethernet applications .

The second Layer: The Data Link
Each NIC has a unique 48-bit ID assigned by manufacturers: the MAC (Media Access Control) address. The Layer 2 header contains this address, which is used for local (intrasite) communications. The MAC address has the following format: 00-11-24-EB-28-DG.

The Ethernet port is usually configurable to auto-negotiate, 10 Mb half-duplex, 10 Mb full-duplex, 100 Mb half-duplex or 100 Mb full-duplex. In many devices, the Ethernet port is set to auto-negotiate. If auto-negotiate is enabled on only one side of the communication, it will always default to half-duplex regardless of the configuration of the other side.

A link with a duplex mismatch will operate, but it will generate large numbers of errors. This means higher jitter and packet loss with apparent audio signal degradation.

The IEEE 802.3 group of standards defines the physical layer (1) and the MAC part of the data link layer (2) (wired Ethernet).

A computer switch works on Layer 2, using the MAC addresses of devices for identification . In this situation – a network using MAC addresses only – can work only in one location. Switches capable of “routing” data, using IP address, are often called Layer 3 switches.

Figure 7 presents a real life setup where 2 consoles (and a computer switch) are needed. Roland (as an example) uses a third Ethernet port to send a multi-tracks feed to the Sonar™ Desktop Audio Workstation (on a PC).

Audio Codecs for compressed transport over IP
The device or interface used to transport an analog or digital (AES/EBU) audio signal over a regular IP network is an IP audio codec. There are about 15 firms offering several models for different needs (general and specific usage). Codecs packetize the audio signal (analog or digital) for transport over the IP network infrastructure.

Two other Layer, of the Network structure need to be understood while using IP audio codecs.

The third Later: IP (Network or Routing)
The task of IP is to get packets of data from the source to the destination. The dotted decimal notation of the 32-bit IP address consists of four octets of eight bits separated by three dots. An example of an IP address is 192.15.121.10. The left-most bits indicate the network address. The others indicate the host (PC or IP device) address. The number of bits assigned to each is determined by the class. For a medium-sized organization, a class B will be needed, but a class C will suffice for a small organization.

Between the Data Link and the network layers is the ARP (Address Resolution Protocol), a service for finding the correspondence between the MAC address and the IP address.

For audio network design, both IP and MAC addresses for each host (devices) may be required.

The fourth Layer: UDP/RTP (Transport)
UDP (User Datagram Protocol) is mainly used for applications such as streaming audio and video, where on-time delivery is more important than reliability. UDP uses a “send and forget” strategy with no acknowledgment process, as is the case for TCP.

There are four fields in a UDP header. One of them is the source port, another the destination port.

RTP (Real Time Protocol), a layer built on top of UDP, has a timestamp and sequence number fields in its header for synchronization and jitter processes. RTP was originally designed as a multicast protocol for delivering real-time media over an Ethernet network.

A fifth Layer would be the AoIP – Audio over IP – application itself.
In audio over Ethernet the audio signal is linear or uncompressed, but for the moment, given IP bandwidth availability constraints and costs, the signal generally has to be compressed for site to site transport. Linear audio gives a bit rate of 1.5 Mb/s to 4.5 Mb/s. You have a wide choice of coding algorithms in addition to linear audio (uncompressed): G.722, CELP, MPEG 4 AAC, HE-AAC v2, AAC- LD, AAC- ELD, AEQ LD Extend, ADPCM, apt-X, HE apt-X, BRIC, MPEG 1/2 Layer 2/3, Music, Voice and probably many more…

Figure 7 presents a setup using low cost IP audio codecs (Encoder/Decoders) to enable monitoring of the Worship service (conducted in the Santuary) in several other rooms of the Church. This is a more efficient and flexible solution than the traditional 70 volts distribution system.

Especially in the AoE and AoIP fields, we do not recommend “plug and pray” engineering. For a digital audio Church project including Ethernet transport, it is really not advisable to adopt a “deploy and forget” attitude.

Information gathering, discussions with the Church board and users, good planning, proper architecture, and subsequent training are paramount for ensuring a successful state-of-the-art high-tech project. Operators would have to be knowledgeable about Ethernet technology, in addition to analog and digital audio. Digital consoles manufacturers offer extensive practical documentation, good support and training. Audio-over-Ethernet is no longer a futuristic idea; it is successfully being developed and used in professional sound, including Houses of Worship.