Digital Audio: MIDI, MADI, and Everything In-Between

In Uncategorizedby tfwm

Signal formats, audio, video, data, digital, analog, computers and networks. Fiber optic cables, twisted pair, category 5, category 6. 220, 221, whatever it takes!

Who can keep everything straight and sort through the confusion? It can be especially difficult when trying to figure out what signal format is appropriate, or which signals might be compatible.

The analog audio segment in the January/February 2008 issue discussed mic level and line level for analog. Now we want to take those connections and convert them to digital audio and eventually back to analog.

The devices that typically do this conversion are usually called break-out-boxes. Break-out-boxes come in all shapes and sizes with various features. Some boxes support mic inputs with controllable preamps, while others may only have line level inputs. Most are some combination of microphone and line level inputs, line level outputs, digital inputs and outputs, control like MIDI, and headphone monitoring.

On the input side of the break-out-boxes, analog voltage is converted to a digital signal by sampling the analog signals amplitude and converting it to a number or digit that a computer uses to represent data. The number of sample points or the resolution of each sample period determines how close the sample represents the analog signal.

The sample period is determined by multiplying the desired frequency response of the audio signal. Human hearing is considered to have a frequency response of 20 kHz, which ranges from 20 Hz to 20,000 Hz. CD audio has a frequency response of 22.5 kHz, and since you need two times the desired frequency response, the sample rate is 44.1 kHz or 44,100 times every second a sample is taken of the audio. Amplitude when speaking in digital terms is referred to as dynamic range or bit depth. Common bit depth is 16-bit, (for CDs) 18-bit, 20-bit, and 24-bit, with sample rates ranging from 32 kHz, 44.1 kHz, 48 kHz, and 96 kHz in professional audio applications. All digital audio systems will use bit depth and sample rates as the basis for audio quality and proper system performance.

After the analog audio signal is converted into a digital signal it needs to be transported to a computer, Digital Signal Processor (DSP), or to another break out box in a different location for output. This is done through some digital audio transport.

Break-out-boxes have one or more digital input-output (I/O). Popular digital I/O for recording can either be standard computer based connections like USB or FireWire/IEEE1394. The digital I/O in professional sound reinforcement can be computer based Ethernet or audio industry standards like Audio Engineering Society (AES) standards AES3 or Multichannel Audio Digital Interface (MADI). The digital I/O can also be a proprietary break-out-box and digital link like Aviom or Roland for example.

The topic of digital audio transport is an in-depth subject and impossible to cover in a single article. There are at least fifteen current audio networking technologies and new ones popping up all the time. However, let’s break this topic down into technology groups and discuss the most prominent technologies you are likely to encounter.

Physical Characteristics
The first step in evaluating the differences in all of the various types of technologies is by their physical characteristics and the rules for their use, often called protocols. The first to consider is geometric properties and special relationship of the devices connected together which describes a network topology. The common topologies include point-to-point, Bus, Star, Ring, Tree and Mesh.

The next step is to know the rules of connectivity for a given technology, for example what type of cable and connector, how many conductors, and what conductors go to which pins on the connectors. The rules also define the purpose of each conductor. Is the conductor used to transmit data, receive data, or provide electrical power for devices on the network? The protocol also provides the rules for cable length and how much data can pass through in a second.

The next category is the size of the network. Personal Area Network (PAN), Local Area Network (LAN), and Wide Area Network (WAN). PANs would be those that are local to a room, which include Bluetooth, USB and common media uses of FireWire. LANs are networks that connect offices together in a single facility and Ethernet is the most widely used technology for LANs. WANs can be considered anything beyond the local area, and for our discussion that would include the Internet.

Point-to-point technologies
These technologies are used in audio applications and are defined by protocols, but are not always considered networking technologies. These technologies include AES3, S/PDIF, and MADI. Each has a specific connector type and physical characteristic, but a key difference is how the bits, which are sampled at the Break-out-box, get organized and transmitted. As the audio is sampled, depending on the specific technology, every so many bits get grouped or framed. When the audio bits get framed, there are also additional bits added to provide information about the audio, such as: what is the bit-depth of the audio data, where it is coming from (source) and where it is headed (destination) once it leaves the break-out-box.

AES is an acronym for Audio Engineering Society; an organization of professional audio engineers which provides sets of industry standards among many other things. The AES standard for digital audio transport has multiple physical connectivity options available for the audio professional, including coax, optical, and shielded twisted pair. One of the most popular on professional audio gear is AES-3, which uses a shielded twisted pair cable terminated by an XLR-3 connector, similar to a balanced mic connector. It transmits stereo data in one direction from source to destination just like analog audio. In addition to the physical characteristic the standard defines the data frame that puts context around the audio bit stream.

Multichannel Audio Digital Interface (MADI) is also an AES standard similar to AES-3 except that it adds channel information to the frame so that it can be used in multichannel applications like multi-track recording or as an interface between digital consoles their break-out-boxes. The MADI can distribute up to 32, 56, or 64 audio channels with a sample rate between 32kHz to 96 kHz and a resolution or bit depth of up to 24-bit.

S/PDIF (Sony/Philips Digital Interface Format) uses either RCA/Coax cable connectivity or optical connection. You will find this as a popular connection on consumer media systems like CD and DVD players, and Surround Sound Receivers.

Serial Bus Technologies
USB or Universal Serial Bus started as a common computer peripheral connection. It replaced the RS-232 communication port and the old parallel printer port. You now have a common interface to connect devices to your computer, or other digital devices. Many break-out-boxes today use USB for connection to computers for recording and editing of media. USB can also have power on a pair of conductors which many mic preamp companies use for phantom power for use with condenser microphones.

USB used to be limited on the amount of data, so FireWire was the choice for connecting media devices to computers. FireWire was developed by Apple computers and the IEEE (Institute of Electrical and Electronic Engineers) made it a standard 1394. Media manufactures like Sony and Yamaha use this technology under the name iLink and mLan respectively. There are two versions of IEEE 1394 or an a and b version. The most common today is IEEE 1394a, which one can find on many computers, video cameras, audio and video break-out-boxes, and DVD devices. There is a newer higher bandwidth version IEEE 1394b that uses a different connector than the previous version to get the full benefit. The key benefit between a and b is greater data throughput.

CAT 5 and Ethernet
When most people think of networking they think of data cables connecting devices together using CAT 5 cable and connectors commonly referred to as RJ 45 connectors. Beware: not all Cat-5 linked systems are Ethernet! They may use popular connectors and cable like CAT 5e and RJ 45’s due to these being plentiful in supply and cost effective (the good-n-plenty effect); however, this doesn’t make it compatible with Ethernet networking. Why?

The conductors may be used for different purposes, which does not conform to the rules set up by the Ethernet standard. There are numerous companies that use CAT 5 technology with their own rules or proprietary protocol. The reason is based on the challenges of transporting digital audio. The biggest is latency versus bandwidth. All digital systems have latency or delay and as a rule of thumb, and as you reduce latency it increases the bandwidth, or amount data throughput. This challenge also affects the number of channels a transport can provide.

Latency is the amount of time it takes for audio to enter the digital audio system and come out the other side. It is important to know when you ask about latency to know if the number quoted includes the analog to digital and digital to analog conversion. If not, you will need to add between 1 – 2 milliseconds to the time quoted. The amount of acceptable latency is a controversial subject— the correct amount is based on the application. Recording and in-ear monitoring requires far less latency than distribution of background music or paging. It is best to seek an experienced digital audio system designer/consultant for what will work best for your application.

Audio Networks and Digital Snakes
Although audio networks can technically include many of the digital snakes, an audio network can be defined as one that can operate with standard data networking equipment and can share the network with common computer devices like desktops and printers. Anything less is a digital audio snake.

Ethernet is the most popular LAN networking technology today. There is one important point to add here: TCP/IP and Ethernet are not the same technology. You can transport TCP/IP data over Ethernet, but you can also transport TCP/IP over FireWire too. It is important when talking about network requirements that one does not confuse Ethernet and TCP/IP.

The two most popular Ethernet audio technologies are CobraNet and Ethersound. Both of these technologies are 802.3 Ethernet standard compliant.
CobraNet was the first professional audio networking system that used standard Ethernet. It should also be noted that it is compatible with AES3 and MADI. It was a great feat because audio requires a synchronous signal and Ethernet technology is not a synchronous technology. The CobraNet protocol provided a timing mechanism so that audio could be transported on standard Ethernet technology. The advantage is flexible routing of digital audio and a low cost transport, and it is already installed in virtually any modern building today. However, CobraNet had a fixed latency of just over five milliseconds. Some consultants felt that this latency was too large after you added digital signal processing, which also adds latency.

Ethersound was developed to address the latency issue by providing a much lower latency, more bit depth and higher sampling frequency than CobraNet could provide at the time of its release. To accomplish this, Ethersound has a clocking system that could be considered to be obtrusive to regular data, plus more bit depth and a higher sample rate also requires more network bandwidth. The better digital audio specs creates a challenge for other data on the network and to solve this issue, Ethersound requires that Virtual Local Area Network (VLAN) be deployed on the network. A VLAN provides a method to isolate the Ethersound data from traditional computer data.

Which one of these systems should you choose?
You need to consider whether all of the audio devices you want in your new system support the network technology natively. This could reduce some cost of break-out-boxes which can also add system latency if your sever hops on and off of the network. How the technology allows you to route audio should be another consideration.

Digital audio snakes system don’t use standard networking technology devices, even though they use CAT 5 cables for interconnecting to their proprietary break-out-boxes. But many of these technologies provide lower latency, with some as low as 125 microseconds, and higher bit-depth, sample-rate, and/or channel count.

It is also possible to combine technologies for instance, you can use CobraNet for FOH and distribution for recording services with a Yamaha digital console while using Aviom for in-ear-monitoring in the same system. This provides you with a flexible distribution system using CobraNet and a lower latency for optimum in-ear-monitoring. Again, one could argue that 1.33 milliseconds is low enough latency for in-ear-monitoring depending on other processing latency.

Conclusion
Which technology you use should be determined by the application, latency constraints, number of channels required, and if you plan to use your facility data network or not. You should also consider what other manufactures support direct connection to the transport of your choice.