IP Voice Bandwidth and Latency

IP Voice Bandwidth and Latency

Author’s Note: This application note assumes the reader has some basic networking knowledge and an understanding of the OSI Network Model. Please familiarize yourself with the Tables in section 5 of this document before reading it, as they are referred to often.

 1.0 Bandwidth and Latency

One of the good reasons to use IP telephony is that it can share your existing data network infrastructure. This can be a great benefit to a company if it is managed correctly. Bandwidth in an IP environment walks a precarious balance. If at any point the usage on the network exceeds the available bandwidth, the users will experience delay, also know as latency. In more traditional uses of an IP data network, the applications can deal with this latency. If a person is waiting for a web page to download, they will accept a certain amount wait time. This is not so for voice traffic. Voice is a real time application, which is sensitive to latency. If the end-to-end voice latency becomes too long (250 ms, for example), the conversation begins to sound like parties talking on walkie-talkies or worse. Another important thing to remember is that packets can get lost. IP is a best effort networking protocol. This means the network will try its best to get your information there, but there is no guarantee.  Delay is the time required for a signal to traverse the network. In a telephony context, end-to end delay is the time required for a signal generated at the talker’s mouth to reach the listener’s ear. Therefore end-to-end delay is the sum of all the delays at the different network devices and across the network links through which voice traffic passes. Many factors contribute to end-to end delay, which are covered next. The buffering, queuing, and switching or routing delay of IP routers primarily determines IP network delay. Specifically, IP network delay is composed of the following:

Packet Capture Delay

Packet capture delay is the time required to receive the entire packet before processing and forwarding it through the router. This delay is determined by the packet length and transmission speed. Using short packets over high-speed trunks can easily shorten the delay but potentially decrease network efficiency.

Switching/Routing Delay

Switching/routing delay is the time the router takes to switch the packet. This time is needed to analyze the packet header, check the routing table, and route the packet to the output port. This delay depends on the architecture of the route engine and the size of the routing table. New IP switches can significantly speed up the routing process by making routing decisions and forwarding the traffic via hardware as opposed to software processing.

Queuing Time

Due to the statistical multiplexing nature of IP networks and to the asynchronous nature of packet arrivals, some queuing, thus delay, is required at the input and output ports of a packet switch. This delay is a function of the traffic load on a packet switch, the length of the packets and the statistical distribution over the ports. Designing very large router and link capacities can reduce but not completely eliminate this delay.

 2.0 IP Voice Bandwidth Requirements

A good rule of thumb for a VoIP network is that users will not tolerate more than 150ms of delay in any individual packet. One could measure this delay in the network with the ping command. From a command prompt on a Microsoft® Windows machine ping another computer in the network. There will be in the response a field called, “time”, which is a record of how long the packet took to travel. It will most likely be “time<10ms”. This means the ping was received in less than 10 milliseconds. Now take this one step further. Ping the same machine in the network by sending a larger packet of information (example “ping –l 1500”). This will send the ping command as a 1500 byte size. The time may or may not change depending on your network. 1500 bytes are typically close to the MTU (Maximum Transmission Unit) packet size that can be transmitted on most Ethernet networks. This is usually more than enough. For example in table 4.1 a packet using the G.711 CODEC with a 40ms filler time will have a IP header and data payload equaling 360 bytes. Be aware though, 360 bytes represent a single IP Packet with 40ms of voice and IP header. As more simultaneous calls are added the bandwidth requirements will increase.

3.0 Circuit Switched vs. Packet Switched

Traditional PBXs use Circuit Switched networks. This means that every single call gets a guaranteed 64Kbps circuit for their call. The down side to this type of network is that as the number of simultaneous calls increase, the requirement for circuits and bandwidth increase exponentially. Furthermore, when there is no voice traffic present (example connection is made between two locations but no voice information is transmitted), the bandwidth between the locations is locked to the voice call.. Some would consider this wasted bandwidth. On the other side is the packet switched network. Packet switched means when a packet comes in it is sent down the same pipeline, be it T1, ISDN, dialup, etc. In most basic forms this is a first come, first serve network. Whatever packet arrives first, be it Data or voice, will be the first packet transmitted. This is a more efficient use of the bandwidth. The file download can take all the bandwidth if needed and significantly increase its speed when no other traffic is there to compete with it. Conversely the downside of a packet switched network is when the network becomes busy, delays can be experienced. This latency is very bad for voice calls.

4.0 Latency Effects on Voice, and How to Handle It

A network user will usually notice latency in the form of delay. For example if one tries to download a file via FTP on a busy network, that file transfer will take longer than if the network was not congested. This is because the file download will have to wait its turn to go down the data pipeline. With file downloads and web pages (ex. non-real time application) the user will usually tolerate a great deal of delay. After all, they can walk away from their computer while the file downloads if they want too.  Voice on the other hand is not so forgiving as standard file downloading or web page surfing. Voice demands its bandwidth now. If the bandwidth is not available, the voice quality degrades

greatly. As stated previously, users will rarely tolerate more than 150ms of delay between each packet of voice. Delay does not affect voice quality directly, but instead affects the character of a conversation. Below 100 ms, most users will not notice the delay. Between 100 ms and 300 ms, users will notice a slight hesitation in their partner’s response. This hesitation can affect how each listener perceives the mood of the conversation. In this situation conversations can seem “cold.” Interruptions are more frequent and the conversation gets out of beat. At more than 150ms, the sound of the voice loses the quality that users are use too on the usual circuit switched networks that voice can use.

Below are some techniques that can be used to give the user that same expected quality that they got from their circuit switched network, but in a shared IP environment.

4.1 Increase the Available Bandwidth

This can sometimes be the most basic solution and the easiest of the solutions. If someone is running a G.711 CODED with a 10ms fill time over Ethernet, for only one call, they need 129.6 Kbps worth of bandwidth. If that same user only has a 56K line, they are not going to be able to have a decent IP voice call. The user can simply increase the available bandwidth to slightly exceed the 129.6 Kbps requirements and their voice quality will dramatically increase. This solution might not be viable if no more bandwidth is available.

4.2 Use a Different CODEC

The CODEC contains possible compression algorithms to be used on the voice.  Lets take the example above again. The user only wants one voice line over a 56 Kbps data connection. They also want to maintain their current fill time of 10ms. So, lets change to a G.729 CS-ACELP. Now for one line, only 40 Kbps is required for a call (see table 5.2). This fits well within the 56 Kbps of available bandwidth. The savings in bandwidth is from the voice being compressed in some form or fashion, allowing less than 64K to be used. This solution might not work for various reasons. For example where FAX Machines are involved, they can only use the G.711 CODEC, or the routers in the user’s network might not support that CODEC.

4.3 Increase the User’s Packet Fill Time

To continue with the example above, the user has moved to a G.729 CODEC. But now, the user wishes to add a 2nd voice line. Their current 56 Kbps line can handle one just fine, because it is only 40Kbps. A 2nd line will mean a grand total of 80 Kbps. The user can now increase the fill time to 20ms. Now each call will only take 24 Kbps. So 2 X 24 = 48. Well within the bandwidth requirements. The user could further increase the fill time to 40ms and add a 3rd line if desired. 3 X 16 = 48. The savings in bandwidth comes from the fact that with a longer fill time, fewer packets are needed to send the voice. With fewer packets there is less header information that needs to be attached and transmitted.  The fill time is the amount of time that the voice will be sampled (recorded) before loading it into a packet for transmission. 10ms is often thought to be a good fill time because if a packet is lost (remember IP is a best effort network), then it usually isn’t noticeable to the user. If you increase the fill time, the more apparent lost packets will be to users.

4.4 Change Layer 2 Protocols

Ethernet is most commonly used for IP packets. Unfortunately Ethernet has a fairly large overhead of 34 bytes. So every IP voice packet going over Ethernet is going to have a 34 byte Ethernet header attached to it. As the number of packets add up, this amount of header data can become significant. Frame Relay has a 7-byte header and Point-to-Point Protocol (PPP) has a 6-byte header. With this decrease in header length at layer 2, some significant savings in bandwidth use can be achieved.  The down side to this is that most networks may not have these services available, where Ethernet is very widely used. Plus Frame Relay and PPP have their own specific difficulties. Frame Relay usually has a Committed Information Rate (CIR) that means it will not guarantee anything above that. PPP is actually a connection oriented technology that requires a connection between the two endpoints be established before any data can be sent. NEC strongly advises that users do more research on other layer 2 protocols before trying to implement them in their voice network.

4.5 Implement Quality of Service (QOS)

Now, assume a derivative of the above example. The user needs only one voice line over their 56 Kbps connection. They are using G.729 CS-ACELP with a 10ms fill time. This will require 40 Kbps of their available bandwidth. Let us now also assume that this line is used at certain times of the day for data connectivity. This data connectivity is very light, only 10 Kbps or so during most of the day, but does spike to 50 Kbps during certain points of the day. This data is not time sensitive like the voice data, so if necessary it could be forced to wait. Therefore the user can implement a Quality of Service (QOS) to the IP network. At its most basic form, this denotes certain IP packets as being more important than others. So they would tell this 56Kbps line that IP packets with voice deserve a higher priority than those without voice. This would allow whatever network devices to have the voice packets bumped to the front of the line in front of the other data, so the quality of the call will not be compromised.  2 different types of QOS for their voice packets. These are IP Precedence and Differentiated Service (Diff-Serv), Note: With the release of the IPW-2U, VLAN will be supported. IP Precedence allows 3 bits in the IP headers to be set as a Type of Service (ToS). This way, data can be classified. So voice data can be given a classification different from other data. With this extra information, the network devices can give data classified as voice a higher priority and move it to the head of the line when transmitting. Differentiated Service (Diff-Serv) is similar to IP Precedence in that is categorizes traffic flows (sets of IP packets in this case). Based on information in the IP Header categories of micro flows can be established and given different service based on weighted priority queuing. For a more complete understanding of QOS in NEC’s IP telephony environment, please see application note AN2900-01-001, Designing the IP Telephony Network.  

5.0 Tables

Table 5.1: IP Voice Bandwidth Per Call/Line


 Notes: This table includes IP (Internet Protocol), UDP (User Datagram Protocol), and RTP (Real Time Protocol) in its Header Length G.723 ACELP and G.723.1 MQ-CLP only support 30-millisecond sampling

Calculations Data Payload Per Packet in Bytes: VSS = (VSI * CBW)/8 VSS: Voice Sample Size in Bytes VSI: Voice Sample Interval in Milliseconds CBW: CODEC Bandwidth in Kbps Bandwidth = (1/Filler Time * 320) + CODEC Bandwidth

TABLE 5.2: Ethernet (802.3) Voice Bandwidth Per Call/Line


 Notes: This table includes IP (Internet Protocol), UDP (User Datagram Protocol), and RTP (Real Time Protocol) in its Header Length.  G.723 ACELP and G.723.1 MQ-CLP only support 30-millisecond sampling.

Calculations: Data Payload = Ethernet Header (34 bytes) + IP Header (20 bytes) + UDP Header (8 bytes) + RTP Header (12

bytes) + VSS IP Header + UDP Header + RTP Header = Header Length in TABLE 1 (40 bytes) VSS: Voice Sample Size in Bytes from TABLE 1 Bandwidth = (Header Length in bytes + IP Datagram in bytes)/Filler Time in Milliseconds Multiple by 1000 then again by 8 then divide by 1000 to get Kbps

For example: ((((34+120)/10) * 1000) * 8) / 1000 = 123.2 Kbps

Table 5.3: Frame Relay Voice Bandwidth Per Call/Line

Notes: This table includes IP (Internet Protocol), UDP (User Datagram Protocol), and RTP (Real Time Protocol) in its Header Length

Calculations: Data Payload Per Packet in Bytes: VSS = (VSI * CBW)/8 VSS: Voice Sample Size in Bytes VSI: Voice Sample Interval in Milliseconds CBW: CODEC Bandwidth in Kbps Bandwidth = (Header Length in bytes + IP Datagram in bytes)/Filler Time in Milliseconds Multiple by 1000, then again by 8, then divide by 1000 to get Kbps

Table 5.4: PPP (Point to Point Protocol) Voice Bandwidth Per Call/Line

 Notes: This table includes IP (Internet Protocol), UDP (User Datagram Protocol), and RTP (Real Time Protocol) in its Header Length

Calculations: Data Payload Per Packet in Bytes: VSS = (VSI * CBW)/8 VSS: Voice Sample Size in Bytes VSI: Voice Sample Interval in Milliseconds CBW: CODEC Bandwidth in Kbps Bandwidth = (Header Length in bytes + IP Datagram in bytes)/Filler Time in Milliseconds Multiple by 1000, then again by 8, then divide by 1000 to get Kbps.

Be the first to comment

Leave a Reply