An IP packet consists of a payload and some amount of overhead, where the payload consists of actual sampled voice, and the overhead represents headers and trailers, which serve to navigate the packet to its proper destination. The overhead due to IP, UDP, and RTP is 40 bytes, while the Ethernet overhead is between 18 and 22 bytes (18 is assumed in this discussion). This represents a total overhead of 58 bytes (464 bits), regardless of the nature of the payload. For this example, Layer 2 (Ethernet) overhead has been included in that total. At every router boundary, because we have included Ethernet overhead in this example, our calculations are for bandwidth on a LAN. As WAN protocol (for example, ppp) Layer 2 headers are generally smaller than Ethernet headers, WAN bandwidth is typically less than LAN bandwidth.
The size of the payload depends upon certain parameters relating to the codec being used. The two most common codecs used with Communication Manager products are (uncompressed) G.711 and (compressed) G.729. The transmission rates associated with those codecs are 64 kbps for G.711 (this is the Nyquist sampling rate for human voice) and 8 kbps for G.729.
The packet size is sometimes expressed in units of time (specifically, in milliseconds). The following formula yields the packet size, expressed in bits:
number of bits of payload per packet = transmission rate (kbps) x milliseconds per packet
Payload size per packet, which has been populated using this formula, provides the payload size per packet (expressed in bits), as a function of packet size (milliseconds per packet) and codec:
Table 1: Payload size per packet
Packet Size |
G.711 |
G.729 |
10 ms |
640 bits |
80 bits |
20 ms |
1280 bits |
160 bits |
30 ms |
1920 bits |
240 bits |
60 ms |
3840 bits |
480 bits |
Note that the number of bits of payload per packet depends on the packet size, but it is independent of the sizes of the individual frames contained in that packet. For example, a packet size of 60 ms could be referring to six 10-ms frames per packet, or three 20-ms frames per packet, or two 30-ms frames per packet. Presently, the most commonly-used packet sizes are 20 ms. Both G.711 and G.729 codecs typically utilize two 10-ms frames per packet.
As stated earlier, there is typically an overhead of 464 bits per packet in a LAN scenario. So, the bandwidth (expressed in kbps) associated with a unidirectional media stream (assuming no Silence Suppression is used) is augmented from 64 kbps and 8 kbps (for G.711 and G.729, respectively) to account for this overhead. The results of this exercise are provided in the Typical LAN bandwidth requirements for media streams:
Table 2: Typical LAN bandwidth requirements for media streams
Packet Size |
G.711 |
G.729 |
10 ms |
110.4 kbps |
54.4 kbps |
20 ms |
87.2 kbps |
31.2 kbps |
30 ms |
79.5 kbps |
23.5 kbps |
60 ms |
71.7 kbps |
15.7 kbps |
The kilobits per second values in Typical LAN bandwidth requirements for media streams were calculated by multiplying the transmission rate by the ratio of the total bits per packet (payload plus overhead) to the payload bits per packet. For example, for the G.711 codec, 20–ms packets, and 58 bytes of overhead per packet, the bandwidth per call is
(64 kbps)[(1280 + 464) / 1280] = 87.2 kbps
Note that the entries in Typical LAN bandwidth requirements for media streams correspond with unidirectional media streams. A full-duplex connection with a kilobits per second capacity at least as large as the number in one of the table cells would be sufficient for carrying a two-way voice stream using the corresponding codec, packet size, and packet overhead. In other words, a full-duplex connection with a particular capacity rating would support enough bandwidth to carry that capacity in both directions. Alternatively, two half-duplex connections of the same capacity rating could be used.