]> code.delx.au - pulseaudio/blob - src/modules/rtp/rfc3551.txt
Use pa_hashmap_remove_and_free() where appropriate
[pulseaudio] / src / modules / rtp / rfc3551.txt
1
2
3
4
5
6
7 Network Working Group H. Schulzrinne
8 Request for Comments: 3551 Columbia University
9 Obsoletes: 1890 S. Casner
10 Category: Standards Track Packet Design
11 July 2003
12
13
14 RTP Profile for Audio and Video Conferences
15 with Minimal Control
16
17 Status of this Memo
18
19 This document specifies an Internet standards track protocol for the
20 Internet community, and requests discussion and suggestions for
21 improvements. Please refer to the current edition of the "Internet
22 Official Protocol Standards" (STD 1) for the standardization state
23 and status of this protocol. Distribution of this memo is unlimited.
24
25 Copyright Notice
26
27 Copyright (C) The Internet Society (2003). All Rights Reserved.
28
29 Abstract
30
31 This document describes a profile called "RTP/AVP" for the use of the
32 real-time transport protocol (RTP), version 2, and the associated
33 control protocol, RTCP, within audio and video multiparticipant
34 conferences with minimal control. It provides interpretations of
35 generic fields within the RTP specification suitable for audio and
36 video conferences. In particular, this document defines a set of
37 default mappings from payload type numbers to encodings.
38
39 This document also describes how audio and video data may be carried
40 within RTP. It defines a set of standard encodings and their names
41 when used within RTP. The descriptions provide pointers to reference
42 implementations and the detailed standards. This document is meant
43 as an aid for implementors of audio, video and other real-time
44 multimedia applications.
45
46 This memorandum obsoletes RFC 1890. It is mostly backwards-
47 compatible except for functions removed because two interoperable
48 implementations were not found. The additions to RFC 1890 codify
49 existing practice in the use of payload formats under this profile
50 and include new payload formats defined since RFC 1890 was published.
51
52
53
54
55
56
57
58 Schulzrinne & Casner Standards Track [Page 1]
59 \f
60 RFC 3551 RTP A/V Profile July 2003
61
62
63 Table of Contents
64
65 1. Introduction ................................................. 3
66 1.1 Terminology ............................................. 3
67 2. RTP and RTCP Packet Forms and Protocol Behavior .............. 4
68 3. Registering Additional Encodings ............................. 6
69 4. Audio ........................................................ 8
70 4.1 Encoding-Independent Rules .............................. 8
71 4.2 Operating Recommendations ............................... 9
72 4.3 Guidelines for Sample-Based Audio Encodings ............. 10
73 4.4 Guidelines for Frame-Based Audio Encodings .............. 11
74 4.5 Audio Encodings ......................................... 12
75 4.5.1 DVI4 ............................................ 13
76 4.5.2 G722 ............................................ 14
77 4.5.3 G723 ............................................ 14
78 4.5.4 G726-40, G726-32, G726-24, and G726-16 .......... 18
79 4.5.5 G728 ............................................ 19
80 4.5.6 G729 ............................................ 20
81 4.5.7 G729D and G729E ................................. 22
82 4.5.8 GSM ............................................. 24
83 4.5.9 GSM-EFR ......................................... 27
84 4.5.10 L8 .............................................. 27
85 4.5.11 L16 ............................................. 27
86 4.5.12 LPC ............................................. 27
87 4.5.13 MPA ............................................. 28
88 4.5.14 PCMA and PCMU ................................... 28
89 4.5.15 QCELP ........................................... 28
90 4.5.16 RED ............................................. 29
91 4.5.17 VDVI ............................................ 29
92 5. Video ........................................................ 30
93 5.1 CelB .................................................... 30
94 5.2 JPEG .................................................... 30
95 5.3 H261 .................................................... 30
96 5.4 H263 .................................................... 31
97 5.5 H263-1998 ............................................... 31
98 5.6 MPV ..................................................... 31
99 5.7 MP2T .................................................... 31
100 5.8 nv ...................................................... 32
101 6. Payload Type Definitions ..................................... 32
102 7. RTP over TCP and Similar Byte Stream Protocols ............... 34
103 8. Port Assignment .............................................. 34
104 9. Changes from RFC 1890 ........................................ 35
105 10. Security Considerations ...................................... 38
106 11. IANA Considerations .......................................... 39
107 12. References ................................................... 39
108 12.1 Normative References .................................... 39
109 12.2 Informative References .................................. 39
110 13. Current Locations of Related Resources ....................... 41
111
112
113
114 Schulzrinne & Casner Standards Track [Page 2]
115 \f
116 RFC 3551 RTP A/V Profile July 2003
117
118
119 14. Acknowledgments .............................................. 42
120 15. Intellectual Property Rights Statement ....................... 43
121 16. Authors' Addresses ........................................... 43
122 17. Full Copyright Statement ..................................... 44
123
124 1. Introduction
125
126 This profile defines aspects of RTP left unspecified in the RTP
127 Version 2 protocol definition (RFC 3550) [1]. This profile is
128 intended for the use within audio and video conferences with minimal
129 session control. In particular, no support for the negotiation of
130 parameters or membership control is provided. The profile is
131 expected to be useful in sessions where no negotiation or membership
132 control are used (e.g., using the static payload types and the
133 membership indications provided by RTCP), but this profile may also
134 be useful in conjunction with a higher-level control protocol.
135
136 Use of this profile may be implicit in the use of the appropriate
137 applications; there may be no explicit indication by port number,
138 protocol identifier or the like. Applications such as session
139 directories may use the name for this profile specified in Section
140 11.
141
142 Other profiles may make different choices for the items specified
143 here.
144
145 This document also defines a set of encodings and payload formats for
146 audio and video. These payload format descriptions are included here
147 only as a matter of convenience since they are too small to warrant
148 separate documents. Use of these payload formats is NOT REQUIRED to
149 use this profile. Only the binding of some of the payload formats to
150 static payload type numbers in Tables 4 and 5 is normative.
151
152 1.1 Terminology
153
154 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
155 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
156 document are to be interpreted as described in RFC 2119 [2] and
157 indicate requirement levels for implementations compliant with this
158 RTP profile.
159
160 This document defines the term media type as dividing encodings of
161 audio and video content into three classes: audio, video and
162 audio/video (interleaved).
163
164
165
166
167
168
169
170 Schulzrinne & Casner Standards Track [Page 3]
171 \f
172 RFC 3551 RTP A/V Profile July 2003
173
174
175 2. RTP and RTCP Packet Forms and Protocol Behavior
176
177 The section "RTP Profiles and Payload Format Specifications" of RFC
178 3550 enumerates a number of items that can be specified or modified
179 in a profile. This section addresses these items. Generally, this
180 profile follows the default and/or recommended aspects of the RTP
181 specification.
182
183 RTP data header: The standard format of the fixed RTP data
184 header is used (one marker bit).
185
186 Payload types: Static payload types are defined in Section 6.
187
188 RTP data header additions: No additional fixed fields are
189 appended to the RTP data header.
190
191 RTP data header extensions: No RTP header extensions are
192 defined, but applications operating under this profile MAY use
193 such extensions. Thus, applications SHOULD NOT assume that the
194 RTP header X bit is always zero and SHOULD be prepared to ignore
195 the header extension. If a header extension is defined in the
196 future, that definition MUST specify the contents of the first 16
197 bits in such a way that multiple different extensions can be
198 identified.
199
200 RTCP packet types: No additional RTCP packet types are defined
201 by this profile specification.
202
203 RTCP report interval: The suggested constants are to be used for
204 the RTCP report interval calculation. Sessions operating under
205 this profile MAY specify a separate parameter for the RTCP traffic
206 bandwidth rather than using the default fraction of the session
207 bandwidth. The RTCP traffic bandwidth MAY be divided into two
208 separate session parameters for those participants which are
209 active data senders and those which are not. Following the
210 recommendation in the RTP specification [1] that 1/4 of the RTCP
211 bandwidth be dedicated to data senders, the RECOMMENDED default
212 values for these two parameters would be 1.25% and 3.75%,
213 respectively. For a particular session, the RTCP bandwidth for
214 non-data-senders MAY be set to zero when operating on
215 unidirectional links or for sessions that don't require feedback
216 on the quality of reception. The RTCP bandwidth for data senders
217 SHOULD be kept non-zero so that sender reports can still be sent
218 for inter-media synchronization and to identify the source by
219 CNAME. The means by which the one or two session parameters for
220 RTCP bandwidth are specified is beyond the scope of this memo.
221
222
223
224
225
226 Schulzrinne & Casner Standards Track [Page 4]
227 \f
228 RFC 3551 RTP A/V Profile July 2003
229
230
231 SR/RR extension: No extension section is defined for the RTCP SR
232 or RR packet.
233
234 SDES use: Applications MAY use any of the SDES items described
235 in the RTP specification. While CNAME information MUST be sent
236 every reporting interval, other items SHOULD only be sent every
237 third reporting interval, with NAME sent seven out of eight times
238 within that slot and the remaining SDES items cyclically taking up
239 the eighth slot, as defined in Section 6.2.2 of the RTP
240 specification. In other words, NAME is sent in RTCP packets 1, 4,
241 7, 10, 13, 16, 19, while, say, EMAIL is used in RTCP packet 22.
242
243 Security: The RTP default security services are also the default
244 under this profile.
245
246 String-to-key mapping: No mapping is specified by this profile.
247
248 Congestion: RTP and this profile may be used in the context of
249 enhanced network service, for example, through Integrated Services
250 (RFC 1633) [4] or Differentiated Services (RFC 2475) [5], or they
251 may be used with best effort service.
252
253 If enhanced service is being used, RTP receivers SHOULD monitor
254 packet loss to ensure that the service that was requested is
255 actually being delivered. If it is not, then they SHOULD assume
256 that they are receiving best-effort service and behave
257 accordingly.
258
259 If best-effort service is being used, RTP receivers SHOULD monitor
260 packet loss to ensure that the packet loss rate is within
261 acceptable parameters. Packet loss is considered acceptable if a
262 TCP flow across the same network path and experiencing the same
263 network conditions would achieve an average throughput, measured
264 on a reasonable timescale, that is not less than the RTP flow is
265 achieving. This condition can be satisfied by implementing
266 congestion control mechanisms to adapt the transmission rate (or
267 the number of layers subscribed for a layered multicast session),
268 or by arranging for a receiver to leave the session if the loss
269 rate is unacceptably high.
270
271 The comparison to TCP cannot be specified exactly, but is intended
272 as an "order-of-magnitude" comparison in timescale and throughput.
273 The timescale on which TCP throughput is measured is the round-
274 trip time of the connection. In essence, this requirement states
275 that it is not acceptable to deploy an application (using RTP or
276 any other transport protocol) on the best-effort Internet which
277 consumes bandwidth arbitrarily and does not compete fairly with
278 TCP within an order of magnitude.
279
280
281
282 Schulzrinne & Casner Standards Track [Page 5]
283 \f
284 RFC 3551 RTP A/V Profile July 2003
285
286
287 Underlying protocol: The profile specifies the use of RTP over
288 unicast and multicast UDP as well as TCP. (This does not preclude
289 the use of these definitions when RTP is carried by other lower-
290 layer protocols.)
291
292 Transport mapping: The standard mapping of RTP and RTCP to
293 transport-level addresses is used.
294
295 Encapsulation: This profile leaves to applications the
296 specification of RTP encapsulation in protocols other than UDP.
297
298 3. Registering Additional Encodings
299
300 This profile lists a set of encodings, each of which is comprised of
301 a particular media data compression or representation plus a payload
302 format for encapsulation within RTP. Some of those payload formats
303 are specified here, while others are specified in separate RFCs. It
304 is expected that additional encodings beyond the set listed here will
305 be created in the future and specified in additional payload format
306 RFCs.
307
308 This profile also assigns to each encoding a short name which MAY be
309 used by higher-level control protocols, such as the Session
310 Description Protocol (SDP), RFC 2327 [6], to identify encodings
311 selected for a particular RTP session.
312
313 In some contexts it may be useful to refer to these encodings in the
314 form of a MIME content-type. To facilitate this, RFC 3555 [7]
315 provides registrations for all of the encodings names listed here as
316 MIME subtype names under the "audio" and "video" MIME types through
317 the MIME registration procedure as specified in RFC 2048 [8].
318
319 Any additional encodings specified for use under this profile (or
320 others) may also be assigned names registered as MIME subtypes with
321 the Internet Assigned Numbers Authority (IANA). This registry
322 provides a means to insure that the names assigned to the additional
323 encodings are kept unique. RFC 3555 specifies the information that
324 is required for the registration of RTP encodings.
325
326 In addition to assigning names to encodings, this profile also
327 assigns static RTP payload type numbers to some of them. However,
328 the payload type number space is relatively small and cannot
329 accommodate assignments for all existing and future encodings.
330 During the early stages of RTP development, it was necessary to use
331 statically assigned payload types because no other mechanism had been
332 specified to bind encodings to payload types. It was anticipated
333 that non-RTP means beyond the scope of this memo (such as directory
334 services or invitation protocols) would be specified to establish a
335
336
337
338 Schulzrinne & Casner Standards Track [Page 6]
339 \f
340 RFC 3551 RTP A/V Profile July 2003
341
342
343 dynamic mapping between a payload type and an encoding. Now,
344 mechanisms for defining dynamic payload type bindings have been
345 specified in the Session Description Protocol (SDP) and in other
346 protocols such as ITU-T Recommendation H.323/H.245. These mechanisms
347 associate the registered name of the encoding/payload format, along
348 with any additional required parameters, such as the RTP timestamp
349 clock rate and number of channels, with a payload type number. This
350 association is effective only for the duration of the RTP session in
351 which the dynamic payload type binding is made. This association
352 applies only to the RTP session for which it is made, thus the
353 numbers can be re-used for different encodings in different sessions
354 so the number space limitation is avoided.
355
356 This profile reserves payload type numbers in the range 96-127
357 exclusively for dynamic assignment. Applications SHOULD first use
358 values in this range for dynamic payload types. Those applications
359 which need to define more than 32 dynamic payload types MAY bind
360 codes below 96, in which case it is RECOMMENDED that unassigned
361 payload type numbers be used first. However, the statically assigned
362 payload types are default bindings and MAY be dynamically bound to
363 new encodings if needed. Redefining payload types below 96 may cause
364 incorrect operation if an attempt is made to join a session without
365 obtaining session description information that defines the dynamic
366 payload types.
367
368 Dynamic payload types SHOULD NOT be used without a well-defined
369 mechanism to indicate the mapping. Systems that expect to
370 interoperate with others operating under this profile SHOULD NOT make
371 their own assignments of proprietary encodings to particular, fixed
372 payload types.
373
374 This specification establishes the policy that no additional static
375 payload types will be assigned beyond the ones defined in this
376 document. Establishing this policy avoids the problem of trying to
377 create a set of criteria for accepting static assignments and
378 encourages the implementation and deployment of the dynamic payload
379 type mechanisms.
380
381 The final set of static payload type assignments is provided in
382 Tables 4 and 5.
383
384
385
386
387
388
389
390
391
392
393
394 Schulzrinne & Casner Standards Track [Page 7]
395 \f
396 RFC 3551 RTP A/V Profile July 2003
397
398
399 4. Audio
400
401 4.1 Encoding-Independent Rules
402
403 Since the ability to suppress silence is one of the primary
404 motivations for using packets to transmit voice, the RTP header
405 carries both a sequence number and a timestamp to allow a receiver to
406 distinguish between lost packets and periods of time when no data was
407 transmitted. Discontiguous transmission (silence suppression) MAY be
408 used with any audio payload format. Receivers MUST assume that
409 senders may suppress silence unless this is restricted by signaling
410 specified elsewhere. (Even if the transmitter does not suppress
411 silence, the receiver should be prepared to handle periods when no
412 data is present since packets may be lost.)
413
414 Some payload formats (see Sections 4.5.3 and 4.5.6) define a "silence
415 insertion descriptor" or "comfort noise" frame to specify parameters
416 for artificial noise that may be generated during a period of silence
417 to approximate the background noise at the source. For other payload
418 formats, a generic Comfort Noise (CN) payload format is specified in
419 RFC 3389 [9]. When the CN payload format is used with another
420 payload format, different values in the RTP payload type field
421 distinguish comfort-noise packets from those of the selected payload
422 format.
423
424 For applications which send either no packets or occasional comfort-
425 noise packets during silence, the first packet of a talkspurt, that
426 is, the first packet after a silence period during which packets have
427 not been transmitted contiguously, SHOULD be distinguished by setting
428 the marker bit in the RTP data header to one. The marker bit in all
429 other packets is zero. The beginning of a talkspurt MAY be used to
430 adjust the playout delay to reflect changing network delays.
431 Applications without silence suppression MUST set the marker bit to
432 zero.
433
434 The RTP clock rate used for generating the RTP timestamp is
435 independent of the number of channels and the encoding; it usually
436 equals the number of sampling periods per second. For N-channel
437 encodings, each sampling period (say, 1/8,000 of a second) generates
438 N samples. (This terminology is standard, but somewhat confusing, as
439 the total number of samples generated per second is then the sampling
440 rate times the channel count.)
441
442 If multiple audio channels are used, channels are numbered left-to-
443 right, starting at one. In RTP audio packets, information from
444 lower-numbered channels precedes that from higher-numbered channels.
445
446
447
448
449
450 Schulzrinne & Casner Standards Track [Page 8]
451 \f
452 RFC 3551 RTP A/V Profile July 2003
453
454
455 For more than two channels, the convention followed by the AIFF-C
456 audio interchange format SHOULD be followed [3], using the following
457 notation, unless some other convention is specified for a particular
458 encoding or payload format:
459
460 l left
461 r right
462 c center
463 S surround
464 F front
465 R rear
466
467 channels description channel
468 1 2 3 4 5 6
469 _________________________________________________
470 2 stereo l r
471 3 l r c
472 4 l c r S
473 5 Fl Fr Fc Sl Sr
474 6 l lc c r rc S
475
476 Note: RFC 1890 defined two conventions for the ordering of four
477 audio channels. Since the ordering is indicated implicitly by
478 the number of channels, this was ambiguous. In this revision,
479 the order described as "quadrophonic" has been eliminated to
480 remove the ambiguity. This choice was based on the observation
481 that quadrophonic consumer audio format did not become popular
482 whereas surround-sound subsequently has.
483
484 Samples for all channels belonging to a single sampling instant MUST
485 be within the same packet. The interleaving of samples from
486 different channels depends on the encoding. General guidelines are
487 given in Section 4.3 and 4.4.
488
489 The sampling frequency SHOULD be drawn from the set: 8,000, 11,025,
490 16,000, 22,050, 24,000, 32,000, 44,100 and 48,000 Hz. (Older Apple
491 Macintosh computers had a native sample rate of 22,254.54 Hz, which
492 can be converted to 22,050 with acceptable quality by dropping 4
493 samples in a 20 ms frame.) However, most audio encodings are defined
494 for a more restricted set of sampling frequencies. Receivers SHOULD
495 be prepared to accept multi-channel audio, but MAY choose to only
496 play a single channel.
497
498 4.2 Operating Recommendations
499
500 The following recommendations are default operating parameters.
501 Applications SHOULD be prepared to handle other values. The ranges
502 given are meant to give guidance to application writers, allowing a
503
504
505
506 Schulzrinne & Casner Standards Track [Page 9]
507 \f
508 RFC 3551 RTP A/V Profile July 2003
509
510
511 set of applications conforming to these guidelines to interoperate
512 without additional negotiation. These guidelines are not intended to
513 restrict operating parameters for applications that can negotiate a
514 set of interoperable parameters, e.g., through a conference control
515 protocol.
516
517 For packetized audio, the default packetization interval SHOULD have
518 a duration of 20 ms or one frame, whichever is longer, unless
519 otherwise noted in Table 1 (column "ms/packet"). The packetization
520 interval determines the minimum end-to-end delay; longer packets
521 introduce less header overhead but higher delay and make packet loss
522 more noticeable. For non-interactive applications such as lectures
523 or for links with severe bandwidth constraints, a higher
524 packetization delay MAY be used. A receiver SHOULD accept packets
525 representing between 0 and 200 ms of audio data. (For framed audio
526 encodings, a receiver SHOULD accept packets with a number of frames
527 equal to 200 ms divided by the frame duration, rounded up.) This
528 restriction allows reasonable buffer sizing for the receiver.
529
530 4.3 Guidelines for Sample-Based Audio Encodings
531
532 In sample-based encodings, each audio sample is represented by a
533 fixed number of bits. Within the compressed audio data, codes for
534 individual samples may span octet boundaries. An RTP audio packet
535 may contain any number of audio samples, subject to the constraint
536 that the number of bits per sample times the number of samples per
537 packet yields an integral octet count. Fractional encodings produce
538 less than one octet per sample.
539
540 The duration of an audio packet is determined by the number of
541 samples in the packet.
542
543 For sample-based encodings producing one or more octets per sample,
544 samples from different channels sampled at the same sampling instant
545 SHOULD be packed in consecutive octets. For example, for a two-
546 channel encoding, the octet sequence is (left channel, first sample),
547 (right channel, first sample), (left channel, second sample), (right
548 channel, second sample), .... For multi-octet encodings, octets
549 SHOULD be transmitted in network byte order (i.e., most significant
550 octet first).
551
552 The packing of sample-based encodings producing less than one octet
553 per sample is encoding-specific.
554
555 The RTP timestamp reflects the instant at which the first sample in
556 the packet was sampled, that is, the oldest information in the
557 packet.
558
559
560
561
562 Schulzrinne & Casner Standards Track [Page 10]
563 \f
564 RFC 3551 RTP A/V Profile July 2003
565
566
567 4.4 Guidelines for Frame-Based Audio Encodings
568
569 Frame-based encodings encode a fixed-length block of audio into
570 another block of compressed data, typically also of fixed length.
571 For frame-based encodings, the sender MAY choose to combine several
572 such frames into a single RTP packet. The receiver can tell the
573 number of frames contained in an RTP packet, if all the frames have
574 the same length, by dividing the RTP payload length by the audio
575 frame size which is defined as part of the encoding. This does not
576 work when carrying frames of different sizes unless the frame sizes
577 are relatively prime. If not, the frames MUST indicate their size.
578
579 For frame-based codecs, the channel order is defined for the whole
580 block. That is, for two-channel audio, right and left samples SHOULD
581 be coded independently, with the encoded frame for the left channel
582 preceding that for the right channel.
583
584 All frame-oriented audio codecs SHOULD be able to encode and decode
585 several consecutive frames within a single packet. Since the frame
586 size for the frame-oriented codecs is given, there is no need to use
587 a separate designation for the same encoding, but with different
588 number of frames per packet.
589
590 RTP packets SHALL contain a whole number of frames, with frames
591 inserted according to age within a packet, so that the oldest frame
592 (to be played first) occurs immediately after the RTP packet header.
593 The RTP timestamp reflects the instant at which the first sample in
594 the first frame was sampled, that is, the oldest information in the
595 packet.
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618 Schulzrinne & Casner Standards Track [Page 11]
619 \f
620 RFC 3551 RTP A/V Profile July 2003
621
622
623 4.5 Audio Encodings
624
625 name of sampling default
626 encoding sample/frame bits/sample rate ms/frame ms/packet
627 __________________________________________________________________
628 DVI4 sample 4 var. 20
629 G722 sample 8 16,000 20
630 G723 frame N/A 8,000 30 30
631 G726-40 sample 5 8,000 20
632 G726-32 sample 4 8,000 20
633 G726-24 sample 3 8,000 20
634 G726-16 sample 2 8,000 20
635 G728 frame N/A 8,000 2.5 20
636 G729 frame N/A 8,000 10 20
637 G729D frame N/A 8,000 10 20
638 G729E frame N/A 8,000 10 20
639 GSM frame N/A 8,000 20 20
640 GSM-EFR frame N/A 8,000 20 20
641 L8 sample 8 var. 20
642 L16 sample 16 var. 20
643 LPC frame N/A 8,000 20 20
644 MPA frame N/A var. var.
645 PCMA sample 8 var. 20
646 PCMU sample 8 var. 20
647 QCELP frame N/A 8,000 20 20
648 VDVI sample var. var. 20
649
650 Table 1: Properties of Audio Encodings (N/A: not applicable; var.:
651 variable)
652
653 The characteristics of the audio encodings described in this document
654 are shown in Table 1; they are listed in order of their payload type
655 in Table 4. While most audio codecs are only specified for a fixed
656 sampling rate, some sample-based algorithms (indicated by an entry of
657 "var." in the sampling rate column of Table 1) may be used with
658 different sampling rates, resulting in different coded bit rates.
659 When used with a sampling rate other than that for which a static
660 payload type is defined, non-RTP means beyond the scope of this memo
661 MUST be used to define a dynamic payload type and MUST indicate the
662 selected RTP timestamp clock rate, which is usually the same as the
663 sampling rate for audio.
664
665
666
667
668
669
670
671
672
673
674 Schulzrinne & Casner Standards Track [Page 12]
675 \f
676 RFC 3551 RTP A/V Profile July 2003
677
678
679 4.5.1 DVI4
680
681 DVI4 uses an adaptive delta pulse code modulation (ADPCM) encoding
682 scheme that was specified by the Interactive Multimedia Association
683 (IMA) as the "IMA ADPCM wave type". However, the encoding defined
684 here as DVI4 differs in three respects from the IMA specification:
685
686 o The RTP DVI4 header contains the predicted value rather than the
687 first sample value contained the IMA ADPCM block header.
688
689 o IMA ADPCM blocks contain an odd number of samples, since the first
690 sample of a block is contained just in the header (uncompressed),
691 followed by an even number of compressed samples. DVI4 has an
692 even number of compressed samples only, using the `predict' word
693 from the header to decode the first sample.
694
695 o For DVI4, the 4-bit samples are packed with the first sample in
696 the four most significant bits and the second sample in the four
697 least significant bits. In the IMA ADPCM codec, the samples are
698 packed in the opposite order.
699
700 Each packet contains a single DVI block. This profile only defines
701 the 4-bit-per-sample version, while IMA also specified a 3-bit-per-
702 sample encoding.
703
704 The "header" word for each channel has the following structure:
705
706 int16 predict; /* predicted value of first sample
707 from the previous block (L16 format) */
708 u_int8 index; /* current index into stepsize table */
709 u_int8 reserved; /* set to zero by sender, ignored by receiver */
710
711 Each octet following the header contains two 4-bit samples, thus the
712 number of samples per packet MUST be even because there is no means
713 to indicate a partially filled last octet.
714
715 Packing of samples for multiple channels is for further study.
716
717 The IMA ADPCM algorithm was described in the document IMA Recommended
718 Practices for Enhancing Digital Audio Compatibility in Multimedia
719 Systems (version 3.0). However, the Interactive Multimedia
720 Association ceased operations in 1997. Resources for an archived
721 copy of that document and a software implementation of the RTP DVI4
722 encoding are listed in Section 13.
723
724
725
726
727
728
729
730 Schulzrinne & Casner Standards Track [Page 13]
731 \f
732 RFC 3551 RTP A/V Profile July 2003
733
734
735 4.5.2 G722
736
737 G722 is specified in ITU-T Recommendation G.722, "7 kHz audio-coding
738 within 64 kbit/s". The G.722 encoder produces a stream of octets,
739 each of which SHALL be octet-aligned in an RTP packet. The first bit
740 transmitted in the G.722 octet, which is the most significant bit of
741 the higher sub-band sample, SHALL correspond to the most significant
742 bit of the octet in the RTP packet.
743
744 Even though the actual sampling rate for G.722 audio is 16,000 Hz,
745 the RTP clock rate for the G722 payload format is 8,000 Hz because
746 that value was erroneously assigned in RFC 1890 and must remain
747 unchanged for backward compatibility. The octet rate or sample-pair
748 rate is 8,000 Hz.
749
750 4.5.3 G723
751
752 G723 is specified in ITU Recommendation G.723.1, "Dual-rate speech
753 coder for multimedia communications transmitting at 5.3 and 6.3
754 kbit/s". The G.723.1 5.3/6.3 kbit/s codec was defined by the ITU-T
755 as a mandatory codec for ITU-T H.324 GSTN videophone terminal
756 applications. The algorithm has a floating point specification in
757 Annex B to G.723.1, a silence compression algorithm in Annex A to
758 G.723.1 and a scalable channel coding scheme for wireless
759 applications in G.723.1 Annex C.
760
761 This Recommendation specifies a coded representation that can be used
762 for compressing the speech signal component of multi-media services
763 at a very low bit rate. Audio is encoded in 30 ms frames, with an
764 additional delay of 7.5 ms due to look-ahead. A G.723.1 frame can be
765 one of three sizes: 24 octets (6.3 kb/s frame), 20 octets (5.3 kb/s
766 frame), or 4 octets. These 4-octet frames are called SID frames
767 (Silence Insertion Descriptor) and are used to specify comfort noise
768 parameters. There is no restriction on how 4, 20, and 24 octet
769 frames are intermixed. The least significant two bits of the first
770 octet in the frame determine the frame size and codec type:
771
772 bits content octets/frame
773 00 high-rate speech (6.3 kb/s) 24
774 01 low-rate speech (5.3 kb/s) 20
775 10 SID frame 4
776 11 reserved
777
778
779
780
781
782
783
784
785
786 Schulzrinne & Casner Standards Track [Page 14]
787 \f
788 RFC 3551 RTP A/V Profile July 2003
789
790
791 It is possible to switch between the two rates at any 30 ms frame
792 boundary. Both (5.3 kb/s and 6.3 kb/s) rates are a mandatory part of
793 the encoder and decoder. Receivers MUST accept both data rates and
794 MUST accept SID frames unless restriction of these capabilities has
795 been signaled. The MIME registration for G723 in RFC 3555 [7]
796 specifies parameters that MAY be used with MIME or SDP to restrict to
797 a single data rate or to restrict the use of SID frames. This coder
798 was optimized to represent speech with near-toll quality at the above
799 rates using a limited amount of complexity.
800
801 The packing of the encoded bit stream into octets and the
802 transmission order of the octets is specified in Rec. G.723.1 and is
803 the same as that produced by the G.723 C code reference
804 implementation. For the 6.3 kb/s data rate, this packing is
805 illustrated as follows, where the header (HDR) bits are always "0 0"
806 as shown in Fig. 1 to indicate operation at 6.3 kb/s, and the Z bit
807 is always set to zero. The diagrams show the bit packing in "network
808 byte order", also known as big-endian order. The bits of each 32-bit
809 word are numbered 0 to 31, with the most significant bit on the left
810 and numbered 0. The octets (bytes) of each word are transmitted most
811 significant octet first. The bits of each data field are numbered in
812 the order of the bit stream representation of the encoding (least
813 significant bit first). The vertical bars indicate the boundaries
814 between field fragments.
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842 Schulzrinne & Casner Standards Track [Page 15]
843 \f
844 RFC 3551 RTP A/V Profile July 2003
845
846
847 0 1 2 3
848 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
849 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
850 | LPC |HDR| LPC | LPC | ACL0 |LPC|
851 | | | | | | |
852 |0 0 0 0 0 0|0 0|1 1 1 1 0 0 0 0|2 2 1 1 1 1 1 1|0 0 0 0 0 0|2 2|
853 |5 4 3 2 1 0| |3 2 1 0 9 8 7 6|1 0 9 8 7 6 5 4|5 4 3 2 1 0|3 2|
854 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
855 | ACL2 |ACL|A| GAIN0 |ACL|ACL| GAIN0 | GAIN1 |
856 | | 1 |C| | 3 | 2 | | |
857 |0 0 0 0 0|0 0|0|0 0 0 0|0 0|0 0|1 1 0 0 0 0 0 0|0 0 0 0 0 0 0 0|
858 |4 3 2 1 0|1 0|6|3 2 1 0|1 0|6 5|1 0 9 8 7 6 5 4|7 6 5 4 3 2 1 0|
859 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
860 | GAIN2 | GAIN1 | GAIN2 | GAIN3 | GRID | GAIN3 |
861 | | | | | | |
862 |0 0 0 0|1 1 0 0|1 1 0 0 0 0 0 0|0 0 0 0 0 0 0 0|0 0 0 0|1 1 0 0|
863 |3 2 1 0|1 0 9 8|1 0 9 8 7 6 5 4|7 6 5 4 3 2 1 0|3 2 1 0|1 0 9 8|
864 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
865 | MSBPOS |Z|POS| MSBPOS | POS0 |POS| POS0 |
866 | | | 0 | | | 1 | |
867 |0 0 0 0 0 0 0|0|0 0|1 1 1 0 0 0|0 0 0 0 0 0 0 0|0 0|1 1 1 1 1 1|
868 |6 5 4 3 2 1 0| |1 0|2 1 0 9 8 7|9 8 7 6 5 4 3 2|1 0|5 4 3 2 1 0|
869 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
870 | POS1 | POS2 | POS1 | POS2 | POS3 | POS2 |
871 | | | | | | |
872 |0 0 0 0 0 0 0 0|0 0 0 0|1 1 1 1|1 1 0 0 0 0 0 0|0 0 0 0|1 1 1 1|
873 |9 8 7 6 5 4 3 2|3 2 1 0|3 2 1 0|1 0 9 8 7 6 5 4|3 2 1 0|5 4 3 2|
874 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
875 | POS3 | PSIG0 |POS|PSIG2| PSIG1 | PSIG3 |PSIG2|
876 | | | 3 | | | | |
877 |1 1 0 0 0 0 0 0|0 0 0 0 0 0|1 1|0 0 0|0 0 0 0 0|0 0 0 0 0|0 0 0|
878 |1 0 9 8 7 6 5 4|5 4 3 2 1 0|3 2|2 1 0|4 3 2 1 0|4 3 2 1 0|5 4 3|
879 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
880
881 Figure 1: G.723 (6.3 kb/s) bit packing
882
883 For the 5.3 kb/s data rate, the header (HDR) bits are always "0 1",
884 as shown in Fig. 2, to indicate operation at 5.3 kb/s.
885
886
887
888
889
890
891
892
893
894
895
896
897
898 Schulzrinne & Casner Standards Track [Page 16]
899 \f
900 RFC 3551 RTP A/V Profile July 2003
901
902
903 0 1 2 3
904 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
905 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
906 | LPC |HDR| LPC | LPC | ACL0 |LPC|
907 | | | | | | |
908 |0 0 0 0 0 0|0 1|1 1 1 1 0 0 0 0|2 2 1 1 1 1 1 1|0 0 0 0 0 0|2 2|
909 |5 4 3 2 1 0| |3 2 1 0 9 8 7 6|1 0 9 8 7 6 5 4|5 4 3 2 1 0|3 2|
910 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
911 | ACL2 |ACL|A| GAIN0 |ACL|ACL| GAIN0 | GAIN1 |
912 | | 1 |C| | 3 | 2 | | |
913 |0 0 0 0 0|0 0|0|0 0 0 0|0 0|0 0|1 1 0 0 0 0 0 0|0 0 0 0 0 0 0 0|
914 |4 3 2 1 0|1 0|6|3 2 1 0|1 0|6 5|1 0 9 8 7 6 5 4|7 6 5 4 3 2 1 0|
915 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
916 | GAIN2 | GAIN1 | GAIN2 | GAIN3 | GRID | GAIN3 |
917 | | | | | | |
918 |0 0 0 0|1 1 0 0|1 1 0 0 0 0 0 0|0 0 0 0 0 0 0 0|0 0 0 0|1 1 0 0|
919 |3 2 1 0|1 0 9 8|1 0 9 8 7 6 5 4|7 6 5 4 3 2 1 0|4 3 2 1|1 0 9 8|
920 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
921 | POS0 | POS1 | POS0 | POS1 | POS2 |
922 | | | | | |
923 |0 0 0 0 0 0 0 0|0 0 0 0|1 1 0 0|1 1 0 0 0 0 0 0|0 0 0 0 0 0 0 0|
924 |7 6 5 4 3 2 1 0|3 2 1 0|1 0 9 8|1 0 9 8 7 6 5 4|7 6 5 4 3 2 1 0|
925 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
926 | POS3 | POS2 | POS3 | PSIG1 | PSIG0 | PSIG3 | PSIG2 |
927 | | | | | | | |
928 |0 0 0 0|1 1 0 0|1 1 0 0 0 0 0 0|0 0 0 0|0 0 0 0|0 0 0 0|0 0 0 0|
929 |3 2 1 0|1 0 9 8|1 0 9 8 7 6 5 4|3 2 1 0|3 2 1 0|3 2 1 0|3 2 1 0|
930 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
931
932 Figure 2: G.723 (5.3 kb/s) bit packing
933
934 The packing of G.723.1 SID (silence) frames, which are indicated by
935 the header (HDR) bits having the pattern "1 0", is depicted in Fig.
936 3.
937
938 0 1 2 3
939 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
940 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
941 | LPC |HDR| LPC | LPC | GAIN |LPC|
942 | | | | | | |
943 |0 0 0 0 0 0|1 0|1 1 1 1 0 0 0 0|2 2 1 1 1 1 1 1|0 0 0 0 0 0|2 2|
944 |5 4 3 2 1 0| |3 2 1 0 9 8 7 6|1 0 9 8 7 6 5 4|5 4 3 2 1 0|3 2|
945 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
946
947 Figure 3: G.723 SID mode bit packing
948
949
950
951
952
953
954 Schulzrinne & Casner Standards Track [Page 17]
955 \f
956 RFC 3551 RTP A/V Profile July 2003
957
958
959 4.5.4 G726-40, G726-32, G726-24, and G726-16
960
961 ITU-T Recommendation G.726 describes, among others, the algorithm
962 recommended for conversion of a single 64 kbit/s A-law or mu-law PCM
963 channel encoded at 8,000 samples/sec to and from a 40, 32, 24, or 16
964 kbit/s channel. The conversion is applied to the PCM stream using an
965 Adaptive Differential Pulse Code Modulation (ADPCM) transcoding
966 technique. The ADPCM representation consists of a series of
967 codewords with a one-to-one correspondence to the samples in the PCM
968 stream. The G726 data rates of 40, 32, 24, and 16 kbit/s have
969 codewords of 5, 4, 3, and 2 bits, respectively.
970
971 The 16 and 24 kbit/s encodings do not provide toll quality speech.
972 They are designed for used in overloaded Digital Circuit
973 Multiplication Equipment (DCME). ITU-T G.726 recommends that the 16
974 and 24 kbit/s encodings should be alternated with higher data rate
975 encodings to provide an average sample size of between 3.5 and 3.7
976 bits per sample.
977
978 The encodings of G.726 are here denoted as G726-40, G726-32, G726-24,
979 and G726-16. Prior to 1990, G721 described the 32 kbit/s ADPCM
980 encoding, and G723 described the 40, 32, and 16 kbit/s encodings.
981 Thus, G726-32 designates the same algorithm as G721 in RFC 1890.
982
983 A stream of G726 codewords contains no information on the encoding
984 being used, therefore transitions between G726 encoding types are not
985 permitted within a sequence of packed codewords. Applications MUST
986 determine the encoding type of packed codewords from the RTP payload
987 identifier.
988
989 No payload-specific header information SHALL be included as part of
990 the audio data. A stream of G726 codewords MUST be packed into
991 octets as follows: the first codeword is placed into the first octet
992 such that the least significant bit of the codeword aligns with the
993 least significant bit in the octet, the second codeword is then
994 packed so that its least significant bit coincides with the least
995 significant unoccupied bit in the octet. When a complete codeword
996 cannot be placed into an octet, the bits overlapping the octet
997 boundary are placed into the least significant bits of the next
998 octet. Packing MUST end with a completely packed final octet. The
999 number of codewords packed will therefore be a multiple of 8, 2, 8,
1000 and 4 for G726-40, G726-32, G726-24, and G726-16, respectively. An
1001 example of the packing scheme for G726-32 codewords is as shown,
1002 where bit 7 is the least significant bit of the first octet, and bit
1003 A3 is the least significant bit of the first codeword:
1004
1005
1006
1007
1008
1009
1010 Schulzrinne & Casner Standards Track [Page 18]
1011 \f
1012 RFC 3551 RTP A/V Profile July 2003
1013
1014
1015 0 1
1016 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
1017 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
1018 |B B B B|A A A A|D D D D|C C C C| ...
1019 |0 1 2 3|0 1 2 3|0 1 2 3|0 1 2 3|
1020 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
1021
1022 An example of the packing scheme for G726-24 codewords follows, where
1023 again bit 7 is the least significant bit of the first octet, and bit
1024 A2 is the least significant bit of the first codeword:
1025
1026 0 1 2
1027 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3
1028 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
1029 |C C|B B B|A A A|F|E E E|D D D|C|H H H|G G G|F F| ...
1030 |1 2|0 1 2|0 1 2|2|0 1 2|0 1 2|0|0 1 2|0 1 2|0 1|
1031 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
1032
1033 Note that the "little-endian" direction in which samples are packed
1034 into octets in the G726-16, -24, -32 and -40 payload formats
1035 specified here is consistent with ITU-T Recommendation X.420, but is
1036 the opposite of what is specified in ITU-T Recommendation I.366.2
1037 Annex E for ATM AAL2 transport. A second set of RTP payload formats
1038 matching the packetization of I.366.2 Annex E and identified by MIME
1039 subtypes AAL2-G726-16, -24, -32 and -40 will be specified in a
1040 separate document.
1041
1042 4.5.5 G728
1043
1044 G728 is specified in ITU-T Recommendation G.728, "Coding of speech at
1045 16 kbit/s using low-delay code excited linear prediction".
1046
1047 A G.278 encoder translates 5 consecutive audio samples into a 10-bit
1048 codebook index, resulting in a bit rate of 16 kb/s for audio sampled
1049 at 8,000 samples per second. The group of five consecutive samples
1050 is called a vector. Four consecutive vectors, labeled V1 to V4
1051 (where V1 is to be played first by the receiver), build one G.728
1052 frame. The four vectors of 40 bits are packed into 5 octets, labeled
1053 B1 through B5. B1 SHALL be placed first in the RTP packet.
1054
1055 Referring to the figure below, the principle for bit order is
1056 "maintenance of bit significance". Bits from an older vector are
1057 more significant than bits from newer vectors. The MSB of the frame
1058 goes to the MSB of B1 and the LSB of the frame goes to LSB of B5.
1059
1060
1061
1062
1063
1064
1065
1066 Schulzrinne & Casner Standards Track [Page 19]
1067 \f
1068 RFC 3551 RTP A/V Profile July 2003
1069
1070
1071 1 2 3 3
1072 0 0 0 0 9
1073 ++++++++++++++++++++++++++++++++++++++++
1074 <---V1---><---V2---><---V3---><---V4---> vectors
1075 <--B1--><--B2--><--B3--><--B4--><--B5--> octets
1076 <------------- frame 1 ---------------->
1077
1078 In particular, B1 contains the eight most significant bits of V1,
1079 with the MSB of V1 being the MSB of B1. B2 contains the two least
1080 significant bits of V1, the more significant of the two in its MSB,
1081 and the six most significant bits of V2. B1 SHALL be placed first in
1082 the RTP packet and B5 last.
1083
1084 4.5.6 G729
1085
1086 G729 is specified in ITU-T Recommendation G.729, "Coding of speech at
1087 8 kbit/s using conjugate structure-algebraic code excited linear
1088 prediction (CS-ACELP)". A reduced-complexity version of the G.729
1089 algorithm is specified in Annex A to Rec. G.729. The speech coding
1090 algorithms in the main body of G.729 and in G.729 Annex A are fully
1091 interoperable with each other, so there is no need to further
1092 distinguish between them. An implementation that signals or accepts
1093 use of G729 payload format may implement either G.729 or G.729A
1094 unless restricted by additional signaling specified elsewhere related
1095 specifically to the encoding rather than the payload format. The
1096 G.729 and G.729 Annex A codecs were optimized to represent speech
1097 with high quality, where G.729 Annex A trades some speech quality for
1098 an approximate 50% complexity reduction [10]. See the next Section
1099 (4.5.7) for other data rates added in later G.729 Annexes. For all
1100 data rates, the sampling frequency (and RTP timestamp clock rate) is
1101 8,000 Hz.
1102
1103 A voice activity detector (VAD) and comfort noise generator (CNG)
1104 algorithm in Annex B of G.729 is RECOMMENDED for digital simultaneous
1105 voice and data applications and can be used in conjunction with G.729
1106 or G.729 Annex A. A G.729 or G.729 Annex A frame contains 10 octets,
1107 while the G.729 Annex B comfort noise frame occupies 2 octets.
1108 Receivers MUST accept comfort noise frames if restriction of their
1109 use has not been signaled. The MIME registration for G729 in RFC
1110 3555 [7] specifies a parameter that MAY be used with MIME or SDP to
1111 restrict the use of comfort noise frames.
1112
1113 A G729 RTP packet may consist of zero or more G.729 or G.729 Annex A
1114 frames, followed by zero or one G.729 Annex B frames. The presence
1115 of a comfort noise frame can be deduced from the length of the RTP
1116 payload. The default packetization interval is 20 ms (two frames),
1117 but in some situations it may be desirable to send 10 ms packets. An
1118
1119
1120
1121
1122 Schulzrinne & Casner Standards Track [Page 20]
1123 \f
1124 RFC 3551 RTP A/V Profile July 2003
1125
1126
1127 example would be a transition from speech to comfort noise in the
1128 first 10 ms of the packet. For some applications, a longer
1129 packetization interval may be required to reduce the packet rate.
1130
1131 0 1 2 3
1132 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1133 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1134 |L| L1 | L2 | L3 | P1 |P| C1 |
1135 |0| | | | |0| |
1136 | |0 1 2 3 4 5 6|0 1 2 3 4|0 1 2 3 4|0 1 2 3 4 5 6 7| |0 1 2 3 4|
1137 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1138 | C1 | S1 | GA1 | GB1 | P2 | C2 |
1139 | 1 1 1| | | | | |
1140 |5 6 7 8 9 0 1 2|0 1 2 3|0 1 2|0 1 2 3|0 1 2 3 4|0 1 2 3 4 5 6 7|
1141 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1142 | C2 | S2 | GA2 | GB2 |
1143 | 1 1 1| | | |
1144 |8 9 0 1 2|0 1 2 3|0 1 2|0 1 2 3|
1145 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1146
1147 Figure 4: G.729 and G.729A bit packing
1148
1149 The transmitted parameters of a G.729/G.729A 10-ms frame, consisting
1150 of 80 bits, are defined in Recommendation G.729, Table 8/G.729. The
1151 mapping of the these parameters is given below in Fig. 4. The
1152 diagrams show the bit packing in "network byte order", also known as
1153 big-endian order. The bits of each 32-bit word are numbered 0 to 31,
1154 with the most significant bit on the left and numbered 0. The octets
1155 (bytes) of each word are transmitted most significant octet first.
1156 The bits of each data field are numbered in the order as produced by
1157 the G.729 C code reference implementation.
1158
1159 The packing of the G.729 Annex B comfort noise frame is shown in Fig.
1160 5.
1161
1162 0 1
1163 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
1164 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1165 |L| LSF1 | LSF2 | GAIN |R|
1166 |S| | | |E|
1167 |F| | | |S|
1168 |0|0 1 2 3 4|0 1 2 3|0 1 2 3 4|V| RESV = Reserved (zero)
1169 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1170
1171 Figure 5: G.729 Annex B bit packing
1172
1173
1174
1175
1176
1177
1178 Schulzrinne & Casner Standards Track [Page 21]
1179 \f
1180 RFC 3551 RTP A/V Profile July 2003
1181
1182
1183 4.5.7 G729D and G729E
1184
1185 Annexes D and E to ITU-T Recommendation G.729 provide additional data
1186 rates. Because the data rate is not signaled in the bitstream, the
1187 different data rates are given distinct RTP encoding names which are
1188 mapped to distinct payload type numbers. G729D indicates a 6.4
1189 kbit/s coding mode (G.729 Annex D, for momentary reduction in channel
1190 capacity), while G729E indicates an 11.8 kbit/s mode (G.729 Annex E,
1191 for improved performance with a wide range of narrow-band input
1192 signals, e.g., music and background noise). Annex E has two
1193 operating modes, backward adaptive and forward adaptive, which are
1194 signaled by the first two bits in each frame (the most significant
1195 two bits of the first octet).
1196
1197 The voice activity detector (VAD) and comfort noise generator (CNG)
1198 algorithm specified in Annex B of G.729 may be used with Annex D and
1199 Annex E frames in addition to G.729 and G.729 Annex A frames. The
1200 algorithm details for the operation of Annexes D and E with the Annex
1201 B CNG are specified in G.729 Annexes F and G. Note that Annexes F
1202 and G do not introduce any new encodings. Receivers MUST accept
1203 comfort noise frames if restriction of their use has not been
1204 signaled. The MIME registrations for G729D and G729E in RFC 3555 [7]
1205 specify a parameter that MAY be used with MIME or SDP to restrict the
1206 use of comfort noise frames.
1207
1208 For G729D, an RTP packet may consist of zero or more G.729 Annex D
1209 frames, followed by zero or one G.729 Annex B frame. Similarly, for
1210 G729E, an RTP packet may consist of zero or more G.729 Annex E
1211 frames, followed by zero or one G.729 Annex B frame. The presence of
1212 a comfort noise frame can be deduced from the length of the RTP
1213 payload.
1214
1215 A single RTP packet must contain frames of only one data rate,
1216 optionally followed by one comfort noise frame. The data rate may be
1217 changed from packet to packet by changing the payload type number.
1218 G.729 Annexes D, E and H describe what the encoding and decoding
1219 algorithms must do to accommodate a change in data rate.
1220
1221 For G729D, the bits of a G.729 Annex D frame are formatted as shown
1222 below in Fig. 6 (cf. Table D.1/G.729). The frame length is 64 bits.
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234 Schulzrinne & Casner Standards Track [Page 22]
1235 \f
1236 RFC 3551 RTP A/V Profile July 2003
1237
1238
1239 0 1 2 3
1240 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1241 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1242 |L| L1 | L2 | L3 | P1 | C1 |
1243 |0| | | | | |
1244 | |0 1 2 3 4 5 6|0 1 2 3 4|0 1 2 3 4|0 1 2 3 4 5 6 7|0 1 2 3 4 5|
1245 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1246 | C1 |S1 | GA1 | GB1 | P2 | C2 |S2 | GA2 | GB2 |
1247 | | | | | | | | | |
1248 |6 7 8|0 1|0 1 2|0 1 2|0 1 2 3|0 1 2 3 4 5 6 7 8|0 1|0 1 2|0 1 2|
1249 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1250
1251 Figure 6: G.729 Annex D bit packing
1252
1253 The net bit rate for the G.729 Annex E algorithm is 11.8 kbit/s and a
1254 total of 118 bits are used. Two bits are appended as "don't care"
1255 bits to complete an integer number of octets for the frame. For
1256 G729E, the bits of a data frame are formatted as shown in the next
1257 two diagrams (cf. Table E.1/G.729). The fields for the G729E forward
1258 adaptive mode are packed as shown in Fig. 7.
1259
1260 0 1 2 3
1261 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1262 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1263 |0 0|L| L1 | L2 | L3 | P1 |P| C0_1|
1264 | |0| | | | |0| |
1265 | | |0 1 2 3 4 5 6|0 1 2 3 4|0 1 2 3 4|0 1 2 3 4 5 6 7| |0 1 2|
1266 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1267 | | C1_1 | C2_1 | C3_1 | C4_1 |
1268 | | | | | |
1269 |3 4 5 6|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2 3 4 5 6|
1270 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1271 | GA1 | GB1 | P2 | C0_2 | C1_2 | C2_2 |
1272 | | | | | | |
1273 |0 1 2|0 1 2 3|0 1 2 3 4|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2 3 4 5|
1274 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1275 | | C3_2 | C4_2 | GA2 | GB2 |DC |
1276 | | | | | | |
1277 |6|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2|0 1 2 3|0 1|
1278 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1279
1280 Figure 7: G.729 Annex E (forward adaptive mode) bit packing
1281
1282 The fields for the G729E backward adaptive mode are packed as shown
1283 in Fig. 8.
1284
1285
1286
1287
1288
1289
1290 Schulzrinne & Casner Standards Track [Page 23]
1291 \f
1292 RFC 3551 RTP A/V Profile July 2003
1293
1294
1295 0 1 2 3
1296 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1297 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1298 |1 1| P1 |P| C0_1 | C1_1 |
1299 | | |0| 1 1 1| |
1300 | |0 1 2 3 4 5 6 7|0|0 1 2 3 4 5 6 7 8 9 0 1 2|0 1 2 3 4 5 6 7|
1301 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1302 | | C2_1 | C3_1 | C4_1 |GA1 | GB1 |P2 |
1303 | | | | | | | |
1304 |8 9|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2|0 1 2 3|0 1|
1305 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1306 | | C0_2 | C1_2 | C2_2 |
1307 | | 1 1 1| | |
1308 |2 3 4|0 1 2 3 4 5 6 7 8 9 0 1 2|0 1 2 3 4 5 6 7 8 9|0 1 2 3 4 5|
1309 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1310 | | C3_2 | C4_2 | GA2 | GB2 |DC |
1311 | | | | | | |
1312 |6|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2|0 1 2 3|0 1|
1313 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1314
1315 Figure 8: G.729 Annex E (backward adaptive mode) bit packing
1316
1317 4.5.8 GSM
1318
1319 GSM (Group Speciale Mobile) denotes the European GSM 06.10 standard
1320 for full-rate speech transcoding, ETS 300 961, which is based on
1321 RPE/LTP (residual pulse excitation/long term prediction) coding at a
1322 rate of 13 kb/s [11,12,13]. The text of the standard can be obtained
1323 from:
1324
1325 ETSI (European Telecommunications Standards Institute)
1326 ETSI Secretariat: B.P.152
1327 F-06561 Valbonne Cedex
1328 France
1329 Phone: +33 92 94 42 00
1330 Fax: +33 93 65 47 16
1331
1332 Blocks of 160 audio samples are compressed into 33 octets, for an
1333 effective data rate of 13,200 b/s.
1334
1335 4.5.8.1 General Packaging Issues
1336
1337 The GSM standard (ETS 300 961) specifies the bit stream produced by
1338 the codec, but does not specify how these bits should be packed for
1339 transmission. The packetization specified here has subsequently been
1340 adopted in ETSI Technical Specification TS 101 318. Some software
1341 implementations of the GSM codec use a different packing than that
1342 specified here.
1343
1344
1345
1346 Schulzrinne & Casner Standards Track [Page 24]
1347 \f
1348 RFC 3551 RTP A/V Profile July 2003
1349
1350
1351 field field name bits field field name bits
1352 ________________________________________________
1353 1 LARc[0] 6 39 xmc[22] 3
1354 2 LARc[1] 6 40 xmc[23] 3
1355 3 LARc[2] 5 41 xmc[24] 3
1356 4 LARc[3] 5 42 xmc[25] 3
1357 5 LARc[4] 4 43 Nc[2] 7
1358 6 LARc[5] 4 44 bc[2] 2
1359 7 LARc[6] 3 45 Mc[2] 2
1360 8 LARc[7] 3 46 xmaxc[2] 6
1361 9 Nc[0] 7 47 xmc[26] 3
1362 10 bc[0] 2 48 xmc[27] 3
1363 11 Mc[0] 2 49 xmc[28] 3
1364 12 xmaxc[0] 6 50 xmc[29] 3
1365 13 xmc[0] 3 51 xmc[30] 3
1366 14 xmc[1] 3 52 xmc[31] 3
1367 15 xmc[2] 3 53 xmc[32] 3
1368 16 xmc[3] 3 54 xmc[33] 3
1369 17 xmc[4] 3 55 xmc[34] 3
1370 18 xmc[5] 3 56 xmc[35] 3
1371 19 xmc[6] 3 57 xmc[36] 3
1372 20 xmc[7] 3 58 xmc[37] 3
1373 21 xmc[8] 3 59 xmc[38] 3
1374 22 xmc[9] 3 60 Nc[3] 7
1375 23 xmc[10] 3 61 bc[3] 2
1376 24 xmc[11] 3 62 Mc[3] 2
1377 25 xmc[12] 3 63 xmaxc[3] 6
1378 26 Nc[1] 7 64 xmc[39] 3
1379 27 bc[1] 2 65 xmc[40] 3
1380 28 Mc[1] 2 66 xmc[41] 3
1381 29 xmaxc[1] 6 67 xmc[42] 3
1382 30 xmc[13] 3 68 xmc[43] 3
1383 31 xmc[14] 3 69 xmc[44] 3
1384 32 xmc[15] 3 70 xmc[45] 3
1385 33 xmc[16] 3 71 xmc[46] 3
1386 34 xmc[17] 3 72 xmc[47] 3
1387 35 xmc[18] 3 73 xmc[48] 3
1388 36 xmc[19] 3 74 xmc[49] 3
1389 37 xmc[20] 3 75 xmc[50] 3
1390 38 xmc[21] 3 76 xmc[51] 3
1391
1392 Table 2: Ordering of GSM variables
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402 Schulzrinne & Casner Standards Track [Page 25]
1403 \f
1404 RFC 3551 RTP A/V Profile July 2003
1405
1406
1407 Octet Bit 0 Bit 1 Bit 2 Bit 3 Bit 4 Bit 5 Bit 6 Bit 7
1408 _____________________________________________________________________
1409 0 1 1 0 1 LARc0.0 LARc0.1 LARc0.2 LARc0.3
1410 1 LARc0.4 LARc0.5 LARc1.0 LARc1.1 LARc1.2 LARc1.3 LARc1.4 LARc1.5
1411 2 LARc2.0 LARc2.1 LARc2.2 LARc2.3 LARc2.4 LARc3.0 LARc3.1 LARc3.2
1412 3 LARc3.3 LARc3.4 LARc4.0 LARc4.1 LARc4.2 LARc4.3 LARc5.0 LARc5.1
1413 4 LARc5.2 LARc5.3 LARc6.0 LARc6.1 LARc6.2 LARc7.0 LARc7.1 LARc7.2
1414 5 Nc0.0 Nc0.1 Nc0.2 Nc0.3 Nc0.4 Nc0.5 Nc0.6 bc0.0
1415 6 bc0.1 Mc0.0 Mc0.1 xmaxc00 xmaxc01 xmaxc02 xmaxc03 xmaxc04
1416 7 xmaxc05 xmc0.0 xmc0.1 xmc0.2 xmc1.0 xmc1.1 xmc1.2 xmc2.0
1417 8 xmc2.1 xmc2.2 xmc3.0 xmc3.1 xmc3.2 xmc4.0 xmc4.1 xmc4.2
1418 9 xmc5.0 xmc5.1 xmc5.2 xmc6.0 xmc6.1 xmc6.2 xmc7.0 xmc7.1
1419 10 xmc7.2 xmc8.0 xmc8.1 xmc8.2 xmc9.0 xmc9.1 xmc9.2 xmc10.0
1420 11 xmc10.1 xmc10.2 xmc11.0 xmc11.1 xmc11.2 xmc12.0 xmc12.1 xcm12.2
1421 12 Nc1.0 Nc1.1 Nc1.2 Nc1.3 Nc1.4 Nc1.5 Nc1.6 bc1.0
1422 13 bc1.1 Mc1.0 Mc1.1 xmaxc10 xmaxc11 xmaxc12 xmaxc13 xmaxc14
1423 14 xmax15 xmc13.0 xmc13.1 xmc13.2 xmc14.0 xmc14.1 xmc14.2 xmc15.0
1424 15 xmc15.1 xmc15.2 xmc16.0 xmc16.1 xmc16.2 xmc17.0 xmc17.1 xmc17.2
1425 16 xmc18.0 xmc18.1 xmc18.2 xmc19.0 xmc19.1 xmc19.2 xmc20.0 xmc20.1
1426 17 xmc20.2 xmc21.0 xmc21.1 xmc21.2 xmc22.0 xmc22.1 xmc22.2 xmc23.0
1427 18 xmc23.1 xmc23.2 xmc24.0 xmc24.1 xmc24.2 xmc25.0 xmc25.1 xmc25.2
1428 19 Nc2.0 Nc2.1 Nc2.2 Nc2.3 Nc2.4 Nc2.5 Nc2.6 bc2.0
1429 20 bc2.1 Mc2.0 Mc2.1 xmaxc20 xmaxc21 xmaxc22 xmaxc23 xmaxc24
1430 21 xmaxc25 xmc26.0 xmc26.1 xmc26.2 xmc27.0 xmc27.1 xmc27.2 xmc28.0
1431 22 xmc28.1 xmc28.2 xmc29.0 xmc29.1 xmc29.2 xmc30.0 xmc30.1 xmc30.2
1432 23 xmc31.0 xmc31.1 xmc31.2 xmc32.0 xmc32.1 xmc32.2 xmc33.0 xmc33.1
1433 24 xmc33.2 xmc34.0 xmc34.1 xmc34.2 xmc35.0 xmc35.1 xmc35.2 xmc36.0
1434 25 Xmc36.1 xmc36.2 xmc37.0 xmc37.1 xmc37.2 xmc38.0 xmc38.1 xmc38.2
1435 26 Nc3.0 Nc3.1 Nc3.2 Nc3.3 Nc3.4 Nc3.5 Nc3.6 bc3.0
1436 27 bc3.1 Mc3.0 Mc3.1 xmaxc30 xmaxc31 xmaxc32 xmaxc33 xmaxc34
1437 28 xmaxc35 xmc39.0 xmc39.1 xmc39.2 xmc40.0 xmc40.1 xmc40.2 xmc41.0
1438 29 xmc41.1 xmc41.2 xmc42.0 xmc42.1 xmc42.2 xmc43.0 xmc43.1 xmc43.2
1439 30 xmc44.0 xmc44.1 xmc44.2 xmc45.0 xmc45.1 xmc45.2 xmc46.0 xmc46.1
1440 31 xmc46.2 xmc47.0 xmc47.1 xmc47.2 xmc48.0 xmc48.1 xmc48.2 xmc49.0
1441 32 xmc49.1 xmc49.2 xmc50.0 xmc50.1 xmc50.2 xmc51.0 xmc51.1 xmc51.2
1442
1443 Table 3: GSM payload format
1444
1445 In the GSM packing used by RTP, the bits SHALL be packed beginning
1446 from the most significant bit. Every 160 sample GSM frame is coded
1447 into one 33 octet (264 bit) buffer. Every such buffer begins with a
1448 4 bit signature (0xD), followed by the MSB encoding of the fields of
1449 the frame. The first octet thus contains 1101 in the 4 most
1450 significant bits (0-3) and the 4 most significant bits of F1 (0-3) in
1451 the 4 least significant bits (4-7). The second octet contains the 2
1452 least significant bits of F1 in bits 0-1, and F2 in bits 2-7, and so
1453 on. The order of the fields in the frame is described in Table 2.
1454
1455
1456
1457
1458 Schulzrinne & Casner Standards Track [Page 26]
1459 \f
1460 RFC 3551 RTP A/V Profile July 2003
1461
1462
1463 4.5.8.2 GSM Variable Names and Numbers
1464
1465 In the RTP encoding we have the bit pattern described in Table 3,
1466 where F.i signifies the ith bit of the field F, bit 0 is the most
1467 significant bit, and the bits of every octet are numbered from 0 to 7
1468 from most to least significant.
1469
1470 4.5.9 GSM-EFR
1471
1472 GSM-EFR denotes GSM 06.60 enhanced full rate speech transcoding,
1473 specified in ETS 300 726 which is available from ETSI at the address
1474 given in Section 4.5.8. This codec has a frame length of 244 bits.
1475 For transmission in RTP, each codec frame is packed into a 31 octet
1476 (248 bit) buffer beginning with a 4-bit signature 0xC in a manner
1477 similar to that specified here for the original GSM 06.10 codec. The
1478 packing is specified in ETSI Technical Specification TS 101 318.
1479
1480 4.5.10 L8
1481
1482 L8 denotes linear audio data samples, using 8-bits of precision with
1483 an offset of 128, that is, the most negative signal is encoded as
1484 zero.
1485
1486 4.5.11 L16
1487
1488 L16 denotes uncompressed audio data samples, using 16-bit signed
1489 representation with 65,535 equally divided steps between minimum and
1490 maximum signal level, ranging from -32,768 to 32,767. The value is
1491 represented in two's complement notation and transmitted in network
1492 byte order (most significant byte first).
1493
1494 The MIME registration for L16 in RFC 3555 [7] specifies parameters
1495 that MAY be used with MIME or SDP to indicate that analog pre-
1496 emphasis was applied to the signal before quantization or to indicate
1497 that a multiple-channel audio stream follows a different channel
1498 ordering convention than is specified in Section 4.1.
1499
1500 4.5.12 LPC
1501
1502 LPC designates an experimental linear predictive encoding contributed
1503 by Ron Frederick, which is based on an implementation written by Ron
1504 Zuckerman posted to the Usenet group comp.dsp on June 26, 1992. The
1505 codec generates 14 octets for every frame. The framesize is set to
1506 20 ms, resulting in a bit rate of 5,600 b/s.
1507
1508
1509
1510
1511
1512
1513
1514 Schulzrinne & Casner Standards Track [Page 27]
1515 \f
1516 RFC 3551 RTP A/V Profile July 2003
1517
1518
1519 4.5.13 MPA
1520
1521 MPA denotes MPEG-1 or MPEG-2 audio encapsulated as elementary
1522 streams. The encoding is defined in ISO standards ISO/IEC 11172-3
1523 and 13818-3. The encapsulation is specified in RFC 2250 [14].
1524
1525 The encoding may be at any of three levels of complexity, called
1526 Layer I, II and III. The selected layer as well as the sampling rate
1527 and channel count are indicated in the payload. The RTP timestamp
1528 clock rate is always 90,000, independent of the sampling rate.
1529 MPEG-1 audio supports sampling rates of 32, 44.1, and 48 kHz (ISO/IEC
1530 11172-3, section 1.1; "Scope"). MPEG-2 supports sampling rates of
1531 16, 22.05 and 24 kHz. The number of samples per frame is fixed, but
1532 the frame size will vary with the sampling rate and bit rate.
1533
1534 The MIME registration for MPA in RFC 3555 [7] specifies parameters
1535 that MAY be used with MIME or SDP to restrict the selection of layer,
1536 channel count, sampling rate, and bit rate.
1537
1538 4.5.14 PCMA and PCMU
1539
1540 PCMA and PCMU are specified in ITU-T Recommendation G.711. Audio
1541 data is encoded as eight bits per sample, after logarithmic scaling.
1542 PCMU denotes mu-law scaling, PCMA A-law scaling. A detailed
1543 description is given by Jayant and Noll [15]. Each G.711 octet SHALL
1544 be octet-aligned in an RTP packet. The sign bit of each G.711 octet
1545 SHALL correspond to the most significant bit of the octet in the RTP
1546 packet (i.e., assuming the G.711 samples are handled as octets on the
1547 host machine, the sign bit SHALL be the most significant bit of the
1548 octet as defined by the host machine format). The 56 kb/s and 48
1549 kb/s modes of G.711 are not applicable to RTP, since PCMA and PCMU
1550 MUST always be transmitted as 8-bit samples.
1551
1552 See Section 4.1 regarding silence suppression.
1553
1554 4.5.15 QCELP
1555
1556 The Electronic Industries Association (EIA) & Telecommunications
1557 Industry Association (TIA) standard IS-733, "TR45: High Rate Speech
1558 Service Option for Wideband Spread Spectrum Communications Systems",
1559 defines the QCELP audio compression algorithm for use in wireless
1560 CDMA applications. The QCELP CODEC compresses each 20 milliseconds
1561 of 8,000 Hz, 16-bit sampled input speech into one of four different
1562 size output frames: Rate 1 (266 bits), Rate 1/2 (124 bits), Rate 1/4
1563 (54 bits) or Rate 1/8 (20 bits). For typical speech patterns, this
1564 results in an average output of 6.8 kb/s for normal mode and 4.7 kb/s
1565 for reduced rate mode. The packetization of the QCELP audio codec is
1566 described in [16].
1567
1568
1569
1570 Schulzrinne & Casner Standards Track [Page 28]
1571 \f
1572 RFC 3551 RTP A/V Profile July 2003
1573
1574
1575 4.5.16 RED
1576
1577 The redundant audio payload format "RED" is specified by RFC 2198
1578 [17]. It defines a means by which multiple redundant copies of an
1579 audio packet may be transmitted in a single RTP stream. Each packet
1580 in such a stream contains, in addition to the audio data for that
1581 packetization interval, a (more heavily compressed) copy of the data
1582 from a previous packetization interval. This allows an approximation
1583 of the data from lost packets to be recovered upon decoding of a
1584 subsequent packet, giving much improved sound quality when compared
1585 with silence substitution for lost packets.
1586
1587 4.5.17 VDVI
1588
1589 VDVI is a variable-rate version of DVI4, yielding speech bit rates of
1590 between 10 and 25 kb/s. It is specified for single-channel operation
1591 only. Samples are packed into octets starting at the most-
1592 significant bit. The last octet is padded with 1 bits if the last
1593 sample does not fill the last octet. This padding is distinct from
1594 the valid codewords. The receiver needs to detect the padding
1595 because there is no explicit count of samples in the packet.
1596
1597 It uses the following encoding:
1598
1599 DVI4 codeword VDVI bit pattern
1600 _______________________________
1601 0 00
1602 1 010
1603 2 1100
1604 3 11100
1605 4 111100
1606 5 1111100
1607 6 11111100
1608 7 11111110
1609 8 10
1610 9 011
1611 10 1101
1612 11 11101
1613 12 111101
1614 13 1111101
1615 14 11111101
1616 15 11111111
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626 Schulzrinne & Casner Standards Track [Page 29]
1627 \f
1628 RFC 3551 RTP A/V Profile July 2003
1629
1630
1631 5. Video
1632
1633 The following sections describe the video encodings that are defined
1634 in this memo and give their abbreviated names used for
1635 identification. These video encodings and their payload types are
1636 listed in Table 5.
1637
1638 All of these video encodings use an RTP timestamp frequency of 90,000
1639 Hz, the same as the MPEG presentation time stamp frequency. This
1640 frequency yields exact integer timestamp increments for the typical
1641 24 (HDTV), 25 (PAL), and 29.97 (NTSC) and 30 Hz (HDTV) frame rates
1642 and 50, 59.94 and 60 Hz field rates. While 90 kHz is the RECOMMENDED
1643 rate for future video encodings used within this profile, other rates
1644 MAY be used. However, it is not sufficient to use the video frame
1645 rate (typically between 15 and 30 Hz) because that does not provide
1646 adequate resolution for typical synchronization requirements when
1647 calculating the RTP timestamp corresponding to the NTP timestamp in
1648 an RTCP SR packet. The timestamp resolution MUST also be sufficient
1649 for the jitter estimate contained in the receiver reports.
1650
1651 For most of these video encodings, the RTP timestamp encodes the
1652 sampling instant of the video image contained in the RTP data packet.
1653 If a video image occupies more than one packet, the timestamp is the
1654 same on all of those packets. Packets from different video images
1655 are distinguished by their different timestamps.
1656
1657 Most of these video encodings also specify that the marker bit of the
1658 RTP header SHOULD be set to one in the last packet of a video frame
1659 and otherwise set to zero. Thus, it is not necessary to wait for a
1660 following packet with a different timestamp to detect that a new
1661 frame should be displayed.
1662
1663 5.1 CelB
1664
1665 The CELL-B encoding is a proprietary encoding proposed by Sun
1666 Microsystems. The byte stream format is described in RFC 2029 [18].
1667
1668 5.2 JPEG
1669
1670 The encoding is specified in ISO Standards 10918-1 and 10918-2. The
1671 RTP payload format is as specified in RFC 2435 [19].
1672
1673 5.3 H261
1674
1675 The encoding is specified in ITU-T Recommendation H.261, "Video codec
1676 for audiovisual services at p x 64 kbit/s". The packetization and
1677 RTP-specific properties are described in RFC 2032 [20].
1678
1679
1680
1681
1682 Schulzrinne & Casner Standards Track [Page 30]
1683 \f
1684 RFC 3551 RTP A/V Profile July 2003
1685
1686
1687 5.4 H263
1688
1689 The encoding is specified in the 1996 version of ITU-T Recommendation
1690 H.263, "Video coding for low bit rate communication". The
1691 packetization and RTP-specific properties are described in RFC 2190
1692 [21]. The H263-1998 payload format is RECOMMENDED over this one for
1693 use by new implementations.
1694
1695 5.5 H263-1998
1696
1697 The encoding is specified in the 1998 version of ITU-T Recommendation
1698 H.263, "Video coding for low bit rate communication". The
1699 packetization and RTP-specific properties are described in RFC 2429
1700 [22]. Because the 1998 version of H.263 is a superset of the 1996
1701 syntax, this payload format can also be used with the 1996 version of
1702 H.263, and is RECOMMENDED for this use by new implementations. This
1703 payload format does not replace RFC 2190, which continues to be used
1704 by existing implementations, and may be required for backward
1705 compatibility in new implementations. Implementations using the new
1706 features of the 1998 version of H.263 MUST use the payload format
1707 described in RFC 2429.
1708
1709 5.6 MPV
1710
1711 MPV designates the use of MPEG-1 and MPEG-2 video encoding elementary
1712 streams as specified in ISO Standards ISO/IEC 11172 and 13818-2,
1713 respectively. The RTP payload format is as specified in RFC 2250
1714 [14], Section 3.
1715
1716 The MIME registration for MPV in RFC 3555 [7] specifies a parameter
1717 that MAY be used with MIME or SDP to restrict the selection of the
1718 type of MPEG video.
1719
1720 5.7 MP2T
1721
1722 MP2T designates the use of MPEG-2 transport streams, for either audio
1723 or video. The RTP payload format is described in RFC 2250 [14],
1724 Section 2.
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738 Schulzrinne & Casner Standards Track [Page 31]
1739 \f
1740 RFC 3551 RTP A/V Profile July 2003
1741
1742
1743 5.8 nv
1744
1745 The encoding is implemented in the program `nv', version 4, developed
1746 at Xerox PARC by Ron Frederick. Further information is available
1747 from the author:
1748
1749 Ron Frederick
1750 Blue Coat Systems Inc.
1751 650 Almanor Avenue
1752 Sunnyvale, CA 94085
1753 United States
1754 EMail: ronf@bluecoat.com
1755
1756 6. Payload Type Definitions
1757
1758 Tables 4 and 5 define this profile's static payload type values for
1759 the PT field of the RTP data header. In addition, payload type
1760 values in the range 96-127 MAY be defined dynamically through a
1761 conference control protocol, which is beyond the scope of this
1762 document. For example, a session directory could specify that for a
1763 given session, payload type 96 indicates PCMU encoding, 8,000 Hz
1764 sampling rate, 2 channels. Entries in Tables 4 and 5 with payload
1765 type "dyn" have no static payload type assigned and are only used
1766 with a dynamic payload type. Payload type 2 was assigned to G721 in
1767 RFC 1890 and to its equivalent successor G726-32 in draft versions of
1768 this specification, but its use is now deprecated and that static
1769 payload type is marked reserved due to conflicting use for the
1770 payload formats G726-32 and AAL2-G726-32 (see Section 4.5.4).
1771 Payload type 13 indicates the Comfort Noise (CN) payload format
1772 specified in RFC 3389 [9]. Payload type 19 is marked "reserved"
1773 because some draft versions of this specification assigned that
1774 number to an earlier version of the comfort noise payload format.
1775 The payload type range 72-76 is marked "reserved" so that RTCP and
1776 RTP packets can be reliably distinguished (see Section "Summary of
1777 Protocol Constants" of the RTP protocol specification).
1778
1779 The payload types currently defined in this profile are assigned to
1780 exactly one of three categories or media types: audio only, video
1781 only and those combining audio and video. The media types are marked
1782 in Tables 4 and 5 as "A", "V" and "AV", respectively. Payload types
1783 of different media types SHALL NOT be interleaved or multiplexed
1784 within a single RTP session, but multiple RTP sessions MAY be used in
1785 parallel to send multiple media types. An RTP source MAY change
1786 payload types within the same media type during a session. See the
1787 section "Multiplexing RTP Sessions" of RFC 3550 for additional
1788 explanation.
1789
1790
1791
1792
1793
1794 Schulzrinne & Casner Standards Track [Page 32]
1795 \f
1796 RFC 3551 RTP A/V Profile July 2003
1797
1798
1799 PT encoding media type clock rate channels
1800 name (Hz)
1801 ___________________________________________________
1802 0 PCMU A 8,000 1
1803 1 reserved A
1804 2 reserved A
1805 3 GSM A 8,000 1
1806 4 G723 A 8,000 1
1807 5 DVI4 A 8,000 1
1808 6 DVI4 A 16,000 1
1809 7 LPC A 8,000 1
1810 8 PCMA A 8,000 1
1811 9 G722 A 8,000 1
1812 10 L16 A 44,100 2
1813 11 L16 A 44,100 1
1814 12 QCELP A 8,000 1
1815 13 CN A 8,000 1
1816 14 MPA A 90,000 (see text)
1817 15 G728 A 8,000 1
1818 16 DVI4 A 11,025 1
1819 17 DVI4 A 22,050 1
1820 18 G729 A 8,000 1
1821 19 reserved A
1822 20 unassigned A
1823 21 unassigned A
1824 22 unassigned A
1825 23 unassigned A
1826 dyn G726-40 A 8,000 1
1827 dyn G726-32 A 8,000 1
1828 dyn G726-24 A 8,000 1
1829 dyn G726-16 A 8,000 1
1830 dyn G729D A 8,000 1
1831 dyn G729E A 8,000 1
1832 dyn GSM-EFR A 8,000 1
1833 dyn L8 A var. var.
1834 dyn RED A (see text)
1835 dyn VDVI A var. 1
1836
1837 Table 4: Payload types (PT) for audio encodings
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850 Schulzrinne & Casner Standards Track [Page 33]
1851 \f
1852 RFC 3551 RTP A/V Profile July 2003
1853
1854
1855 PT encoding media type clock rate
1856 name (Hz)
1857 _____________________________________________
1858 24 unassigned V
1859 25 CelB V 90,000
1860 26 JPEG V 90,000
1861 27 unassigned V
1862 28 nv V 90,000
1863 29 unassigned V
1864 30 unassigned V
1865 31 H261 V 90,000
1866 32 MPV V 90,000
1867 33 MP2T AV 90,000
1868 34 H263 V 90,000
1869 35-71 unassigned ?
1870 72-76 reserved N/A N/A
1871 77-95 unassigned ?
1872 96-127 dynamic ?
1873 dyn H263-1998 V 90,000
1874
1875 Table 5: Payload types (PT) for video and combined
1876 encodings
1877
1878 Session participants agree through mechanisms beyond the scope of
1879 this specification on the set of payload types allowed in a given
1880 session. This set MAY, for example, be defined by the capabilities
1881 of the applications used, negotiated by a conference control protocol
1882 or established by agreement between the human participants.
1883
1884 Audio applications operating under this profile SHOULD, at a minimum,
1885 be able to send and/or receive payload types 0 (PCMU) and 5 (DVI4).
1886 This allows interoperability without format negotiation and ensures
1887 successful negotiation with a conference control protocol.
1888
1889 7. RTP over TCP and Similar Byte Stream Protocols
1890
1891 Under special circumstances, it may be necessary to carry RTP in
1892 protocols offering a byte stream abstraction, such as TCP, possibly
1893 multiplexed with other data. The application MUST define its own
1894 method of delineating RTP and RTCP packets (RTSP [23] provides an
1895 example of such an encapsulation specification).
1896
1897 8. Port Assignment
1898
1899 As specified in the RTP protocol definition, RTP data SHOULD be
1900 carried on an even UDP port number and the corresponding RTCP packets
1901 SHOULD be carried on the next higher (odd) port number.
1902
1903
1904
1905
1906 Schulzrinne & Casner Standards Track [Page 34]
1907 \f
1908 RFC 3551 RTP A/V Profile July 2003
1909
1910
1911 Applications operating under this profile MAY use any such UDP port
1912 pair. For example, the port pair MAY be allocated randomly by a
1913 session management program. A single fixed port number pair cannot
1914 be required because multiple applications using this profile are
1915 likely to run on the same host, and there are some operating systems
1916 that do not allow multiple processes to use the same UDP port with
1917 different multicast addresses.
1918
1919 However, port numbers 5004 and 5005 have been registered for use with
1920 this profile for those applications that choose to use them as the
1921 default pair. Applications that operate under multiple profiles MAY
1922 use this port pair as an indication to select this profile if they
1923 are not subject to the constraint of the previous paragraph.
1924 Applications need not have a default and MAY require that the port
1925 pair be explicitly specified. The particular port numbers were
1926 chosen to lie in the range above 5000 to accommodate port number
1927 allocation practice within some versions of the Unix operating
1928 system, where port numbers below 1024 can only be used by privileged
1929 processes and port numbers between 1024 and 5000 are automatically
1930 assigned by the operating system.
1931
1932 9. Changes from RFC 1890
1933
1934 This RFC revises RFC 1890. It is mostly backwards-compatible with
1935 RFC 1890 except for functions removed because two interoperable
1936 implementations were not found. The additions to RFC 1890 codify
1937 existing practice in the use of payload formats under this profile.
1938 Since this profile may be used without using any of the payload
1939 formats listed here, the addition of new payload formats in this
1940 revision does not affect backwards compatibility. The changes are
1941 listed below, categorized into functional and non-functional changes.
1942
1943 Functional changes:
1944
1945 o Section 11, "IANA Considerations" was added to specify the
1946 registration of the name for this profile. That appendix also
1947 references a new Section 3 "Registering Additional Encodings"
1948 which establishes a policy that no additional registration of
1949 static payload types for this profile will be made beyond those
1950 added in this revision and included in Tables 4 and 5. Instead,
1951 additional encoding names may be registered as MIME subtypes for
1952 binding to dynamic payload types. Non-normative references were
1953 added to RFC 3555 [7] where MIME subtypes for all the listed
1954 payload formats are registered, some with optional parameters for
1955 use of the payload formats.
1956
1957
1958
1959
1960
1961
1962 Schulzrinne & Casner Standards Track [Page 35]
1963 \f
1964 RFC 3551 RTP A/V Profile July 2003
1965
1966
1967 o Static payload types 4, 16, 17 and 34 were added to incorporate
1968 IANA registrations made since the publication of RFC 1890, along
1969 with the corresponding payload format descriptions for G723 and
1970 H263.
1971
1972 o Following working group discussion, static payload types 12 and 18
1973 were added along with the corresponding payload format
1974 descriptions for QCELP and G729. Static payload type 13 was
1975 assigned to the Comfort Noise (CN) payload format defined in RFC
1976 3389. Payload type 19 was marked reserved because it had been
1977 temporarily allocated to an earlier version of Comfort Noise
1978 present in some draft revisions of this document.
1979
1980 o The payload format for G721 was renamed to G726-32 following the
1981 ITU-T renumbering, and the payload format description for G726 was
1982 expanded to include the -16, -24 and -40 data rates. Because of
1983 confusion regarding draft revisions of this document, some
1984 implementations of these G726 payload formats packed samples into
1985 octets starting with the most significant bit rather than the
1986 least significant bit as specified here. To partially resolve
1987 this incompatibility, new payload formats named AAL2-G726-16, -24,
1988 -32 and -40 will be specified in a separate document (see note in
1989 Section 4.5.4), and use of static payload type 2 is deprecated as
1990 explained in Section 6.
1991
1992 o Payload formats G729D and G729E were added following the ITU-T
1993 addition of Annexes D and E to Recommendation G.729. Listings
1994 were added for payload formats GSM-EFR, RED, and H263-1998
1995 published in other documents subsequent to RFC 1890. These
1996 additional payload formats are referenced only by dynamic payload
1997 type numbers.
1998
1999 o The descriptions of the payload formats for G722, G728, GSM, VDVI
2000 were expanded.
2001
2002 o The payload format for 1016 audio was removed and its static
2003 payload type assignment 1 was marked "reserved" because two
2004 interoperable implementations were not found.
2005
2006 o Requirements for congestion control were added in Section 2.
2007
2008 o This profile follows the suggestion in the revised RTP spec that
2009 RTCP bandwidth may be specified separately from the session
2010 bandwidth and separately for active senders and passive receivers.
2011
2012 o The mapping of a user pass-phrase string into an encryption key
2013 was deleted from Section 2 because two interoperable
2014 implementations were not found.
2015
2016
2017
2018 Schulzrinne & Casner Standards Track [Page 36]
2019 \f
2020 RFC 3551 RTP A/V Profile July 2003
2021
2022
2023 o The "quadrophonic" sample ordering convention for four-channel
2024 audio was removed to eliminate an ambiguity as noted in Section
2025 4.1.
2026
2027 Non-functional changes:
2028
2029 o In Section 4.1, it is now explicitly stated that silence
2030 suppression is allowed for all audio payload formats. (This has
2031 always been the case and derives from a fundamental aspect of
2032 RTP's design and the motivations for packet audio, but was not
2033 explicit stated before.) The use of comfort noise is also
2034 explained.
2035
2036 o In Section 4.1, the requirement level for setting of the marker
2037 bit on the first packet after silence for audio was changed from
2038 "is" to "SHOULD be", and clarified that the marker bit is set only
2039 when packets are intentionally not sent.
2040
2041 o Similarly, text was added to specify that the marker bit SHOULD be
2042 set to one on the last packet of a video frame, and that video
2043 frames are distinguished by their timestamps.
2044
2045 o RFC references are added for payload formats published after RFC
2046 1890.
2047
2048 o The security considerations and full copyright sections were
2049 added.
2050
2051 o According to Peter Hoddie of Apple, only pre-1994 Macintosh used
2052 the 22254.54 rate and none the 11127.27 rate, so the latter was
2053 dropped from the discussion of suggested sampling frequencies.
2054
2055 o Table 1 was corrected to move some values from the "ms/packet"
2056 column to the "default ms/packet" column where they belonged.
2057
2058 o Since the Interactive Multimedia Association ceased operations, an
2059 alternate resource was provided for a referenced IMA document.
2060
2061 o A note has been added for G722 to clarify a discrepancy between
2062 the actual sampling rate and the RTP timestamp clock rate.
2063
2064 o Small clarifications of the text have been made in several places,
2065 some in response to questions from readers. In particular:
2066
2067 - A definition for "media type" is given in Section 1.1 to allow
2068 the explanation of multiplexing RTP sessions in Section 6 to be
2069 more clear regarding the multiplexing of multiple media.
2070
2071
2072
2073
2074 Schulzrinne & Casner Standards Track [Page 37]
2075 \f
2076 RFC 3551 RTP A/V Profile July 2003
2077
2078
2079 - The explanation of how to determine the number of audio frames
2080 in a packet from the length was expanded.
2081
2082 - More description of the allocation of bandwidth to SDES items
2083 is given.
2084
2085 - A note was added that the convention for the order of channels
2086 specified in Section 4.1 may be overridden by a particular
2087 encoding or payload format specification.
2088
2089 - The terms MUST, SHOULD, MAY, etc. are used as defined in RFC
2090 2119.
2091
2092 o A second author for this document was added.
2093
2094 10. Security Considerations
2095
2096 Implementations using the profile defined in this specification are
2097 subject to the security considerations discussed in the RTP
2098 specification [1]. This profile does not specify any different
2099 security services. The primary function of this profile is to list a
2100 set of data compression encodings for audio and video media.
2101
2102 Confidentiality of the media streams is achieved by encryption.
2103 Because the data compression used with the payload formats described
2104 in this profile is applied end-to-end, encryption may be performed
2105 after compression so there is no conflict between the two operations.
2106
2107 A potential denial-of-service threat exists for data encodings using
2108 compression techniques that have non-uniform receiver-end
2109 computational load. The attacker can inject pathological datagrams
2110 into the stream which are complex to decode and cause the receiver to
2111 be overloaded.
2112
2113 As with any IP-based protocol, in some circumstances a receiver may
2114 be overloaded simply by the receipt of too many packets, either
2115 desired or undesired. Network-layer authentication MAY be used to
2116 discard packets from undesired sources, but the processing cost of
2117 the authentication itself may be too high. In a multicast
2118 environment, source pruning is implemented in IGMPv3 (RFC 3376) [24]
2119 and in multicast routing protocols to allow a receiver to select
2120 which sources are allowed to reach it.
2121
2122
2123
2124
2125
2126
2127
2128
2129
2130 Schulzrinne & Casner Standards Track [Page 38]
2131 \f
2132 RFC 3551 RTP A/V Profile July 2003
2133
2134
2135 11. IANA Considerations
2136
2137 The RTP specification establishes a registry of profile names for use
2138 by higher-level control protocols, such as the Session Description
2139 Protocol (SDP), RFC 2327 [6], to refer to transport methods. This
2140 profile registers the name "RTP/AVP".
2141
2142 Section 3 establishes the policy that no additional registration of
2143 static RTP payload types for this profile will be made beyond those
2144 added in this document revision and included in Tables 4 and 5. IANA
2145 may reference that section in declining to accept any additional
2146 registration requests. In Tables 4 and 5, note that types 1 and 2
2147 have been marked reserved and the set of "dyn" payload types included
2148 has been updated. These changes are explained in Sections 6 and 9.
2149
2150 12. References
2151
2152 12.1 Normative References
2153
2154 [1] Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobson,
2155 "RTP: A Transport Protocol for Real-Time Applications", RFC
2156 3550, July 2003.
2157
2158 [2] Bradner, S., "Key Words for Use in RFCs to Indicate Requirement
2159 Levels", BCP 14, RFC 2119, March 1997.
2160
2161 [3] Apple Computer, "Audio Interchange File Format AIFF-C", August
2162 1991. (also ftp://ftp.sgi.com/sgi/aiff-c.9.26.91.ps.Z).
2163
2164 12.2 Informative References
2165
2166 [4] Braden, R., Clark, D. and S. Shenker, "Integrated Services in
2167 the Internet Architecture: an Overview", RFC 1633, June 1994.
2168
2169 [5] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z. and W.
2170 Weiss, "An Architecture for Differentiated Service", RFC 2475,
2171 December 1998.
2172
2173 [6] Handley, M. and V. Jacobson, "SDP: Session Description
2174 Protocol", RFC 2327, April 1998.
2175
2176 [7] Casner, S. and P. Hoschka, "MIME Type Registration of RTP
2177 Payload Types", RFC 3555, July 2003.
2178
2179 [8] Freed, N., Klensin, J. and J. Postel, "Multipurpose Internet
2180 Mail Extensions (MIME) Part Four: Registration Procedures", BCP
2181 13, RFC 2048, November 1996.
2182
2183
2184
2185
2186 Schulzrinne & Casner Standards Track [Page 39]
2187 \f
2188 RFC 3551 RTP A/V Profile July 2003
2189
2190
2191 [9] Zopf, R., "Real-time Transport Protocol (RTP) Payload for
2192 Comfort Noise (CN)", RFC 3389, September 2002.
2193
2194 [10] Deleam, D. and J.-P. Petit, "Real-time implementations of the
2195 recent ITU-T low bit rate speech coders on the TI TMS320C54X
2196 DSP: results, methodology, and applications", in Proc. of
2197 International Conference on Signal Processing, Technology, and
2198 Applications (ICSPAT) , (Boston, Massachusetts), pp. 1656--1660,
2199 October 1996.
2200
2201 [11] Mouly, M. and M.-B. Pautet, The GSM system for mobile
2202 communications Lassay-les-Chateaux, France: Europe Media
2203 Duplication, 1993.
2204
2205 [12] Degener, J., "Digital Speech Compression", Dr. Dobb's Journal,
2206 December 1994.
2207
2208 [13] Redl, S., Weber, M. and M. Oliphant, An Introduction to GSM
2209 Boston: Artech House, 1995.
2210
2211 [14] Hoffman, D., Fernando, G., Goyal, V. and M. Civanlar, "RTP
2212 Payload Format for MPEG1/MPEG2 Video", RFC 2250, January 1998.
2213
2214 [15] Jayant, N. and P. Noll, Digital Coding of Waveforms--Principles
2215 and Applications to Speech and Video Englewood Cliffs, New
2216 Jersey: Prentice-Hall, 1984.
2217
2218 [16] McKay, K., "RTP Payload Format for PureVoice(tm) Audio", RFC
2219 2658, August 1999.
2220
2221 [17] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., Handley, M.,
2222 Bolot, J.-C., Vega-Garcia, A. and S. Fosse-Parisis, "RTP Payload
2223 for Redundant Audio Data", RFC 2198, September 1997.
2224
2225 [18] Speer, M. and D. Hoffman, "RTP Payload Format of Sun's CellB
2226 Video Encoding", RFC 2029, October 1996.
2227
2228 [19] Berc, L., Fenner, W., Frederick, R., McCanne, S. and P. Stewart,
2229 "RTP Payload Format for JPEG-Compressed Video", RFC 2435,
2230 October 1998.
2231
2232 [20] Turletti, T. and C. Huitema, "RTP Payload Format for H.261 Video
2233 Streams", RFC 2032, October 1996.
2234
2235 [21] Zhu, C., "RTP Payload Format for H.263 Video Streams", RFC 2190,
2236 September 1997.
2237
2238
2239
2240
2241
2242 Schulzrinne & Casner Standards Track [Page 40]
2243 \f
2244 RFC 3551 RTP A/V Profile July 2003
2245
2246
2247 [22] Bormann, C., Cline, L., Deisher, G., Gardos, T., Maciocco, C.,
2248 Newell, D., Ott, J., Sullivan, G., Wenger, S. and C. Zhu, "RTP
2249 Payload Format for the 1998 Version of ITU-T Rec. H.263 Video
2250 (H.263+)", RFC 2429, October 1998.
2251
2252 [23] Schulzrinne, H., Rao, A. and R. Lanphier, "Real Time Streaming
2253 Protocol (RTSP)", RFC 2326, April 1998.
2254
2255 [24] Cain, B., Deering, S., Kouvelas, I., Fenner, B. and A.
2256 Thyagarajan, "Internet Group Management Protocol, Version 3",
2257 RFC 3376, October 2002.
2258
2259 13. Current Locations of Related Resources
2260
2261 Note: Several sections below refer to the ITU-T Software Tool
2262 Library (STL). It is available from the ITU Sales Service, Place des
2263 Nations, CH-1211 Geneve 20, Switzerland (also check
2264 http://www.itu.int). The ITU-T STL is covered by a license defined
2265 in ITU-T Recommendation G.191, "Software tools for speech and audio
2266 coding standardization".
2267
2268 DVI4
2269
2270 An archived copy of the document IMA Recommended Practices for
2271 Enhancing Digital Audio Compatibility in Multimedia Systems (version
2272 3.0), which describes the IMA ADPCM algorithm, is available at:
2273
2274 http://www.cs.columbia.edu/~hgs/audio/dvi/
2275
2276 An implementation is available from Jack Jansen at
2277
2278 ftp://ftp.cwi.nl/local/pub/audio/adpcm.shar
2279
2280 G722
2281
2282 An implementation of the G.722 algorithm is available as part of the
2283 ITU-T STL, described above.
2284
2285 G723
2286
2287 The reference C code implementation defining the G.723.1 algorithm
2288 and its Annexes A, B, and C are available as an integral part of
2289 Recommendation G.723.1 from the ITU Sales Service, address listed
2290 above. Both the algorithm and C code are covered by a specific
2291 license. The ITU-T Secretariat should be contacted to obtain such
2292 licensing information.
2293
2294
2295
2296
2297
2298 Schulzrinne & Casner Standards Track [Page 41]
2299 \f
2300 RFC 3551 RTP A/V Profile July 2003
2301
2302
2303 G726
2304
2305 G726 is specified in the ITU-T Recommendation G.726, "40, 32, 24, and
2306 16 kb/s Adaptive Differential Pulse Code Modulation (ADPCM)". An
2307 implementation of the G.726 algorithm is available as part of the
2308 ITU-T STL, described above.
2309
2310 G729
2311
2312 The reference C code implementation defining the G.729 algorithm and
2313 its Annexes A through I are available as an integral part of
2314 Recommendation G.729 from the ITU Sales Service, listed above. Annex
2315 I contains the integrated C source code for all G.729 operating
2316 modes. The G.729 algorithm and associated C code are covered by a
2317 specific license. The contact information for obtaining the license
2318 is available from the ITU-T Secretariat.
2319
2320 GSM
2321
2322 A reference implementation was written by Carsten Bormann and Jutta
2323 Degener (then at TU Berlin, Germany). It is available at
2324
2325 http://www.dmn.tzi.org/software/gsm/
2326
2327 Although the RPE-LTP algorithm is not an ITU-T standard, there is a C
2328 code implementation of the RPE-LTP algorithm available as part of the
2329 ITU-T STL. The STL implementation is an adaptation of the TU Berlin
2330 version.
2331
2332 LPC
2333
2334 An implementation is available at
2335
2336 ftp://parcftp.xerox.com/pub/net-research/lpc.tar.Z
2337
2338 PCMU, PCMA
2339
2340 An implementation of these algorithms is available as part of the
2341 ITU-T STL, described above.
2342
2343 14. Acknowledgments
2344
2345 The comments and careful review of Simao Campos, Richard Cox and AVT
2346 Working Group participants are gratefully acknowledged. The GSM
2347 description was adopted from the IMTC Voice over IP Forum Service
2348 Interoperability Implementation Agreement (January 1997). Fred Burg
2349 and Terry Lyons helped with the G.729 description.
2350
2351
2352
2353
2354 Schulzrinne & Casner Standards Track [Page 42]
2355 \f
2356 RFC 3551 RTP A/V Profile July 2003
2357
2358
2359 15. Intellectual Property Rights Statement
2360
2361 The IETF takes no position regarding the validity or scope of any
2362 intellectual property or other rights that might be claimed to
2363 pertain to the implementation or use of the technology described in
2364 this document or the extent to which any license under such rights
2365 might or might not be available; neither does it represent that it
2366 has made any effort to identify any such rights. Information on the
2367 IETF's procedures with respect to rights in standards-track and
2368 standards-related documentation can be found in BCP-11. Copies of
2369 claims of rights made available for publication and any assurances of
2370 licenses to be made available, or the result of an attempt made to
2371 obtain a general license or permission for the use of such
2372 proprietary rights by implementors or users of this specification can
2373 be obtained from the IETF Secretariat.
2374
2375 The IETF invites any interested party to bring to its attention any
2376 copyrights, patents or patent applications, or other proprietary
2377 rights which may cover technology that may be required to practice
2378 this standard. Please address the information to the IETF Executive
2379 Director.
2380
2381 16. Authors' Addresses
2382
2383 Henning Schulzrinne
2384 Department of Computer Science
2385 Columbia University
2386 1214 Amsterdam Avenue
2387 New York, NY 10027
2388 United States
2389
2390 EMail: schulzrinne@cs.columbia.edu
2391
2392
2393 Stephen L. Casner
2394 Packet Design
2395 3400 Hillview Avenue, Building 3
2396 Palo Alto, CA 94304
2397 United States
2398
2399 EMail: casner@acm.org
2400
2401
2402
2403
2404
2405
2406
2407
2408
2409
2410 Schulzrinne & Casner Standards Track [Page 43]
2411 \f
2412 RFC 3551 RTP A/V Profile July 2003
2413
2414
2415 17. Full Copyright Statement
2416
2417 Copyright (C) The Internet Society (2003). All Rights Reserved.
2418
2419 This document and translations of it may be copied and furnished to
2420 others, and derivative works that comment on or otherwise explain it
2421 or assist in its implementation may be prepared, copied, published
2422 and distributed, in whole or in part, without restriction of any
2423 kind, provided that the above copyright notice and this paragraph are
2424 included on all such copies and derivative works. However, this
2425 document itself may not be modified in any way, such as by removing
2426 the copyright notice or references to the Internet Society or other
2427 Internet organizations, except as needed for the purpose of
2428 developing Internet standards in which case the procedures for
2429 copyrights defined in the Internet Standards process must be
2430 followed, or as required to translate it into languages other than
2431 English.
2432
2433 The limited permissions granted above are perpetual and will not be
2434 revoked by the Internet Society or its successors or assigns.
2435
2436 This document and the information contained herein is provided on an
2437 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
2438 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
2439 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
2440 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
2441 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
2442
2443 Acknowledgement
2444
2445 Funding for the RFC Editor function is currently provided by the
2446 Internet Society.
2447
2448
2449
2450
2451
2452
2453
2454
2455
2456
2457
2458
2459
2460
2461
2462
2463
2464
2465
2466 Schulzrinne & Casner Standards Track [Page 44]
2467 \f