RTP and playout delay compensation Henning Schulzrinne Dept

  • Slides: 9
Download presentation
RTP and playout delay compensation Henning Schulzrinne Dept. of Computer Science Columbia University Fall

RTP and playout delay compensation Henning Schulzrinne Dept. of Computer Science Columbia University Fall 2003

RTP packet header 0 1 2 3 4 5 6 7 8 9 0

RTP packet header 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | contributing source (CSRC) identifiers | |. . | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

RTP: timestamp l l Timestamp measured in sample units reflects nominal sampling time of

RTP: timestamp l l Timestamp measured in sample units reflects nominal sampling time of first sample in packet e. g. , 20 ms block size of 8, 000 Hz audio 160 timestamp units per packet always 90 k. Hz for video – – – l l e. g. , 3000 timestamp units per packet for 30 fps 3600 for 25 fps 3750 for 24 fps even if real system clock is slower or faster note: 32 bit integer may wrap around – – if start at 0, after about 6 days for audio, ½ day for video but starting value is supposed to be random

RTP sequence number l l Counts packets actually sent Wraps around much quicker –

RTP sequence number l l Counts packets actually sent Wraps around much quicker – l e. g. , for 20 ms packets, in about 22 minutes Also uses random starting value

RTP timestamp vs. sequence number l Related, but different purposes – timestamp for timing

RTP timestamp vs. sequence number l Related, but different purposes – timestamp for timing reconstruction: l l – l l playout delay compensation (later) synchronization with other sources (later) sequence number for loss measurements and gap detection t = s*b + c where t = timestamp s = sample units per packet offset c is constant within a talkspurt, but changes after each talkspurt or after transmission gap

Playout delay l Converts variable network delay (“jitter”) into fixed delay – – thus,

Playout delay l Converts variable network delay (“jitter”) into fixed delay – – thus, end-to-end delay is max(jitter) + propagation delay or, if willing to tolerate some late packets: l l Propagation delay is invisible – – l l delay < 95% of jitter + propagation delay and hard to measure without synchronized clocks about 5 ms/1000 km one way Total delay should be less than 150 ms one-way End-to-end delay must remain constant within a talkspurt – otherwise gaps

Playout delay packet jitter late = lost time

Playout delay packet jitter late = lost time

Playout buffer l l Logically infinite buffer Implemented as “circular buffer”, with wrap around

Playout buffer l l Logically infinite buffer Implemented as “circular buffer”, with wrap around Takes care of jitter and reordering based on RTP timestamp t Playout point p = t*b + c – – – p = buffer position, measured in samples (typically, 16 bits if decoding is done before playout) b = buffer positions per sample (usually, = 1) c = offset silence decoder (G. 729 L 16) l l Usually, best to think of each talkspurt as an independently schedulable unit p = p 0 + (t – t 0) * b t 0 = timestamp for first packet in talkspurt p 0 = position for first packet in talkspurt

Playout buffer, cont’d. l l Thus, hard part is computing insertion point for first

Playout buffer, cont’d. l l Thus, hard part is computing insertion point for first packet in talkspurt Trying to predict future – l late loss vs. excessive delay Conceptually, two approaches: – look at current playout point when first packet arrives l l – then, leave some margin of error may be too conservative l compute based on last talkspurt and change c l l avoids overestimation due to slow first packet deals less well with jumps in delay after long pauses insert t=100 t=140 Simple method: assume roughly normal distribution and take n times the variance of the delay (= jitter) – l play this becomes the extra delay Other mechanisms: – – spike detection optimal value for last talkspurt t