Audio Definition Model for Flexible File Formats Dave
Audio Definition Model for Flexible File Formats Dave Marston BBC R&D
Involvement ● EBU Groups: ● FAR-BWF (BWF file, audio expertise) ● MIM-MM (EBU Core, metadata expertise)
What is the Audio Definition Model? ● ● Formalised way of describing audio for file formats. Initial file format will be Broadcast WAV (BWAV). ● Specified by EBUCore XML schema. ● Model can be used more generally. ● Aim to make it the primary description model for as many formats as possible.
Future Multichannel Audio ● Channel based ● ● Scene based ● ● e. g. Ambisonics Object based ● ● e. g. stereo, 5. 1, 22. 2 Audio objects with stationary or moving spatial properties. Combinations of all three
Cooking with Audio! ● ● ● Audio Definition Model is like a shopping list of ingredients. Each ingredient has a formal description. BWAV file is like a shopping bag containing the actual ingredients. BWAV 'chna' chunk is like the bar-codes on each item. The ADM is NOT the recipe though!
Terminology Track Stream Channel Block Pack Object Type Content Programme A single set of samples or data in the storage medium. A combination of tracks (or one track) required to represent a channel, an object, or a group. A single sequence of audio samples. A division of a channel in time. A set of audio channels that belong together. A pack with time limited properties. The type of audio channel, whether direct speakers, Ambisonic component, audio object, etc. Objects with the actual audio. A set of content that derived from the same material.
Audio Definition Model Diagram audio. Programme audio. Content audio. Object 'chna' chunk Content Track No Format audio. Pack. Format audio. Track. UID audio. Track. Format. IDRef audio. Pack. Format. IDRef audio. Stream. Format audio. Channel. Format audio. Block. Format audio. Track. Format
Simple Channel Based Example PCM_Front. Left Channel Front. Left Block start N/A 00010001 00000001 Stream Block start N/A Pack 3. 0 00010005 Track Stream PCM_Front. Left 0001_01 Track PCM_Front. Right Channel Front. Right 00010002_01 00010002 00000001 Track PCM_Centre Stream PCM_Centre Channel Centre Block start N/A 00010003_01 00010003 00000001 Object 3. 0 Track No UID Track. ID Pack. ID 1 00000001_01 00010005 2 00000002 00010002_01 00010005 3 00000003 00010003_01 00010005 00011005 00000001 00000002 00000003
Coded Audio Example Track data 1 Stream Dolby. E_3. 0 00040001_01 00040001 Track data 2 00040001_02 Channel Front. Left Block start N/A 0001 00000001 Channel Front. Right Block start N/A Pack 3. 0 00010002 000000010005 Channel Centre Block start N/A 00010003 00000001 Object 3. 0 Track No UID Track. ID Pack. ID 00011006 1 00000001 00040001_01 00010005 2 00000001 00000002 00040001_02 00010005
Object Based Example Track Stream PCM_Object 1 00031001_01 PCM_Object 1 Channel Object 1 00031001 Block start 00: 00 dur: 00: 05 00000001 Block start 00: 05 dur: 00: 08 00000002 Pack Objects 00031001 Block start 00: 13 dur: 00: 07 00000003 Track No UID Track. ID Pack. ID 1 00000001 00031001_01 00031001 Objects start 00: 30 dur: 00: 20 00031001 00000001
XML Representation Use new version of the EBUCore schema <audio. Channel. Format. ID="AC_00031001" audio. Channel. Format. Name="Object 1" type. Definition=”Objects”> <audio. Block. Format. ID=”AB_00031001_00000001” rtime=” 00: 00” duration=” 00: 05”> <position type=”azimuth”>-20. 0</position> <position type=”elevation”>5. 0</position> <position type=”distance”>1. 0</position> </audio. Block. Format> <audio. Block. Format. ID=”AB_00031001_00000002” rtime=” 00: 05” duration=” 00: 08”> … </audio. Block. Format> <audio. Block. Format. ID=”AB_00031001_00000003” rtime=” 00: 13” duration=” 00: 07”> … </audio. Block. Format> </audio. Channel. Format> <audio. Stream. Format. ID="AS_00031001" audio. Stream. Format. Name="Object 1" type. Definition=”PCM”> <audio. Channel. Format. IDRef>AC_00031001</audio. Channel. Format. IDRef> <audio. Track. IDFormat. Ref>AT_00031001_01</audio. Track. Format. IDRef> </audio. Stream. Format> <audio. Track. Format. ID=”AT_00031001_01" audio. Track. Format. Name="Object 1" type. Definition=”PCM”/>
Standard Configuration File ● Many configurations will use common channel types (e. g. stereo, 5. 1, 22. 2, Ambisonics). Therefore use an external standard reference XML file. <audio. Channel. Format. ID="AC_0001" audio. Channel. Format. Name="Front. Left" type. Definition=”Direct. Speakers”> <audio. Block. Format. ID=”AB_0001_00000001”> <speaker. Label>M-30</speaker. Label> <position type=”azimuth”>-25. 0</position> <position type=”elevation”>5. 0</position> <position type=”distance”>1. 0</position> </audio. Block. Format> </audio. Channel. Format>
Custom Configuration ● ● For non-standard channel definitions, particularly audio objects, a custom configuration file must file generated. This is what is carried in the 'axml' chunk. <audio. Channel. Format. ID="AC_00031001“ audio. Channel. Format. Name="Object 1" type. Definition=”Objects”> <audio. Block. Format. ID=”AB_00031001_00000001” rtime=” 00: 00” duration=” 00: 05”> <position type=”azimuth”>-20. 0</position> <position type=”elevation”>5. 0</position> <position type=”distance”>1. 0</position> </audio. Block. Format> <audio. Block. Format. ID=”AB_00031001_00000002” rtime=” 00: 05” duration=” 00: 08”> <position type=”azimuth”>-22. 0</position> <position type=”elevation”>6. 0</position> <position type=”distance”>1. 1</position> </audio. Block. Format> <audio. Block. Format. ID=”AB_00031001_00000003” rtime=” 00: 13” duration=” 00: 07”> <position type=”azimuth”>-24. 0</position> <position type=”elevation”>7. 0</position> <position type=”distance”>1. 2</position> </audio. Block. Format> </audio. Channel. Format>
What are BWAV and RF 64 Files? ● WAV is a RIFF file for audio ● BWAV = Broadcast WAV ● BWF = Broadcast WAV File ● RF 64 = WAV file for >4 GB size files ● BWAV have a 'bext' chunk ● MBWF is a RF 64 file with a 'bext' chunk
Chunks ● Resource Interchange File Format (RIFF) ● Data stored in chunks – header, length & data. ● WAV chunks: ● ● 'RIFF' : tells you its a WAVE file ● 'fmt ' : contains sample-rate, number of channels, etc. ● 'data' : contains audio samples. BWAV chunks: ● 'bext', 'axml', 'link', 'levl', 'mext', 'qlty', 'dbmd'
Where does the XML go? fmt chunk bext chunk Refers to chna chunk Standard XML Definitions Refers to data chunk Custom XML Definitions axml chunk is stored in If no custom XML definitions are used, then no axml chunk is required. Standard XML definitions do not need to be included in the file.
'chna' chunk Simple 3. 0 Channel Example Track 1 Track 2 Track 3 Track. No audio. Track. UID 1 2 3 00000001 00000002 00000003 audio. Track. Format. ID 0001_01 00010002_01 00010003_01 First 4 digits specify type of stream. 0001 = PCM audio. Pack. Format. ID 00010005
Current Status ● ● ● EBU Tech 3364 “Audio Definition Model” now published. EBU Core v 1. 5 (EBU Tech 3293) schema containing ADM soon to be released. ITU Contributions being made.
Future Work ● ● A list of standard configurations will be drawn together. ● Database ● Reference XML file Audio Object parameters need continual refinement. Libraries/APIs for parsing and generating ADM metadata to be developed. Look at streaming methods.
- Slides: 19