Binary Compression Rates for ASCII Formats Martin Isenburg
Binary Compression Rates for ASCII Formats Martin Isenburg Jack Snoeyink University of North Carolina at Chapel Hill
Take this home: “Geometry compression does not require binary. ” “We can store compressed geometry as ASCII and achieve benchmark compression rates. ” “This enables full conformance between ASCII and binary X 3 D. ”
Overview • • • Motivation Benchmark Compressor Storing Bits as ASCII Results More Motivation Demo • Summary
Motivation
Why ASCII Scenes? • authors like text-based 3 D – wide acceptance – read, understand, modify, and save scene with any text editor – Web 3 D Developer survey: “very important” • no binary standard • rather readable ASCII than binary
Why Compression? • authors want compressed 3 D – smaller files – faster download – Web 3 D Developer survey: “very important” • data-heavy nodes – audio / images / video – geometry
Approaches to Compression • binary formats: – proprietary compression schemes – automatically “protects” the content – Shockwave 3 D, Viewpoint, Virtue 3 D, Cult 3 D. . . • gzipped text-based formats: – human readable ASCII content – WRL, X 3 D, OBJ, PLY, SMF, OFF…
Compression Standards • software to read, edit, and write content is available and widely used • binary compression standards: – – audio data: image data: movie data: geometry data: MP 3 JPEG, GIF MPEG ?
Why no Standard yet? ( 1) structure of 3 D data is complex • • audio: sequence of numbers image: block of numbers video: sequence of blocks of numbers geometry: vertex positions + triangles (or polygons ? ) + texcoords (1, 2, 4, 8 layers ? ) + normals (smoothing groups ? ) + colors (per-face ? per-vertex ? ) + bones (1, 2, 3 attachments ? )
Why no Standard yet? ( 2) 3 D data is much less used • • audio: everybody image: everybody video: everybody geometry: anybody ? my mom, dad, sister, aunt, uncle, grandma, friends, neighbors, … gamers, researchers 3 D data is not a consumable product on its own profitability issues
Geometry compression & X 3 D • long wanted feature (`96) – compression requires binary VRML – Compressed Binary Format workgroup – binary VRML + 5 new compressed nodes • but … CBF proposal refused (`98) – problem-child: the binary specification • second call for BF proposals (`00) – unanswered
Compression without Binary Web 3 D 2002 paper Coding Polygon Meshes as Compressable ASCII [Isenburg & Snoeyink, 02] – eliminate binary requirement – only requires specification changes where it matters – does not interfere with how authors publish content – does not affect (overall) readability
Design Objective • work with Shout 3 D API • pure Java viewer (plugin-less) • fast, light-weight decoding – low computational complexity – small size for decoder class • be worthwhile less than 400 lines of code 5, 381 bytes 1: 6
ASCII coder binary coder • How good is the compression? – compare to benchmark coder – oops … • New Design Objective: >>> Maximum Compression <<< – produce ASCII output – compress with “gzip -9” – be as good as benchmark
Benchmark Compressor
Benchmark Compressor Triangle Mesh Compression [Touma & Gotsman, 98] • Connectivity Coder – codes connectivity as sequence of vertex degrees • Geometry Coder – codes geometry as sequence of corrective vectors only handles fully no support for triangulated meshes texture coordinates
Polygon Mesh Compressor handling non-triangular meshes: Compressing Polygon Mesh Connectivity with Degree Duality Prediction [Isenburg, 02] Compressing Polygon Mesh Geometry with Parallelogram Prediction [Isenburg & Alliez, 02] handling textured meshes: Compressing the Property Mapping of Polygon Meshes [Isenburg & Snoeyink, 01] Compressing Texture Coordinates with Selective Linear Predictions [Isenburg & Snoeyink, 03]
It’s available! • a working Web implementation • Java applet • based on node set of Shout 3 D • WRL, OBJ, PLY, OFF, SMF, … • compress any model online
Binary vs. ASCII [results from Web 3 D’ 02 paper] scene binary ASCII difference lion wolf raptor … horse cat dog 42. 6 18. 4 21. 5 66. 2 29. 4 34. 9 55 % 60 % 62 % 20. 3 22. 5 21. 5 40. 9 39. 9 34. 6 101 % 77 % 61 % average 74 %
Mesh Compression Process grow region + store symbols
Why is binary so much better? Because it … No! • is binary? Not quite! • stores symbols in binary? • uses better predictive Sort of. encoding scheme? • compresses symbols with Yes, but … entropy coding? • allows to store a bit-stream? Yes!
Entropy Coders for a symbol sequence of t types Entropy = t i =1 1 pi • log 2( ) bits pi pi = 0. 2 bits 1. 3 bits # of type t # total 2. 0 bits
Contents of Binary File • an array of bits [0011010011110… 10001010 ] • six floats xmin = -1. 2323 ymin = -2. 1324 zmin = -1. 7123 xmax = 2. 6731 ymax = 0. 8372 zmax = 3. 4971 • an integer precision = 12
triceratops. wrl Shape { appearance Appearance { material Material { diffuse. Color 1. 5 0 } } geometry Indexed. Face. Set { crease. Angle 0. 9 coord Coordinate { point [ -0. 0715 1. 7609. . . -0. 4479 -1. 5153 1. 5304 ] } coord. Index [ 7 6 209 204 -1 4 … 4577 4223 4222 -1 ] } }
triceratops_compressed. wrl Shape { appearance Appearance { material Material { diffuse. Color 1. 5 0 } } geometry Compressed. Indexed. Face. Set { crease. Angle 0. 9 code [ 0 0 1 1 1 0 1 101110… 10001110110] bits 12 box [ -1. 2323 -2. 1324 -1. 7123 2. 6731 0. 8372 3. 4971 ] } }
triceratops_compressed. wrl Shape { appearance Appearance { material Material { diffuse. Color 1. 5 0 } } geometry Compressed. Indexed. Face. Set { crease. Angle 0. 9 code “AQg+N 9 g. Od+Rhh+l. Xdfz. KEyz 0 CCZUJHLs 3+ f. Wps. TQFxortsk 8 XW 8 j 7 j. EZw. L … v. XS 7 GVb. HHUu. X” bits 12 box [ -1. 2323 -2. 1324 -1. 7123 2. 6731 0. 8372 3. 4971 ] } }
Storing Bits as ASCII
Design Goals 1. resulting ASCII should be “safe” obvious choices: – uuencode – base 64 – Hexbin • represent binary files with a set of 64 ASCII characters • other possibilities: 2 …. 96 chars
Design Goals 1. resulting ASCII should be “safe” 2. conversion should be simple efficient implementation by using 2 k ASCII characters – value of k bits – look-up table [00110100111101…] LT[6] LT[19] … LT = { ‘A’, ‘B’, ‘C’, ‘D’, ‘E’, ‘F’, ‘G’, ‘H’, … …, ‘+’, ‘=’ }
Design Goals 1. resulting ASCII should be “safe” 2. conversion should be simple 3. ASCII representation should be compact number of ASCII chars expansion factor: 8 / log 2(n) 32 chars 64 chars 96 chars = 60. 0 % = 33. 0 % = 21. 5 %
Design Goals 1. resulting ASCII should be “safe” 2. conversion should be simple 3. ASCII representation should be compact 4. size of ASCII representation of bits after “gzipping” should be close to original
Results
fish. wrl Shape { appearance Appearance { material Material { modulate. Texture. With. Diffuse true diffuse. Color 1 1 1 } texture Image. Texture { url fish. jpg } } geometry Indexed. Face. Set { crease. Angle 0. 9 coord Coordinate { point [ -0. 0715 4. 7609 6. 3930. . . -0. 4479 -4. 5153 4. 5304 ] } coord. Index [ 7 6 209 204 -1 4 217 … 4577 -1 4577 4223 4222 -1 ] tex. Coord Texture. Coordinate { point [ 0. 3735 0. 9441 0. 3289 … 0. 2666 0. 4990 0. 1082 ] } tex. Coord. Index [ 0 1 2 3 -1 4 5 6 7 … 4311 -1 4311 4293 4683 -1 ] } }
fish_compressed. wrl Shape { appearance Appearance { material Material { modulate. Texture. With. Diffuse true diffuse. Color 1 1 1 } texture Image. Texture { url fish. jpg } } geometry Compressed. Indexed. Face. Set { crease. Angle 0. 9 code “AQg+N 9 g. Od+Rhh+l. Xdfz. K 20 Eyz 0 CCZUJHMK 8 EKLs 3 Qr. P+ f. Wps. TQFxo 4 d. PVVZ 0 p. GK 720 h. VQw 3 a 9 x. Xrtsk 8 XW 4 d 8 j 7 j. EDf. P 7 p. C h. VQw 3 a 9 x. Xrt. PTsk 8 XW 8 j 7 j. EDf. P 7 p. Cg. Eof. TPo. Tnf. UYf. TWzz. Abc 9 eh Fji 6 UL 0 Zw. L … v. XS 7 GVb. HHUu. X“ bits 12 box [ -1. 9725 -7. 6861 -10 1. 9725 7. 6861 10 ] bits_tex 10 box_tex [ 0. 0146 0. 0749 0. 9896 0. 9834 ] } }
resulting file sizes [without gzipping the text file] scene IFS comp IFS ratio lion wolf raptor … horse cat dog 1, 360 569 586 57 25 29 1 : 24 1 : 23 1 : 20 749 791 586 27 30 29 1 : 28 1 : 26 1 : 20 average 1 : 25
resulting file sizes [with gzipping the text file] scene IFS comp IFS ratio lion wolf raptor … horse cat dog 442 183 200 43 19 22 1 : 10 1: 9 266 267 186 21 23 22 1 : 13 1 : 11 1: 8 average 1 : 11
Binary vs. ASCII scene binary ASCII difference lion wolf raptor … horse cat dog 42. 6 18. 4 21. 5 43. 2 18. 7 21. 9 1. 3 % 1. 8 % 1. 7 % 20. 3 22. 5 21. 5 20. 6 22. 8 21. 9 1. 7 % 1. 6 % 1. 7 % average 1. 7 %
More Motivation
ASCII binary Scene. x 3 d. gz IFS “living room” ASCII Scene. x 3 d. gz IFS “living room” ? T IFS “frame” compress T IFS “frame” IMG “mona lisa” 12 bits IMG “mona lisa” T IFS “statue” asc 2 bin 2 asc Scene. bx 3 d IFS “living room” binary Compression T CIFS “statue” asc 2 bin 2 asc Scene. bx 3 d IFS “living room” T IFS “frame” compress T IFS “frame” IMG “mona lisa” 12 bits IMG “mona lisa” T IFS “statue” T CIFS “statue”
Complete Conformance • compressed nodes exist in ASCII and binary • going back and forth between the two is a very simple mapping • same decompression algorithm used – no matter if compressed node stored in ASCII or binary
Demo > local web <
Summary
Summary (1) • completely independent things: – ASCII format two alternative – binary format representations – use of compressed author’s decision at publishing time geometry nodes • store compressed nodes in ASCII • as compact as if stored in binary (after gzipping)
Summary (2) • provide “bit-field” in ASCII formats • same decompression algorithm for ASCII and binary version • full conformance between ASCII and binary version of X 3 D ----- including compressed nodes design/decide compression now without waiting for a binary format !!!
Thank you.
Example XML node Compressed. Indexed. Face. Set : Indexed. Face. Set { MFbyte [out] code NULL } IC SFint 32 [out] bit_tex bit_col bit_nor [1. . 24] [1. . 16] IC IC SFvec 6 f SFvec 4 f SFvec 6 f [out] box_tex box_col [1. . 24] [0 0 1 1 ] [0 0 0 1 1 1 ] IC IC IC SFbool [] valid [false] IC
- Slides: 47