Challenging 5 Common Assumptions about Videoconferencing Milton Chen
Challenging 5 Common Assumptions about Videoconferencing Milton Chen Computer Systems Lab Stanford University Presented at Internet 2 Advanced Applications Track 10/28/2002 Copyright 2002 Milton Chen
The Stanford Video Auditorium 15’ x 5’ video wall desktop interface Copyright 2002 Milton Chen
Video Auditorium publicity/users Intel president Paul Otellini’s Intel Developer Forum keynote Invited demo to NASA headquarters for Paul G. Pastorek CANARIE, Canada CUDI, Mexico Comdex, Brazil IBM Almaden Lab Manhattan College Hopkins Marine Station Stanford Medical School Stanford Learning Lab Stanford Center for Design Research Berkeley Bioengineering Lab Universidade Federal do Rio Grande do Sul, Brazil Copyright 2002 Milton Chen
Outline Common assumptions – Technology 1. High-fidelity AV requires dedicated hardware 2. Difficult to install and use – Human factors 3. Life size displays are ideal 4. Floor control requires interactive frame rate 5. Eye contact is difficult Beyond MCU and H 323 – Peer-to-peer – Stanford’s Port Bootstrap Protocol – Personal directory An evaluation of distance learning at Stanford Why videoconferencing is not ubiquitous
1. High-fidelity low-latency AV requires dedicated hardware Copyright 2002 Milton Chen
Your PC outperforms all dedicated systems $700 Pentium 4 computer outperforms $7000 systems Copyright 2002 Milton Chen
Comparison of videoconferencing solutions Max number of links Max video resolution BW required at 352 x 288 15 fps Net. Meeting 1 352 x 288 200 Kbps WIDE DVTS 1 720 x 480 3000 Kbps Vbrick 1 720 x 480 2000 Kbps Polycom, Sony, … 4 352 x 288 200 Kbps Access. Grid, VRVS many 720 x 480 400 Kbps Stanford Video Auditorium 16 720 x 480 100 Kbps to more than 100 * CUSee. ME, i. Visit, Yahoo messenger have. Copyright unacceptable latency 2002 Milton Chen
demo Copyright 2002 Milton Chen
A scalable AV streaming architecture audio capture audio compress audio send video capture video compress video send audio receive audio decompress audio render video receive video decompress video render * True. Speech 8. 5 * MPEG-4 * Encrypted, AES (Rijndael), streaming * Simultaneous AV recording * Perceptual streaming adapts to network conditions Copyright 2002 Milton Chen
Beyond MCU and H 323 MCU vs. peer-to-peer – Scalability – Ease of deployment H 323 vs. Stanford’s Port-Bootstrap Protocol – Firewall – Ease of deployment Personal directory Copyright 2002 Milton Chen
2. Videoconferencing systems are difficult to install and use Copyright 2002 Milton Chen
One click operation To use the Video Auditorium – “Nothing” to install – One click on the html speed dial <OBJECT CLASSID="CLSID: E 80 F 7 B 8 F-7906 -4 A 89 -B 59 E-B 19871 F 474 A 9" CODEBASE="runtime/VA_Start. ocx#Version=-1, -1, -1"> <PARAM NAME="addr" VALUE="stanford -client_only"> </OBJECT> Makes conferencing as simple as surfing the web Copyright 2002 Milton Chen
3. Life size displays are ideal Copyright 2002 Milton Chen
Each video should be between 6° and 14° wide * 12 people sat 10’ from the display Subjectively, people reported 6° as minimum and 14° as ideal. Life size is 12°. Copyright 2002 Milton Chen
Balance between size and head movements 14° 7° 9° 4° * 12 people viewed 9 and 36 students on a large and immersive display. Immersive display requires head movements to see all the students. Copyright 2002 Milton Chen
4. Effective floor control requires interactive frame rate Copyright 2002 Milton Chen
Minimum required frame rate Interactive 10 fps Tolerable 5 fps – [Tang and Isaac ’ 93] Lip synchronization 5 fps – [Watson and Sasse ’ 96] Content understanding 5 fps – [Ghinea and Thomas ’ 98] Sign language recognition 1 fps – [Johnson and Caird ’ 96] Copyright 2002 Milton Chen
Gesture Detection Algorithm input image frame difference Visualization of algorithm after erosion Copyright 2002 Milton Chen
Requires 10% of full motion bandwidth full-motion (10 fps) gesture-sensitive (0. 2 fps) * MPEG 4 encoded at 320 x 240 Copyright 2002 Milton Chen
Gesture sensitive allows dynamic discussion 15 fps ~0. 2 fps * 8 groups of 4 people during a discussion Copyright 2002 Milton Chen
5. Eye contact is difficult Copyright 2002 Milton Chen
Eye contact fires up our brain [Kampe et al. ’ 01] Copyright 2002 Milton Chen
Eye contact is difficult Looking into the camera Attempting eye contact Copyright 2002 Milton Chen
Solutions to eye contact Half-silvered mirror [Rosenthal ’ 47] Clear. Board [Ishii, et al. ’ 92] MAJIC [Okada, et al. ’ 94] Gaze. Master [Gemmell, et al. ’ 00] Copyright 2002 Milton Chen
A simple solution Hydra [Sellen, Buxton, and Arnott ’ 92] Copyright 2002 Milton Chen
Eye contact sensitivity is high 100 Eye contact (%) 2 m looker stdev = 2. 8° observer Spatial perception task As good as Snellen acuity [Gibson and Pick ’ 63] 0 -8. 5 0 Angle (deg) 8. 5 * 6 observers judged 1 looker Copyright 2002 Milton Chen
Sensitivity is symmetric Cline ’ 67 Kruger and Huckstedt ‘ 69 Anstis, et al. ’ 69 Stokes ’ 69 Ellgring ’ 70 Picture. Phone camera above display Hydra camera below display Copyright 2002 Milton Chen
Methodology large display with camera at the center Record lookers gazing at different targets Observers watch videos of looker and judge eye contact * Two rooms can be linked in a videoconferencing session Copyright 2002 Milton Chen
Sensitivity is asymmetric * 16 observers judged recorded videos of 1 looker Copyright 2002 Milton Chen
An anatomical explanation looking at you looking sideways looking up Illustrations from The Artist’s Guide to Facial Expression [Faigin ’ 90] looking down eye closing Copyright 2002 Milton Chen
Sensitivity is less in conversation recorded (down) * 16 observers judged videos of 1 looker Copyright 2002 Milton Chen
Sensitivity is less in video face-to-face (down) * 16 observers judged 1 looker in conversation Copyright 2002 Milton Chen
We are biased to perceive contact eye contact (%) 100 sideway, up down & video & conversation 0 angle Snellen Acuity Conferencing Acuity Copyright 2002 Milton Chen
Maximum camera to eyes distance device minimum viewing distance camera to rendered eyes distance Palm held 1’ 1. 5” Desktop 2’ 3” Wall size 8’ 12” * Assuming a sensitivity of 7° Copyright 2002 Milton Chen
Eye contact in the Video Auditorium Copyright 2002 Milton Chen
Why is videoconferencing essential to distance learning: An evaluation of distance learning at Stanford Copyright 2002 Milton Chen
Distance learning at Stanford a 2002 operator console a 1969 classroom a 2002 lecture viewer Remote students can call in during class Instructor cannot see the remote students Copyright 2002 Milton Chen
Students like distance learning * 120 students, 15 TAs, and 41 faculty Copyright 2002 Milton Chen
Learning is less effective * 120 students, 15 TAs, and 41 faculty Copyright 2002 Milton Chen
F 2 F interaction is important F 2 F is important for lecturing and crucial for discussions Copyright 2002 Milton Chen
No interaction with remote students Classroom observation of 4 CS classes – Instructor on average asked 9 questions per session – Local students on average asked/made 3 questions/comments per session – Remote students spoke once in 6 month Copyright 2002 Milton Chen
Value of video beyond audio Cues only transmitted by the visual channel – Negative feedbacks, … Emotional bond – Establishing and maintaining relationships Can you imagine it? – A new face, … Copyright 2002 Milton Chen
A proposal Copyright 2002 Milton Chen
The world’s largest video wall: link all Internet 2 members for Spring 03 Developed technology One Mouse AV stream migration Bandwidth: 2 x 300 x (100 Kbps + 10 Kbps) Cost: 10 P 4 laptops + 10 portable projectors 60 Mbps $30 K Copyright 2002 Milton Chen
A prediction Copyright 2002 Milton Chen
Why all videoconferencing products has failed A plane that does not fly is not a plane First flight, Wrights 1903 A videophone that limits communication is not a videophone • poor audio fidelity • poor video fidelity • excessive latency • no eye contact • poor lip synchronization Copyright 2002 Milton Chen
Threshold of quality for the 2 nd revolution 1 st Revolution: Possible 2 nd Revolution: Practical first mobile phone, 1924 first handheld phone, 1973 first videoconferencing system, 1927 Copyright 2002 Milton Chen
Conclusion Common assumptions 1. High-fidelity AV requires dedicated hardware 2. Difficult to install/use 3. Life size displays are ideal 4. Floor control requires at least 10 fps 5. Eye contact is difficult higher on a PC one click 6° to 14° 0. 2 fps avg 7° down Videoconferencing is essential to distance learning A MCU-less and H 323 -less future Copyright 2002 Milton Chen
You already have a one-click high-fidelity multiparty videoconferencing system We are at the dawn of a videoconferencing revolution that will fuel the demand for a 1000 X increase in available bandwidth Copyright 2002 Milton Chen
Acknowledgement – – – NASA Intel Sony Interval Research Wallenberg Global Learning Network Department of Defense Future work – Gold release for Feb 2003 – SDK – The Wall Copyright 2002 Milton Chen
- Slides: 50