Sound Prof Jim Warren Some from Heim Chapter

Learning outcomes • Describe the basics of human hearing • Explain the difference between

Hearing • Provides information about environment: distances, directions, objects etc. • inner ear –

Timbre is harmonic structure • A sine wave is all energy on the ‘first

Hearing (cont) • Auditory system filters sounds • Can attend to sounds over background

What if…. • Your hearing is below average • You are deaf the human

Sound versus Visual Sound exists in time and over space, vision exists in space

Sound Interaction • Computer Output/Generation (input to human) • Non speech • Music •

Computer Output: Music • Can be pre-recorded or generated • Movies • Games •

Generating instruments • MIDI – musical instrument digital interface • Allows very compact storage

Generating music • Exciting area for artists • Everything from pseudo real to completely

Auditory Icons and Earcons • The difference between these two is subtle • Auditory

Auditory Icons and Earcons • Redundant Encoding • It aids memory by adding additional

Using Sound in Interaction Design • Learnability of the mapping between the icon and

Can you remember earcons? • How many? • How often do you hear them?

Speech Output • Eyes-free operation • Alternative output channel • Good for checking your

Speech Output • Recorded • Menu choices for telephone systems • Books or other

Sound Input • Speech • Music • Environmental 19

Speech Recognition • Two distinct applications: • Transaction: giving commands, issuing queries, making menu

Practical speech recognition • May mix transactions with transcription 1 -21

Searching Speech and Audio • Sound files do not afford easy opportunities for indexing

‘Environmental’ input example • Gunfire detection and location • Shotspotter. com now deployed in

Summary • Describe the basics of human hearing • Detects pitch, loudness and timbre

Slides: 24

Download presentation

Sound Prof Jim Warren Some from Heim Chapter 13

Learning outcomes • Describe the basics of human hearing • Explain the difference between visual and auditory interaction • Describe the classes and subclasses of sound output and the attributes of each • Describe the classes and subclass of sound input and recognition and attributes of each 2

Hearing • Provides information about environment: distances, directions, objects etc. • inner ear – protects inner and amplifies sound – transmits sound waves as vibrations to inner ear – chemical transmitters are released and cause impulses in auditory nerve • Sound • pitch • loudness • timbre – sound frequency – amplitude – type or quality • Humans can hear frequencies from 20 Hz to 15 k. Hz • Less accurate distinguishing high frequencies than low • Higher frequency hearing disappears as you get older the human 1 • outer ear • middle ear 3 • Physical apparatus:

Sound is vibration 1 -4

Timbre is harmonic structure • A sine wave is all energy on the ‘first harmonic’ or ‘fundamental’ frequency (sounds like O) • Other shapes of sound wave come from a distribution of energy into other multiples of the fundamental 1 -5 http: //www. sfu. ca/sonic-studio/handbook/Triangle_Wave. html

Hearing (cont) • Auditory system filters sounds • Can attend to sounds over background noise • Hearing is involuntary • Suddenly ‘grabs’ attention before we think • And some sounds are harder to ignore (e. g. baby crying) • ‘Listening’ is voluntary (largely) • Whether we choose to process the meaning, especially if the sound is language (although something like hearing your name is pretty well involuntary) the human 1 • Hearing aids disrupt this filtering 6 • The cocktail party phenomenon http: //www. spring. org. uk/2009/03/the-cocktail-party-effect. php

What if…. • Your hearing is below average • You are deaf the human 1 • Phone call/ text message? 7 • You are in a noisy environment • Night clubbing

Sound versus Visual Sound exists in time and over space, vision exists in space and over time. (Gaver, 1989) - Sound is only there when it is playing/made - Vision is there until it is replaced 8

Sound Interaction • Computer Output/Generation (input to human) • Non speech • Music • Audio Icons and Earcons • Speech • Computer Input/Recognition • Speech • Non speech • Environmental • Music 9

Computer Output: Music • Can be pre-recorded or generated • Movies • Games • Immersive experiences • Activates your brain in a different way from language • Acts almost entirely independently from hand-to-eye processing 10

Generating instruments • MIDI – musical instrument digital interface • Allows very compact storage of music as the tones, durations and choice of synthesized instruments • Generally very ‘computer generated’ timbre/feel: http: //www. midiworld. com/files/1128/ • Potential for much more sophisticated synthesized sounds for realistic or ‘virtual’ instruments • Although the physics of real instruments can be quite complex • http: //newt. phys. unsw. edu. au/jw/violintro. html Violin cross-section • Virtual: http: //www. kurzweilai. net/instrument-of-the-future 1 -11

Generating music • Exciting area for artists • Everything from pseudo real to completely abstract • There are Jazz music generators that only skilled people can differentiate from actual musicians. • Serato – dj software (www. serato. com) • Auckland company doing fantastic things • Several UOA grads there 12

Auditory Icons and Earcons • The difference between these two is subtle • Auditory icons: emphasis on ‘natural’ sounds and metaphor with real world; caricatures of naturally occurring sounds • e. g. sound of filling a bottle with water to match moving a large file • Earcons: ‘Artificial’ sounds (generated) • e. g. more abstract metaphorical relationship to action or purely a convention (like corporate colour schemes) Some earcons Windows hardware fail 13 insert remove

Auditory Icons and Earcons • Redundant Encoding • It aids memory by adding additional associations. • Can alert without interrupting (well, at least leaves the visual field clear) • An alterative communications channel. • Positive/Negative Feedback • Auditory alarms might be crucial to the safe operation of computer-operated machinery or mission-critical environments • But too many alarms may be: • Annoying • Ignored 14

Using Sound in Interaction Design • Learnability of the mapping between the icon and the object represented • “Oink” and “bow wow” have high articulatory directness (low distance between ‘appearance’ and function [or denotation]) • A swishing sound accompanying a paintbrush tool also has high articulatory directness • A system beep, on the other hand, carries no information about what it denotes • But we may quickly learn to associate it with an error • And the square wave structure is a bit unpleasant, so it’s better for an error than for feedback on success 15

Can you remember earcons? • How many? • How often do you hear them? • Can you intuitively tell what these mean? On Misrecognized Off Disambiguate Sleep 16

Speech Output • Eyes-free operation • Alternative output channel • Good for checking your essays • But navigation is hard • Backtracking • Finding location of a particular thing 17

Speech Output • Recorded • Menu choices for telephone systems • Books or other multimedia experiences • Generated (‘text-to-speech’, TTS) • Synthesizer built into Office • See http: //office. microsoft. com/en-nz/powerpoint-help/using-the-speaktext-to-speech-feature-HA 102066711. aspx • Google Translate has a nice one too (better, I think) • Still sound a little artificial • Best synthesizers have a physical model of the tongue and breath to give natural flow between phonemes 18

Sound Input • Speech • Music • Environmental 19

Speech Recognition • Two distinct applications: • Transaction: giving commands, issuing queries, making menu selections • Transcription: ‘typing’ a document by voice • Transaction • Telephone menu systems • Choose from a limited number of options, works OK • Automatic speech recognition (ASR) • Built into operating systems • Siri (i. Phone) and Android are somewhat usable • This is a triumph of Artificial Intelligence • Difficult, ongoing research problem • Not just about recognizing phonemes (language sounds) but also finding the ‘right’ interpretation (helped e. g. by statistical word triple frequencies, but better if AI is ‘deeper’) 20

Practical speech recognition • May mix transactions with transcription 1 -21

Searching Speech and Audio • Sound files do not afford easy opportunities for indexing and searching • Speech recognition can be used to transcribe speech files and create transcripts that can be searched like any other text file • So long as recognition accuracy is ok, which it isn't at the moment • Tune identification apps • Hum a bit of the tune and it tells you what it is! (e. g. Midomi, Shazam 22 • See http: //www. guidingtech. com/8572/identify-song-byhumming-tune-using-3 -web-apps/ )

‘Environmental’ input example • Gunfire detection and location • Shotspotter. com now deployed in US cities • Decides that a sound is a gunshot • Detection algorithm identifies signal for human confirmation • Triangulates position based on multiple sensors in a neighbourhood • Police notified 1 -23

Summary • Describe the basics of human hearing • Detects pitch, loudness and timbre at 20 Hz to 15 k. Hz • Largely involuntary • Explain the difference between visual and auditory interaction • Sound is transitory but supports hands (and eyes) free interaction • Describe the classes and subclasses of sound output and the attributes of each • Non speech • Music • Earcons – provide feedback • Speech – recorded or synthesized • Describe the classes and subclass of sound input and recognition and attributes of each • Speech – becoming of good practical quality • Transaction • Transcription • Others: music, environmental – a few applications emerging 24