Voice Localization using Nearby Wall Reflections YuLin Wei
Voice Localization using Nearby Wall Reflections Yu-Lin Wei, Sheng Shen, Daguan Chen, Zhijian Yang, Romit Roy Choudhury
Many variants of the problem … many Ao. A Algorithms Delay-Sum MUSIC GCC-PHAT ESPRIT JADE
Many variants of the problem … many Ao. A Algorithms Delay-Sum MUSIC GCC-PHAT ESPRIT JADE But still an open problem for MULTI-ECHO environment
This paper 1. A new Ao. A algorithm 2. Application: voice localization
What makes the problem challenging?
What makes the problem challenging? 1 N # of sources
What makes the problem challenging? K 1 # of echoes modeled N # of sources 1
What makes the problem challenging? French K Latin # of echoes modeled 1 N # of sources German 1 Source signals are unknown!
What makes the problem challenging? French K Latin # of echoes modeled Holy Grail (Very challenging) 1 N # of sources German 1 Goal: separate all N x K signal Ao. As.
Existing solutions have made significant progress … French K Latin # of echoes modeled Holy Grail (Very challenging) 1 N # of sources German 1
Existing solutions have made significant progress … French K Latin # of echoes modeled Holy Grail (Very challenging) 1 N # of sources MUSIC German (Source uncorrelated) 1 Assumes signals are un-correlated
Existing solutions have made significant progress … French K Latin # of echoes modeled Holy Grail (Very challenging) 1 N # of sources German GCC-PHAT MUSIC (Line-of-Sight Ao. A) (Source uncorrelated) 1 Only estimate the Line-of-Sight Ao. A
This paper: Voice Localizaion (Vo. Loc) French K Latin 1 # of echoes modeled Holy Grail Vo. Loc (Very challenging) N # of sources German GCC-PHAT MUSIC (Line-of-Sight Ao. A) (Source uncorrelated) 1 Estimating multiple, fully correlated echoes
Opportunities on Ao. A
Conventional Ao. A algorithm
Conventional Ao. A algorithm With (infinite number of) echoes … ΔT’s are getting mixed Impossible to decouple each ΔT from the mixture
Key opportunity – Human speech has many pauses “Alexa, what time is it? ” Time (second)
Pause opportunity ABCDEFG … Voice Samples
Pause opportunity Path #3 (2 nd Echo) abc… Path #2 (1 st Echo) AB C… Path #1 (Direct Path) AB C… abc… … abc
Time A B C D E F G H I J K L M N ⋮ A B C D E F G H I J K L ⋮ A B C D E F G H I J ⋮
Time A B C D E F G H I J K L M N ⋮ A B C D E F G H I J K L ⋮ A B C D E F G H I J ⋮
Time A B C D E F G H I J K L M N ⋮ a b c d a e b ⋮ ⋮ A B C D E F G H I J K L ⋮ a b c d e a f b ⋮ ⋮ A B C D E F G H I J ⋮ a b c d e f a g b ⋮ ⋮
Time A B C D E F G H I J K L M N ⋮ a b c d a e b ⋮ ⋮ A B C D E F G H I J K L ⋮ a b c d e a f b ⋮ ⋮ A B C D E F G H I J ⋮ a b c d e f a g b ⋮ ⋮
Iterative Align and Cancel (IAC) algorithm
Raw Signal A B C D E … A B C D … … A B C D E F Aligned Signal Cancel Path 1 Residue
A B C D E F G H I J K a b c d e f g … … A B C D E F G Ha I b J c Kd L e Mf Raw Signal
Raw Signal … … a b c d e f g … … A B C D E F G H I J K A B C D E a F b Gc Hd I e J f K g L e Mf Aligned Signal Cancel Path 1 -a -b -c a-d b-e c-f d-g … A B C D E F G Ha I b J c Kd L e Mf Residue
A B C D E F G H I J Cancel Path 2 a b c d e f A B C D-A E-B F-C G-D … … Raw Signal A B C D E F G Ha I b J c Kd L e Mf … … a b c d e f g … … A B C D E F G H I J K … A B C D E F G Ha I b J c Kd L e Mf Aligned Signal Residue
… … a b c d e f g A B C D-A E-B F-C G-D … … … A B C D E F G H I J K … A B C D E F G Ha I b J c Kd L e Mf -a -b -c a-d b-e c-f d-g Raw Signal Residues Final Residue
Objective Function is not Convex, but manageable … Final Residue 2 nd path’s scale �� But what happens with 3 rd, 4 th, and K incoming paths?
el c n Ca d h an Pat ign 1 st l A Align and Cancel 2 nd Path Linear Combination Al ign 3 r and d Pa Can th ce l Final Residue Raw Signal Residues
This paper 1. A new Ao. A algorithm 2. Application: voice localization
Alexa, turn on the light
Alexa, add “urgent” to groceries Do you mean “detergent”?
Can Amazon Alexa localize the user from her voice command • • • Require 2 Ao. As Require wall config Reverse triangulation Vo. Loc Part II: How to find the wall distance/ orientation
Vo. Loc estimates wall geometry using past voice commands By assuming one stable wall, models echoes from the wall, and solves a minimization function.
Implementation and Evaluation
Seeed Studio 6 -Mic Circular Mic Array + Raspberry Pi (To obtain raw acoustic samples) 6 Microphones Raspberry Pi
Comparison with existing algorithms Vo. Loc can improve the Ao. A estimates of at least 2 echoes
Overall location accuracy Median location error: 0. 44 meters
Vo. Loc Summary • • • Iterative align and cancel (IAC) algorithm Indoor user localization from voice signals Single microphone array (Alexa) as the receiver Reverse triangulate with few Ao. As Median error < 50 cm
Much more in the paper Shen Daguan Zhijian Yu-Lin Romit
- Slides: 44