Learningbased Practical Smartphone Eavesdropping with Built in Accelerometer
Learning-based Practical Smartphone Eavesdropping with Built -in Accelerometer NDSS 20
Motion sensor based speech recognition (State-of -the-art) Table or solid medium Smartphone with built-in motion sensor Loudspeaker 1. Low recognition accuracy (26%) 2. Low robustness (noise)
Motion sensor based speech recognition (State-of -the-art) • Sampling rate of accelerometer (加速度计) and gyroscope (陀螺仪) <= 200 Hz • Frequency of adult speech: 85 -255 Hz 85 100 255 Frequenc y (Hz) Speech band Capture ceiling (use Nyquist Sampling Theorem) 200 / 2 = 100 Hz Sampling ceiling Capture band TOO NARROW !
Intuition and Discovery 1. Utilize the motion sensor on the Same Smartphone as the speaker Motion Sensor Speaker Use the smartphone itself as the transmission medium
Intuition and Discovery 2. Sampling rate limitation does not exist any more 255 85 500 250 Speech band Capture ceiling (use Nyquist Sampling Theorem) 500 / 2 = 250 Hz Sampling ceiling Capture band Frequenc y (Hz)
Contribution and Workflow 1. Propose Accel. Eve, an accelerometer-based side channel attack against smartphone speakers. 2. First report an important observation that accelerometers on recent smartphones almost cover fundamental frequency band of adult speech. 3. Design a deep learning-based system to recognize and reconstruct speech signals only from accelerometer measurements.
Accelerometer and Gyroscope Capture accelerations Capture angular rates
Accelerometer and Gyroscope Response Accelerometer BETTER!
Core idea Exploit the accelerometer on a smartphone as a zero-permission “microphone” to eavesdrop on the speaker of the same device
Significance
Significance P: summed square magnitude S and N: two acceleration signals recorded with and without the presence of speech voice Significance threshold: 3 d. B Dominate Axis
Robustness 1. 2. 3. 4. Hardware distortion Acoustic noise Human activities Self-noise and vibration
Hardware distortion Can only affect the DC component Eliminate it to address the problem
Acoustic noise 1. Victim smartphone noise 2. Resonant (共振) noise 3. Remote caller noise ALL TOO SMALL to have impact!
Human activities / Self-noise and surface vibration High Pass Filter
Raw acceleration signals Linear interpolation High-pass filtering Segmentation System pipeline 线性插值:address the inequality of sampling rate 高通滤波:eliminate significant distortion 分割:separate each word or letter Signal-to-spectrogram conversion Convert into a spectrogram Spectrogram-Images Convert into an RGB picture
Raw acceleration signals Linear interpolation High-pass filtering Segmentation Signal-to-spectrogram conversion Spectrogram-Images
Spectrogram
Recognition Dense. Net Loss: Cross Entropy Optimizer: Momentum with scheduler
Reconstruction Encoder-Residual Blocks. Decoder Acceleration spectrogram-image to speech spectrogram Loss: L 1 Optimizer: Momentum with scheduler
Reconstruction Speech spectrogram Griffin-Lim Algorithm Speech signal Griffin-Lim Algorithm: An iterative algorithm for signal estimation from spectrograms
Experiment results Different recognition tasks Different user states
Experiment results Noisy environments Device scalability
Experiment results Reconstructi on Hot-word
Defense 1. Use lower sampling rate 2. More strict permission control
Summary and comments Pros: 1. An attractive real-world story 2. Comprehensive work: android, signal & system, image processing, deep learning, etc. 3. Sufficient experiments and results Cons: 1. Weak defense 2. App implementation
Thank you !
- Slides: 27