Learningbased Practical Smartphone Eavesdropping with Built in Accelerometer

Learning-based Practical Smartphone Eavesdropping with Built -in Accelerometer NDSS 20

Motion sensor based speech recognition (State-of -the-art) Table or solid medium Smartphone with built-in motion sensor Loudspeaker 1. Low recognition accuracy (26%) 2. Low robustness (noise)

Motion sensor based speech recognition (State-of -the-art) • Sampling rate of accelerometer (加速度计) and gyroscope (陀螺仪) <= 200 Hz • Frequency of adult speech: 85 -255 Hz 85 100 255 Frequenc y (Hz) Speech band Capture ceiling (use Nyquist Sampling Theorem) 200 / 2 = 100 Hz Sampling ceiling Capture band TOO NARROW !

Intuition and Discovery 1. Utilize the motion sensor on the Same Smartphone as the speaker Motion Sensor Speaker Use the smartphone itself as the transmission medium

Intuition and Discovery 2. Sampling rate limitation does not exist any more 255 85 500 250 Speech band Capture ceiling (use Nyquist Sampling Theorem) 500 / 2 = 250 Hz Sampling ceiling Capture band Frequenc y (Hz)

Contribution and Workflow 1. Propose Accel. Eve, an accelerometer-based side channel attack against smartphone speakers. 2. First report an important observation that accelerometers on recent smartphones almost cover fundamental frequency band of adult speech. 3. Design a deep learning-based system to recognize and reconstruct speech signals only from accelerometer measurements.

Accelerometer and Gyroscope Capture accelerations Capture angular rates

Accelerometer and Gyroscope Response Accelerometer BETTER!

Core idea Exploit the accelerometer on a smartphone as a zero-permission “microphone” to eavesdrop on the speaker of the same device

Significance

Significance P: summed square magnitude S and N: two acceleration signals recorded with and without the presence of speech voice Significance threshold: 3 d. B Dominate Axis

Robustness 1. 2. 3. 4. Hardware distortion Acoustic noise Human activities Self-noise and vibration

Hardware distortion Can only affect the DC component Eliminate it to address the problem

Acoustic noise 1. Victim smartphone noise 2. Resonant (共振) noise 3. Remote caller noise ALL TOO SMALL to have impact!

Human activities / Self-noise and surface vibration High Pass Filter

Raw acceleration signals Linear interpolation High-pass filtering Segmentation System pipeline 线性插值：address the inequality of sampling rate 高通滤波：eliminate significant distortion 分割：separate each word or letter Signal-to-spectrogram conversion Convert into a spectrogram Spectrogram-Images Convert into an RGB picture

Raw acceleration signals Linear interpolation High-pass filtering Segmentation Signal-to-spectrogram conversion Spectrogram-Images

Spectrogram

Recognition Dense. Net Loss: Cross Entropy Optimizer: Momentum with scheduler

Reconstruction Encoder-Residual Blocks. Decoder Acceleration spectrogram-image to speech spectrogram Loss: L 1 Optimizer: Momentum with scheduler

Reconstruction Speech spectrogram Griffin-Lim Algorithm Speech signal Griffin-Lim Algorithm: An iterative algorithm for signal estimation from spectrograms

Experiment results Different recognition tasks Different user states

Experiment results Noisy environments Device scalability

Experiment results Reconstructi on Hot-word

Defense 1. Use lower sampling rate 2. More strict permission control

Summary and comments Pros: 1. An attractive real-world story 2. Comprehensive work: android, signal & system, image processing, deep learning, etc. 3. Sufficient experiments and results Cons: 1. Weak defense 2. App implementation

Thank you !