Freeing Speech Common Voice and Deep Speech Why

  • Slides: 45
Download presentation
Free(ing) Speech Common Voice and Deep Speech

Free(ing) Speech Common Voice and Deep Speech

Why? Common Voice Deep Speech

Why? Common Voice Deep Speech

Why?

Why?

How many languages have production quality, open speech recognition models?

How many languages have production quality, open speech recognition models?

Just one, English.

Just one, English.

Why?

Why?

Despite the existence of various open STT engines ZERO LANGUAGES have the 10 k

Despite the existence of various open STT engines ZERO LANGUAGES have the 10 k hours of open data needed for a production quality model.

Libri. Speech The largest open English corpus is Libri. Speech which is about 1

Libri. Speech The largest open English corpus is Libri. Speech which is about 1 k hours

Formosa Grand Challenge Corpus The largest open Mandarin corpus is the Formosa Grand Challenge

Formosa Grand Challenge Corpus The largest open Mandarin corpus is the Formosa Grand Challenge corpus which is about 400 hours

For Celtic languages. . . The availability of data sets of any size, free

For Celtic languages. . . The availability of data sets of any size, free or at cost , drops off significantly

Common Voice

Common Voice

Collect

Collect

Validate

Validate

Distribute

Distribute

And there’s more. . .

And there’s more. . .

And there’s still more. . .

And there’s still more. . .

Deep Speech

Deep Speech

Simple

Simple

A design principle we decided upon from the beginning is that the engine should

A design principle we decided upon from the beginning is that the engine should work for ALL LANGUAGES the only requirement being that one have training data.

Perfection is achieved, not when there is nothing more to add, but when there

Perfection is achieved, not when there is nothing more to add, but when there is NOTHING LEFT TO TAKE AWAY -Antoine de Saint. Exupéry

More controversially, we decided that training a new language should not require LINGUISTIC KNOWLEDGE

More controversially, we decided that training a new language should not require LINGUISTIC KNOWLEDGE

Deep Speech Architecture Softmax Layer Feedforward Layer Recurrent Layer Feedforward Layers Input Features

Deep Speech Architecture Softmax Layer Feedforward Layer Recurrent Layer Feedforward Layers Input Features

Open

Open

Deep Speech source code is released under Mozilla Public License 2. 0

Deep Speech source code is released under Mozilla Public License 2. 0

Deep Speech modelsare released under Mozilla Public License 2. 0

Deep Speech modelsare released under Mozilla Public License 2. 0

Ubiquitous

Ubiquitous

Currently we support nine different Programming Languages

Currently we support nine different Programming Languages

At the same time we also support nine different Platforms

At the same time we also support nine different Platforms

Why? Common Voice Deep Speech

Why? Common Voice Deep Speech

Why? Question: How many languages have production quality, open speech recognition models? Answer: Just

Why? Question: How many languages have production quality, open speech recognition models? Answer: Just one, English.

Common Voice ● Collect ● Validate ● Distribute

Common Voice ● Collect ● Validate ● Distribute

Deep Speech ● Simple ● Open ● Ubiquitous

Deep Speech ● Simple ● Open ● Ubiquitous

Free(ing) Speech

Free(ing) Speech