GritTec > Biometric Technologies > Voice Transcription, phoneme recognition

Home

Site Map

Contact

Home

Products

Solutions

News

Technology

Company

Downloads

PRODUCTS

Voice Biometrics:

Automatic text independent speaker identification

Automatic voice transcription

Speech Enhancement

Audio-Speech Effects

DSP Development Boards

For more information:

Download presentation of GritTec's General Researching

Send request via Online Request Form

Voice Transcription

DOC Downloads

Datasheet

Overview

Automatic Voice Transcription System is used for making phonetic transcription of a speech signal of unknown voice and it's language identification.

This system consists of kit by modules:

A speaker independent phoneme recognition system (40 phonemes in English);
Language identification system.

Module of phoneme recognition system is designed on the base triphone Hidden Markov Models (HMMs) and uses continuous-density phoneme model.

Developed algorithm of language identifications is based on double bi-gram model of language. Double bi-gram model allows to trace in the speech signal probability of transition between phonemes, with its further comparison with each the matrix of language from the database system. Matrix of language consists from the transition probabilities between phonemes of given language.

This system can be effectively used:

For automatic voice transcription of unknown voice by phonogram of telephone negotiations;
In security systems, where it's important to identify language of unknown voice and to do phonetic transcription of the voice;
Applications with high safety level, for instance, when access to digital information is limited by circle of given persons.

Designed system of language identification is trained for English language. It's planned to train system of language identification German, French, Chinese, Japanese and Russian languages.

Features

Operated with low SNR;
Fast adaptation to changing of channel distortion and external noises;
Speaker independent system;
Accuracy of phoneme recognition nearing 75% for train of TIMIT database (40 phonemes in English);
Reliability of language identification nearing 95% for speech signal recorded not less than 10 seconds;
Real time processing;
Easy integration with target applications.

Signal requirement

Signal format: 16-bits linear;
8 kHz sampling rate;
SNR, at least 10 db;
Frequency range: 300-3400 Hz or better.

Availability

DLL libraries for MS Windows;
PC demo for MS Windows is available on request.

For more information, please contact us via Online Request Form.