Automatic text independent speaker identification technology is intended for automatic identification of a speech signal of unknown
voice by paired comparing with 'speaker cards', existing in the database of system. Comparison is conducted by calculation of
'true' and 'false' spots (spots of correspondences) and with the further determination of probability of Acceptance and
Rejection. Each speaker card besides information about current speaker (first, last name, birthday, gender, and so on) is
characterized by examples of audio files with the speaker voice.
Each example of audio file is described by the acoustic voice model, error model (FAR, FRR, EER) and noise model,
describing surrounding noises and channel distortion, existing in audio file (see Fig.1). For the full description of each
speaker card it is sufficiently 1 - 3 audio files with the speaker voice, recorded for different telephone lines and duration
of each one not less than 60 sec.
In algorithmic part of speaker identification technology it was added tone and music detectors. Detector of tone signals
is intended for detection of DTMF, CPTD, UMTD and other similar signals. Detector of music is intended for detection of
musical accompanying, playing during waiting of connection between telephone speakers.
Technology of building statistical voice models and its re-estimation
(with S-states) was updated in speaker card module. Comparative analysis has shown that using the updating voice models greatly enlarges account of "true" and "false" spots and
increases probability of definition of Acceptance and Rejection.
Testing of updating speaker identifications technology was conducted on the real telephone records and on specialized
sound base LDC96S61 of English telephone records given by LDC consortium (Linguistic Data Consortium).
Renovations and optimization of architecture of program identification
modules for using in multi-threading mode were made in software code. At the renovation of program modules architecture
of modules was structured on the functionality of each modules. Developing architecture of program modules supposes buildings a client-server applications and identification server by end developers. In identification server
identification of unknown speaker is made in the threading mode - independently for each other.
At present automatic speaker identification technology is available for Intel platform as SDK library with examples of MS VC++ projects.
Fig.1. Structure of speaker card
Glossary:
FRR - False Rejection Rate;
FAR - False Acceptance Rate;
EER - Error Equal Rate: EER = FRR = FAR;
DTMF - Dual Tone Modulated Frequency;
UMTD - Universal Multy Tone Detection;
CPTD - Call Progress Tone Detection.
About GritTec
GritTec Laboratory specializes on research and development of algorithms and technologies in the field of
speech and audio processing. GritTec's research is focused on speech enhancement, speech concealment, voice
biometric, speech recognition, speech synthesis and other speech and audio technologies.
Url: http://www.grittec.com