Automatically build a speaker dataset 

How can I build up a dataset for my speaker recognition? 

Radio Stations, Broadcaster, Media Archives

The Problem

Content not only contains a lot of information on the visual level that is not systematically mapped or stored in a comprehensible way. Also on the audio level, information about speakers is lost and does not appear in the metadata. How can DeepVA support me in extracting speakers from my material in order to better structure my archive and make it searchable?

The Solution

The DeepVA Deep Colletor can be used to automatically create speaker datasets. This is done by reading out lower thirds that contain the speaker’s name. If there is material that shows Barack Obama speaking, for example, and his name appears underneath, this information is automatically linked and transferred into a separate speaker dataset.

0 %

faster data acquisition

0 %


0 %

faster labelling

What results can be obtained?

The AI recognizes the display of a lower third 

It stores training data from the audio track together with the name and any additional information in the database. 

Automated creation of an unique speaker database 

Permanent Extension of the training data, due to automated addition.


Contact us

Do you have any questions?

Enter your email below and we will get back to you as soon as possible.

Related use cases

Take a look at our other use cases

Automatically build a face recognition training dataset 

How can I automatically build training datasets from my media footage? 

Read More