use case: Build speak dataset automatically
How can I build up a dataset for my speaker recognition?
the challenge
Creating speaker datasets
There is a great deal of visual information in content that can’t be systematically mapped or stored traceably, as well as information about speakers at the audio level that is lost and doesn’t appear in the metadata. How can DeepVA support me in extracting speakers from my material in order to better structure my archive and make it searchable?
the solution
Deep Collector
The DeepVA Deep Colletor can be used to automatically create speaker datasets. This is done by reading out the lower thirds that contain the speaker’s name. If there is material that shows Barack Obama speaking, for example, and his name appears underneath, this information is automatically linked and transferred into a separate speaker dataset. The system can be used on-premise or in the cloud. If it is to be part of a workflow, integration is required. Via our RESTful API, it can be easily integrated in any existing system or workflow. Data protection requirements usually play a major role in this decision and should be considered.
What results can be obtained?
The AI recognizes the display of a lower third
Training data from the audio track is stored together with the speaker's name and any additional information in a database
Automated creation of a unique speaker database
Constant expansion of the training data by automatic data addition
faster data acquisition
COST REDUCTION
faster labelling
Automatically build a speaker dataset
Function overview
Face Index
Using Face Index, each person in video and image material can be assigned a number, allowing them to be translated into the metadata afterwards.
Contact us
Do you have any questions?
Related use cases
Take a look at our other use cases
Speeding up research time
How can I recognize people and objects more quickly and establish references between them?
Automatically build a speaker dataset
How can I build up a dataset for my speaker recognition?
Improving visual search
How can I efficiently find what I’m really looking for in my visual media material?