vidispine und deepva kooperation

Partnership with Arvato Bertelsmann and VidiNet

AI is becoming more and more prevalent in our everyday lives, but less than 6% of companies in the media and enter­tainment industry are using AI in the form of computer vision, voice recog­nition, or natural language processing.

Often, the current AI offering does not provide the added value that was hoped for. Thus, it is important to recognise content in visual media types such as images, videos, or livestreams that are not part of a pre-trained AI, but rather meet the specific needs of the company. Constant adaptation of AI models is required to meet the condi­tions in media companies.

To train AI models without adjust­ments, the necessary “tools” must already be made available to the user intuitively and trans­par­ently in the system or appli­cation, so that companies or their users have the possi­bil­ities to adjust pretrained AI models precisely and uncom­pli­catedly to their require­ments.

Pre-trained AI models

Pre-trained models, as we know them from large image recog­nition service providers, only return the recog­nition scope that has been trained to the AI model via training data. Classi­fi­cation of general image content can be imple­mented quickly, but media- and company-specific use cases cannot be fully covered. Companies are looking for solutions that enable customization or individ­u­al­ization of AI models while ensuring high data security.

This means recog­nizing people, objects or landmarks in images and videos that are not part of a pre-trained model. Thus, the recog­nition perfor­mance of content from media data with pre-trained models is very limited.

Individual AI models

Building custom AI models used to take a long time and it was very complex. Complex algorithms are imple­mented and fed with large training databases. This data has to be collected, managed in a struc­tured way, kept up-to-date and described precisely. Before AI models can be created from this data and used produc­tively, their perfor­mance must be exten­sively validated with independent test data. For media companies, this challenge seems to be difficult to overcome so far. DeepVA’s mission is to make the potential of AI acces­sible to any company without requiring prior knowledge.

AI can be easily integrated into existing systems and used intuitively to optimize everyday workflows when working with media data. This includes managing data, building your own training data, and discovery analysis of your own content.

Partnership between Vidispine and DeepVA

DeepVA first got into conver­sation with Arvato Systems at FKTG’s AI panel at Hamburg Open 2020. A follow-up meeting turned out to be a stroke of luck in that both parties were fasci­nated by the idea of automating media workflows to the maximum through AI. Together, the Vidispine team and DeepVA posed the central question of how the right AI tools could be brought directly into the user’s familiar environment. Users should also be able to monitor and control the recog­nition of content in their media data, the creation of their own AI models, and the quality assurance of their training data.

Deepva integration in vidispine

Advan­tages of integrating DeepVA into Vidispine’s MAM system:

  • The management and use of training data in the familiar MAM environment.
  • Intuitive user interface with integrated training appli­cation
  • Training classes can be easily organized into datasets and trained with one click via API call
  • The trained AI model is available for the analysis of images and videos after a few seconds
  • Analysis with custom models at the push of a button in the UI
  • Timecode-accurate navigation in the video material and thus best overview of analyzed objects or faces
  • Face Indexing: automatic recog­nition of faces, that are not known to the AI model, are provided with code, manual description and will subse­quently described automat­i­cally in the entire system (finger­print)
  • The integration of AI training into the known MAM system is a crucial step to improve the customer experience

Where does the training data come from?

Training data can be extracted using Face Dataset Creation. Here, training data is automat­i­cally extracted from videos and livestreams by linking names inserted in the image to the corre­sponding faces and storing them in a dataset. Training data thus offers a high degree of customiz­ability and precise and quali­tative analysis of visual media data. The so-called Face Finger­printing or Face Indexing offers the possi­bility to store unrec­og­nized faces with a specific index, so that they can be marked later in your entire database. They are manually named once and automat­i­cally applied to all faces recog­nized in the same way.

In summary, the joint solution of Arvato Vidispine and DeepVA provides users in the MAM system a variety of AI tools at his disposal that can be applied immedi­ately without any prior technical knowledge. The user can build their own individ­u­alized AI models with multiple options of training. The appli­cation of these models and the associated analysis of image and video files provides a more detailed and higher quality keywording, leading to improved search­a­bility of media data.

Share

Email
LinkedIn
Facebook
Twitter
Search

Table of Contents

latest AI news

Subscribe to our newsletter

Don’t worry, we reserve our newsletter for important news, so we only send a few updates once in a while. No spam!