AI is becoming more and more prevalent in our everyday lives, but less than 6% of companies in the media and entertainment industry are using AI in the form of computer vision, voice recognition, or natural language processing.
Often, the current AI offering does not provide the added value that was hoped for. Thus, it is important to recognise content in visual media types such as images, videos, or livestreams that are not part of a pre-trained AI, but rather meet the specific needs of the company. Constant adaptation of AI models is required to meet the conditions in media companies.
To train AI models without adjustments, the necessary “tools” must already be made available to the user intuitively and transparently in the system or application, so that companies or their users have the possibilities to adjust pretrained AI models precisely and uncomplicatedly to their requirements.
Pre-trained AI models
Pre-trained models, as we know them from large image recognition service providers, only return the recognition scope that has been trained to the AI model via training data. Classification of general image content can be implemented quickly, but media- and company-specific use cases cannot be fully covered. Companies are looking for solutions that enable customization or individualization of AI models while ensuring high data security.
This means recognizing people, objects or landmarks in images and videos that are not part of a pre-trained model. Thus, the recognition performance of content from media data with pre-trained models is very limited.
Individual AI models
Building custom AI models used to take a long time and it was very complex. Complex algorithms are implemented and fed with large training databases. This data has to be collected, managed in a structured way, kept up-to-date and described precisely. Before AI models can be created from this data and used productively, their performance must be extensively validated with independent test data. For media companies, this challenge seems to be difficult to overcome so far. DeepVA’s mission is to make the potential of AI accessible to any company without requiring prior knowledge.
AI can be easily integrated into existing systems and used intuitively to optimize everyday workflows when working with media data. This includes managing data, building your own training data, and discovery analysis of your own content.
Partnership between Vidispine and DeepVA
DeepVA first got into conversation with Arvato Systems at FKTG’s AI panel at Hamburg Open 2020. A follow-up meeting turned out to be a stroke of luck in that both parties were fascinated by the idea of automating media workflows to the maximum through AI. Together, the Vidispine team and DeepVA posed the central question of how the right AI tools could be brought directly into the user’s familiar environment. Users should also be able to monitor and control the recognition of content in their media data, the creation of their own AI models, and the quality assurance of their training data.
Advantages of integrating DeepVA into Vidispine’s MAM system:
- The management and use of training data in the familiar MAM environment.
- Intuitive user interface with integrated training application
- Training classes can be easily organized into datasets and trained with one click via API call
- The trained AI model is available for the analysis of images and videos after a few seconds
- Analysis with custom models at the push of a button in the UI
- Timecode-accurate navigation in the video material and thus best overview of analyzed objects or faces
- Face Indexing: automatic recognition of faces, that are not known to the AI model, are provided with code, manual description and will subsequently described automatically in the entire system (fingerprint)
- The integration of AI training into the known MAM system is a crucial step to improve the customer experience
Where does the training data come from?
Training data can be extracted using Face Dataset Creation. Here, training data is automatically extracted from videos and livestreams by linking names inserted in the image to the corresponding faces and storing them in a dataset. Training data thus offers a high degree of customizability and precise and qualitative analysis of visual media data. The so-called Face Fingerprinting or Face Indexing offers the possibility to store unrecognized faces with a specific index, so that they can be marked later in your entire database. They are manually named once and automatically applied to all faces recognized in the same way.
In summary, the joint solution of Arvato Vidispine and DeepVA provides users in the MAM system a variety of AI tools at his disposal that can be applied immediately without any prior technical knowledge. The user can build their own individualized AI models with multiple options of training. The application of these models and the associated analysis of image and video files provides a more detailed and higher quality keywording, leading to improved searchability of media data.