Open Source on the rise – how we are facing this change with DeepVA!

Generative AI and the end of training data?

Artificial intelligence has entered many areas of work, almost everyone is aware of this by now. Even if perhaps not everyone has had direct experience with it yet, it is at least omnipresent in the news and the media.

The much-mentioned generative AI models in particular, such as ChatGPT, Stable Diffusion or Midjourney , make it possible to mass-produce creative output that previously could only be produced by highly skilled professionals. This ranges from stories, reports, and other text outputs to multimodal content such as images, videos, and audio. More and more AI systems are changing the way content is produced, distributed, experienced, and also marketed.

But where does the training data come from to make these models intelligent?

What started as quiet voices at the turn of the year is now manifesting itself in tangible demands: The use of texts, images, and pieces of music as training material should be remunerated in the future. Every AI system must be trained with data. Often, however, this is data whose origin has not been clarified or whose use was not even legally covered – the AI is a so-called black box.

Now, various associations of creative professionals, DJU, Verdi or photographers’ associations are demanding more transparency and remuneration for the use of their works as training data in a joint statement.

The EU AI Act will also call for more transparency with regard to training data and potentially also remuneration of some kind. Whether it is more of a lump sum based on GEMA or a classic pay-per-use is not yet in any of the concepts. However, the demand is clear: Creative work must also be remunerated as a basis for AI models in the future.

Data security and quality of training data sets

The use of existing content for the training of AI systems is basically covered by the regulations on text and data mining in European copyright law and is still permissible so far, provided that the rights holder has not withheld them (§ 44 b and § 60 d UrhG). But how will the demands and lobbying affect the amendment?

Our partners at Adobe are overcoming this legal hurdle by sourcing and licensing training data from their own stock photo marketplace.

The AI market is becoming broader and increasingly diverse

In addition to the rapid development of generative AI, more sophisticated, high-performance open-source AI models are entering the market.

Growth-oriented companies are increasingly relying on open-source technologies, freely available and transparent. There is a real renaissance of open-source developments.

In a talk at the TU München, OpenAI CEO Sam Altman also confirmed this. Many simple and risk-free application scenarios of AI will be handled by open-source models in the future. At the latest since an internal Google document became known, it is clear that proprietary technologies such as Google, Meta and Microsoft have recognized this threat. Although their models are still qualitatively better, they are no longer alone in the market and open source is developing rapidly and has many advantages.

For example, researchers and hobbyists have developed great language models that can compete with Open AI and Google’s offerings. Some of these are more scalable and allow these language models to be used on less powerful systems such as laptops or smartphones.

This enables the democratization of AI for a wide range of users.

For example, language models can be personalized on consumer hardware within a few hours. New knowledge can be integrated and tested in real-time – developers can now implement and test AI in their software in an agile way, without big tech. Sharing and collaboration – this is what the future will look like.

Nevertheless, open source also poses challenges. They have to be adapted for specific, individual use cases and are sometimes more difficult to implement than proprietary software. Maintenance and service are not available through the lively open-source community, but are elementary, especially in media production.

The goal is actually simple: reliable, maintained, and managed AI systems that work according to their own specifications in in-house environments and with existing software. It should not matter whether the AI system is operated in the company’s own data center, in a private cloud, or in a public cloud solution.

The legal framework is also constantly changing in terms of data protection, which is why many users still rely on on-premise or hybrid systems, especially when dealing with personal data. Knowing that the data is with them, in case of doubt completely without access to the Internet, is the extra plus in security and data protection, not only for public authorities.

Future-proof AI platform

How can companies be helped to realize their full potential through AI?

With DeepVA, we’ve created the easiest, most secure, and most trusted approach to AI. Not only have we developed simple AI models, but we offer an entire AI operating system that provides and connects AI capabilities across your organization’s workflows. In the future, the platform can also be understood as a managed environment for a plug & play deployment of open-source solutions. On which own AI models can be trained, training data can be built up, but also the advantages of managed open-source models can be used – without responsibility for implementation, maintenance, and support.

Unlike purely cloud-based models, DeepVA is also offered for on-premise use, providing a high level of protection for your on-premises data as it is hosted on your own internal servers and can remain within the company.

The training data does not come from anywhere but is created along your media workflows. Whether from archive material or ingest, our tools help you build and manage a pool of training materials, automatically and supervised by your staff. Through the Rest API, we can then use the data sets we have gained at various points along the value chain: In ad placement, news production, or in eliciting diversity in your content.

We want productive access to AI to be as simple, secure, and transparent as possible for you.

Subscribe to our newsletter!

Share this article:
LinkedIn
Facebook
Twitter

don't miss the latest news!

Subscribe to our newsletter

Don’t worry, we reserve our newsletter for important news, so we only send a few updates once in a while. No spam!