Open Source on the rise — how we are facing this change with DeepVA!

Gener­ative AI and the end of training data?

Artificial intel­li­gence has entered many areas of work, almost everyone is aware of this by now. Even if perhaps not everyone has had direct experience with it yet, it is at least omnipresent in the news and the media.

The much-mentioned gener­ative AI models in particular, such as ChatGPT, Stable Diffusion or Midjourney , make it possible to mass-produce creative output that previ­ously could only be produced by highly skilled profes­sionals. This ranges from stories, reports, and other text outputs to multi­modal content such as images, videos, and audio. More and more AI systems are changing the way content is produced, distributed, experi­enced, and also marketed.

But where does the training data come from to make these models intel­ligent?

What started as quiet voices at the turn of the year is now manifesting itself in tangible demands: The use of texts, images, and pieces of music as training material should be remunerated in the future. Every AI system must be trained with data. Often, however, this is data whose origin has not been clarified or whose use was not even legally covered — the AI is a so-called black box.

Now, various associ­a­tions of creative profes­sionals, DJU, Verdi or photog­ra­phers’ associ­a­tions are demanding more trans­parency and remuner­ation for the use of their works as training data in a joint statement.

The EU AI Act will also call for more trans­parency with regard to training data and poten­tially also remuner­ation of some kind. Whether it is more of a lump sum based on GEMA or a classic pay-per-use is not yet in any of the concepts. However, the demand is clear: Creative work must also be remunerated as a basis for AI models in the future.

Data security and quality of training data sets

The use of existing content for the training of AI systems is basically covered by the regula­tions on text and data mining in European copyright law and is still permis­sible so far, provided that the rights holder has not withheld them (§ 44 b and § 60 d UrhG). But how will the demands and lobbying affect the amendment?

Our partners at Adobe are overcoming this legal hurdle by sourcing and licensing training data from their own stock photo market­place.

The AI market is becoming broader and increas­ingly diverse

In addition to the rapid devel­opment of gener­ative AI, more sophis­ti­cated, high-performance open-source AI models are entering the market.

Growth-oriented companies are increas­ingly relying on open-source technologies, freely available and trans­parent. There is a real renais­sance of open-source devel­op­ments.

In a talk at the TU München, OpenAI CEO Sam Altman also confirmed this. Many simple and risk-free appli­cation scenarios of AI will be handled by open-source models in the future. At the latest since an internal Google document became known, it is clear that propri­etary technologies such as Google, Meta and Microsoft have recog­nized this threat. Although their models are still quali­ta­tively better, they are no longer alone in the market and open source is devel­oping rapidly and has many advan­tages.

For example, researchers and hobbyists have developed great language models that can compete with Open AI and Google’s offerings. Some of these are more scalable and allow these language models to be used on less powerful systems such as laptops or smart­phones.

This enables the democ­ra­ti­zation of AI for a wide range of users.

For example, language models can be person­alized on consumer hardware within a few hours. New knowledge can be integrated and tested in real-time — devel­opers can now implement and test AI in their software in an agile way, without big tech. Sharing and collab­o­ration — this is what the future will look like.

Never­theless, open source also poses challenges. They have to be adapted for specific, individual use cases and are sometimes more difficult to implement than propri­etary software. Mainte­nance and service are not available through the lively open-source community, but are elementary, especially in media production.

The goal is actually simple: reliable, maintained, and managed AI systems that work according to their own speci­fi­ca­tions in in-house environ­ments and with existing software. It should not matter whether the AI system is operated in the company’s own data center, in a private cloud, or in a public cloud solution.

The legal framework is also constantly changing in terms of data protection, which is why many users still rely on on-premise or hybrid systems, especially when dealing with personal data. Knowing that the data is with them, in case of doubt completely without access to the Internet, is the extra plus in security and data protection, not only for public author­ities.

Future-proof AI platform

How can companies be helped to realize their full potential through AI?

With DeepVA, we’ve created the easiest, most secure, and most trusted approach to AI. Not only have we developed simple AI models, but we offer an entire AI operating system that provides and connects AI capabil­ities across your organization’s workflows. In the future, the platform can also be under­stood as a managed environment for a plug & play deployment of open-source solutions. On which own AI models can be trained, training data can be built up, but also the advan­tages of managed open-source models can be used – without respon­si­bility for imple­men­tation, mainte­nance, and support.

Unlike purely cloud-based models, DeepVA is also offered for on-premise use, providing a high level of protection for your on-premises data as it is hosted on your own internal servers and can remain within the company.

The training data does not come from anywhere but is created along your media workflows. Whether from archive material or ingest, our tools help you build and manage a pool of training materials, automat­i­cally and super­vised by your staff. Through the Rest API, we can then use the data sets we have gained at various points along the value chain: In ad placement, news production, or in eliciting diversity in your content.

We want productive access to AI to be as simple, secure, and trans­parent as possible for you.

Share

Email
LinkedIn
Facebook
Twitter
Search

Table of Contents

latest AI news

Subscribe to our newsletter

Don’t worry, we reserve our newsletter for important news, so we only send a few updates once in a while. No spam!