Changelog June 2025: Text Recog­nition Upgrade, Content Moder­ation & Deep Live Hub Improve­ments

We’re happy to announce another update to the DeepVA composite AI platform, with improve­ments in both the Deep Media Analyzer and the Deep Live Hub. This release focuses on smarter automation, increased relia­bility, and more trans­parent user feedback during live opera­tions.

Deep Media Analyzer: Visual & Text Intel­li­gence Improve­ments

Content Moder­ation – Now With Shot-Based Automation

The Content Moder­ation Module, previ­ously intro­duced in our March release, has been further improved and is worth highlighting again. It now performs shot-based segmen­tation of videos to provide a more granular analysis of visual content. Each segment is analyzed for poten­tially sensitive content — such as violence, nudity, or substance use — and tagged according to ESRB Content Descriptors.

This enhancement enables platforms and reviewers to automate compliance and age rating workflows more effec­tively by focusing moder­ation efforts on clearly defined segments. If you require other content descriptors in the future, please do not hesitate to contact our team.

Text Recog­nition – Upgraded OCR Model

We’ve integrated a new Optical Character Recog­nition (OCR) model into our Text Recog­nition Module. While the core function­ality remains the same — extracting visible on-screen text with accurate timecodes — the upgraded model signif­i­cantly improves recog­nition accuracy and speed.

Users can now select between Chinese, Latin, and English character sets during setup, making it easier to customize the module for region-specific content. Currently, the module extracts four image samples per second. In a future release, the option to define the sampling rate will be made available to the user.

Why is separate OCR needed when the Visual Under­standing module is available?

Visual language models, such as Visual Under­standing, can interpret context and may make semantic correc­tions depending on the parameters. In contrast, the OCR model focuses exclu­sively on what is visually present. This means that it does not automat­i­cally correct missing or suspected text. However, it ensures the accurate and verifiable extraction of actual screen content. This is a major advantage in scenarios where accuracy and repro­ducibility are critical.

Additionally, OCR-based processing is signif­i­cantly faster and more resource-efficient than GPU-intensive visual under­standing models. This makes OCR the ideal choice for high-volume or real-time appli­ca­tions. One example is extracting screen text to suggest metadata tags or to retrieve infor­mation about the persons appearing in the content.

Bug Fixes
  • Resolving an issue with dictionary request timeout: Speaker recog­nition jobs would occasionally fail when the system was experi­encing high usage.

  • Support for .wav files has been fixed for speaker training: Speaker Identi­fi­cation Training now supports the audio/x‑wav MIME type, resolving failed training jobs using certain types of WAV file.

Deep Live Hub: More Control and Options in Real Time

Updated File Export via API after a finished Stream

To enable smarter processing and workflows, the system now allows you to receive the AMT file after a finished livestream. The AMT is our own devel­opment standard for livestream metadata, not only offering you timecoded transcription and trans­la­tions, but also additional metadata layers in the future. Read more about our AMT standard here.

Network Status Feedback

The system now provides visual feedback in the event of network connec­tivity issues, enabling users to troubleshoot the issue indepen­dently. Should the network be experi­encing a delay, an orange pop-up notifi­cation will appear in the header.

Stream Status Feedback in Live Editor

To provide better control and trans­parency during live sessions, the Live Editor now shows stream status indicators in the top bar.

Combined with the Network Status feedback, this will help users identify if a stream has uninten­tionally stopped, for example due to bandwidth inter­rup­tions, and react quickly to resolve the issue.

Restart Session Button for Subtitles

If a live stream session is inter­rupted and restarted on the same endpoint, the Live Editor now offers a “Restart Session” button.
This resets the subtitle editor to match the current live stream, ensuring that subtitle workflows stay in sync and prevent misaligned captions or timecodes.

ASR-Update – Faster and More Efficient

We’ve released an update to our Automatic Speech Recog­nition (ASR) engine, making it faster and more resource-efficient — partic­u­larly when handling multiple languages. This update lays the foundation for broader language coverage and smoother real-time perfor­mance in future updates.

More precise timecodes for the broadcast and the editor video timestamp

We separated the timers to show the timestamp of the video on the top left and the timestamp of the video that is currently being edited. This is in prepa­ration for the upcoming advances of more customization of the editing workflow.

Updates to models across all languages

All languages received an update to allow faster recog­nition and a signif­icant improvement of formatting for numbers and related characters such as % and currency symbols.

All DeepVA changelog updates are available here: https://docs.deepva.com/changelog/

Share

Email
LinkedIn
Facebook
Twitter
Search

Table of Contents

latest AI news

Subscribe to our newsletter

Don’t worry, we reserve our newsletter for important news, so we only send a few updates once in a while. No spam!