We’re happy to announce another update to the DeepVA composite AI platform, with improvements in both the Deep Media Analyzer and the Deep Live Hub. This release focuses on smarter automation, increased reliability, and more transparent user feedback during live operations.
Deep Media Analyzer: Visual & Text Intelligence Improvements
Content Moderation – Now With Shot-Based Automation
The Content Moderation Module, previously introduced in our March release, has been further improved and is worth highlighting again. It now performs shot-based segmentation of videos to provide a more granular analysis of visual content. Each segment is analyzed for potentially sensitive content — such as violence, nudity, or substance use — and tagged according to ESRB Content Descriptors.
This enhancement enables platforms and reviewers to automate compliance and age rating workflows more effectively by focusing moderation efforts on clearly defined segments. If you require other content descriptors in the future, please do not hesitate to contact our team.
Text Recognition – Upgraded OCR Model
We’ve integrated a new Optical Character Recognition (OCR) model into our Text Recognition Module. While the core functionality remains the same — extracting visible on-screen text with accurate timecodes — the upgraded model significantly improves recognition accuracy and speed.
Users can now select between Chinese, Latin, and English character sets during setup, making it easier to customize the module for region-specific content. Currently, the module extracts four image samples per second. In a future release, the option to define the sampling rate will be made available to the user.
Why is separate OCR needed when the Visual Understanding module is available?
Visual language models, such as Visual Understanding, can interpret context and may make semantic corrections depending on the parameters. In contrast, the OCR model focuses exclusively on what is visually present. This means that it does not automatically correct missing or suspected text. However, it ensures the accurate and verifiable extraction of actual screen content. This is a major advantage in scenarios where accuracy and reproducibility are critical.
Additionally, OCR-based processing is significantly faster and more resource-efficient than GPU-intensive visual understanding models. This makes OCR the ideal choice for high-volume or real-time applications. One example is extracting screen text to suggest metadata tags or to retrieve information about the persons appearing in the content.
Bug Fixes
-
Resolving an issue with dictionary request timeout: Speaker recognition jobs would occasionally fail when the system was experiencing high usage.
-
Support for .wav files has been fixed for speaker training: Speaker Identification Training now supports the audio/x‑wav MIME type, resolving failed training jobs using certain types of WAV file.
Deep Live Hub: More Control and Options in Real Time
Updated File Export via API after a finished Stream
Network Status Feedback
The system now provides visual feedback in the event of network connectivity issues, enabling users to troubleshoot the issue independently. Should the network be experiencing a delay, an orange pop-up notification will appear in the header.
Stream Status Feedback in Live Editor
To provide better control and transparency during live sessions, the Live Editor now shows stream status indicators in the top bar.
Combined with the Network Status feedback, this will help users identify if a stream has unintentionally stopped, for example due to bandwidth interruptions, and react quickly to resolve the issue.
Restart Session Button for Subtitles
If a live stream session is interrupted and restarted on the same endpoint, the Live Editor now offers a “Restart Session” button.
This resets the subtitle editor to match the current live stream, ensuring that subtitle workflows stay in sync and prevent misaligned captions or timecodes.
ASR-Update – Faster and More Efficient
More precise timecodes for the broadcast and the editor video timestamp
Updates to models across all languages
All DeepVA changelog updates are available here: https://docs.deepva.com/changelog/