Last updated: February 26, 2026

Changelog February 2026: New Sign-on, faster and smarter Speech Recognition & improved Visual Understanding

This February release introduces major improvements across DeepVA, with a strong focus on live transcription quality, latency reduction, and more predictable AI output. From ultra-low-latency roll-up captions in the Deep Live Hub to structured Visual Understanding in the Deep Media Analyzer, this update lays the groundwork for more workflow automation and fewer latency.

Earlier this month, we introduced single sign-on, unifying the DeepVA platform and the Deep Live Hub under a single login and bringing the two platforms closer together. Using your DeepVA account, you can now access the Deep Live Hub and vice versa. This enables you to run quick tests, evaluate our live system or quickly build a POC, since the platform is already included in your subscription. Try it out!

Deep Live Hub Updates

Now is a great time to test out our Deep Live Hub, since we have introduced numerous updates to improve working with custom dictionaries, decrease latency and offer rolling subtitles, as well as enhance the model’s overall quality.

Dictionaries & “Sounds Like” – Smarter ASR Assistance

Dictionaries act as an additional knowledge source for the ASR engine whenever transcription confidence of the model is low. With this release, dictionary handling has been significantly expanded through the introduction of “Sounds Like” entries.

“Sounds Like” entries allow you to define alternative phonetic representations of a dictionary term, helping the ASR engine recognize how a word might be spoken while enforcing how it should appear in the transcript.

For example, you may want the word CEO to always be written without dots. Spoken audio might sound like “See E Oh” or “C.E.O.”. By creating a dictionary entry for CEO and adding See E Oh and C.E.O. as “Sounds Like” variants behind, the ASR engine will consistently output CEO in the finished transcript. This feature is particularly valuable when working with:

Names from foreign languages

Brand or product names, abbreviations, and acronyms

Technical or domain-specific terminology

Dictionary management remains available via the Dictionaries & Glossaries section and supports manual editing as well as bulk CSV imports with “Sounds Like” definitions. In the future, we will also implement a way of using an LLM to compile the dictionaries. See the full documentation here

Roll-Up Captions – Ultra-Low Latency Live Subtitles

This release introduces roll-up captions as feature in the Deep Live Hub, dramatically reducing latency for live subtitles while maintaining readability. Roll-up captions can now be configured directly in the ASR settings with 1, 2, 3, or 4‑second roll-up intervals for the HLS output.

Unlike the traditional live subtitles—which only appear after a full sentence has been completed—roll-up captions stream spoken content word by word, almost simultaneously with the speaker. As new words arrive, the current subtitle line grows dynamically, while completed sentences smoothly move upward.

Roll-up captions are now available as an HLS output, making them suitable for broadcast and professional live workflows. One example of use case is integration with Steam Engineering’s SDI Teletext inserter, enabling near real-time subtitle insertion for linear broadcast environments. If you are using other output formats or the live editor, we do not support roll-up captions yet.

By combining faster ASR processing with rolling caption output, Deep Live Hub can now display spoken content almost in sync with live speech, significantly improving accessibility and viewer experience.

Deep Live Hub — Further Improvements

Updated ASR Engine: Improved quality and speed (now ultra-low latency)

Improved stability during live transcription
Minor bug fixes

DeepVA Platform Updates

Our DeepVA platform has also been updated, and all Visual Understanding updates that are already available via API are now included in the UI. Many new models have been added also for our API customers to evaluate.

Visual Understanding – Structured Output & Expanded Model Choice

The Visual Understanding Module now supports structured output also in the UI, making AI-driven image and video analysis more predictable and easier to integrate into workflows. Being already available via API for two months, structured output is now also available directly in the UI when configuring the module.

Instead of receiving free-form responses, you can now define a JSON schema that the Vision Language Model (VLM) must follow—combining prompt flexibility with reliable, machine-readable results. Additionally, new VLM models have been implemented, featuring also a larger Qwen 3 8B model.
See full documentation here

In the coming months, we’ll be preparing new AI-powered workflows for transcription, export creation, and metadata generation to further optimize your media processing.

Visual Understanding – Shot Segmentation Upgrade

We’ve completely revamped shot segmentation to give you far more control over how videos are split and analyzed. Shot detection was limited to: activation and a threshold controlling sensitivity. That was it. Useful, but limited.

Now it has become a fully flexible, multi-method Shot Segmentation for the visual understanding module.

Note: This is not available as standalone function yet, only in combination with Visual Understanding.

In Visual Understanding you can now choose how shots are detected, how sensitive each method is, or skip detection entirely and use fixed-length segments instead. These are the new options:

A. Shot Detection

When Enable shot detection is turned on:

The video is automatically split into shots based on visual changes
- Content – detects semantic visual changes (default)
- Adaptive – dynamically adjusts sensitivity based on the video
- Threshold – brightness-based fade/cut detection
- Histogram – color distribution changes
- Hash – perceptual image differences
Each detected shot is then processed independently with prompt & JSON sheme
Each shot produces its own result segment

B. Fixed-Length Segments

If you set Fixed shot length (in seconds) > 0:

The video is split into equal time segments (e.g. every 5s, 10s, etc.)
Each segment is analyzed independently
Cannot be combined with shot detection and overwrites shot detection if both are activated.

This is method is ideal for:

Long videos with few shots, like CCTV footage
Uniform analysis based on timed chunks
Timeline-based chunking

This gives you more accurate shot boundaries and better results and fine-grained control for your use case. In short: you decide how your video is split, and how detailed the analysis should be.

Improved Speech Recognition

The Speech Recognition model has also been updated to improve the quality of transcripts, reduce latency and reducing the processing duration.

DeepVA — Further Improvements

Fixed-length shot segmentation now supported

Longer prompts and larger JSON inputs now supported

On-Prem: RAM exhaustion for long videos issue solved

Detailed Text Recognition results now visualized also as vertical list

Over the next months, we will be preparing more post-processing results and breaking down silos between metadata types on our platform. We will also be adding extra inputs, outputs and dictionary management for the Deep Live Hub.

Our AI applications

Deep Media Analyzer

Deep Model Customizer

Deep Collector

Deep Live Hub

Deep Indexer

Deep Explorer

by solution

by customer story

Customer Success Story: Zebra Live meets European Publishing Congress

Changelog February 2026: New Sign-on, faster and smarter Speech Recognition & improved Visual Understanding

Deep Live Hub Updates

Dictionaries & “Sounds Like” – Smarter ASR Assistance

Roll-Up Captions – Ultra-Low Latency Live Subtitles

Deep Live Hub — Further Improvements

DeepVA Platform Updates

Visual Understanding – Structured Output & Expanded Model Choice

Visual Understanding – Shot Segmentation Upgrade

A. Shot Detection

B. Fixed-Length Segments

Improved Speech Recognition

DeepVA — Further Improvements

Table of Contents

Related news

DeepVA

Product

Functions

Resources

Our AI applications

by solution

by customer story

Changelog February 2026: New Sign-on, faster and smarter Speech Recog­nition & improved Visual Under­standing

Deep Live Hub Updates

Dictio­naries & “Sounds Like” – Smarter ASR Assis­tance

Roll-Up Captions – Ultra-Low Latency Live Subtitles

Deep Live Hub — Further Improve­ments

DeepVA Platform Updates

Visual Under­standing – Struc­tured Output & Expanded Model Choice

Visual Under­standing – Shot Segmen­tation Upgrade

A. Shot Detection

B. Fixed-Length Segments

Improved Speech Recog­nition

DeepVA — Further Improve­ments

Table of Contents

Related news

DeepVA

Product

Functions

Resources

Subscribe to our newsletter

Changelog February 2026: New Sign-on, faster and smarter Speech Recognition & improved Visual Understanding

Dictionaries & “Sounds Like” – Smarter ASR Assistance

Deep Live Hub — Further Improvements

Visual Understanding – Structured Output & Expanded Model Choice

Visual Understanding – Shot Segmentation Upgrade

Improved Speech Recognition

DeepVA — Further Improvements