NAB Show

NAB Show

Session.

Is Dialog Intelligibility the New Loudness?

Tuesday, April 21 | 5:10 – 5:30 p.m.

Broadcast Engineering and IT Conference

Around 2004, the primary audio-related complaints to the FCC concerned sudden loudness changes, which also caused audience switching and reduced broadcaster revenue. In response, the ATSC, ITU, and EBU developed standards to measure and regulate loudness. These practices were adopted into QC workflows and ultimately became legal requirements in many regions.

Today, the leading audio complaint for broadcasters and streaming platforms is poor dialog intelligibility. Studies show that unclear dialog reduces viewer engagement and program “stickiness,” directly impacting revenue models.

Unlike loudness, there is no single industry-standard method for measuring dialog intelligibility. However, a promising approach developed by Fraunhofer IDMT, the “Listening Effort Meter” (LE Meter), is gaining adoption. The LE Meter uses a deep neural network to detect phonemes in an audio signal, and the confidence of detection serves as a proxy for listening effort: higher confidence indicates lower effort and better intelligibility. This method is largely language-agnostic and has been validated against human listening tests.

Several manufacturers, including Steinberg, NUGEN Audio, and RTW, have implemented or are integrating the LE Meter into commercial products.

A key question is whether dialog-intelligibility metrics should become part of formal QC requirements or remain a recommended creative-stage tool. Either way, meaningful presentation of LE Meter data remains a challenge. The algorithm generates readings every 30 ms, creating significant short-term variability. As with loudness measurement, time-windowing, smoothing, or ballistic behavior may be needed to produce usable information.

For QC purposes, there is interest in creating a single program-level value or pass/fail indicator to answer: “Is dialog intelligibility acceptable?” Determining how to derive such a score from raw or processed LE data is not straightforward. Options include mean or median values or quartile-based measures, taken on raw or smoothed data. However, each method has limitations, and certain content scenarios may not be fully represented by simple statistical summaries. Identifying the most reliable and practical approach remains an open issue for the industry.