AI is rapidly transforming the full media pipeline—from how services are tested, to how content is adapted, distributed, and monetized. This session brings together three perspectives on next-generation broadcast and streaming workflows: agentic AI that autonomously performs broadcast-grade QA across Smart TVs and set-top boxes; AI-powered multisensory adaptation that expands accessibility beyond captions and basic audio description; and real-time AI enrichment for live sports, enabling automated ad signaling, metadata extraction, personalization, localization, and enhanced video quality. Together, these presentations show how AI can improve reliability, unlock new audiences, and drive new value across modern video ecosystems.
Saturday, April 18 | 3 – 3:20 p.m. | N256
Yoann Hinard
Smart TVs and OTT devices have become the primary gateway to broadcast and streaming content, yet they present an increasingly complex QA environment. Native applications run across heterogeneous operating systems (Tizen, webOS, Fire OS, Roku OS, Android TV or Google TV, RDK) and render highly dynamic interfaces that include personalized rails, rotating promotions, contextual backdrops, and variable ad driven placements. Traditional scripted and model-based testing methods remain valuable and widely used, yet they can become difficult to maintain as device platforms, app versions, and personalized interfaces evolve on a weekly basis. This paper presents a new agentic, multi agent AI framework designed to autonomously execute robust QA workflows on Smart TVs and any video devices in full black box conditions. Instead of depending on fixed navigation routes or pre mapped interfaces, the system uses AI that visually interprets the screen, selects actions based on the testing goal, and adjusts its behavior dynamically, enabling navigation that mirrors human interactions. The paper details the underlying mechanisms that enable: · Zero shot UI adaptation using VLMs and specialized perception modules to recognize semantic elements such as service logos, promoted content, play controls, install prompts, and ad transitions. · Multi agent orchestration where Designer, Planner, Runner, and Analyst agents work together to transform a high-level test goal, such as "Open the ABC News live stream and verify playback," into an executable plan that remains resilient to layout changes, cross language interfaces, and dynamic content insertion. · Perceptual verification combining semantic checks with video and audio perception modules to validate playback integrity, ad stitching, live latency behaviors, and QoE impairments without reference streams. · Deterministic reproducibility using techniques that bound agent exploration and ensure that autonomous executions remain traceable, repeatable, and suitable for broadcast engineering environments. · Scalability across OS ecosystems with empirical results demonstrating reliability across Smart TV platforms, operator STBs, and regional UI variations. Field evaluations show that agentic workflows reduce test maintenance by approximately 80 percent, maintain robustness despite UI reflows and dynamic content, and significantly expand the range of QA tasks that can be automated. These tasks include app discovery, deep navigation, FAST channel validation, dynamic ad insertion checks, and cross language playback verification. For broadcast and streaming engineering teams, this work outlines a practical and standardized path for integrating agentic AI into QA and monitoring workflows bridging the gap between traditional script and fully autonomous, goal-oriented device validation in multi-screen environments.
Saturday, April 18 | 3:20 – 3:40 p.m. | N256
Punyabrota Dasgupta, Maheshwaran G
PROBLEM STATEMENT Current media accessibility solutions—subtitles, closed-captioning, and basic audio descriptions—provide fragmented, incomplete experiences that fail to capture storytelling essence. While addressing deaf and blind audiences, they ignore substantial populations with intellectual disabilities, ADHD, autism spectrum disorders, sensory processing disorders, and photosensitive epilepsy who face consumption barriers despite intact vision and hearing. UN disability statistics reveal these underserved audiences represent 15-20% of target demographics, including millions of young adults on OTT platforms—a significant untapped market opportunity largely ignored by traditional accessibility approaches. NOVEL APPROACH We introduce a comprehensive AI-driven framework that decomposes, analyzes, and reconstructs multimedia content through multisensory interpretation. Our system employs: (1) Advanced feature extraction using neural source separation (Demucs/Spleeter), computer vision, and video-scene parsing to create hierarchical content metadata; (2) Multi-LLM consensus frameworks evaluating plot comprehension, emotional impact, and character development across specialized language models to determine feature relevance; (3) Fourteen specialized transformation services including real-time braille generation, dialog simplification using RAG fine-tuning, animation generation for sensory processing disorders, music pitch reduction for auditory sensitivity, media summarization (120-minute films to 20-minute versions), and contextual engagement tools. Real-world validation: For Shakespeare's Macbeth, our system identified that "Birnam wood trees are moving" creates comprehension barriers for intellectual disabilities, while Lady Macbeth's hand-washing scene poses triggers for GAD/OCD patients—insights inaccessible to conventional solutions. IMPACT This framework revolutionizes inclusive media by: preserving artistic integrity while enabling emotional fidelity for disabled audiences; generating scalable, economically viable adaptations across diverse disability profiles; expanding addressable markets and revenue streams; supporting corporate sustainability mandates; and implementing universal design principles that enhance content for all viewers. Moves industry beyond compliance checkbox to paradigm shift—transforming accessibility into competitive differentiator while advancing genuine inclusion.
Saturday, April 18 | 3:40 – 4 p.m. | N256
Moore Macauley, Damien Degorre
Artificial intelligence is rapidly reshaping how broadcast and video streaming services are distributed and monetized. As broadcasters and streaming providers seek new ways to enhance the viewer experience, drive efficiency and unlock revenue potential, AI is a core enabler. This paper explores recent breakthroughs in AI designed to elevate live sports streaming and monetization. It will describe how AI models can automatically detect natural ad breaks within live feeds and generate SCTE-35 markers in real time, creating monetization opportunities even for content lacking traditional ad cues. By learning patterns of motion, crowd reaction and scene composition, AI engines can identify high- and low-action segments in live sports streaming, dynamically triggering in-stream advertising and unlocking new monetization models. Furthermore, the paper will examine AI approaches for extracting contextual metadata from video and audio content. When combined with automation technologies, such as speech-to-text subtitle generation, multilingual translation, voice cloning, and over-dubbing for localized commentary, these methods can help sports rights holders and streaming platforms broaden accessibility, reach new audiences and elevate engagement. Integrating contextual data enables smarter real-time highlight creation and more personalized on-demand replay generation. Another major area of focus is elevating video quality with AI, including super-resolution upscaling from HD to UHD. This innovation allows video service providers to deliver UHD experiences while minimizing costs. Collectively, these AI advancements represent a shift from rule-based video processing to intelligent, self-optimizing systems capable of analyzing, enriching and monetizing live content at scale. An outlook on how AI will continue to transform live sports streaming and monetization will be shared, paving the way for a new generation of efficient and revenue-optimized media workflows.
Speakers
Skip PizziConsultant / ATSC Brazil Implementation Team ChairSkip Pizzi Media Consultant LLCModeratorVIEW BIO
Damien DegorreDirector, SaaS Product DefinitionHarmonicSpeakerVIEW BIO
Maheshwaran GPrincipal solution Architect Media and EntertainmentAmazon Web Services IndiaSpeakerVIEW BIO
Moore MacauleyChief Technology Officer, Video Business UnitHarmonicSpeakerVIEW BIO
Punyabrota DasguptaPrincipal Solutions ArchitectAmazon Web Services IndiaSpeakerVIEW BIOWork with NAB Show’s Sales Team to explore how your brand can power the pros shaping what’s next.