We propose a novel method for automatically detecting and tracking news topics from multimodal broadcast TV news data. We propose a Multimodal Topic And-Or Graph (MT-AOG) to jointly represent the latent structure of both texts and visuals. MT-AOG embeds a context sensitive grammar that can describe the hierarchical composition of news topics by semantic elements about people involved, related places and what happened, and model contextual relationships between elements in the hierarchy. We detect news topics through a cluster sampling process which groups stories about closely related events together. Swendsen-Wang Cuts (SWC), an effective cluster sampling algorithm, is adopted for traversing the solution space and obtaining optimal clustering solutions by maximizing a Bayesian posterior probability. Topics are tracked to deal with continuously updated news streams. We generate topic trajectories to show how topics emerge, evolve and disappear over time. The experimental results show that our method can explicitly describe the textual and visual data in news videos and produce meaningful topic trajectories. Our method also achieves superior performance for the task of document clustering compared to state-of-the-art methods on Reuters-21578 dataset and our novel dataset, UCLA Broadcast News Dataset.
This project is supported by the NSF CDI project CNS 1028381.