Thus, a tweet announcing the Notre Dame cathedral is on fire can be linked to live footage from a major television network confirming the events. Quantifying the dual visual and spoken narratives of television news allows it to be linked to online news and social media reporting of those events. Television news cameras are often the first on the scene of major events, offering a trustworthy and verified accounting of what is happening in real-time. Perhaps most importantly, because it would allow us to expand our efforts to combat misinformation, disinformation and foreign influence beyond the textual realm to the visual world through which we increasingly “see” the world around us. ![]() Why analyze television news using deep learning? Together, these four APIs represent the four major modalities of current deep learning approaches to content understanding: video, imagery, speech and text. In total, nearly 2TB of data were analyzed, producing 615GB of machine annotations. Finally, both the station-provided closed captioning and the automatically generated transcripts were processed using Google’s Natural Language API to inventory all of the major people, places, organizations and other primary topics mentioned. While the Video AI API supports automatic transcript generation, in this case each video was transcribed using Google’s Cloud Speech-to-Text API since it supports 120 languages, offering an easier path for expanding beyond English television news in the future. The video was then split into 1-frame-per-second preview thumbnails and analyzed through Google’s Vision AI API to examine how treating videos as sequences of still images affects analytic results while creating even further distance between the analysis and the original source content. This week was selected due to it having two major stories, one national (the Mueller report release on April 18th) and one international (the Notre Dame fire on April 15th).Įach video was analyzed using Google’s Video AI API with all of its features enabled, including identifying the topics and activities it depicted second-by-second, scene changes, OCR text recognition and object tracking. To explore this vision in more detail, I worked with the Internet Archive’s Television News Archive to analyze one week of television news coverage, covering CNN, MSNBC and Fox News and the morning and evening broadcasts of San Francisco affiliates KGO (ABC), KPIX (CBS), KNTV (NBC) and KQED (PBS) from April 15 to April 22, 2019, totaling 812 hours of television news. ![]() Visual materials like videos and imagery have long resisted similar analysis due to the limitations of traditional machine learning beyond text.Īs deep learning algorithms have matured, this same non-consumptive workflow can now be extended to videos and imagery, making it possible to have machines watch millions or even billions of hours of television and summarize the key visual and spoken narratives without a human ever being able to see any of the underlying source material. Moreover, these AI tools do not require any human intervention, making them not just infinitely scalable, but most importantly, non-consumptive.īooks are routinely analyzed today using machine learning algorithms that can analyze their topical and narrative structures without any human having access to the underlying text. Simultaneously, deep learning has matured to the point that off-the-shelf cloud AI APIs can watch television and examine images, cataloging the objects and activities they depict with extraordinary precision.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |