Skip to Main Content

Digital Video and Image Analytics

Analyzing videos and images using digital tools

Fair Use Term on Digital Media File Usage

According to U.S. copyright law, below cases should fall into fair use scenario:

Notwithstanding the provisions of sections 106 and 106A, the fair use of a copyrighted work, including such use by reproduction in copies or phonorecords or by any other means specified by that section, for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research, is not an infringement of copyright. In determining whether the use made of a work in any particular case is a fair use the factors to be considered shall include —

1. the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;

2. the nature of the copyrighted work;

3. the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and

4. the effect of the use upon the potential market for or value of the copyrighted work.

The fact that a work is unpublished shall not itself bar a finding of fair use if such finding is made upon consideration of all the above factors.(U.S. Constitution, Article 1, Section 8). 

Media File Acquisition

Downloading from YouTube:

There are many ways to download the media file on Youtube. One of the easiest ways is to type a double "s" (i.e. ss ) right before the letter "y" of "youtube" in the URL box and press enter, and then you could  select the type of media file (e.g. MP4 720p, MP4 360p, WebM 360p, FLV 240p) you want to download. Here is the tutorial for reference. 

Use Data Crawling Toolkit- TubeKit: 

Using YouTube APIs allows you to scrape large quantities of data on Youtube very efficiently in a custom search. The data it can extract includes YouTube video link from any webpage, YouTube video metadata (author, keywords, genre, number of views, ratings, comments, etc.), text comments for YouTube videos, and YouTube users' profile data. However, the downside is it is built upon YouTube APIs, and once the APIs is changed, some of its process may not function properly and may take some time to fix. Here is the toolkit download website. 

"Hard Data" Collection and Extraction Tools

Cinemetrics Film Editing Data Counting Tool:

The tool is located on website where you can download and learn how it works. It functions like a stopwatch that records each time when you click the pause button as the movie plays and allows you to add annotation at each pause. However, the software must be run simultaneous while the user is watching the movie, which means it has to be done manually instead of automatically. Even so, the website still collects approximately 14,000 films' editing data contributed by independent users' collaboration till 2014. 


Cinemetrics by Frederic Brodbeck and Github OpenCV:

This section includes both data collection and visualization tools. Interestingly, a talented graphic designer Frederic Brodbeck created a website also named as "cinemetrics" but under different domain and with different functions. The amazing part of his work is that he did great job in film data visualization based upon media files, with both still image color wheel visualization and dynamic color wheel visualization. However, he didn't document his code completely, it is thus hard for other users to restore the code and apply it directly on selected media files. Luckily, another Github contributor @suite22 made modification on the code that made it easier to use for both data collection and visualization. The code can be run on Linux system via a virtual machine, through which you can acquire basic shot length and motion index data. More help could be accessed at Temple University Paley Library Digital Scholarship Center


Video Shot and Scene Automatic Detection Tool:

This tool provides automatic detection of temporal segmentation of videos into shots and scenes. The tool can be downloaded here with usage instructions. However, the limitation is it can only process videos of duration up to 10 minutes each. For unrestricted version, you have to contact: . 

"Soft Data" Collection and Extraction Tools:

In addition to video shot boundary detection, the tools and systems in this section focus on recognizing the image content or speech content in the video and transcribe them into text for the purpose of further analysis or database retrieval. The technology is usually developed by large tech corporations and utilized by large university based digital libraries. Below are some examples: ​


Company-developed Video Event Detection Products: ​​​​

  • IBM Cue Video summarizes a video and extracts key frames. It acquires spoken documents from video via speech recognition.

  • IBM Research TRECVID-2004 Video Retrieval System is revised on IBM Cue Video system. It is content-based automatic retrieval project focusing on four tasks of shot boundary detection, high-level feature detection, story segmentation, and search. Here is a YouTube video demonstration on how the system works.

University-based Video Event Detection Projects:  

  • Informedia Digital Video Library by Carnegie Mellon University has integrated speech, natural language process technologies to automatically recognize speech in video soundtrack and transcribe it into text information in alignment with linear video segments and index to create “video paragraphs” or “video skimming” for efficient retrieval.

  • Digital Video Multimedia Lab at Columbia University has been engaged in multi-media content analysis, data extraction from images, videos with the efforts to build large-scale search engines, machine leaning and recognition system for automatic index and retrieval of the data.

  • The Fischlar Digital Video System at Dublin City University is a digital library of broadcast TV programs with several hundred hours of video content, operating via TREC Video Retrieval track. It can detect and remove the advertisement from the video shots and analyze the remaining content via spoken dialogue indexing, speech/music discrimination, face detection, anchorperson detection, shot clustering, and shot length cue and so on.