This lib guide serves as an initial exploration of visual data capture and analysis and as a guide to help students navigate the basic tools and methodologies that can be used in the field of digital video and image content analysis and research.
Statistical analysis on film data dates back at least to the 1970s, with Barry Salt's initial work applying quantitative statistical analysis methods to cinematic data. His work is a good place to start.
Salt, B. (1974). Statistical style analysis of motion pictures. Film Quarterly, 13-22.
Salt, B. (1992). Film style and technology: History and analysis (2rd edition). Starword. (Last 3rd edition is in 2009)
Salt's book Moving into Pictures was later on reviewed by Buckland (2008).
Buckland, W. (2008). What does the statistical style analysis of film involve? A review of Moving Into Pictures: More on History, Style, and Analysis. Literary and Linguistic Computing 23, 219-230.
Selected recent work:
Baxter, M.(2014). Notes on cinemetric data analysis. (Posted by author)
Baxter, M. (2013). Evolution in Hollywood editing patterns. Cinemetrics Website. (Posted by author)
Cutting, J.E., DeLong, J.E. and Nothelfer, C.E. (2010): Attention and the evolution of Hollywood Film film. Psychological Science 21, 440-447.
Cutting, J.E., Brunick, K.L., DeLong, J.E., Irichischi, C., &Candan, A.(2011). Quicker, faster, darker: change in Hollywood film over 75 years. i-Perception, 2(6), 569.
Smith, T.J., Levin, D. and Cutting, J.E. (2012): A window on reality: perceiving edited reality. Current Directions in Psychological Science 21, 107-113.
Cinemetrics - originated from an eponymous web-site (http://www.cinemetrics.lv) initiated in 2005 by Yuri Tsivian, a professor from University of Chicago, Department of Cinema and Media Studies. The website introduces a data collection tool that facilitates the annotation of films, and therefore makes it easier to collect editing data or some other content analysis data (e.g. action, dialogue scenes) manually. The website thus acts as a repository that displays the database of the quantified data about thousands of films across history based on crowd contribution; meanwhile it also presents a couple of published articles that have extensively utilized the website database. The website is a notable resources for scholars and researchers who are interested in taking quantitative approach in film studies.
Cinemetrics is thus understood as a manner of "statistical analysis of quantitative data, descriptive of the structure and content of films that might be viewed as aspects of style" (Baxter, 2014, p2).
Just like text analysis on literature where authors’ language and narrative style can be evaluated by analyzing the frequency of key words usage, the density distribution of key words, the sentiment variation and so on, digital video and film offers elements that can be analyzed in similar ways. The editors or directors' editing style can be analyzed through data such as the total shot counts, shot length distribution, frequently appeared images, color and so on.
The quantitative way of analyzing digital video and image materials resembles text analysis in many ways, but it can be more complex due to the special texture of digital video and image materials, in that they incorporate many dimensions of data.
Generally speaking, I would define the meta-data of digital video and image materials into two types: the hard data and the soft data. The “hard data” is usually associated with the production related information generated at the time when the digital videos or images were created. For example, the production time, location, material size, run time, frame rate, resolution, number of shots, average shot length (ASL), theme color and so on. This type of data is typically regarded as comparatively objective and can be used directly into descriptive statistical analysis.
In contrast, the “soft data” is usually associated with the content of the digital videos. For example, you could categorize the content of digital video into dialogue, action, and silence, or recognize and transcribe the speech and image content into text description. The qualitative nature of “soft data” determines that the “soft data” is subject to a certain level of interpretation and therefore it is comparatively subjective. This type of data is not natively available and a secondary data extraction and analysis are usually necessary and required. However, once the category of “soft data” is defined, then the preciseness of “soft data” is largely depending on the speech and image recognition technology and thus also has an objective nature.
As for which type of data we should select or combine, it depends on the research questions we have. And no matter which type of data we use, they both serve the purpose of better understanding the video and image materials. We will introduce in the following tabs the useful tools and methods in terms of both digital video data extraction and visualization for assisting with content analysis.
Baxter (2014): Notes on Cinemetric Data Analysis defines basic terms that are frequently used in quantitative analysis of films and digital videos as follows:
Frame rate: The number of frames or images that are projected or displayed per second, usually represented as FPS, frame per second.
Shot: Between one cutting point to another cutting point in digital post-production editing decision list.