At the Language and Media Processing Lab, a lot of our research focusses on analyzing video for semantic content. This includes tracking people, detecting text, and so forth. ViPER is our system for evaluating our work. The Video Processing Analysis Resource is a toolkit of scripts and Java programs that enable the markup of visual data ground truth, and systems for evaluating how closely sets of result data approximate that truth.
The Performance Evaluation Problem
In order to evaluate a video analysis algorithm, or a set of algorithms, it is necessary to define a methodology. As there are many books and papers describing methods for evaluating specific types of algorithms, we decided to develop a general framework for evaluation. The basic idea common in most types of evaluations we do is a comparison between the computer generated output and some ideal version of 'Truth'.
In some subfields of vision, like document processing, it is possible to automatically generate test data. However for video processing, it is more common for a human to define the ground truth for each video clip. In order to ensure that researchers may repeat and verify evaluations, it is important to make the ground truth metadata is available to other researchers in a documented format. It is very useful to have methods of qualitatively verifying the ground truth, as well. ViPER-GT provides tools for solving the metadata problem.
There are many ways to define how correct a result data set is with respect to a ground truth data set. A metric that looks at difference in size of bounding boxes for text detection may give different results than a more goal-oriented metric which operates on the number of characters or words correctly recognized. ViPER-PE provides tools for solving the evaluation problem.
The ViPER Ground Truth Authoring Tool
ViPER-GT gives the process of authoring ground truth a Java graphical user interface. It is designed to allow frame-by-frame markup of video metadata stored in the Viper format. It is also useful for visualization. For more information, see the appropriate manual.
The Viper Performance Evaluation Tool
ViPER-PE is a command line performance evaluation tool. It offers a variety of metrics for performing comparison between video metadata files. With it, a user can select from multiple metrics to compare a result data set with ground truth data. It can give precision and recall metrics, perform frame-by-frame and object-based evaluations, and features a filtering mechanism to evaluate relevant subsets of the data. The tool is further described in its manual.
The ViPER API is a set of Java interfaces and classes that provide programmatic access to data stored in the ViPER format. It offers a generic, object-oriented view of video metadata that is aimed at evaluation. Since ViPER data is stored in XML, it is not difficult to read in the data in languages that cannot interface with Java.
ViPER-Viz is (currently) a set of UNIX scripts that enable a user to compactly visualize ground truth, analysis results, performance evaluation results, or an entire video clips, using several flexible representations.
The system is currently under development, so there are still bugs in the program. As such, there is no warranty on this, expressed or implied. Save early, save often. See our buglist for more details.
Also, note that the web site hosted at Sourceforge is likely to be kept more up to date. Also, its url is easier to remember. So, go see the new Video Processing Evaluation Resource (ViPER) Toolkit home page.
We're located in AVW 3126. The group’s research covers most of the major areas of language research, including but not limited to speech recognition, handwriting and optical character recognition, multilingual text processing such as machine translation, and language data exploitation applications including document summarization, sense-making across structured data such as ontologies and thesauri, information retrieval, ranking and personalization, and computational social science.