A research team with the University of Central Florida’s Center for Research in Computer Vision recently won a competition to improve computer vision by creating technology that can automatically track behavior in long security videos.
The competition, called the Activities in Extended Video Challenge for 2020, was sponsored by the U.S. Department of Commerce’s National Institute of Standards and Technology and was held virtually in June as part of the Conference on Computer Vision and Pattern Recognition.
Top computer vision teams from around the world, including teams from IBM, Massachusetts Institute of Technology, Carnegie Mellon University, and Purdue University competed in the challenge.
“Video surveillance is of great importance for security, and manually watching surveillance videos is not only difficult but inefficient,” says Yogesh Rawat, an assistant professor at the center and team leader. “Also, with so many closed-circuit television cameras all around, it is not possible to manually watch those videos. We need automatic analysis of these security videos to improve efficiency as well as accuracy.”
That need for “extra eyes” is why the UCF computer vision team developed a deep-learning system, named Gabriella, that can detect multiple activities happening in a security video efficiently, at a speed of 100 frames per second.
“This is a first step toward analyzing these security videos, and it will have a lot of applications in national security,” Rawat says.
The team also included UCF trustee chair professor of computer science and center director Mubarak Shah who says the win is a big plus for the group.
“Video activity recognition in unconstrained domain is a very important problem that has applications in self-driving cars, video surveillance and monitoring, human-computer interface and video search,” Shah says.
“Our submission was the fastest and most accurate, two criteria for the Deep Intermodal Video Analytics program,” he says
Participation in the challenge supports the UCF team’s role in the Deep Intermodal Video Analytics program, which is funded by the U.S. Office of the Director of National Intelligence’s Intelligence Advanced Research Projects Activity program through a sub-contract from the University of Maryland.
The UCF team was runner-up in 2018 and 2019 in the institute’s similar Text Retrieval Conference’s Video Retrieval Evaluation but lost to Carnegie Mellon University. This year, the Carnegie Mellon team was runner-up.
The UCF team won by developing an end-to-end approach to computer analyzation of video footage.
End-to-end means the computer directly takes the raw RGB video as input and generates the required output, without any intermediate processing, which was required for the systems developed by the other teams.
Intermediate processing tasks such as object detection, optical-flow computation, and tracking make the whole process very complicated and difficult to train as well as test, Rawat says.
“The end-to-end system avoids all of this and therefore is preferred,” he says.
The UCF team is able to monitor 37 different activities over the course of more than 250 hours of video, including activities such as “theft” and “person abandons package,” and the scalable, machine learning system can be trained to recognize more if the data are available.
Monitoring for these kinds of activities over hours of video is difficult using computer vision because the activities vary in length; there can be multiple activities in the same frame; the same person can be doing different activities; and the scale of the activities varies as those closer to the camera are bigger in size.
The UCF research team also includes UCF Department of Computer Science doctoral students Praveen Tiruputtar, Aayush Rana, Kevin Duarte, Ugur Demir and Ishan Dave; and UCF Office of Research doctoral fellow Nayeem Rizve.
“We worked very hard to get to the top position,” Rawat says.