Real-Time Computer Vision

2007-2008

The Graphics and Visual Informatics Laboratory at UMD, run by Professor Amitabh Varshney, is interested in scientific visualization and virtual environments. In the summer of 2007, I was brought on board to help with the graphics end of a real-time computer vision project.

Current video surveillance systems are quite lacking in visualization -- namely, they are limited by the position, field of view, resolution, orientation, and frame rate of their array of cameras. This project set out to solve this problem by providing a real-time computer simulation (in fully navigable 3d) of the scene currently viewed by the physical surveillance system.

First, from the computer vision world, the position/orientation/clothing/body movements/etc of subjects in play were required. Vehicles, animals, and other objects were not neglected, but human subjects (carrying weapons, briefcases, lunchbags) were the main focus. Determining such details boils down to many matrix projections from the cameras' points of view (orientation, f.o.v., position, resolution) to a more manageable coordinate system. Once the video was processed, specialized recognition filters were applied to this space to determine body parts, clothing textures, colors, gait, and other details.

With this information, we were able to display a real-time navigable environment mapping the scene observed by the surveillance systems to a computer-rendered scene on the video wall. This video wall - a fifty-seven megapixel display consisting of a 5 x 5 array of Dell 2407 FPW widescreen monitors running at 1920x1280 - allows the user to both view and navigate the 3d-simulation and keep watch of a subset of the video feeds, as shown in the picture below.

57 megapixel display running Flexiview

Provided controls allow for synchronized movement forward/backward in time, skipping to certain locations in the video feed, fly-through navigation, ground-level walking navigation, multiple bird's eye view levels of navigation, and the loading of different base environments/objects/et cetera. Up to five video feeds can be played on the right-most five LCD panels, as shown in the image.

Regarding the technical details of this project -- the cooperating labs used the A.V. Williams building as a test ground for this research. Multiple cameras were set up on and around the exterior of the building. Handheld cameras were also used for video recording while walking and driving. Most of the vision-side calculations were done using MatLab and C.

Displaying real-time graphics on a 57-megapixel video wall can be troublesome. Our wall is driven by fourteen "nodes," so a multithreaded display strategy was crucial. Because of this, we chose to use the scenegraph utility OpenSG. Not only did this give a ready-made, OpenGL-based framework with which to display graphics on a single screen, but it also provided a multithreaded environment with which to display our scene on the display wall. "Client" programs were run on each of the display nodes, while a single "Server" program served display data to each of the nodes.

The project was developed on RedHat machines running nVidia 6- and 7-series GPUs, mainly in OpenSG. However, 3ds Max, Blender, Photoshop, and The GIMP were used for modelling and texturing. Python scripting provided a pipe between the OpenSG visualization and the (up to) five instances of mplayer playing surveillance videos. Ruby scripts were used to parse the data received from the vision end of the project into a more graphics-friendly format.

Here is the webpage maintained by Maryland regarding this project: link.