1 Stabilization of video streams using rotational and translational motion estimation E. Ardizzone, R. Gallea, A. V. Miceli Barone, M. Morana Abstract—We present a real-time method for stabilizing video sequences taken from an oscillating camera. We estimate camera rotation (roll) and translation on the image plane with Fourier spectrum analysis and Integral projection matching and then filter-out high-frequency motion components (presumably due to unintentional oscillation) , preserving intentional motion. expensive, thus requires a specialized hardware to be implemented in real-time. In this paper we refer particularly to the stabilization of video streams taken from the camera of a Sony Aibo ERS-7, a dog-like robot, as an example application, although the method is not limited to this domain. Index Terms—Video Stabilization, Motion compensation, Video signal processing, Integral projection matching, Frequency domain analysis I. INTRODUCTION V streams taken from an hand held camera or a camera mounted on a moving vehicle are often corrupted by the oscillation of the unstable mounting. These oscillations are unaesthetic when observed by a human and make difficult any kind of automated analysis like object recognition and localization. So, for both amateur video recording applications and automated video analysis applications (as, for instance, robotics) a method for stabilizing video streams is required. This method should be fast enough to be implemented in realtime on a performance limited hardware, like a amateur camera video signal processor or a robot control computer. Image stabilization can be divided in three phases: motion estimation, motion compensation and image composition. In literature there are two classes of motion estimation techniques: block matching [1], [2] and feature matching [3], [4], [5]. Block matching can estimate only global image translations, not other kind of motion like rotation. Feature matching tries to find corresponding features between subsequent frames. It can estimate various kind of motion but its quality is limited by the kind of feature used. Finding useful features in a generic unknown environment can be computational-intensive, thus these methods are not well suited in applications were there isn’t a prior knowledge of the environment [5]. The method we present performs stabilization by estimating global 2D image rotational and translational motion using Fourier spectrum analysis and Integral projection matching and then filtering-out oscillation using polynomial interpolation. We don’t use correspondence matching like other stabilization methods, since this approach is computationally IDEO II. METHOD DESCRIPTION Stabilized image Input image Roll stabilization Traslational stabilization Fig.1. High-level dataflow diagram. A. Overview Our stabilization system is made of two cascaded subsystems: A roll correction system and a translation correction system. B. Roll correction system Roll is the rotation of a vehicle (or a camera) with respect of its own longitudinal axis. In ships, for instance, roll is generated by the waves. In a Sony Aibo it is generated by the leg motion. Usually the roll axis of a vehicle is approximately orthogonal to the horizon line so we chose this line as a reference for the correction. First, we measure the inclination of lines known to be almost parallel to the horizon line. In the scenario of an Aibo playing RoboCup soccer these are: distant field lines and the gate top edges, in an indoor or city outdoor scenario these are: edges of buildings, windows, walls, furniture, etc., in a sea scenario it is simply the true horizon. Second, we rotate each frame by the mean measured inclination of the detected lines. 2 The whole processing is composed by the following subsequent operations, applied on each frame: 1) Extraction of a vertical gradient estimate 2) Horizontal line detection mask filtering 3) Fourier transform 4) Extraction of the Fourier spectrum principal direction 5) Spectrum thresholding for angle detection 6) Frame rotation (a) input frame (b) vertical gradient (c) horizontal segments (d) Fourier spectrum Note that this methods works well in scenarios with predominance of straight edges, like indoor, outdoor or sea scenarios, while doesn’t performs well in natural or otherwise complex scenarios. C. Translation correction system To remove translational oscillations while preserving intentional camera translations (without having prior knowledge of them) we perform three steps: 1) Global motion estimation 2) Intentional motion estimation 3) Oscillation removal Global motion estimation is done using the Integral projection matching technique [1]. For each NxN frame, a summation over the rows and one over the cols is performed do obtain two separate N-elements vectors. For two consecutive frames, the vectors obtained from the summation over the rows are compared by making one scroll above the other within a specified window size, calculating the displacement that gives the least sum of squared differences. That displacement is the estimated vertical translation. Similarly, operating on the vectors obtained from the summation over the cols we get the horizontal translation. Example: Consider the following images: (e) thresholded spectrum (f) rotated frame Fig.2. Processing steps. The vertical gradient (b) is estimated with the point-to-point difference between each pixel and its down neighbor, then the frame is filtered (convolved) with an horizontal line detection matrix which enhances almost horizontal segments of a sufficient length (c). The Fourier transform is performed on this image; its spectrum contains, apart from a vertical peak, an almost vertical line, passing from the centre, which is orthogonal to the horizon line of the original image (d). This image is thresholded (e) and least square error regression is used to find the line that best fits the points. The angle between this line and the vertical image axis is the mean roll angle of the original frame. Note that we work in the frequency domain because in this domain the line is unique and passing from the centre, while in the original image we have multiple lines in various positions. After the angle has been estimated, the whole original image is rotated by it to compensate the roll (f). Fig.3. Two tralated frames. Summing over the cols we get: 16 16 4 14 6 12 12 12 12 12 12 6 6 12 12 12 12 12 14 4 16 16 And 16 16 16 16 4 14 12 6 14 4 Fig. 11. Integral projection of the columns. By scrolling the first vector above the second we get a minimum sum of squared differences for a displacement of 2. 3 Roll stabilization Similarly, summing over the rows we get: 16 16 Intensity image 4 14 6 12 12 12 12 12 12 6 14 4 16 16 14 6 12 12 12 12 12 12 6 14 4 16 16 16 And 16 4 Vertical Derivative Horizontal segments detection FFT Roatation Least Squares line computation Thresholding Full image Fig. 12. Integral projection of the rows With a displacement of -1. So the global translation between the two frames is (2, -1), as can be seen by inspecting directly the matrices. Traslational stabilization Global motion estimation Intentional motion estimation Frame mosaicing Stabilized image - After estimating the global motion, we need to estimate the intentional motion. So we do a polynomial fitting on the global translation values of the last K frames. Using a loworder polynomial the fitting cure will be smooth, thus oscillation would almost disappear, while the intentional motion would remain. But if the order is too low, the curve would not promptly follow abrupt intentional motion changes, so a tradeoff must be made. We found experimentally that second or third order polynomials gave the best results. The number of frames to take into account for fitting (K) should also be chosen with a trade off: a too small K generates an estimate that suffers from oscillation and a too large K generate a n estimate that follows the intentional movement with too much delay and also degrades the performances. With our tests on the Sony Aibo, we found a good value of 60 frames, which corresponds to 4 seconds of video stream. For other applications other values may produce better results. Oscillation motion deletion Fig. 5. Complete system block diagram. In general the rotated and translated frame would not fit the original rectangular bounds: some parts at the corner would remain without image data. To prevent the unaesthetic effect of ‘black corners’ and to maintain temporal coherence between frames we put the new frame on the top of the previous, creating a mosaic effect [6]. III. RESULTS This method has been tested on video sequences taken in different environments with different resolutions. The algorithm, implemented in Matlab is able to process an average of 1.5 frames per second at a resolution of 416x320. About half of the processing time is used to perform the image rotation, since it is a computational-intensive geometrical transformation. An efficient implementation of the rotation, would significately speed up the whole algorithm. In our tests we used polynomials of second and third order for traslational motion compensation. Second order polynomials produce a smooth visible motion, removing significant obscillations, but not promptly following intentional motion changes. Third order polynomial follow the intentional motion promptly but don't remove all obscillations. Fig. 4. Global motion estimation versus corrected motion. After estimating the global and the intentional translational motion, we calculate the oscillation simply as the difference between the two, and then we translate the frame in order to compensate. The number of samples considered for smoothing affects the smoothnes of the output sequence as well as the complexity of the algorithm. We found values about 60 to be well performing. Lower values, while speeding up the method would produce visible obscillations, and highter values would produce slow adapting motion estimations. 4 IV. CONCLUSION We presented a video stream stabilization method suitable of real-time implementation. The method assume a simple 2D rotation and translation motion model. It also assumes that the images contains straight edges, especially almost horizontal, so it is suited for operation in artificial environments like a building interior or a city, or also in the sea, where the horizon is viewable. It is not suited for operation in an environment with a lot of complex and variously oriented edges, like a forest. The main performance bottlenecks are the image rotation and translation used to compensate the oscillation after the estimation. So the main optimization that may be applied should be directed towards these steps. Also for some applications, like robotics, it may be unnecessary to perform such corrections as the estimated oscillation values could be sufficient for an image processing module to perform correct analysis. REFERENCES [1] [2] [3] [4] [5] [6] Krishna Ratakonda, Real-Time Digital Video Stabilization For MultiMedia Applications (Online). Available: URL Y. M. Yeh, H. C. Chiang, and S. J. Wang, A digital camcorder image stabilizer based on gray coded bit-plane block matching, in proc. 13th IPPR Conf. Computer Vision, Graphics and Image Processing, Taipei,Taiwan, 2000, pp. 244-251. Z. Duric and A. Rosenfeld, Image sequence stabilization in real time, Real-Time Imaging vol. 2, no. 5, pp. 271-284, 1996. Tom Stepleton, Ethan Tira-Thompson, AIBO Camera Stabilization, 16720, Fall 2003. Yu-Ming Liang, Hsiao-Rong Tyan, Shyang-Lih Chang, Hong-Yuan Mark Liao, and Sei-Wang Chen, Video Stabilization for a Camcorder Mounted on a Moving Vehicle, IEEE Transactions on Vehicular Technology, vol. 53, no. 6, November 2004. M. Hansen, P. Anandan, K. Dana, G. van der Wal, P. Burt, Real-time Scene Stabilization and Mosaic Construction, David Sarnoff Research Center, CN 5300, Princeton, NJ 08543.
© Copyright 2026 Paperzz