Fusing appearance and motion information for action recognition on depth sequences

Pei, Cong; Jiang, Feng; Li, Mao

doi:10.3233/JIFS-200954

Fusing appearance and motion information for action recognition on depth sequences

Article type: Research Article

Authors: Pei, Cong | Jiang, Feng^{; *} | Li, Mao

Affiliations: School of Computer and Information Engineering, Central South University of Forestry and Technology, Changsha, Hunan, China

Correspondence: [*] Corresponding author. Feng Jiang, School of Computer and Information Engineering, Central South University of Forestry and Technology, No. 498 Shaoshan South Road, Changsha, Hunan, China. E-mail: jf09mail@126.com.

Abstract: With the advent of cost-efficient depth cameras, many effective feature descriptors have been proposed for action recognition from depth sequences. However, most of them are based on single feature and thus unable to extract the action information comprehensively, e.g., some kinds of feature descriptors can represent the area where the motion occurs while they lack the ability of describing the order in which the action is performed. In this paper, a new feature representation scheme combining different feature descriptors is proposed to capture various aspects of action cues simultaneously. First of all, a depth sequence is divided into a series of sub-sequences using motion energy based spatial-temporal pyramid. For each sub-sequence, on the one hand, the depth motion maps (DMMs) based completed local binary pattern (CLBP) descriptors are calculated through a patch-based strategy. On the other hand, each sub-sequence is partitioned into spatial grids and the polynormals descriptors are obtained for each of the grid sequences. Then, the sparse representation vectors of the DMMs based CLBP and the polynormals are calculated separately. After pooling, the ultimate representation vector of the sample is generated as the input of the classifier. Finally, two different fusion strategies are applied to conduct fusion. Through extensive experiments on two benchmark datasets, the performance of the proposed method is proved better than that of each single feature based recognition method.

Keywords: Action recognition, feature fusion, depth motion maps, completed local binary pattern, polynormal

DOI: 10.3233/JIFS-200954

Journal: Journal of Intelligent & Fuzzy Systems, vol. 40, no. 3, pp. 4287-4299, 2021

Published: 02 March 2021

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com

For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn

For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl

如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl

Share this:

North America

Europe

Asia