| 英文摘要 |
Salient object detection (SOD) originates from the cognitive studies of human visual attention behavior, i.e., the astonishing ability of the human visual system to quickly orient attention to the most informative parts of visual scenes and ignore the other parts. SOD is thus significantly instrumental to a wide range of real-world applications, e.g., autonomous driving, robotic interaction, video segmentation, video captioning, video compression. Besides its academic value and practical significance, SOD presents great difficulties due to the challenges carried by video data (e.g., occlusions, blur, large object-deformations, diverse motion patterns) and the inherent complexity of human visual attention behavior (i.e., selective attention allocation, attention shift) during dynamic scenes. Subject to the limitation of acquisition device, the early build salient object detection datasets do not represent the real scene well. Moreover, the evaluation metrics in this field ignore the properties of the human visual system and are all based on pixel-level error. The above problems have seriously restricted the development of salient object detection technology. This dissertation based on the cognitive theory and focuses on image and video salient object detection, the research directions including the collection of the dataset, the creation of the models and the design of evaluation metrics. The major contributions of the dissertation are: 1.My analysis points out various of serious data bias in existing SOD datasets. I built a new SOD dataset, called SOC which contains diversity context in the realistic environment. Then, a set of attributes (e.g., Appearance Change) is proposed in the attempt to obtain a deeper insight into the SOD problem. I also present the currently largest scale performance evaluation of CNNs based SOD models. 2.To further advance the research of the saliency-shift issue, I elaborately collected a high-quality Densely Annotated Video Salient Object Detection (DAVSOD) dataset. The proposed SSAV model performs better against other top competitors over the five large scale datasets. To further contribute the community a complete and the largest-scale benchmark, I systematically assess several representative video salient object detection algorithms. 3.To address the evaluation problem of the non-binary map, I propose a structure similarity based SOD measure, called S-measure. Rather than based on pixelwise error, the new measure based on structural similarity. Especially, the performance of human consistency has improved from 23% to 77%. 4.I propose a novel and effective Enhanced-alignment measure (E-measure) for binary salient object detection map. The motivation from the cognitive vision studies which have shown that human vision is highly sensitive to both global information and local details in scenes. Thus, the new measure achieve the largest improvement of 19% compared with other popular measures in terms of specific meta-measures. Key Words: salient object detection (SOD), evaluation metric, dataset, video saliency, image saliency
|