認知心理學早已指出,人類知識記憶中的重要部分是視覺知識,被用來進行形象思維。因此,基于視覺的人工智能(AI)是AI繞不開的課題,且具有重要意義。本文繼《論視覺知識》一文,討論與之相關的5個基本問題:(1)視覺知識表達;(2)視覺識別;(3)視覺形象思維模擬;(4)視覺知識的學習;(5)多重知識表達。視覺知識的獨特優點是具有形象的綜合生成能力,時空演化能力和形象顯示能力。這些正是字符知識和深度神經網絡所缺乏的。AI與計算機輔助設計/圖形學/視覺的技術聯合將在創造、預測和人機融合等方面對AI新發展提供重要的基礎動力。視覺知識和多重知識表達的研究是發展新的視覺智能的關鍵,也是促進AI 2.0取得重要突破的關鍵理論與技術。這是一塊荒蕪、寒濕而肥沃的“北大荒”,也是一塊充滿希望值得多學科合作勇探的“無人區”。

Yun-he Pan ,   et al.

To boost research into cognition-level visual understanding, i.e., making an accurate inference based on a thorough understanding of visual details, (VCR) has been proposed. Compared with traditional visual question answering which requires models to select correct answers, VCR requires models to select not only the correct answers, but also the correct rationales. Recent research into human cognition has indicated that brain function or cognition can be considered as a global and dynamic integration of local neuron connectivity, which is helpful in solving specific cognition tasks. Inspired by this idea, we propose a to achieve VCR by dynamically reorganizing the that is contextualized using the meaning of questions and answers and leveraging the directional information to enhance the reasoning ability. Specifically, we first develop a GraphVLAD module to capture to fully model visual content correlations. Then, a contextualization process is proposed to fuse sentence representations with visual neuron representations. Finally, based on the output of , we propose to infer answers and rationales, which includes a ReasonVLAD module. Experimental results on the VCR dataset and visualization analysis demonstrate the effectiveness of our method.

Yahong Han ,   Aming Wu   et al.
Object detection is one of the hottest research directions in computer vision, has already made impressive progress in academia, and has many valuable applications in the industry. However, the mainstream detection methods still have two shortcomings: (1) even a model that is well trained using large amounts of data still cannot generally be used across different kinds of scenes; (2) once a model is deployed, it cannot autonomously evolve along with the accumulated unlabeled scene data. To address these problems, and inspired by theory, we propose a novel scene-adaptive evolution algorithm that can decrease the impact of scene changes through the concept of object groups. We first extract a large number of object proposals from unlabeled data through a pre-trained detection model. Second, we build the dictionary of object concepts by clustering the proposals, in which each cluster center represents an object prototype. Third, we look into the relations between different clusters and the object information of different groups, and propose a graph-based group information propagation strategy to determine the category of an object concept, which can effectively distinguish positive and negative proposals. With these pseudo labels, we can easily fine-tune the pre-trained model. The effectiveness of the proposed method is verified by performing different experiments, and the significant improvements are achieved.

Shiliang Pu ,   Wei Zhao   et al.

Most Popular