Authors
Shinichi Goto and Terumasa Aoki, Tohoku University, Japan
Abstract
This work proposes a novel system for Violent Scenes Detection, which is based on the combination of visual and audio features with machine learning at segment-level. Multiple Kernel Learning is applied so that multimodality of videos can be maximized. In particular, Mid-level Violence Clustering is proposed in order for mid-level concepts to be implicitly learned, without using manually tagged annotations. Finally a violence-score for each shot is calculated. The whole system is trained ona dataset from MediaEval 2013 Affect Task and evaluated by its official metric. The obtained results outperformed its best score.
Keywords
Multimedia Analysis, Video Processing, Machine Learning