Object category detection using audio-visual cues

Luo, Jie; Caputo, Barbara; Zweig, Alon; Bach, Jörg Hendrik; Anemüller, Jörn

doi:10.1007/978-3-540-79547-6_52

Categorization is one of the fundamental building blocks of cognitive systems. Object categorization has traditionally been addressed in the vision domain, even though cognitive agents are intrinsically multimodal. Indeed, biological systems combine several modalities in order to achieve robust categorization. In this paper we propose a multimodal approach to object category detection, using audio and visual information. The auditory channel is modeled on biologically motivated spectral features via a discriminative classifier. The visual channel is modeled by a state of the art part based model. Multimodality is achieved using two fusion schemes, one high level and the other low level. Experiments on six different object categories, under increasingly difficult conditions, show strengths and weaknesses of the two approaches, and clearly underline the open challenges for multimodal category detection. © 2008 Springer-Verlag Berlin Heidelberg.

Object category detection using audio-visual cues / Luo, Jie; Caputo, Barbara; Zweig, Alon; Bach, Jörg Hendrik; Anemüller, Jörn. - STAMPA. - 5008:(2008), pp. 539-548. ( 6th International Conference on Computer Vision Systems, ICVS 2008 Santorini; Greece 12-15 May 2008) [10.1007/978-3-540-79547-6_52].

Object category detection using audio-visual cues

Luo, Jie;CAPUTO, BARBARA;Zweig, Alon;Bach, Jörg Hendrik;Anemüller, Jörn

2008

Abstract

Categorization is one of the fundamental building blocks of cognitive systems. Object categorization has traditionally been addressed in the vision domain, even though cognitive agents are intrinsically multimodal. Indeed, biological systems combine several modalities in order to achieve robust categorization. In this paper we propose a multimodal approach to object category detection, using audio and visual information. The auditory channel is modeled on biologically motivated spectral features via a discriminative classifier. The visual channel is modeled by a state of the art part based model. Multimodality is achieved using two fusion schemes, one high level and the other low level. Experiments on six different object categories, under increasingly difficult conditions, show strengths and weaknesses of the two approaches, and clearly underline the open challenges for multimodal category detection. © 2008 Springer-Verlag Berlin Heidelberg.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2008
			
	Nome convegno
	
				6th International Conference on Computer Vision Systems, ICVS 2008
			
	Parole chiave
	
				Audio-visual Fusion; Multimodal Recognition; Object Categorization; Computer Science (all); Biochemistry, Genetics and Molecular Biology (all); Theoretical Computer Science
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				Object category detection using audio-visual cues / Luo, Jie; Caputo, Barbara; Zweig, Alon; Bach, Jörg Hendrik; Anemüller, Jörn. - STAMPA. - 5008:(2008), pp. 539-548. ( 6th International Conference on Computer Vision Systems, ICVS 2008 Santorini; Greece 12-15 May 2008) [10.1007/978-3-540-79547-6_52].
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/951717

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

3

1

Catalogo dei prodotti della ricerca