Multimodal speaker recognition in a conversation scenario