In this work, we show how to co-train a classifier for active speaker detection using audio-visual data.First, audio Voice Activity Detection (VAD) is used to train a personalized video-based active speaker classifier in a weakly supervised fashion.The video classifier is in turn used to train a voice model for each person.The individual voice models are then used to detect active speakers.There is no manual supervision -audio weakly supervises video classification, and the co-training loop is completed by using the trained video classifier to supervise the training of a personalized audio voice classifier.