Due to the short scope of this project, I was not able to make strong use of the testing data provided in the LFPW data set. With all of the keypoint and HoG code that I have created, however, it will be straightforward to improve the performance of Poselet classifiers by bootstrapping them. In this context, bootstrapping means that I will run each Poselet classifier on a subset of the testing images and collect all of the locations where the classifier fires indicating a detection. Since the ground truth is labeled in the testing annotations CSV file, I can immediately detect false positive locations and add the HoG feature vector from those locations into the training data as hard negatives. Since all of the training data is being stored as it is extracted from the images, this process only incurs the cost of computing features in the test data.
The Poselet paradigm intentionally seeks to train an overcomplete set of Poselet classifiers and then select an optimal subset of these for practical use. In , the authors propose a simple greedy procedure that selects the Poselet that covers the most distinct key points not yet covered, removes that Poselet from consideration, and repeats. Though this idea is effective, it can be improved upon by using context and domain-specific knowledge for the type of objects being detected. In my case, the annotated keypoints will give conditional probabilities of the locations of parts of faces with respect to the parts belonging to a Poselet. Using these conditional distributions, one should be able to select subsets of the Poselet classifiers that are optimal for specific tasks, like tracking parts of a face through a video or highlighting places where occlusion might be a major factor.
The main cost of the Poselet pipeline is the human effort required to achieve thousands of manually labeled keypoints. Any time a new object class comes under test, the types of keypoints completely change and all old annotations become virtually useless. Since it is relatively easier to speed up the computations performed using annotations than it is to acquire these annotations, one might wonder whether we can do without the annotations all together. To explore this, I want to use some ideas from image topology to construct unsupervised image keypoint features. Recent work has shown that the associated Reeb tree of a 3D Morse function satisfies certain useful invariance and classification properties. As a proxy for the Reeb tree, one could instead use the maxima of curvature in image patches. This yields a nice "unsupervised" Poselet pipeline.