Serial Feature Extraction
The Histogram of Oriented Gradient Feature
Histogram of Oriented Gradients (HoG)  are a common feature vector used in computer vision for detection and recognition. The strategy is to take a small patch of an image, divide it into coarse sub-regions, and compute the angles and magnitudes of the gradient within each subregion. One defines a fixed number of angular value bins (usually between 9 and 15) and then either divides the whole circle (360 degrees) or half circle (180 degrees) into these bins. For each sub region of the image patch, every pixel contributes its gradient magnitude into the angular histogram bin determined by the angle the gradient forms at that pixel. See the figure below for an illustration.
The above image shows an original image patch on the left. The center and right images show the computed gradients at each location, and the brightness is proportional to the gradient magnitude at that location. Each quantized gradient location is represented by a line pointing in the direction of the gradient at that pixel. HoG works by subdividing the image into regions, and then making a histogram of the gradient angles in each region. The contribution to an angular histogram bin is the gradient magnitude, and then each histogram vector from the different regions is concatenated to make the final feature vector.
Vanilla Serial Implementation
In the vanilla serial implementation, all of the relevant quantities are computed using NumPy vectorized operations on the entire image patch and logical indexing. No edge detection is used. You can find my implementation in the software page.
Python pHOG Implementation
A pyramidal approach to the HoG descriptor that attempts to make it invariant to object scale and less sensitive to spatial binning was developed by . Most of the Matlab-specific functions they use are also found in NumPy. Canny edge detection is not found in NumPy, but it is found in the scikits.image  package. By combining these Python packages, I was able to replicate the Matlab code in my own Python version of pHoG which can be found in the software page.
Issues with Serial Implementations
Most HoG implementations are patch-based, which means that the user supplies a region-of-interest to the code and the histogram feature is computed for that region. One popular version, by Dalal and Triggs  (link), however, is based on key points. The user supplies a list of key point locations in an image and the program will enlarge a window around the key points and compute the HoG descriptor for that window. The C implementation by Dalal and Triggs is highly optimized and thus much faster than my homemade Python functions. However, it was very difficult to make my problem conform to the key point style used by the Dalal and Triggs program. Thus, I chose not to perform timing analysis with the Dalal and Triggs code and instead only compared GPU versions of the feature extractor with my own Python versions.