- Feature Extraction
- Conventional Feature Extraction Techniques
- Convolution over SIFT and HOG
- What is a feature (in an image)?
Features are various forms of information that can be gained from an image.
For example: Fig 1 has various features, such as shape, size, color, edges, and background.
These features play a crucial role in helping perform several tasks in computer vision like
image classification, object detection, scene detection etc.
- The features required by each task in computer vision are completely task-dependent, and each
task might not require all the features available in an image.
For example: We can still identify that Fig 2 is an image of a duck (even though it is harder to do so than for Fig 1) without necessarily knowing the color or background of the image, but only from looking at the edges, the presence of water and the shape of the bird inside the image.
- Hence, we need methods to extract just the essential information contained in images and ignore the rest depending on different tasks. This step is called feature extraction.
- The features we obtain from feature extraction are crucial in enhancing the
- Feature extraction methods help in converting images to certain feature
vectors of fixed sizes.
- Some of the most important feature extraction techniques to have been used
with images are:
○ HOG (Histogram of Oriented Gradients)
○ SIFT (Scale Invariant Feature Transform)
○ Convolution Operations
Conventional Feature Extraction Techniques
The two most historically-important, manual feature extraction techniques have been:
- HOG (Histogram of Oriented Gradients)
- SIFT (Scale Invariant Feature Transform)
|HOG mainly focuses on the
structure and shape of an
|In SIFT, image content is converted
into local feature coordinates that are
not affected by rotation, scaling, or
other image manipulations
|HOG is different from only
detecting edges, as it also
identifies the magnitude and
direction of edges in the image.
|SIFT assists in locating the local
features of an image, often known as
|HOG calculates the magnitude
and direction of edges in each
region. The Orientation is the
direction and the Gradient is the
magnitude of the pixel values of
|The key points obtained from SIFT
can be utilised for picture
matching, object detection, scene
detection, and other computer
- Why was SIFT preferred over HOG?
+ SIFT features, as opposed to HOG features, have the advantage of being unaffected
by the image’s size or orientation.
+ SIFT has a better accuracy than HOG for detecting features in an image.
+ HOG is not scale and rotation invariant, whereas SIFT shows those properties.
- However, both SIFT and HOG showed certain disadvantages in the efforts to apply them as
general feature extraction techniques for all images:
+ Both SIFT and HOG are quite slow and computationally expensive.
+ They are also somewhat mathematically complex in their working.
+ In addition, HOG does not work well with lighting changes and blurring in the images.
SIFT and HOG
- Convolution is a specialized linear operation on an image, that represents an efficient way of extracting image features and reducing the dimensions of an image.
- Convolution consists of a set of filters called convolution layers, that perform convolution operations on images.
- We use multiple filters to perform convolution operations on an image, and try to extract various kinds of features (pertaining to each kind of filter) from a single image.
For example: Let’s assume that we need to use convolutions to extract features from this image of a brick wall.
● We may be interested in each of the following feature extraction tasks:
+ to extract all the vertical edges from the image
+ to extract all the horizontal edges from the image
+ to blur/sharpen the image
+ To highlight/focus on certain places of image
● As seen from the below examples, convolution filters can help us achieve any of the feature extraction tasks we may require for our images.
Convolution over SIFT and HOG
Why is Convolution better than SIFT and HOG?
● Convolutional feature detectors are highly trainable and adaptable, allowing them to achieve higher accuracy levels in comparison to SIFT and HOG for the task at hand.
● Convolutions excel at learning low-level features of an image in a much better way than SIFT and HOG, and they do so without the overhead of the hand-coded feature engineering which is usually required for SIFT and HOG.
● Apart from learning low-level features, hierarchical combinations of convolutions are quite effective in learning important high-level features as well.
Example: For images of human faces, convolutional layers would easily learn to understand more complex shapes such as the eyes, the ears, the nose or the mouth.
● In 2012, AlexNet, a Convolutional Neural Network (CNN) architecture, based
fundamentally on the principle of convolutional filters, handily won the famous ImageNet competition – outperforming the runner-up by over 10 percentage points. Although convolutions were already known in literature from the work of Yann LeCun, this breakthrough is what drew attention from the whole technology industry to the power of convolutions in image feature extraction.
● It soon became clear to machine learning practitioners that hierarchical combinations of convolution filters achieved superior and far more generalizable results in image feature extraction than SIFT and HOG. This is the fundamental driver behind the emergence of convolutions as part of Convolutional Neural Networks (CNNs), which have become a staple in state-of-the-art deep learning models for computer vision over the last decade.