Handwritten letter recognition without Deep Learning Feb 17, 2022

Gabor filters are a type of linear filter used in image processing and computer vision that are particularly effective at capturing texture and edge information in images.

Gabor filters are based on the Gabor function, which is a complex sinusoidal wave modulated by a Gaussian function. This unique structure allows Gabor filters to respond selectively to edges and textures at different orientations and scales:

$$g(x, y; \lambda, \theta, \psi, \sigma, \gamma) = \exp\left(-\frac{x'^2 + \gamma^2 y'^2}{2\sigma^2}\right) \exp\left(i(2\pi\frac{x'}{\lambda} + \psi)\right)$$

In this equation, \( \displaystyle{ \lambda } \) represents the wavelength of the sinusoidal factor, \( {\displaystyle \theta } \) represents the orientation of the normal to the parallel stripes of a Gabor function, \( {\displaystyle \psi } \) is the phase offset, \( {\displaystyle \sigma } \) is the sigma/standard deviation of the Gaussian envelope, and \( {\displaystyle \gamma } \) is the spatial aspect ratio, which specifies the ellipticity of the support of the Gabor function.

Below you can play around with these parameters to understand the effects of this filter:







On the left, you can see the Gabor filter with the parameters you provided. In the center, there's an image of the digit 7, and on the right, you can see the resulting image after the convolution with the Gabor filter.

This information, represented in a 28 x 28 x 3 image (28 pixels x 28 pixels x 3 channels), can be compressed using Principal Component Analysis (PCA). It works by transforming the images into a new set of orthogonal variables called principal components. These components are ordered in such a way that the first few capture the maximum variance in the data. By retaining only the most significant principal components and discarding the rest, PCA can effectively reduce the dimensionality of the images while preserving its essential structure and relationships.

After that a Support Vector Machine (SVM) model is trained to find a hyperplane that separates the compressed images. In particular, it finds a hyperplane in 40 dimensions. Below you can see the classification of the SVM in two dimensions:



The model is able to detect digits with ~95% accuracy. To test it yourself, run the Jupyter Notebook Testing.ipynb in this GitHub repository.