使用 Haar 类似特征描述符进行人脸分类#

Haar 类似特征描述符已成功用于实现第一个实时人脸检测器 [1]。受此应用的启发，我们提出了一个示例，说明 Haar 类似特征的提取、选择和分类，以检测人脸与非人脸。

注意#

此示例依赖于 scikit-learn 进行特征选择和分类。

参考文献#

from time import time

import numpy as np
import matplotlib.pyplot as plt

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score

from skimage.data import lfw_subset
from skimage.transform import integral_image
from skimage.feature import haar_like_feature
from skimage.feature import haar_like_feature_coord
from skimage.feature import draw_haar_like_feature

从图像中提取 Haar 类似特征的过程相对简单。首先，定义感兴趣区域 (ROI)。其次，计算此 ROI 内的积分图像。最后，使用积分图像提取特征。

def extract_feature_image(img, feature_type, feature_coord=None):
    """Extract the haar feature for the current image"""
    ii = integral_image(img)
    return haar_like_feature(
        ii,
        0,
        0,
        ii.shape[0],
        ii.shape[1],
        feature_type=feature_type,
        feature_coord=feature_coord,
    )

我们使用 CBCL 数据集的子集，该子集由 100 张人脸图像和 100 张非人脸图像组成。每张图像都已调整为 19 x 19 像素的 ROI。我们从每组中选择 75 张图像来训练分类器并确定最显著的特征。每类其余的 25 张图像用于评估分类器的性能。

images = lfw_subset()
# To speed up the example, extract the two types of features only
feature_types = ['type-2-x', 'type-2-y']

# Compute the result
t_start = time()
X = [extract_feature_image(img, feature_types) for img in images]
X = np.stack(X)
time_full_feature_comp = time() - t_start

# Label images (100 faces and 100 non-faces)
y = np.array([1] * 100 + [0] * 100)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, train_size=150, random_state=0, stratify=y
)

# Extract all possible features
feature_coord, feature_type = haar_like_feature_coord(
    width=images.shape[2], height=images.shape[1], feature_type=feature_types
)

可以训练随机森林分类器，以便选择最显著的特征，特别是用于人脸分类。其思想是确定树集成最常使用哪些特征。通过在后续步骤中仅使用最显著的特征，我们可以大大加快计算速度，同时保持准确性。

# Train a random forest classifier and assess its performance
clf = RandomForestClassifier(
    n_estimators=1000, max_depth=None, max_features=100, n_jobs=-1, random_state=0
)
t_start = time()
clf.fit(X_train, y_train)
time_full_train = time() - t_start
auc_full_features = roc_auc_score(y_test, clf.predict_proba(X_test)[:, 1])

# Sort features in order of importance and plot the six most significant
idx_sorted = np.argsort(clf.feature_importances_)[::-1]

fig, axes = plt.subplots(3, 2)
for idx, ax in enumerate(axes.ravel()):
    image = images[0]
    image = draw_haar_like_feature(
        image, 0, 0, images.shape[2], images.shape[1], [feature_coord[idx_sorted[idx]]]
    )
    ax.imshow(image)
    ax.set_xticks([])
    ax.set_yticks([])

_ = fig.suptitle('The most important features')

我们可以通过检查特征重要性的累积和来选择最重要的特征。在此示例中，我们保留表示累积值 70% 的特征（相当于仅使用特征总数的 3%）。

cdf_feature_importances = np.cumsum(clf.feature_importances_[idx_sorted])
cdf_feature_importances /= cdf_feature_importances[-1]  # divide by max value
sig_feature_count = np.count_nonzero(cdf_feature_importances < 0.7)
sig_feature_percent = round(sig_feature_count / len(cdf_feature_importances) * 100, 1)
print(
    f'{sig_feature_count} features, or {sig_feature_percent}%, '
    f'account for 70% of branch points in the random forest.'
)

# Select the determined number of most informative features
feature_coord_sel = feature_coord[idx_sorted[:sig_feature_count]]
feature_type_sel = feature_type[idx_sorted[:sig_feature_count]]
# Note: it is also possible to select the features directly from the matrix X,
# but we would like to emphasize the usage of `feature_coord` and `feature_type`
# to recompute a subset of desired features.

# Compute the result
t_start = time()
X = [extract_feature_image(img, feature_type_sel, feature_coord_sel) for img in images]
X = np.stack(X)
time_subs_feature_comp = time() - t_start

y = np.array([1] * 100 + [0] * 100)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, train_size=150, random_state=0, stratify=y
)

712 features, or 0.7%, account for 70% of branch points in the random forest.

提取特征后，我们可以训练和测试新的分类器。

t_start = time()
clf.fit(X_train, y_train)
time_subs_train = time() - t_start

auc_subs_features = roc_auc_score(y_test, clf.predict_proba(X_test)[:, 1])

summary = (
    f'Computing the full feature set took '
    f'{time_full_feature_comp:.3f}s, '
    f'plus {time_full_train:.3f}s training, '
    f'for an AUC of {auc_full_features:.2f}. '
    f'Computing the restricted feature set took '
    f'{time_subs_feature_comp:.3f}s, plus {time_subs_train:.3f}s '
    f'training, for an AUC of {auc_subs_features:.2f}.'
)

print(summary)
plt.show()

Computing the full feature set took 29.998s, plus 3.076s training, for an AUC of 1.00. Computing the restricted feature set took 0.170s, plus 2.463s training, for an AUC of 1.00.

脚本的总运行时间：（0 分钟 39.042 秒）

由 Sphinx-Gallery 生成的图库