Overview of TensorFlow Classifier Approach
What are we Building?
From the ground up, we are going to create a TensorFlow convolutional neural network classifier that can detect 43 different German Traffic Signs at approximately 95% accuracy. The model is also be able to use images from the web to make predictions on new data. It is small enough to train on a CPU, but training on a GPU will make your life easier.
Dependencies
- Python 3.5
- TensorFlow >= 0.10
- OpenCV 3
Exploratory Analysis of the German Traffic Signs Dataset
Preprocessing Images with OpenCV
When you start with this dataset, you should have the training and testing data loaded into a numpy array with a shape of (None, 32, 32, 3), where none represents the number of photos in the dataset. In this example, training data is held in the X_train, y_train, and testing in X_test, y_test.
Converting images to gray scale reduces the total size of the dataset by 66% by going from 3 color channels to just 1. This will improve the speed of the algorithm without sacrificing much information. The main features of a traffic sign are present in the shapes of the signs. OpenCV provides helper functions to make these transformations with just a few lines of code. Also, make note that we are expanding the dimensions of the images from (32x32) to (32x32x1), which is required for 2D convolutions in TensorFlow.
def preprocess(data):
"""Convert to grayscale, histogram equalize, and expand dims"""
imgs = np.ndarray((data.shape[0], 32, 32, 1), dtype=np.uint8)
for i, img in enumerate(data):
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img = cv2.equalizeHist(img)
img = np.expand_dims(img, axis=2)
imgs[i] = img
return imgs
X_train = preprocess(X_train)
X_test = preprocess(X_test)
def center_normaize(data, mean, std):
"""Center normalize images"""
data = data.astype('float32')
data -= mean
data /= std
return data
mean = np.mean(X_train)
std = np.std(X_train)
X_train = center_normaize(X_train, mean, std)
X_test = center_normaize(X_test, mean, std)
One-Hot-Encoding the Labels
In this step, we are going to One-Hot-Encode the labels to work with the softmax activation in TensorFlow. There are many ways to do this, but I am using Scikit-Learn label binnerizer to get the job done.
from sklearn.preprocessing import LabelBinarizer
ohe = LabelBinarizer().fit(y_train)
y_train_ohe = ohe.transform(y_train)
y_test_ohe = ohe.transform(y_test)
Helper Methods for TensorFlow Convolutional Neural Networks
TensorFlow neural networks can become verbose when dealing with many layers. In this section, I provide a few helper methods to keep the code more concise and readable. For more great examples of helpers, check out this TensorFlow examples repo.
def conv2d(x, W, b, strides=1):
# Conv2D wrapper, with bias and relu activation
x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding='SAME')
x = tf.nn.bias_add(x, b)
return tf.nn.relu(x)
def maxpool2d(x, k=2):
return tf.nn.max_pool(
x,
ksize=[1, k, k, 1],
strides=[1, k, k, 1],
padding='SAME')
Building and Tuning the Classifier
Now it's time to build the actual model. This model is similar to the VGG-style convnets that have been very successful at image classification tasks.
Model Structure Layer-by-Layer
The list below shows each layer and the shape of the tensor it outputs. In total, we have 4 convolutional layers and two dense layers that output a prediction of 43 logits (for 43 diffent classes of traffic signs).
- Input: (32, 32, 1)
- Conv: (32, 32, 64)
- Pool: (16, 16, 64)
- Conv: (16, 16, 128)
- Pool: (8, 8, 128)
- Conv: (8, 8, 256)
- Conv: (8, 8, 256)
- Pool: (4, 4, 256)
- Flatten: (2048)
- FullyConnected: (400)
- Dropout 0.5
- FullyConnected: (200)
- Dropout 0.5
- Output: (43)
Coding Up the Model
Below is the code for the actual model. Note that the weights and biases have not been defined yet, which will be completed in the next step.
def conv_net(x, weights, biases, dropout):
conv1 = conv2d(x, weights['layer_1'], biases['layer_1'])
conv1 = maxpool2d(conv1)
conv2 = conv2d(conv1, weights['layer_2'], biases['layer_2'])
conv2 = maxpool2d(conv2)
conv3 = conv2d(conv2, weights['layer_3'], biases['layer_3'])
conv4 = conv2d(conv3, weights['layer_4'], biases['layer_4'])
conv4 = maxpool2d(conv4)
fc1 = tf.reshape(conv4, [-1, weights['dense_1'].get_shape().as_list()[0]])
fc1 = tf.add(tf.matmul(fc1, weights['dense_1']), biases['dense_1'])
fc1 = tf.nn.relu(fc1)
fc1 = tf.nn.dropout(fc1, dropout_prob)
fc2 = tf.add(tf.matmul(fc1, weights['dense_2']), biases['dense_2'])
fc2 = tf.nn.relu(fc2)
fc2 = tf.nn.dropout(fc2, dropout_prob)
out = tf.add(tf.matmul(fc2, weights['out']), biases['out'])
return out
Defining Hyperparameters, Weights, Biases for a TensorFlow Session
This Section is Locked!
Unlock this lesson for $10 to view all sections.
Running the Model with Shuffled Batches
Now it's time to run a TensorFlow session to train the model and generate predictions. Here's what's going on step-by-step in the code below.
- Initializing global variables (just resetting everything).
- Starting a TensorFlow session. Training and prediction will always happen within the session scope.
- Iterating over the number of epochs. The data is reshuffled at each epoch with Numpy.
- Iterating over each batch and feeding the data to the optimizer. This is where the training happens. For each batch of images, the neural network will make predictions on the data, then adjust the weights based on the cost function.
- Check the training progress by predicting the test data. You can get predictions in different formats, in this case we are returning the crossentropy loss and accuracy.
- When all epochs are finished, we save the model for use in later TensorFlow sessions.
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
for epoch in range(training_epochs):
total_batch = int(n_train/batch_size)
# shuffle data index for each epoch
rand_idx = np.random.permutation(n_train)
for i in range(total_batch):
offset = i*batch_size
off_end = offset+batch_size
batch_idx = rand_idx[offset:off_end]
batch_x = X_train[batch_idx]
batch_y = y_train_ohe[batch_idx]
sess.run(optimizer, feed_dict={x: batch_x, y: batch_y, keep_prob: dropout_prob})
cost_ts, acc_ts = sess.run([cost, accuracy], feed_dict={x: X_test, y: y_test_ohe, keep_prob: 1.})
print("Cost: {:.5f} | Accuracy: {:.5f}".format(cost_ts, acc_ts))
save_path = saver.save(sess, "models/model.ckpt")
print("Training Complete! Model saved in file: %s" % save_path)
Validating the Model with Photos from the Web
This Section is Locked!
Unlock this lesson for $10 to view all sections.
Processed Traffic Signs from the Web
Analyzing the Top K Predictions
This Section is Locked!
Unlock this lesson for $10 to view all sections.
Visualization of Logits for each Prediction
Grades
Nobody has graded this lesson yet.
- 29 Unlocks
- 6405 Total Reads
- about 1 hour Est. Learning Time
by machinehorizon
Aloha! My name is machinehorizon. I have been a CampusHippo member for over 5 years. I currently offer 3 lessons with a combined overall grade of a and sales amounting to $757.00
- Spreading knowledge of machine learning, data science, and artificial intelligence.