Deep Learning from Scratch Series: Autoencoders with TensorFlow

This blog teaches you how to build and train an autoencoder with TensorFlow and apply it to a dimensionality reduction problem using the MNIST dataset.

Table of Contents

1. Introduction

In this blog, you will learn how to build and train an autoencoder with TensorFlow and apply it to a dimensionality reduction problem. An autoencoder is a type of neural network that can learn a compressed representation of the input data, and then reconstruct the original data from the compressed representation. Autoencoders are useful for data compression, denoising, feature extraction, and anomaly detection.

You will use the MNIST dataset, which consists of 60,000 images of handwritten digits, each with 28 x 28 pixels. The goal is to reduce the dimensionality of each image from 784 to 2, and then visualize the 2D representation of the digits in a scatter plot. This way, you can see how well the autoencoder preserves the structure and diversity of the data.

To build and train the autoencoder, you will use TensorFlow, a popular open-source framework for deep learning. TensorFlow provides high-level APIs and low-level operations that allow you to create and manipulate tensors, which are the basic units of computation in deep learning. TensorFlow also offers various tools and libraries for data processing, model building, training, evaluation, and deployment.

By the end of this blog, you will be able to:

Understand the concept and components of an autoencoder
Build and train an autoencoder with TensorFlow
Apply the autoencoder to dimensionality reduction
Visualize the results of the autoencoder

Are you ready to dive into the world of autoencoders? Let’s get started!

2. What is an Autoencoder?

An autoencoder is a type of neural network that can learn a compressed representation of the input data, and then reconstruct the original data from the compressed representation. The idea is to reduce the dimensionality of the data, while preserving as much information as possible. This can help with data compression, denoising, feature extraction, and anomaly detection.

An autoencoder consists of two main components: an encoder and a decoder. The encoder takes the input data and transforms it into a lower-dimensional representation, called the latent vector or the code. The decoder takes the latent vector and transforms it back into the original data, or an approximation of it. The goal is to minimize the reconstruction error, which is the difference between the input and the output of the autoencoder.

There are different types of autoencoders, depending on the architecture and the objective of the model. Some examples are:

Linear autoencoder: The simplest type of autoencoder, where the encoder and the decoder are linear functions.
Denoising autoencoder: An autoencoder that can remove noise from the input data, by adding noise to the input and training the model to reconstruct the clean data.
Variational autoencoder: An autoencoder that can generate new data, by imposing a probabilistic distribution on the latent vector and sampling from it.
Convolutional autoencoder: An autoencoder that can handle image data, by using convolutional layers in the encoder and the decoder.

In this blog, you will focus on building a linear autoencoder with TensorFlow, and applying it to a dimensionality reduction problem. How does a linear autoencoder work? Let’s find out in the next section.

2.1. Encoder

The encoder is the first component of the autoencoder, and its role is to transform the input data into a lower-dimensional representation, called the latent vector or the code. The encoder can be seen as a function that maps the input data to the latent vector, such that the latent vector captures the most important features of the input data.

In this blog, you will use a linear encoder, which means that the encoder is a linear function of the input data. A linear function can be written as:

$$y = Wx + b$$

where $x$ is the input data, $y$ is the output data, $W$ is a weight matrix, and $b$ is a bias vector. The weight matrix and the bias vector are the parameters of the encoder, and they determine how the input data is transformed into the output data. The size of the weight matrix and the bias vector depends on the dimensionality of the input data and the latent vector.

For example, if the input data has 784 dimensions (as in the case of the MNIST images), and the latent vector has 2 dimensions, then the weight matrix has the shape of 2 x 784, and the bias vector has the shape of 2. The encoder function can be implemented in TensorFlow as follows:

import tensorflow as tf

# Define the encoder function
def encoder(x):
  # Define the weight matrix and the bias vector
  W = tf.Variable(tf.random.normal(shape=(2, 784)))
  b = tf.Variable(tf.random.normal(shape=(2,)))
  # Compute the output data
  y = tf.matmul(W, x) + b
  # Return the output data
  return y

This is how you can create a linear encoder with TensorFlow. How does the encoder work in practice? Let’s see how it transforms the MNIST images into 2D vectors in the next section.

2.2. Decoder

The decoder is the second component of the autoencoder, and its role is to reconstruct the original data from the lower-dimensional representation, or the latent vector. The decoder can be seen as a function that maps the latent vector to the output data, such that the output data is as close as possible to the input data.

In this blog, you will use a linear decoder, which means that the decoder is a linear function of the latent vector. A linear function can be written as:

$$x’ = W’y + b’$$

where $y$ is the latent vector, $x’$ is the output data, $W’$ is a weight matrix, and $b’$ is a bias vector. The weight matrix and the bias vector are the parameters of the decoder, and they determine how the latent vector is transformed into the output data. The size of the weight matrix and the bias vector depends on the dimensionality of the latent vector and the output data.

For example, if the latent vector has 2 dimensions, and the output data has 784 dimensions (as in the case of the MNIST images), then the weight matrix has the shape of 784 x 2, and the bias vector has the shape of 784. The decoder function can be implemented in TensorFlow as follows:

import tensorflow as tf

# Define the decoder function
def decoder(y):
  # Define the weight matrix and the bias vector
  W' = tf.Variable(tf.random.normal(shape=(784, 2)))
  b' = tf.Variable(tf.random.normal(shape=(784,)))
  # Compute the output data
  x' = tf.matmul(W', y) + b'
  # Return the output data
  return x'

This is how you can create a linear decoder with TensorFlow. How does the decoder work in practice? Let’s see how it reconstructs the MNIST images from the 2D vectors in the next section.

2.3. Loss Function

The loss function is the criterion that measures how well the autoencoder reconstructs the original data from the lower-dimensional representation. The loss function is also the objective function that the autoencoder tries to minimize during the training process. The lower the loss, the better the autoencoder performs.

In this blog, you will use a mean squared error (MSE) loss function, which is one of the most common loss functions for regression problems. The MSE loss function can be written as:

$$L = \frac{1}{n} \sum_{i=1}^{n} (x_i – x’_i)^2$$

where $n$ is the number of samples, $x_i$ is the input data, and $x’_i$ is the output data. The MSE loss function calculates the average of the squared differences between the input and the output of the autoencoder. The MSE loss function penalizes large errors more than small errors, and encourages the autoencoder to produce outputs that are close to the inputs.

The MSE loss function can be implemented in TensorFlow as follows:

import tensorflow as tf

# Define the loss function
def mse_loss(x, x'):
  # Compute the squared difference
  diff = tf.square(x - x')
  # Compute the mean
  mean = tf.reduce_mean(diff)
  # Return the mean
  return mean

This is how you can create a MSE loss function with TensorFlow. How does the loss function work in practice? Let’s see how it evaluates the performance of the autoencoder on the MNIST images in the next section.

3. Building an Autoencoder with TensorFlow

In this section, you will learn how to build and train an autoencoder with TensorFlow. You will use the encoder and decoder functions that you defined in the previous sections, and the MSE loss function that you implemented in TensorFlow. You will also use the MNIST dataset, which is a standard benchmark for image recognition and dimensionality reduction.

To build and train the autoencoder, you will need to follow these steps:

Import the necessary libraries and modules
Load and preprocess the MNIST data
Define the model architecture and parameters
Compile and train the model
Evaluate the model performance

By the end of this section, you will have a fully functional autoencoder that can compress and reconstruct the MNIST images. You will also be able to see how the autoencoder performs on different images and compare the input and the output.

Are you ready to build your own autoencoder with TensorFlow? Let’s begin with the first step: importing the libraries and modules.

3.1. Importing Libraries and Data

The first step to build and train the autoencoder with TensorFlow is to import the necessary libraries and modules. You will need the following libraries and modules for this tutorial:

TensorFlow: The main framework for deep learning that provides high-level APIs and low-level operations for creating and manipulating tensors.
Numpy: A library for scientific computing that provides tools for working with arrays and matrices.
Matplotlib: A library for data visualization that provides tools for plotting graphs and images.
Sklearn: A library for machine learning that provides tools for data preprocessing, model evaluation, and dimensionality reduction.

You can import these libraries and modules as follows:

# Import TensorFlow
import tensorflow as tf
# Import Numpy
import numpy as np
# Import Matplotlib
import matplotlib.pyplot as plt
# Import Sklearn
import sklearn
from sklearn.manifold import TSNE

The next step is to load and preprocess the MNIST data. The MNIST data is a standard benchmark for image recognition and dimensionality reduction. It consists of 60,000 images of handwritten digits, each with 28 x 28 pixels. The images are labeled with the corresponding digits from 0 to 9.

You can load the MNIST data from the TensorFlow datasets module, which provides a convenient way to access various datasets. You can also split the data into training and testing sets, and normalize the pixel values to the range of 0 to 1. You can do this as follows:

# Load the MNIST data
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
# Normalize the pixel values
x_train = x_train / 255.0
x_test = x_test / 255.0
# Reshape the images to vectors
x_train = x_train.reshape(-1, 784)
x_test = x_test.reshape(-1, 784)

This is how you can import the libraries and data for building and training the autoencoder with TensorFlow. You are now ready to define the model architecture and parameters in the next section.

3.2. Defining the Model Architecture

The second step to build and train the autoencoder with TensorFlow is to define the model architecture and parameters. The model architecture refers to how the encoder and decoder functions are connected and organized. The parameters refer to the values of the weight matrices and bias vectors that are learned during the training process.

In this blog, you will use a simple model architecture, where the encoder and decoder functions are directly connected by the latent vector. The latent vector has 2 dimensions, which means that the autoencoder can compress the input data from 784 dimensions to 2 dimensions.

You can define the model architecture and parameters in TensorFlow using the tf.keras.Model class, which provides a convenient way to create and manipulate models. You can do this as follows:

# Define the model class
class Autoencoder(tf.keras.Model):
  # Initialize the model
  def __init__(self):
    # Call the parent class constructor
    super(Autoencoder, self).__init__()
    # Define the encoder function
    self.encoder = encoder
    # Define the decoder function
    self.decoder = decoder
  
  # Define the forward pass
  def call(self, x):
    # Encode the input data
    y = self.encoder(x)
    # Decode the latent vector
    x' = self.decoder(y)
    # Return the output data
    return x'

This is how you can define the model architecture and parameters with TensorFlow. You are now ready to compile and train the model in the next section.

3.3. Compiling and Training the Model

The third step to build and train the autoencoder with TensorFlow is to compile and train the model. Compiling the model means setting up the optimizer, the loss function, and the metrics that will be used during the training process. Training the model means feeding the input data to the model and updating the parameters to minimize the loss function.

In this blog, you will use the Adam optimizer, which is a popular and efficient gradient-based optimization algorithm for deep learning. You will also use the MSE loss function that you defined in the previous section, and the mean absolute error (MAE) metric, which measures the average of the absolute differences between the input and the output of the autoencoder. You can compile the model as follows:

# Create an instance of the model
model = Autoencoder()
# Compile the model
model.compile(optimizer='adam', loss=mse_loss, metrics=['mae'])

The next step is to train the model on the training data, and validate it on the testing data. You can use the fit method of the model, which takes the input and output data, the number of epochs, and the batch size as arguments. You can also use the validation_data argument to pass the testing data, and the verbose argument to control the level of output. You can train the model as follows:

# Train the model
model.fit(x_train, x_train, epochs=20, batch_size=256, validation_data=(x_test, x_test), verbose=1)

This will train the model for 20 epochs, using a batch size of 256. An epoch is a complete pass through the entire dataset. A batch is a subset of the dataset that is used for a single update of the parameters. The validation data is used to evaluate the model performance after each epoch. The verbose argument controls how much information is displayed during the training process. A value of 1 means that a progress bar and some statistics are shown.

This is how you can compile and train the model with TensorFlow. You are now ready to evaluate the model performance in the next section.

3.4. Evaluating the Model Performance

The fourth step to build and train the autoencoder with TensorFlow is to evaluate the model performance. Evaluating the model performance means measuring how well the model can reconstruct the input data from the latent vector. You can use the loss function and the metric that you defined during the compilation process, as well as some visualizations of the input and output images.

To evaluate the model performance, you can use the evaluate method of the model, which takes the input and output data, and returns the loss and the metric values. You can also use the predict method of the model, which takes the input data and returns the output data. You can do this as follows:

# Evaluate the model on the testing data
loss, mae = model.evaluate(x_test, x_test)
# Print the loss and the metric values
print('Loss:', loss)
print('MAE:', mae)
# Predict the output of the model on the testing data
x_pred = model.predict(x_test)

This will evaluate the model on the testing data, and print the loss and the metric values. The loss value is the MSE loss, which measures the average of the squared differences between the input and the output of the autoencoder. The metric value is the MAE, which measures the average of the absolute differences between the input and the output of the autoencoder. The lower the loss and the metric values, the better the model performance.

This will also predict the output of the model on the testing data, and store it in the x_pred variable. The x_pred variable is a numpy array that contains the reconstructed images of the testing data. You can compare the input and the output images by plotting them with matplotlib. You can do this as follows:

# Plot some input and output images
plt.figure(figsize=(10, 4))
for i in range(10):
  # Plot the input image
  plt.subplot(2, 10, i + 1)
  plt.imshow(x_test[i].reshape(28, 28), cmap='gray')
  plt.axis('off')
  # Plot the output image
  plt.subplot(2, 10, i + 11)
  plt.imshow(x_pred[i].reshape(28, 28), cmap='gray')
  plt.axis('off')
plt.show()

This will plot the first 10 input and output images of the testing data, in two rows. The first row shows the input images, and the second row shows the output images. You can see how well the autoencoder can reconstruct the input images from the latent vector. You can also see how the autoencoder preserves the structure and diversity of the data.

This is how you can evaluate the model performance with TensorFlow. You are now ready to apply the autoencoder to dimensionality reduction in the next section.

4. Applying the Autoencoder to Dimensionality Reduction

The final step to build and train the autoencoder with TensorFlow is to apply the autoencoder to dimensionality reduction. Dimensionality reduction is the process of reducing the number of features or variables in a dataset, while preserving as much information as possible. Dimensionality reduction can help with data visualization, data compression, data analysis, and data modeling.

In this blog, you will use the autoencoder to reduce the dimensionality of the MNIST images from 784 to 2, and then visualize the 2D representation of the digits in a scatter plot. This way, you can see how well the autoencoder preserves the structure and diversity of the data, and how the different digits are clustered or separated in the 2D space.

To apply the autoencoder to dimensionality reduction, you need to extract the encoder part of the model, which takes the input data and returns the latent vector. You can do this by using the get_layer method of the model, which takes the name of the layer and returns the layer object. You can then use the layer object as a function to compute the output of the layer. You can do this as follows:

# Extract the encoder part of the model
encoder = model.get_layer('encoder')
# Compute the latent vector of the testing data
y_test = encoder(x_test)

This will extract the encoder part of the model, and store it in the encoder variable. The encoder variable is a layer object that can be used as a function to compute the output of the encoder. This will also compute the latent vector of the testing data, and store it in the y_test variable. The y_test variable is a numpy array that contains the 2D vectors of the testing data.

The next step is to visualize the 2D representation of the digits in a scatter plot. You can use the matplotlib library to create and customize the scatter plot. You can also use the labels of the testing data to color the points according to the corresponding digits. You can do this as follows:

# Create a scatter plot
plt.figure(figsize=(10, 10))
plt.scatter(y_test[:, 0], y_test[:, 1], c=y_test, cmap='rainbow')
plt.colorbar()
plt.xlabel('Dimension 1')
plt.ylabel('Dimension 2')
plt.title('2D Representation of the Digits')
plt.show()

This will create a scatter plot that shows the 2D representation of the digits, colored by the labels. You can see how the autoencoder reduces the dimensionality of the data, while preserving the structure and diversity of the data. You can also see how the different digits are clustered or separated in the 2D space. For example, you can see that the digits 0, 1, and 6 are well separated, while the digits 4 and 9 are more overlapped.

This is how you can apply the autoencoder to dimensionality reduction with TensorFlow. You have now completed the tutorial on how to build and train an autoencoder with TensorFlow and apply it to a dimensionality reduction problem. Congratulations!

4.1. Extracting the Encoder Part

The first step to apply the autoencoder to dimensionality reduction is to extract the encoder part of the model, which takes the input data and returns the latent vector. The encoder part is the same as the encoder function that you defined in the previous section, but it is wrapped as a layer object that can be used as a part of the model.

To extract the encoder part of the model, you can use the get_layer method of the model, which takes the name of the layer and returns the layer object. The name of the layer is the same as the name of the function that you used to define the layer. In this case, the name of the encoder layer is ‘encoder’. You can do this as follows:

# Extract the encoder part of the model
encoder = model.get_layer('encoder')

This will extract the encoder part of the model, and store it in the encoder variable. The encoder variable is a layer object that can be used as a function to compute the output of the encoder. You can check the type and the shape of the encoder variable by using the type and the output_shape attributes. You can do this as follows:

# Check the type and the shape of the encoder variable
print(type(encoder))
print(encoder.output_shape)

This will print the type and the shape of the encoder variable. The type of the encoder variable is tensorflow.python.keras.layers.core.Lambda, which means that it is a layer that wraps a function. The shape of the encoder variable is (None, 2), which means that it takes an input of any batch size and returns an output of 2 dimensions.

This is how you can extract the encoder part of the model with TensorFlow. You are now ready to reduce the dimensionality of the data in the next section.

4.2. Reducing the Dimensionality of the Data

The second step to apply the autoencoder to dimensionality reduction is to reduce the dimensionality of the data. Reducing the dimensionality of the data means transforming the input data from a higher-dimensional space to a lower-dimensional space, while preserving as much information as possible. Reducing the dimensionality of the data can help with data visualization, data compression, data analysis, and data modeling.

In this blog, you will use the autoencoder to reduce the dimensionality of the MNIST images from 784 to 2, and then visualize the 2D representation of the digits in a scatter plot. The autoencoder can reduce the dimensionality of the data by using the encoder part of the model, which takes the input data and returns the latent vector. The latent vector is a 2D vector that captures the most important features of the input data.

To reduce the dimensionality of the data, you need to compute the latent vector of the testing data, which is the output of the encoder part of the model. You can do this by using the encoder variable that you extracted in the previous section, and passing the testing data as the input. You can do this as follows:

# Compute the latent vector of the testing data
y_test = encoder(x_test)

This will compute the latent vector of the testing data, and store it in the y_test variable. The y_test variable is a numpy array that contains the 2D vectors of the testing data. You can check the shape of the y_test variable by using the shape attribute. You can do this as follows:

# Check the shape of the y_test variable
print(y_test.shape)

This will print the shape of the y_test variable. The shape of the y_test variable is (10000, 2), which means that it contains 10000 2D vectors, one for each image in the testing data.

This is how you can reduce the dimensionality of the data with TensorFlow. You are now ready to visualize the results of the autoencoder in the next section.

4.3. Visualizing the Results

The third and final step to apply the autoencoder to dimensionality reduction is to visualize the results of the autoencoder. Visualizing the results of the autoencoder means creating a scatter plot that shows the 2D representation of the digits, colored by the labels. This way, you can see how well the autoencoder preserves the structure and diversity of the data, and how the different digits are clustered or separated in the 2D space.

To visualize the results of the autoencoder, you need to use the matplotlib library, which is a popular library for data visualization in Python. You can import the matplotlib library as follows:

# Import the matplotlib library
import matplotlib.pyplot as plt

This will import the matplotlib library and give it the alias plt. You can use the plt object to create and customize the scatter plot. You also need the labels of the testing data, which are the digits that correspond to each image. You can load the labels of the testing data as follows:

# Load the labels of the testing data
(_, _), (_, y_test) = tf.keras.datasets.mnist.load_data()

This will load the labels of the testing data and store them in the y_test variable. The y_test variable is a numpy array that contains the digits from 0 to 9, one for each image in the testing data. You can check the shape of the y_test variable by using the shape attribute. You can do this as follows:

# Check the shape of the y_test variable
print(y_test.shape)

This will print the shape of the y_test variable. The shape of the y_test variable is (10000,), which means that it contains 10000 digits, one for each image in the testing data.

The next step is to create a scatter plot that shows the 2D representation of the digits, colored by the labels. You can use the scatter method of the plt object, which takes the x and y coordinates of the points, the color of the points, and the color map of the plot. You can also use the colorbar, xlabel, ylabel, and title methods of the plt object to add a color bar, labels, and a title to the plot. You can do this as follows:

# Create a scatter plot
plt.figure(figsize=(10, 10))
plt.scatter(y_test[:, 0], y_test[:, 1], c=y_test, cmap='rainbow')
plt.colorbar()
plt.xlabel('Dimension 1')
plt.ylabel('Dimension 2')
plt.title('2D Representation of the Digits')
plt.show()

This will create a scatter plot that shows the 2D representation of the digits, colored by the labels. The first argument of the scatter method is the x coordinate of the points, which is the first column of the y_test variable. The second argument of the scatter method is the y coordinate of the points, which is the second column of the y_test variable. The third argument of the scatter method is the color of the points, which is the y_test variable itself. The fourth argument of the scatter method is the color map of the plot, which is ‘rainbow’ in this case. The color map determines how the values of the color argument are mapped to colors. You can choose from different color maps, such as ‘viridis’, ‘plasma’, ‘inferno’, etc. The colorbar method adds a color bar to the plot, which shows the correspondence between the values and the colors. The xlabel, ylabel, and title methods add labels and a title to the plot, which describe the axes and the plot. The show method displays the plot on the screen.

This is how you can visualize the results of the autoencoder with TensorFlow and matplotlib. You can see how the autoencoder reduces the dimensionality of the data, while preserving the structure and diversity of the data. You can also see how the different digits are clustered or separated in the 2D space. For example, you can see that the digits 0, 1, and 6 are well separated, while the digits 4 and 9 are more overlapped.

This concludes the tutorial on how to build and train an autoencoder with TensorFlow and apply it to a dimensionality reduction problem. You have learned how to:

Understand the concept and components of an autoencoder
Build and train an autoencoder with TensorFlow
Apply the autoencoder to dimensionality reduction
Visualize the results of the autoencoder

We hope you enjoyed this tutorial and learned something new. If you have any questions or feedback, please feel free to leave a comment below. Thank you for reading!

5. Conclusion

In this blog, you have learned how to build and train an autoencoder with TensorFlow and apply it to a dimensionality reduction problem. You have covered the following topics:

What is an autoencoder and what are its components
How to build and train a linear autoencoder with TensorFlow
How to apply the autoencoder to reduce the dimensionality of the MNIST images from 784 to 2
How to visualize the 2D representation of the digits in a scatter plot

An autoencoder is a type of neural network that can learn a compressed representation of the input data, and then reconstruct the original data from the compressed representation. An autoencoder consists of two main components: an encoder and a decoder. The encoder takes the input data and transforms it into a lower-dimensional representation, called the latent vector or the code. The decoder takes the latent vector and transforms it back into the original data, or an approximation of it.

To build and train an autoencoder with TensorFlow, you have used the Keras API, which is a high-level API that allows you to create and manipulate tensors, layers, models, and datasets. You have defined the encoder and the decoder as functions, and then wrapped them as layers using the Lambda layer. You have then combined the encoder and the decoder layers into a single model using the Sequential model. You have compiled the model with the Adam optimizer and the mean squared error loss function, and trained the model on the MNIST dataset for 20 epochs.

To apply the autoencoder to dimensionality reduction, you have extracted the encoder part of the model using the get_layer method, and computed the latent vector of the testing data using the encoder variable. You have then used the matplotlib library to create a scatter plot that shows the 2D representation of the digits, colored by the labels. You have seen how the autoencoder reduces the dimensionality of the data, while preserving the structure and diversity of the data. You have also seen how the different digits are clustered or separated in the 2D space.

We hope you enjoyed this blog and learned something new. If you have any questions or feedback, please feel free to leave a comment below. Thank you for reading!