How to Create Custom Layers in TensorFlow?

In TensorFlow, layers are the fundamental building blocks for creating machine learning models. While TensorFlow offers a wide variety of ready-to-use built-in layers, there are situations where these standard layers are not enough to meet specific needs. For example, when implementing advanced architectures like residual networks, transformers, or creating layers with unique behaviors, custom layers become essential. By defining custom layers, we can incorporate specialized computations, parameter sharing, or custom initialization schemes that are not available in standard layers.

What is a layer?

A layer is a callable object that takes input tensors, performs computations on them, and then outputs tensors. To create our own layer, we inherit the base layer from TensorFlow. Let's analyze the base layer.

(Callable object: when we create an instance of it, we can use it as a method, which means it has implemented a __call__ method.)

Let's understand how to create our own custom layer for the MNIST handwritten digit classification, which contains 10 classes from 0 to 9.

Imports and data loading

First, we will import TensorFlow as tf and then load the MNIST dataset that is built into TensorFlow.

import tensorflow as tf 
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

The train and test sets have 60,000 and 10,000 samples, respectively.

Custom Dense Layer

Now we will create our own layer using the Keras base layer.

class CustomLayer(tf.keras.layers.Layer):

    def __init__(self, units, activation=None):
        super().__init__()
        self.units = units
        self.activation = tf.keras.activations.get(activation)

    def build(self, input_shape):
        self.kernel = self.add_weight(
                shape=(input_shape[-1], self.units),
                initializer="glorot_uniform",
                trainable=True,
                name="kernel",
            )
        self.bias = self.add_weight(
                shape=(self.units,),
                initializer="zeros",
                trainable=True,
                name="bias",
            )

    def call(self, inputs):
        return self.activation(tf.matmul(inputs, self.kernel) + self.bias)

There are important methods while implementing our own custom layer: __init__, build, and call. In the __init__, we initialize some variables, like, in our case, the number of neurons, units, or activation. We can also initialize the weight variables if we know the input shape, but the recommended method is to use the build method for lazy initialization. The build method is used for the initialization of weight variables, and the call method is used for the implementation of the computation of the layer.

How to Define Weights in Custom Layers?

The recommended method to define variables in custom layers is by using the add_weight() method. The inputs to this method are shape, initializer, dtype, trainable boolean, and other optional parameters.

Creating our model

The next step is to create a model using our custom layer.

model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    CustomLayer(units=128, activation='relu'),
    CustomLayer(units=10, activation='softmax'),
])

We first call the Keras flatten layer to flatten the input images so that they can be fed to dense layers. Our first custom layer has 128 neurons with relu activation, and the second layer has 10 neurons with softmax function.

Compiling and fitting the model

We will use the Adam optimizer, with sparse categorical cross entropy and accuracy as metrics, and train the model for 10 epochs.

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])
model.fit(x_train, y_train, epochs=10, validation_data=(x_test, y_test))

After 10 epochs, the model has an accuracy of approximately 96%. Note that this accuracy can vary based on various factors, like random seed, batch size, etc.

Inputs to the Layer

Some of the inputs to the layer are:

  • trainable: A boolean that indicates whether the weights are trainable.

  • name: The name of the layer.

  • dtype: Defines the data type of the weights, which is float32 by default.

Important Attributes

Some important attributes of layers that can be set and retrieved are:

  • name: The name of the layer.

  • dtype: The data type of the weights.

  • trainable_weights: A list of trainable variables.

  • non_trainable_weights: A list of non-trainable variables.

  • weights: A combination of both trainable and non-trainable variables.

  • trainable: A boolean indicating whether the layer is trainable or not.

Note: The trainable attribute is very important because, when we implement transfer learning and want to train only the top layers, we set the trainable to true.

The code used in this blog is available on Github.

References

  1. Keras source code

  2. Tensorflow documentation

  3. Toward data science article