Introducing AutoKeras

Creating neural networks for regression, classification, or image recognition is a trial-and-error process. The figure below shows the overall trial-and-error process for deep learning-based models.

This trial and error process includes selecting a suitable model for the problem at hand, fine-tuning all hyperparameters, and selecting the right data model via feature selection. Once a suitable model is selected, it is trained until further training only increases the loss in accuracy.

If testing does not yield favorable results for the business, the model is discarded, and this cycle repeats.

AutoKeras (  tremendously reduces the manual trial and error repeat cycles by conducting all experiments and coming up with the right model selections and correct Kera layer architecture for the model.

AutoKeras can experiment with structured data, Image or text classification and regression jobs, time series forecasting, and multi-model jobs. The API it provides is at a higher level than the Keras layers and connections model. 

Essentially autoKeras provides automation to the ML development process.

Example – Image Recognition with Tensorflow and Keras

In our hypothetical scenario, a developer is tasked to classify handwritten images of the well-known MNIST dataset distributed with Scikit-Learn. MNIST is a database of handwritten digits with a training set of 60,000 examples and a test set of 10,000 examples. Image size is 28 x 28 with a single channel (monochrome images)

The developer’s first attempt would be to model this problem using Keras dense layer. The figure below shows the sample code our developer has crafted.

from tensorflow import keras
from tensorflow.keras import layers

mlp_model = keras.Sequential(
        keras.layers.Dense(128, activation="relu"),
        keras.layers.Dense(10, activation="softmax"),
mlp_model.summary(), train_labels, epochs=5, batch_size=64, verbose=1)

test_loss, test_acc = mlp_model.evaluate(test_images, test_labels, verbose=0)

In this model, the first input feature tensor of 28 x 28 x 1 is flattened to 1 x 784 tensor. A dense layer does deep learning by calculating the dot product of input and learned weights using the formula 

output = activation(dot(input, weights) + bias). We selected Relu activation, a linear function that will output the input directly if it is positive. Otherwise, it will output zero for each of the 784 x 128 neurons. Finally, we stacked a final output layer that does the logistic regression-based classification of the previous layer into ten classes, each representing digits 0 to 9.

The network goes through a training loop with our desired epochs of 5 and a batch of 64 samples in each iteration. Losses for each iteration are calculated using  “SparseCategoricalCrossentropy,”  and network weights are adjusted using the Adam optimizer. This fairly trivial deep learning mode will still yield accuracy above 96%.

The developer would never know if it is the best accuracy he can attain, so he or she makes another attempt to improve the accuracy using a convolutional neural network.

The code would look like this.

def build_cnn():
    model = keras.Sequential(
                32, (3, 3), activation="relu", input_shape=train_images.shape[1:] + (1,)
            keras.layers.MaxPooling2D((2, 2)),
            keras.layers.Conv2D(64, (3, 3), activation="relu"),
            keras.layers.MaxPooling2D((2, 2)),
            keras.layers.Conv2D(64, (3, 3), activation="relu"),
            keras.layers.Dense(64, activation="relu"),
            keras.layers.Dense(10, activation="softmax"),

    return model

  cnn_model = build_cnn(), train_labels, epochs=5, batch_size=64, verbose=1)

test_loss, test_acc = cnn_model.evaluate(test_images_4d, test_labels, verbose=0)

This model is developed through trial and error, and Max Pooling and Flatten layers are added to improve accuracy.

There are several more experiments to perform, but we assume the developer achieves an accuracy of over 98 %. It is pretty good, but arriving at the final network architecture took time.

Meet AutoKeras

AutoKeras ImmageClassifier component makes life easy for the developer as it will perform all experiments and reach the final  Convolution network architecture.

Here is the code snippet for the Image classifier

import numpy as np 
import tensorflow as tf 
from tensorflow.keras.datasets import mnist 
import autokeras as ak
(x_train, y_train), (x_test, y_test) = mnist.load_data()
clf = ak.ImageClassifier(overwrite=True, max_trials=2), y_train, epochs=10) 
predicted_y = clf.predict(x_test) print(predicted_y) 
print(clf.evaluate(x_test, y_test))

This can yield an accuracy of over 98%. In one go saving a tremendous amount of time in experimentation.

 The final network architecture that AutoKeras deduced based on the test and training data set would look like this

>>> cnn_model.summary()

Model: “sequential_1”


Layer (type) Output Shape Param #


conv2d (Conv2D) (None, 26, 26, 32) 320


max_pooling2d (MaxPooling2D) (None, 13, 13, 32) 0


conv2d_1 (Conv2D) (None, 11, 11, 64) 18496


max_pooling2d_1 (MaxPooling2 (None, 5, 5, 64) 0


conv2d_2 (Conv2D) (None, 3, 3, 64) 36928


flatten_1 (Flatten) (None, 576) 0


dense_2 (Dense) (None, 64) 36928


dense_3 (Dense) (None, 10) 650


Total params: 93,322

Trainable params: 93,322

Nontrainable params: 0


AutoKeras Features

AutoKeras provides a simplified API for image, structured data or text classification and regression jobs.

Like Image Classifier, Text Classification job API is the straightforward structure of

automakers.TextClassifier( num_classes=None, 
**kwargs )

And Classifier can be used as 

clf = ak.TextClassifier( overwrite=True, max_trials=1 ), y_train, epochs=2) 
predicted_y = clf.predict(x_test) 
print(clf.evaluate(x_test, y_test))

The Structured Data Classifier is an intriguing tool that significantly reduces feature engineering time. By analyzing CSV or pandas datasets, it automatically infers column types and generates categorical or one-hot encoded columns, providing the optimal features for your model.

The author

Ajmal Mahmood is Chief Architect for High Plains Computing (HPC). HPC provides cloud DevOps and MLOps services and helps roll out ML models to production using AWS cloud and Kubernetes.

Social Share :

AWS Security Enhancements

In today’s swiftly evolving tech landscape, prioritizing security is imperative. As a leading cloud service…

Introduction to AWS Migration Hub

The Amazon Web Services (AWS) Migration Hub is a powerful tool. It provides a centralized…

AWS Cloud Adoption Framework

Introduction The AWS Cloud Adoption Framework (AWS CAF) is a comprehensive approach organizations can utilize…

Ready to make your business more efficient?