Kedro meets gradio

Atsushi Hara
7 min readJun 11, 2022

These days, many ML and MLOps projects are built in the OSS field. Kedro and gradio are ones of the MLOps project. These projects are not just make building ML models easier, they make them reproducible and prove the concept easier.
Today, I build a project using both of them. In this post, I want to show the process to build pipelines to build an ML model with the Kedro and Tensorflow. And after that work with the gradio to confirm that the trained model works with real data.

Btw what’s Kedro?

If you were unfamiliar with the Kedro, check the introduction here. On the page, it is written as below.

Kedro is an open-source Python framework for creating reproducible, maintainable and modular data science code. It borrows concepts from software engineering and applies them to machine-learning code; applied concepts include modularity, separation of concerns and versioning.

Kedro helps developers build a machine learning model by building pipelines of preprocessing the data and modeling. And writing the process and steps by pipelines will bring reproducibility and maintainability.

And the QuantumBlack who is the organizer of this project had written a post about the concept, so you should definitely check it!

Build Tensorflow model with Kedro

I used the MNIST dataset for this project. The reason why I choose this dataset is that when I use the “sketchpad” feature in gradio, I felt surprised by how easy to implement the application with a sketchpad, and also I thought I could confirm if does model trained with the MNIST dataset would really recognize my handwritten digit.

Setting up the project

I named this project kedro_gradio because it combines them. I created a directory and move into it.

$ mkdir kedro_gradio
$ cd kedro_gradio
$ git init

I used pipenv and python 3.8 for this project, so let’s initialize with it. And after that, I install the main packages kedro and gradio.

$ pipenv --python 3.8
$ pipenv install kedro kedro-viz gradio
# or if you want to install specific version, execute command below
# pipenv install kedro==0.18.1 kedro-viz==4.6.0 gradio==3.0.12

The kedro-viz is used for visualizing Kedro pipeline to confirm the pipelines are implemented as developers designed.
After installation is finished successfully, let’s move into the python virtual environment and confirm Kedro would work.

$ pipenv shell
$ kedro info
2022-06-11 12:07:32,809 - kedro.framework.cli.hooks.manager - INFO - Registered CLI hooks from 1 installed plugin(s): kedro-telemetry-0.2.1
_ _
| | _____ __| |_ __ ___
| |/ / _ \/ _` | '__/ _ \
| < __/ (_| | | | (_) |
|_|\_\___|\__,_|_| \___/
v0.18.1
Kedro is a Python framework for
creating reproducible, maintainable
and modular data science code.
Installed plugins:
kedro_telemetry: 0.2.1 (entry points:cli_hooks)
kedro_viz: 4.6.0 (entry points:global,line_magic)

Kedro has a feature to create a new, well-defined project. So, let’s create a new project with it and name the project, repository and so on. In this project, I named it “Kedro Gradio”.

$ kedro new
# You will be asked the project name and repository name, package name, basically after you input the project name, repository and package name create automatically from project name.
project_name: Kedro Gradio
repository_name: kedro-gradio
package_name: kedro_gradio

Before installing the required libraries for Kedro, let’s add extra libraries to src/requirements.txt. Of course, if you are using other libraries like pytorch, add them instead.

# Add below to src/requirements.txt
tensorflow==2.9.1
kedro[tensorflow]==0.18.1
$ pip install -r src/requirements.txt

Building pipelines and Run!

Now, let’s start the fun part!
Not just Kedro, but in most machine learning projects, there is a process of preparing the data and a process of learning the model. In Kedro, each process is defined as a pipeline so let’s create them.

$ kedro pipeline create data_processing
$ kedro pipeline create data_science
# After executing the above commands, under the src directory would like to be below.src
├── kedro_gradio
│ ├── __init__.py
│ ├── __main__.py
│ ├── pipeline_registry.py
│ ├── pipelines
│ │ ├── __init__.py
│ │ ├── data_processing
│ │ │ ├── __init__.py
│ │ │ ├── nodes.py
│ │ │ ├── pipeline.py
│ │ │ └── README.md
│ │ └── data_science
│ │ ├── __init__.py
│ │ ├── nodes.py
│ │ ├── pipeline.py
│ │ └── README.md
│ └── settings.py
├── requirements.txt
└── setup.py

In Kedro, small steps like normalizing the values are defined at nodes.py.
In the data_processing process, I implemented loading the MNIST dataset and preprocessing step that will reshape and normalize.
In the first phase, I implement a simple dense model. So, I reshape the image data (28, 28) to (1, 784), simply.

After defining the steps, each step is organized as a pipeline. You will notice that by putting them together as a pipeline, the order of dependencies is always followed and parallel execution is ensured where parallel execution is possible, and this is a great part of using Kedro!

Normally, the normalization step for validation and test data should be performed after the normalization of the training data, since scaling depends on the training data. But in this case, each pixel is just normalized to between 0 and 1, so it is possible to perform parallel.

Some of the readers might say, You’re saying it is simply normalized then why don’t you split data after normalizing it?
Well, in one aspect, the point is surely true. However, this normalization method is not always correct, and after much experimentation, one may find a better normalization method. In this case, I thought it would be better to leave the normalization step after splitting the dataset, in case we need to update the pipeline later.

Then, I defined the functions in the data_science/nodes.py to build, train and evaluate the model and organized them, as same as data_processing.

And integrate those pipelines as one pipeline pipeline_registry.py.

If you want to confirm the pipeline, use kedro viz command. You can see the following image in your browser.

A whole pipeline image

To run the whole process, execute the following command.

$ kedro run# if you want to run parallely, execute following command.
# kedro run --runner=ParallelRunner

In this project, I saved the trained model in h5 format. To do that, I just defined the file path in conf/base/catalog.yml and after the trained process Kedro handled to save the trained model in a specific way. And also it can be version controlled, so if you tweak the model it won’t overwrite the model and compare them easily. To do so, don’t forget to set the versioned property as follows.

trained_model:
type: tensorflow.TensorFlowModelDataset
filepath: data/06_models/dnn.h5
versioned: true # Set as true

Running the model with gradio

As I write in above in this post, I’m going to use “sketchpad” feature to confirm does it recognize clearly with my handwrite digit.

Btw what’s gradio?

If you were unfamiliar with gradio, check the official page here. Gradio has a friendly page which has a quick-start page and few cool examples.

How to run the app?

I’ve created gradio_app.py file, and defined few methods.

The great part of connecting gradio and Kedro is you can import the normalization process to the gradio script so that the input data will be normalized with same function used in preprocessing step.

After prepairing the gradio script, execute following command and open the URL which is shown in the console, in my case it was as below.

$ python gradio_app.pyRunning on local URL:  http://127.0.0.1:7860/

You will see a screen like this.

gradio application

Now all we have to do is play!

it seems my model understand it is 2 :)
it seems my model didn’t understand it is 7 :(

--

--