How to use aws-glue-libs in CI/CD

Atsushi Hara
3 min readJul 11, 2021

Overview

I wanted to use the aws-glue-libs container image for AWS Glue automated testing, but I got high-uid-error. This is the story of how I avoided it in CI/CD.

Assumption

To make it easier to develop AWS Glue, a container called aws-glue-libs is available from Amazon. https://hub.docker.com/r/amazon/aws-glue-libs
It is useful for the following situations.

  • Developing AWS Glue script
  • Notebook development with Zeppelin or Jupyter
  • Unit test execution and development

The problem that I faced

I was implementing an ETL task using a combination of glue and PySpark, and some of the transforms were written with pyTest.
I used docker-compose to run the tests in the local environment and used aws-glue-libs as the base image.
The tests ran fine in that environment, but I ran into a high-uid-error, due to this problem when I tried to run automated tests with CircleCI.

Problem

The high-uid-error occurs in cases where the UID/GID of a file or directory in a container is specified with a value higher than the specified value range.
The documentation says

The error is caused by a userns remapping failure. CircleCI runs Docker containers with userns enabled to securely run customers’ containers. The host machine is configured with a valid UID/GID for remapping. This UID/GID must be in the range of 0–65535.

The UID/GID must be in the range of 0–65535.

Cause

All files’ and directories’ UID/GID must be in the range of 0–65535.
But I got the high-uid-error, and this means there is some object that has over range UID/GID.
So I got in to the amazon/aws-glue-libs:glue_libs_1.0.0_image_01 container, and look to the UID/GID.

```bash
## Start amazon/aws-glue-libs:glue_libs_1.0.0_image_01 from local environment.
$ docker run -it amazon/aws-glue-libs:glue_libs_1.0.0_image_01 /bin/bash

# Operations in amazon/aws-glue-libs:glue_libs_1.0.0_image_01
root@b3219a5e7e66:/# ls -la /home/
total 32
drwxr-xr-x 1 root root 4096 Jul 21 2020 .
drwxr-xr-x 1 root root 4096 Jul 5 05:13 .
. drwxr-xr-x 3 root root 4096 Jul 15 2020 aws
. drwxr-xr-x 5 root root 4096 Jul 21 2020 aws-glue-libs
drwxr-xr-x 3 root root 4096 Jul 21 2020 jupyter
drwxr-xr-x 25 root root 4096 Jul 21 2020 livy
drwxr-xr-x 11 2049080342 staff 4096 Sep 17 2019 spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8
drwxr-xr-x 13 root root 4096 Jul 21 2020 zeppelin
````

I found that the uid of /home/spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8 is 2049080342 and way high the available range.

What I have done to make it work

In this case, I cleared it up by using GitHub Action instead of CircleCI.
Fortunately, I had prepared a docker-compose file to run the tests, so I only had to call it in GitHub Action.

Other actions I tried but didn’t work

In CircleCI’s jobs phase, specify amazon/aws-glue-libs:glue_libs_1.0.0_image_01 as the image, and then applied root:root to /home/spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8 at command phase, but didn’t work.
In the first place, an error occurred when mapping the corresponding container in CircleCI, so I couldn’t reach the command and change the UID/GID.

Docker image is now available!

I’ve created a Docker image in Docker Hub and GitHub repository with just the Spark of amazon/aws-glue-libs changed to root:root, so that it can be used with CircleCI.
If you’re interested, try it out!

Docker Hub
https://hub.docker.com/repository/docker/toohsk/aws-glue-libs

GitHub repository
https://github.com/toohsk/aws-glue-libs

--

--