Using Volumes

When a container writes files, it writes them inside of the container. Which means that when the container dies (the host machine restarts, the container is moved from one node to another in a cluster, it simply fails, etc.) all of that data is lost. It also means that if you run the same container several times in a load-balancing scenario, each container will have its own data, which may result in inconsistent user experience.

A rule of thumb for the sake of simplicity is to ensure that containers are stateless, for instance, storing their data in an external database (relational like an SQL Server or document-based like MongoDB) or distributed cache (like Redis). However, sometimes you want to store files in a place where they are persisted; this is done using volumes.

Using a volume, you map a directory inside the container to a persistent storage. Persistent storages are managed through drivers, and they depend on the actual Docker host. They may be an Azure File Storage on Azure or Amazon S3 on AWS. With Docker Desktop, you can map volumes to actual directories on the host system; this is done using the -v switch on the docker run command.

Suppose you run a MySQL database with no volume:

docker run -d mysql:5.7

Any data stored in that database will be lost when the container is stopped or restarted. In order to avoid data loss, you can use a volume mount:

docker run -v /your/dir:/var/lib/mysql -d mysql:5.7

It will ensure that any data written to the /var/lib/mysql directory inside the container is actually written to the /your/dir directory on the host system. This ensures that the data is not lost when the container is restarted.

An interesting sauce coming up!

Where Do Images Come From?

Each container is created from an image. You provide the image name to the docker run command. Docker first looks for the image locally and uses it when present. When the image is not present locally, it is downloaded from a registry.

Downloading an image from a registry to your local machine:

You can list the local images using the following command: docker image ls

When an image is published to a registry, its name must be:

<repository_name>/<name>:<tag>
  • tag is optional; when missing, it is considered to be latest by default

  • repository_name can be a registry DNS or the name of a registry in the Docker Hub

We’ll soon see more about Docker Hub and private registries. All of the images we’ve been using until now were downloaded from Docker Hub as they are not DNS names. When you have time, you should browse the Docker Hub and get familiar with the images it provides.

For instance, the Jenkins image may be found on the Docker Hub.

Although the docker run command downloads images automatically when missing, you may want to trigger the download manually. To do this, you can use the docker pull command. A pull command forces an image to download, whether it is already present or not.

Here are some scenarios where using a docker pull command is relevant:

  • You expect that the machine which runs the containers does not have access to the registries (e.g., no internet connection) at the time of running the containers.

  • You want to ensure you have the latest version of an image tagged as “latest,” which wouldn’t be downloaded by the docker run command.