What is Docker?

And why would you want to use it?

Put simply, a Docker container is like a mini virtual machine.  It contains just enough to do what it needs to do, and no more.  It's also isolated from the machine it's running on, so crashing the host machine is many orders of magnitude more difficult than if the application in the container was running directly on the machine.  You can also run a LOT of them at the same time with not very many resources.  I'm currently running 22 containers on a modern 8 core server with 64 GB of RAM, and CPU usage usually doesn't exceed 3 or 4% per core, and memory usage doesn't get above 4 or 5 GB.  It also works well on older systems with less specs - I also have an ancient server in a closet running an old 4 core Athlon chip with 8 GB of RAM, and it does just fine running 20 containers.  

Let's look at an example of how docker can be useful.  Assume that you come across an application that you want to install on your Ubuntu system.  It's a document scanning program that requires Python 3. Your main Ubuntu system has Python 2 installed and can't be updated because other software already installed on the system requires that specific version to work.

It's not really practical to install two versions of Python on the same machine - you could technically do it, but it wouldn't work right without a lot of faffing around.  BUT (insert non-denominational angelic choir sounds) you managed to find the application you need in a Docker container.  This means that the author of the software you want has created a docker image in addition to providing a "bare metal" installer that installs directly on your system.

Installing the bare metal version of the software requires that you first install any dependencies that are required for the application to work.  This includes the correct version of Python and the relevant libraries, some font files, software to convert files to PDF, software to generate thumbnails, software that can scan graphics for text and convert it to editable, searchable text, a database to store all of this information, maybe some search software to enable you to search for documents, and a lot more.  You'd have to do that yourself if you want to go the bare metal route.  However, in addition to the bare-metal version, the author of the software has packaged everything the program needs to run and has put it in a single container for you - a docker image.  As long as your system can run docker images, you can install this software quickly and easily.

There are only two things you need to (optimally) run any docker container:

  1. Docker Engine
  2. Docker-compose

Docker Engine

So what is Docker Engine?  Very simply, Docker Engine is set of several pieces of software that allows your system to read and use docker images.  A docker image is what is provided by a developer - it contains instructions on how to build a container from the image, like a template.  If you're using Ubuntu, you can install Docker Engine by following the instructions here.  Here are the important bits of the Docker Engine:

The docker daemon (dockerd) hangs out on your system waiting for requests to do stuff, like "give me a list of all currently running containers", "stop container x", "start container y", etc.

The docker client (docker), is how you, the user, will interact with your containers.  When you type in a command like docker ps, you're telling the client to create a message for the docker daemon, telling it to list all currently running docker images.  The docker daemon hears this and executes your request.

Docker registries are where you'll find docker images.  The main docker registry is the Docker Hub.  There are others, though, like quay.io.

To create and run a docker container (without docker-compose), you would send the following as a single command (we'll assume you want to install the papermerge docker container):

docker run -d \
  --name=papermerge \
  -e PUID=1000 \
  -e PGID=1000 \
  -e TZ=America/New_York \
  -e REDIS_URL= `#optional` \
  -p 8000:8000 \
  -v </path/to/appdata/config>:/config \
  -v </path/to/appdata/data>:/data \
  --restart unless-stopped \
  ghcr.io/linuxserver/papermerge

Let's break this down, line-by-line.

Line 1: docker run -d \ This is the command to run a docker container - the "-d" means that it should run "detached", which is a good idea if you don't want to keep a terminal window open the entire time the container is running.  The "\" at the end of the line tells your terminal to go to the next line instead of executing the command on the current line.

Line 2: --name=papermerge \ This is simply the name of the container - you want something easy to read, especially if you have a number of containers running at the same time.

Line 3: -e PUID=1000 \ This is the ID of the user who will be running the command to execute the container.  This is the "owner" of the container.  Usually it just needs to match your usual user ID, which is typically "1000" on most flavors of Linux.  The "-e" simply means that what you're defining here is an "environment" variable.

Line 4: -e PGID=1000 \ This is the ID of the group that the container will belong to.  In Linux, pretty much everything has both an ID and a group ID.  This is the ID of the same group the user belongs to, which again is usually 1000.

Line 5: -e TZ=America/New_York \ Another environment variable!  Software often needs to know what time it is, so "TZ" stands for "Time Zone", and New York is the time zone this particular computer resides in.

Line 6: -e REDIS_URL= optional \ Redis is a kind of database - it's optional in this case, but if you want to use one for this image, it's supported!  We'll leave it alone.

Line 7: -p 8000:8000 \ "-p" here stands for "port". There are actually a few things going on here.  First of all, understand that ports are basically like "portholes" through which computers (and containers) pass data through.  

In this example, the first "8000" is the port for the host machine, and the second "8000" is the port for the container.  This format is useful because you could perhaps have something else that's already using port 8000 on your host machine, so you could define a different port here, like 8020, in which case you'd have -p 8020:8000 \.  This means that when your container is up and running, you would access it in your browser by tacking that port number to the end of the IP address of the machine the container is running on.  For example, if your computer's IP address is 192.168.1.221, you would access this container by going to 192.168.1.221:8020.  That request is then passed to port 8000 in your container.

Line 8: -v </path/to/appdata/config>:/config The "-v" here stands for "volume".  Containers by their very nature are "contained", but sometimes there is a need for part of the container to be visible outside the container.  In this instance, this container has some editable configuration files.  There are several advantages to creating a volume outside of a container.  

First, while it is possible to access the configuration file that is located within the container, doing so has two problems:

  1. You'd have to use the docker exec command to get a shell for the container, and
  2. Any changes made inside a docker container only last as long as the container is running - they are not persistent.  The next time the container is started, docker is going to re-create it from the image, which does not include any changes you made.

Second, creating a volume outside the container, and defining that volume when you run the container means that you can access and edit anything in that volume using your normal editor at any time.  It also means that the contents of that volume can be backed up.

Let's look at the format of this command, because it's pretty important.  It follows the same logic as the port command, in that the first part references the host computer, and the second part refers to the container.  So  instead of <path/to/appdata/config>, you would enter something like /opt/papermerge/appdata/config, which is a directory on your host system.  If this directory doesn't yet exist, docker will create it.  The second part says essentially "copy the contents of the config file (in /config in the container) to /opt/papermerge/appdata/config, AND mirror any changes made to the files in /opt/papermerge/appdata/config INTO /config in the container.

The creator of the image determines what goes in the config folder, you don't have to do anything except modify whatever is in there, if you want/need to.

Line 9: -v </path/to/appdata/data>:/data Same as the /config directory, but for whatever goes in a data directory for this particular image.

Line 10: --restart unless-stopped If you don't add this line, the docker container will not be restarted when your system is rebooted.  Another alternative here is "always".

Line 11: ghcr.io/linuxserver/papermerge This is the actual location of the image in a repository. In this instance, the repository is ghcr.io, which is the repository for the linuxserver group's images.  This line tells docker to go to that address and pull the latest "papermerge" image.

Whew!  That's a lot, but it's all pretty simple and logical.  Now, imagine that you want to change the timezone (maybe you moved), or you want to change the restart variable to "always".  You would need to stop this particular image with the command docker stop papermerge and then re-enter the entire command above, with your changes, as a single command.  This could get tedious, especially if you have a lot of docker containers that might need to change at the same time.  This is where docker-compose comes in.

Docker-compose

Docker compose allows you to use YAML configuration files instead of commands.  For example, instead of the above docker command, a docker-compose.yml file for the papermerge image looks like this:

---
version: "2.1"
services:
  papermerge:
    image: ghcr.io/linuxserver/papermerge
    container_name: papermerge
    environment:
      - PUID=1000
      - PGID=1000
      - TZ=America/New_York
      - REDIS_URL= #optional
    volumes:
      - </path/to/appdata/config>:/config
      - </path/to/appdata/data>:/data
    ports:
      - 8000:8000
    restart: unless-stopped

The beauty of this is that format is pretty much the same, but this is a text file that is easily editable.  When you want to bring the container up, you just run docker-compose up -d in the same directory as the docker-compose.yml file, and docker will create the container based on the parameters in the docker-compose.yml file.  If you need to bring the container  down because you made changes to the compose file, you'd just run docker-compose down.

The only thing tricky about YAML files is that the spacing has to be exactly correct, but that's not very difficult.  The only thing to keep in mind when using docker-compose.yml files is the version number.  This relates to the version number of docker-compose itself.  If the version of the docker-compose.yml file you're attempting to run is 3.7, but you've only got 3.2 installed, you'll need to update the version of docker-compose on your system before it'll work.

And that's the scoop on docker!

patrick

patrick