The big question we hear quite often is, “Can and should we run production Postgres workloads in a Docker? Does it work?” The answer in short: yes, it will work… if you really want it to… or if it’s all only fun and play, i.e. for throwaway stuff like testing.
Containers, commonly also just called Docker, have definitely been a thing for quite a few years now. (There are other popular container runtimes out there, and it’s not a proprietary technology per se, but let’s just say Docker to save on typing.) More and more people are “jumping on the container-ship” and want to try out Docker, or have already given this technology a go. However, containers were originally designed more as a vehicle for code; they were initially intended to provide a worry-free “batteries included” deployment experience. The idea is that it “just works” anywhere and is basically immutable. That way, quality can easily be tested and guaranteed across the board.
Those are all perfectly desirable properties indeed for developers…but what if you’re in the business of data and database management? Databases, as we know, are not really immutable – they maintain a state, so that code can stay relatively “dumb” and doesn’t have to “worry” about state. Statelessness enables rapid feature development and deployment, and even push-button scaling – just add more containers!
Should I use Postgres with Docker?
If your sensors are halfway functional, you might have picked up on some concerned tones in that last statement, meaning there are some “buts” – as usual. So why not fully embrace this great modern technology and go all in? Especially since I already said it definitely works.
The reason is that there are some aspects you should at least take into account to avoid cold sweats and swearing later on. To summarise: you’ll benefit greatly for your production-grade use cases only if you’re ready to do the following:
a) live fully on a container framework like Kubernetes / OpenShift
b) depend on some additional 3rd party software projects not directly affiliated with the PostgreSQL Global Development Group
c) or maintain either your own Docker images, including some commonly needed extensions, or some scripts to perform common operational tasks like upgrading between major versions.
To reiterate – yes, containers are mostly a great technology, and this type of stuff is interesting and probably would look cool on your CV…but: the origins of container technologies do not stem from persistent use cases. Also, the PostgreSQL project does not really do much for you here besides giving you a quick and convenient way to launch a standard PostgreSQL instance on version X.
A testers’ dream
Not to sound too discouraging – there is definitely at least one perfectly valid use case out there for Docker / containers: it’s perfect for all kinds of testing, especially for integration and smoke testing!
Since containers are basically implemented as super light-weight “mini VMs”, you can start and discard them in seconds! That, however, assumes that the image has already been downloaded. If not, then the first launch will take a minute or two, depending on how good your internet connection is
As a matter of fact, I personally usually have all the recent (9.0+) versions of Postgres constantly running on my workstation in the background, via Docker! I don’t of course use all those versions too frequently – however, since they don’t ask for too much attention, and don’t use up too many resources if “idling”, they don’t bother me. Also, they’re always there for me when I need to test out some Postgres statistic fetching queries for our Postgres monitoring tool called pgwatch2. The only annoying thing that could pester you a bit is – if you happen to also run Postgres on the host machine, and want to take a look at a process listing to figure out what it’s doing, (e.g. `ps -efH | grep postgres`) the “in container” processes show up and somewhat “litter” the picture.
Slonik in a box – a quickstart
OK, so I want to benefit from those light-weight pre-built “all-inclusive” database images that everyone is talking about and launch one – how do I get started? Which images should I use?
As always, you can’t go wrong with the official stuff – and luckily, the PostgreSQL project provides all modern major versions (up to v8.4 by the way, released in 2009!) via the official Docker Hub. You also need to know some “Docker foo” of course. For a simple test run, you usually want something similar to what you can see in the code below.
NB! As a first step, you need to install the Docker runtime / engine (if it is not already installed) – which I’ll not be covering as it should be a simple process of following the official documentation line by line.
Also note: when launching images, we always need to explicitly expose or “remap” the default Postgres port to a free port of our preference. Ports are the “service interface” for Docker images, over which all communication normally happens, so that we actually don’t need to care about how the service is internally implemented!
# Note that the first run could take a few minutes due to the image being downloaded…
docker run -d --name pg13 -p 5432:5432 -e POSTGRES_HOST_AUTH_METHOD=trust postgres:13
# Connect to the container that’s been started and display the exact server version
psql -U postgres -h localhost -p 5432 -c "show server_version" postgres
server_version
────────────────────────────────
13.1 (Debian 13.1-1.pgdg100+1)
(1 row)
Note that you don’t have to actually use “trust” authentication, but can also set a password for the default “postgres” superuser via the POSTGRES_PASSWORD env variable.
Once you’ve had enough of Slonik’s services for the time being, just throw away the container and all the stored tables / files etc with the following code:
# Let’s stop the container / instance
docker stop pg13
# And let’s also throw away any data generated and stored by our instance
docker rm pg13
Couldn’t be any simpler!
NB! Note that I could also explicitly mark the launched container as “temporary” with the ‘–rm’ flag when launching the container, so that any data remains would automatically be destroyed upon stopping.
Peeking inside the container
Now that we have seen how basic container usage works, complete Docker beginners might get curious here – how does it actually function? What is actually running down there inside the container box?
First, we should probably clear up the two concepts that people often initially mix up:
-
A Docker image: images are immutable “batteries (libraries) included” software packages that you can download from some public or private Docker registry or build yourself, that then can be “instantiated”, i.e. launched.
-
A Docker container: once we have launched an image, we’re dealing with a “live clone” that should actually be called a container! And now, its files can be modified, although in theory this freedom should not be overused – or at least not in a direct manner without volumes (see below).
Let’s make sense of this visually:
# Let’s take a look at available Postgres images on my workstation
# that can be used to start a database service (container) in the snappiest way possible
docker images | grep ^postgres | sort -k2 -n
postgres 9.0 cd2eca8588fb 5 years ago 267MB
postgres 9.1 3a9dca7b3f69 4 years ago 261MB
postgres 9.2 18cdbca56093 3 years ago 261MB
postgres 9.4 ed5a45034282 12 months ago 251MB
postgres 9.5 693ab34b0689 2 months ago 197MB
postgres 9.6 ebb1698de735 6 months ago 200MB
postgres 10 3cfd168e7b61 3 months ago 200MB
postgres 11.5 5f1485c70c9a 16 months ago 293MB
postgres 11 e07f0c129d9a 3 months ago 282MB
postgres 12 386fd8c60839 2 months ago 314MB
postgres 13 407cece1abff 14 hours ago 314MB
# List all running containers
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
042edf790362 postgres:13 "docker-entrypoint.s…" 11 hours ago Up 11 hours 0.0.0.0:5432->5432/tcp pg13
Other common tasks when working with Docker might be:
* Checking the logs of a specific container, for example, to get more insights into query errors
# Get all log entries since initial launch of the instance
docker logs pg13
# “Tail” the logs limiting the initial output to last 10 minutes
docker logs --since "10m" --follow pg13
* Listing the IP address of the image
Note that by default, all Docker containers can speak to each other, since they get assigned to the default subnet of 172.17.0.0/16. If you don’t like that, you can also create custom networks to cordon off some containers, whereby then they can also access each other using the container name!
# Simple ‘exec’ into container approach
docker exec -it pg13 hostname -I
172.17.0.2
# A more sophisticated way via the “docker inspect” command
docker inspect -f '{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}' pg13
* Executing custom commands on the container
Note that this should be a rather rare occasion, and is usually only necessary for some troubleshooting purposes. You should try not to install new programs and change the files directly, as this kind of defeats the concept of immutability. Luckily, in the case of official Postgres images it can be easily done, since it runs under “root” and the Debian repositories are also still connected – which a lot of images remove, in order to prevent all sorts of maintenance nightmares.
Here is an example of how to install a 3rd party extension. By default, we only get the “contrib extensions” that are part of the official Postgres project.
docker exec -it pg13 /bin/bash
# Now we’re inside the container!
# Refresh the available packages listing
apt update
# Let’s install the extension that provides some Oracle compatibility functions...
apt install postgresql-13-orafce
# Let’s exit the container (can also be done with CTRL+D)
exit
* Changing the PostgreSQL configuration
Quite often when doing some application testing, you want to measure how much time the queries really take – i.e. measure things from the DB engine side via the indispensable “pg_stat_statements” extension. This can be done relatively easily, without going “into” the container! Starting from Postgres version 9.5, to be exact…
# Connect with our “Dockerized” Postgres instance
psql -h localhost -U postgres
postgres=# ALTER SYSTEM SET shared_preload_libraries TO pg_stat_statements;
ALTER SYSTEM
postgres=# ALTER SYSTEM SET track_io_timing TO on;
ALTER SYSTEM
# Exit psql via typing “exit” or pressing CTRL+D
# and restart the container
docker restart pg13
Don’t forget about the volumes
As stated in the Docker documentation: “Ideally, very little data is written to a container’s writable layer, and you use Docker volumes to write data.”
The thing about containers’ data layer is that it’s not really meant to be changed! Remember, containers should be kind of immutable. The way it works internally is via “copy-on-write”. Then, there’s a bunch of different storage drivers used over different versions of historical Docker runtime versions. Also, there are some differences which spring from different host OS versions. It can get quite complex, and most importantly, slow on the disk access level via the “virtualized” file access layer! It’s best to listen to what the documentation says, and set up volumes for your data to begin with.
Aha, but what are volumes, exactly? They’re directly connected and persistent OS folders where Docker tries to stay out of the way as much as possible. That way, you don’t actually lose out on file system performance and features. The latter is not really guaranteed, though – and can be platform-dependent. Things might look a bit hairy, especially on Windows (as usual), where one nice issue comes to mind. The most important keyword here might be “persistent” – meaning volumes don’t disappear, even when a container is deleted! So they can also be used to “migrate” from one version of the software to another.
How should you use volumes, in practice? There are two ways to use volumes: the implicit and the explicit. The “fine print” by the way, is available here.
Also, note that we actually need to know beforehand what paths should be directly accessed, i.e. “volumized”! How can you find out such paths? Well, you could start from the Docker Hub “postgres” page, or locate the instruction files (the Dockerfile) that are used to build the Postgres images and search for the “VOLUME” keyword. The latter can be found for Postgres version 13 here.
# Implicit volumes: Docker will automatically create the left side folder if it is not already there
docker run -d --name pg13 -p5432:5432 -e POSTGRES_HOST_AUTH_METHOD=trust \
-v /mydatamount/pg-persistent-data:/var/lib/postgresql/data \
postgres:13
# Explicit volumes: need to be pre-initialized via Docker
docker volume create pg13-data
docker run -d --name pg13 -p5432:5432 -e POSTGRES_HOST_AUTH_METHOD=trust \
-v pg13-data:/var/lib/postgresql/data \
postgres:13
# Let’s inspect where our persistent data actually “lives”
docker volume inspect pg13-data
# To drop the volume later if the container is not needed anymore use the following command
docker volume rm pg13-data
Some drops of tar – big benefits possible, with some drawbacks
To tie up the knots on this posting – if you like containers in general, and also need to run some PostgreSQL services – go ahead! Containers can be made to work pretty well, and for bigger organizations running hundreds of PostgreSQL services, it can actually make life a lot easier and more standardized once everything has been automated. Most of the time, the containers won’t bite you.
But at the same time, you had better be aware of the pitfalls:
- Docker images and the whole concept of containers are actually optimized for the lightning-fast and slim startup experience so that by default even the data is not properly separated into a separate persistence unit! Which for databases could end in a catastrophe, if appropriate measures (volumes) are not put into place.
- Using containers won’t give you any automatic and magical high-availability capabilities. This usually comes provided by the container framework – either via some simple “stateful sets”, or more advanced “operators”, or via cleverly bundled database and “bot” images which rely on a central highly-available consensus database.
- Life will be relatively easy only when you go “all in” on some container management framework like Kubernetes, and additionally select some “operator” software (Zalando and Crunchy Postgres operators are the most popular ones, I believe) to take care of the nitty-gritty.
- Batteries are not included: you pretty much only get persistence and a running Postgres major version! For example, a very common task – major version upgrades – is surprisingly out of scope for the default Postgres images! It is also out of scope for some Kubernetes operators – this means you need to be ready to get your hands dirty and create some custom intermediate images, or find some 3rd party ones like Spilo.
TLDR;
Don’t want to sound like a luddite again, but before going “all in” on containers you should acknowledge two things. One, that there are major benefits to production-level database containers only if you’re using some container automation platform like Kubernetes. Two, the benefits will come only if you are willing to make yourself somewhat dependent on some 3rd party software vendors. 3rd party vendors are not out to simplify the life of smaller shops, but rather cater to bigger “K8s for the win” organizations. Often, they encode that way of thinking into the frameworks, which might not align well with your way of doing things.
Also, not all aspects of the typical database lifecycle are well covered. My recommendation is: if it currently works for you “as is”, and you’re not 100% migrating to some container-orchestration framework for all other parts of your software stack, be aware that you’re only winning in the ease of the initial deployment and typically also in automatic high-availability (which is great of course!) – but not necessarily in all aspects of the whole lifecycle (fast major version upgrades, backups the way you like them, access control, etc).
On the other hand – if you feel comfortable with some container framework like Kubernetes and/or can foresee that you’ll be running oodles of database instances – give it a go! — after you research possible problem points, of course.
On the positive side – since I am in communication with a pretty wide crowd of DBA’s, I can say that many bigger organizations do not want to look back at the traditional way of running databases after learning to trust containers.
Anyway, it went a bit long – thanks for reading, and please do let me know in the comments section if you have some thoughts on the topic!
The post Running Postgres in Docker – why and how? appeared first on Cybertec.