Configuration of Python Applications
During the development of Python applications, I've continually asked myself how and when variables should be passed and initialized for the application's configuration. I want to be able to easily override the configuration for tests, for example, to use a local database for testing. But what exactly is application configuration, and why is it needed? The website https://12factor.net/ describes application configuration as follows:
An app’s config is everything that is likely to vary between deploys (staging, production, developer environments, etc). This includes:
- Resource handles to the database, Memcached, and other backing services
- Credentials to external services such as Amazon S3 or Twitter
- Per-deploy values such as the canonical hostname for the deploy
Apps sometimes store config as constants in the code. This is a violation of twelve-factor, which requires strict separation of config from code. Config varies substantially across deploys, code does not.
So, mutable configuration variables should not be part of the application code but should be defined externally. This offers several advantages:
- The application can be reparameterized and, for example, deployed in a different infrastructure without making changes to the code.
- The code contains no secrets (private keys, database passwords, ...) that can be leaked through the code.
- All configuration parameters are transparently defined in one place so they can be easily found and documented.
So, how do the configuration variables get into my Python application from outside the code? I'd like to answer this question first and compare different formats for configuration files. Subsequently, I'll try to clarify when and at what point in the code the configuration should be initialized. Finally, I will discuss how configurations can be overridden for tests and what problems may arise in the process.
How Do Configuration Variables Get Into My Code?
There are various ways to load configuration variables from outside the code. The variables can be defined within a configuration file, stored in environment variables, or dynamically loaded from so-called "Secret Stores".
Configuration via a Configuration File
There are different formats for defining a configuration file. I would like to briefly introduce the most common formats.
1. INI Format
The Python Core Library provides several libraries to load configuration files. The configparser library loads .INI
files, which correspond to the structure of Microsoft Windows INI files.
Example INI Format:
[DEFAULT]
TimeoutInterval = 45
[db.postgres]
Host = localhost
Port = 5432
[app.webserver]
Port = 80
The configuration file is parsed into a nested dictionary. The file describes variables in sections and default values, which are adopted for all sections.
config = configparser.ConfigParser()
config.read('example.ini')
# access section variables
config['db.postgres']['Host']
# list sections
config.sections()
2. JSON Format
In addition to the configparser
library, the json and tomllib libraries are also available, which can load the widely used .json
and .toml
formats.
Example JSON Format:
{
"db": {
"postgres": {
"Host": "localhost",
"Port": 5432
}
},
"app": {
"webserver": {
"Port": 80
}
}
}
import json
with open('example.json') as f:
config = json.load(f)
# access variables
config["db"]["postgres"]["Host"]
3. TOML Format
The TOML format is significantly more compact, and easier to read and edit compared to the JSON format. Whereas nested structures in the JSON format can quickly become confusing, hierarchies in the TOML format can be easily represented through dot notation.
Example TOML (Tom's Obvious Minimal Language) Format:
[db.postgres]
Host = "localhost"
Port = 5432
[app.webserver]
Port = 80
import tomllib
with open("example.toml", "rb") as f:
config = tomllib.load(f)
# access variables
config['db']['postgres']['Host']
Note: tomllib has been added to the core library in Python 3.11
4. YAML Format
In addition to TOML, the YAML format has established itself as a widely used format for the definitions of configuration files. YAML's biggest advantage lies in its ease of readability and support for comments, but this can be clouded by dependencies on correct indentation and possible confusion caused by various notation styles. In contrast, TOML, with its explicit syntax, is easier to write and analyze, but offers less flexibility in representing complex data structures. YAML's syntax relies on indentation, making the syntax more flexible but also more prone to errors.
In Python, for example, the PyYAML library is available, with which YAML files can be loaded.
Example YAML (Yet Another Markup Language) Format:
db:
postgres:
Host: "localhost"
Port: 5432
app:
webserver:
Port: 80
import yaml
with open("example.yaml") as f:
config = yaml.safe_load(f)
# access variables
config['db']['postgres']['Port']
Configuration via Environment Variables
Configuration via environment variables is described as the preferred method in application development according to the 12-Factor Application concept. Environment variables offer higher security against unintentional publication of secrets as they cannot be checked into version control (e.g., Git). They are agnostic of programming language and operating system, and they do not force a fixed grouping into dev, stage, or prod environments.
Manually Setting Environment Variables
In the simplest form, environment variables can be set via the console. In Linux, this can be done using:
export DB_POSTGRES_HOST=localhost
export DB_POSTGRES_PORT=5432
export APP_WEBSERVER_PORT=80
or in Windows, using:
set DB_POSTGRES_HOST=localhost
set DB_POSTGRES_PORT=5432
set APP_WEBSERVER_PORT=80
Environment variables can be read in Python using the os
library:
import os
# access variables
db_postgres_host = os.getenv("DB_POSTGRES_HOST")
db_postgres_port = os.getenv("DB_POSTGRES_PORT")
app_webserver_port = os.getenv("APP_WEBSERVER_PORT")
Configuration Using a .env File
Since manually setting environment variables in the development environment can be quite tedious, many developers use a so-called .env
file, which defines the environment variables. Of course, this loses the security aspect of unintentionally publishing secrets. Therefore, the .env
file should always be part of the .gitignore
file. The content of a .env
file could look as follows:
DB_POSTGRES_HOST=localhost
DB_POSTGRES_PORT=5432
APP_WEBSERVER_PORT=80
To load this .env file and access the variables, you can use the python-dotenv
Python library, for instance. This library loads the file but does not overwrite existing environment variables.
from dotenv import load_dotenv
import os
load_dotenv(".env")
db_postgres_host = os.getenv("DB_POSTGRES_HOST")
db_postgres_port = os.getenv("DB_POSTGRES_PORT")
app_webserver_port = os.getenv("APP_WEBSERVER_PORT")
Parsing Environment Variables Using Pydantic
The validation library pydantic
offers a simple and elegant method to manage configuration files and environment variables in Python programs with Pydantic Settings (installable via the pydantic-settings
package). It combines the strengths of Pydantic data validation with a flexible hierarchy of configuration sources, enabling robust and error-free management of application settings.
With validation, once the settings object is initialized without errors, it can be assumed that all necessary configuration parameters are available to the application. The predefined configuration parameters, including all data types, also enable type-checking and auto-completion, significantly simplifying the handling of configuration parameters during development.
Minimal example with pydantic-settings
:
from pydantic_settings import BaseSettings
class Settings(BaseSettings):
db_postgres_host: str
db_postgres_port: int
app_webserver_port: int
# load and parse settings
settings = Settings()
# access variables
settings.db_postgres_host
When and where do I load the application configuration?
Configuration initialization in an application should only occur once. All application components should only access a shared truth. For example, if two database clients are initialized at different times in the program code, it should not be possible for both clients to be initialized with different configurations in case the database configuration is altered in the meantime. This issue becomes particularly relevant when configurations are dynamically loaded from a parameter/secret store. While environment variables are typically set at the time of an application's deployment and therefore a subsequent change is unlikely (though not impossible, for example, due to a developer's manual intervention), it is normal for dynamically loaded configuration parameters to be altered during the program's runtime.
What areas in my program code are there to initialize the configuration?
Initialization in the module __init.py__
The __init__.py
in the module's root directory, as the name suggests, is used for module initialization - hence, it is the ideal place to initialize the application configuration. The __init__.py
is executed by the interpreter only once during the first import from the module. An exemplary __init__.py
could look like this:
"""Initialize application settings"""
from ._settings import Settings
settings = Settings()
__all__ = ["settings"]
The settings
object is thus available to all parts of the program as a simple import and exists only once in the code.
from my_module import settings
Of course, the initialization can also take place in a separate settings.py
or config.py
file, as is found in many applications.
Initialization within a function
Another way to initialize the application configuration is to do it, for instance, at the application's entry point.
def main():
settings = Settings()
sub_function(settings)
The configuration object is then passed on as a whole or in parts to sub-functions. This approach initially involves additional development work, but it forces that there are no 'sideloads' of variables via the settings object from functions. Here is an example to make that clear.
# Pass all variables
def init_client(host: str, port: int, user: str, password: str) -> Client:
return Client(host=host, port=port, user=user, password=password)
# Side loading of configuration variables
from my_module import settings
def init_client(user: str, password: str) -> Client:
return Client(host=settings.host, port=settings.port, user=user, password=password)
The first example is not only easier to read but also much easier to test. The example also illustrates why it can be sensible to define constants separately. By defining constants separate from the configuration object, I reduce the number of configuration parameters. Sideloading of constants is not an issue, as they do not need to be overwritten for tests.
Constants can, for instance, be defined in a separate constants.py
file:
# constants.py
MAX_EMAIL_FIELD_LENGTH = 60
# utils.py
FROM constants import MAX_EMAIL_FIELD_LENGTH
def truncate_email(email: str) -> str:
return email[:MAX_EMAIL_FIELD_LENGTH]
Passing the settings
object as a parameter also has some disadvantages, especially for larger applications. If the settings
object is passed as a parameter, this can lead to almost every function of the application needing this additional parameter. This can make the code more complex and harder to read, especially if there are many functions that require settings. It can also mean that the settings
object must be passed down through many levels of functions, which also increases complexity.
Both approaches increase the complexity of the code and make it harder to understand and maintain. For this reason, they are usually not the preferred method for overriding settings for tests. It is often easier and cleaner to have separate test configurations or to use setup and teardown functions in the tests.
How do I patch the application configuration for tests?
A prerequisite for good testability is first of all to avoid the sideloading of configuration parameters mentioned in the last chapter. Unit tests can thus be carried out without any problems without having to patch configuration parameters. If configuration parameters still need to be patched, for example, to enable connection to a local test database, there are some pitfalls that need to be considered.
Patching a configuration object initialized at import
The initialization of the application configuration is performed by the first import from the module. Since an import from the module to be tested is carried out before the patching process, there is no easy way to manipulate the application configuration before initialization. A test environment must therefore be created in which initialization can be carried out without problems. Patching the application configuration therefore occurs after initialization.
For initializing the application configuration for tests, it is advisable to maintain separate configuration files (e.g., test_config.toml
or dev.env
). Individual parameters can then be separately patched for each test case. Visual Studio Code, for example, offers the possibility to set separate .env
files for the debug and prod environment (Docs).
Patching a configuration object initialized at runtime
Configuration objects that are initialized at runtime and load configuration variables dynamically from a parameter store, for instance, are easier to patch. In this case, it is advisable to patch the function that retrieves the parameters. During implementation, it must be ensured that the function for retrieving the configuration parameters is not called at the time of the module import, as this would make patching in advance impossible. This requirement again impacts the implementation. Here is an example:
# settings.py
@dataclass
class Settings:
param1: str
settings: Settings | None = None
def load_settings() -> Settings:
global settings
if settings:
return settings
settings = retrieve_settings()
return settings
In this example, settings
is defined in such a way that it can either be an object of the Settings class or None. When importing from this module, settings
remains None until load_settings() has been called. settings
is defined here as a global variable and can therefore exist only once. In this case, each use of the settings
object would first have to check whether it is None.
from settings import settings
def example_fun():
if not settings:
raise ValueError("Settings not loaded.")
print(settings.param1)
This leads to surplus code, as the existence of the settings
object must be checked with every access (which makes sense, as it might not have been initialized yet). It would be much easier to access the settings
object through the helper function.
from settings import load_settings
def example_fun():
print(load_settings().param1)
Of course, this function could also be hidden in a lovely property within a Settings class.
Summary
The configuration of an application is an essential component and includes everything that varies between different environments (e.g., development, test, production). These configurations should not be set as constants in the code but should be loaded from outside. This increases flexibility and security, as sensitive information like passwords or API keys does not have to be stored in the code.
There are various ways to load these configurations in Python, either via special configuration file formats like INI, TOML, or YAML, or via environment variables. The use of environment variables has the advantage of being language- and operating system-independent, thus providing a universal solution.
The configuration initialization should occur exactly once, and all application components should access this one source. The initialization can either occur in a special initialization file or in a function that then passes the configuration objects to the different parts of the application.
For tests, it may be necessary to overwrite certain configurations. In such cases, it is often easier to create separate test configurations or use appropriate setup and teardown functions in the tests.
In conclusion, the configuration of an application is a critical aspect that needs to be carefully planned and implemented. Flexibility and security should always be emphasized to adapt the application to different environments and protect sensitive information.
2023-10-02