Categories
Allgemein

Achieving European AI Sovereignty

I do not usually write about socio-economic or political issues in this blog. However, I would like to discuss European AI sovereignty this time, because I believe it to be linked to the core topic of this blog: reliable software.

The Economic Need for AI Integration

Integrating AI into core business processes is a macroeconomic necessity for Europe. Companies must use AI to automate tasks, analyze data, and improve overall efficiency to remain competitive in the global market. In my opinion, the real economic value of AI is not generated by simply having access to raw technology. To me, it seems clear that the true value is created at the application level and by incorporating AI into repetitive business processes that benefit from it. Businesses need to build software and integrate AI directly into their workflows, enterprise tools, and consumer products. I am deeply convinced that European companies need a reliable and stable foundation to focus their capital and engineering efforts on building them.

European Dependence on American AI

Right now, the foundation for European AI applications relies heavily on American technology. I see this reliance as a severe strategic vulnerability. To me, there is a parallel here to Europe’s past dependence on Russian gas. Germany imported 55% of its gas from Russia when Russia invaded Ukraine in February 2022. In 2025, Anthropic, OpenAI, Google and Meta captured a staggering estimated 96% of Enterprise LLM API market share. Just as relying on a single foreign state for energy proved to be a massive economic and political liability, I believe relying on the United States for core AI infrastructure carries geopolitical risks.

Many have been alarmed by the recent price hikes of multiple US companies such as GitHub Copilot switching to usage-based billing. But there is also a greater, political risk. The core issue is not that individual American companies might raise their prices or change their terms of service. The real threat is that the US government could impose export controls or trade restrictions on AI technology. If Washington decides to restrict foreign access to advanced models or compute resources due to shifting political priorities, European businesses built entirely on American APIs could be cut off without warning.

The EU AI Action Plan

When I look at the European Union’s AI Action Plan, I see a well-intentioned attempt to address these technological gaps. However, I think the current strategy focuses too heavily on training new foundational models from scratch.

Training large AI models requires massive capital expenditure (CapEx) for compute resources. To me, this looks like a CapEx trap. Foundational models are rapidly becoming commodities, and I do not think Europe needs to outspend American hyperscalers in a brute-force training race.

Since the EU has many issues to solve (renewable energy transition, managing the industrial decline, etc.), which might compete for subsidies, I do not believe entering into the ruinous race for frontier models is a good allocation of public funds. Instead, I believe our funding and effort should focus on creating European infrastructure for existing models.

Sovereign Inference Infrastructure

Instead of training models, I believe the priority for Europe is building sovereign infrastructure for AI inference. Inference is the process of using a trained model to generate text, analyze data, or make decisions.

To achieve sovereignty, I think European applications need to run on data centers located on European soil that are specifically optimized for inference workloads. While the exact technical requirements for inference hardware might evolve, controlling this infrastructure ensures that European companies have reliable access to AI capabilities without the risk of foreign government intervention. In my view, the infrastructure is simply the enabler; it provides the stability needed for the application layer to thrive.

I do acknowledge that, at least initially, this would mean dependence on recent NVIDIA GPUs from the US or specialized NPUs (like Huawei’s Ascend) from China. This is a case where perfect should not be the enemy of good. The supply chain risk of modern hardware is also significant, but this concern can be addressed separately.

Concrete measures could for example include:

  • tax coupons for companies spending on EU-based AI platforms
  • direct subsidization of companies investing in local datacenter expansion
  • energy tax exemptions

All of these would of course need to be implemented EU-wide to protect the EU Single Market integrity. But these are just some of my proposals. I am convinced creative policymakers will find even better ones.

Trustworthy and Accessible AI

To mitigate geopolitical risks, European companies could rely on open-weights models. Because the model weights are publicly available, they can be downloaded and run independently.

My goal here is not to suggest that every company must build highly complex, air-gapped server rooms. Rather, I think the objective is to ensure that AI models are accessible and provided by trustworthy parties. Businesses need to know that the infrastructure running their core applications is transparent, reliable, and free from foreign political leverage.

Chinese Open-Source Models

When selecting open-weights models for inference, I know companies have several global options, including models developed in China. During his talk at the OMR festival, tech analyst Philipp Klöckner discussed the utility of these Chinese open-source models. He noted that they are often highly compute-efficient for inference, meaning they require less hardware to run and return outputs quickly.

In contrast, he also issued a warning about the associated risks. These models can contain embedded censorship to align with Chinese state guidelines. However, since these models are open in terms of structure and trained weights, it would also be an option to “de-censor” them by fine-tuning them with uncensored content.

There also exist some open-source options from the US such as OpenAI’s gpt-oss and Meta’s Llama families. The French company Mistral is also developing open-source models, which are somewhat in the middle of the pack on many benchmarks. From their online presence it appears to me that they are focusing on full-stack AI offerings including tailored AI solution instead of the most capable foundational models. Chinese models are certainly leading currently.

An overview of the wider AI model ecosystem could for example be found at llm-stats.com. I find the landscape of available open-source models to be highly attractive.

Inference-as-a-Service

Setting up and maintaining local inference infrastructure is complex. Most standard businesses simply do not have the operational capacity to manage it. To bridge this gap, I believe Europe needs strong Inference-as-a-Service (IaaS) providers.

These IaaS providers act as the trustworthy parties Europe needs. Scaleway is a notable European success story in this sector. As a cloud provider, they supply the necessary infrastructure for companies to deploy models securely within European borders or run the full operation with a simple-to-use API endpoint. Alongside Scaleway, I see emerging providers like Inceptron building services to help businesses run inference workloads efficiently. An overview can for example be obtained from the AI Atlas. The ecosystem has not quite grown large enough to convince me to ditch OpenRouter and subscribe to a European provider exclusively, but I see an opportunity. By utilizing these platforms, European developers (such as me) can get the accessible infrastructure required to power their software without relying on foreign APIs.

Conclusion: A Practical Blueprint for Europe

Achieving European AI sovereignty requires a practical approach. In my opinion, it does not mean competing in the expensive race to train the latest foundational models.

While it is hard to predict exactly how the global AI landscape will play out over the next few years, I am convinced that a more pragmatic strategy is our best path forward. By recognizing the political risks of relying on foreign AI infrastructure, Europe can adapt. The path forward involves utilizing open-weights models hosted by trustworthy, local IaaS providers. I believe this approach secures the technology stack and provides the exact stability European businesses need to capture the true economic value of AI at the application layer.

I am looking forward to being pleasantly surprised!

PS: Ironically, this post has been co-authored by Gemini.

Categories
Allgemein

Try to Falsify Your Theories Before You Act On Them

Edsger Dijkstra once stated that “[…] program testing can be used very effectively to show the presence of bugs but never to show their absence.” A passing test does not prove you are right. A failing test, however, gives you something solid: an assumption that did not survive contact with reality.

This asymmetry shapes how I think about writing software. Every line of code is based on a theory—about what a function does, how hardware behaves, what the user actually needs. The goal is not to prove yourself correct. It is to find out where you are wrong, as early and as cheaply as possible, before that wrongness gets baked into software you will ship.

Why formulation matters

Most programmers test their assumptions from time to time. You have a hunch something might be slow, so you measure it. You suspect a race condition, so you write a stress test. That is already good practice. But there is a difference between having an assumption in the back of your mind and stating it as an explicit theory. The difference matters in three ways.

First, a properly stated theory opens itself to critique. “This interrupt handler might miss events under load” is a vague worry. You cannot easily disprove it because it does not commit to anything specific. “If the interrupt fires slower than every 200 microseconds, the handler will still process every event” is a theory. It makes a concrete claim. You can write a test that generates interrupts at increasing rates and checks whether the dropout pattern matches the prediction. The precision invites falsification. The vagueness does not.

Second, unstated assumptions feed decision paralysis. When you have not laid out what you believe and how strongly you believe it, every option looks equally risky. You end up stuck because you cannot weigh one concern against another. Formulating “I am 80% confident this message queue blocked because of excessive load in the other process” gives you something to act on. It also tells you exactly where to invest more investigation if that 20% doubt turns out to matter. You move forward with strengthened confidence instead of standing still with diffuse anxiety.

Third, explicit theories make conversations with coworkers more productive. “I chose to apply a moving average before processing this sensor data; I found it to differ by 200 mV between cycles.” That sentence explains the reasoning. A colleague can now engage with the theory directly. Maybe they know something about the sensor or measurement that weakens it. Maybe they can suggest a simpler approach that still satisfies the constraint. What they cannot do is stare at the code and wonder what you were thinking. The theory documents the intent.

The day-to-day cycle

Once the theory is stated, the next step is finding the cheapest way to challenge it. The cost of being wrong rises sharply once code has been integrated, tested on hardware, and deployed. So you want the falsification attempt to happen as early and as lightly as possible.

I ran into this with a popular theory about performance among my coworkers. They have been inlining logic by hand because the theory was: “Function call overhead in this hot path will eat into the timing budget enough to matter.” That is a statement you can check. I collected some timing measurements comparing the inlined version against one with proper function calls. Every other aspect of the code had larger impact than the manual inlining. The measured difference was noise. The theory was false. Thirty minutes saved me from continuing to write harder-to-read code for no reason.

The check does not need to be elaborate. A unit test that exercises a boundary condition is cheap. A small code experiment that runs on the bench is cheap. Asking the engineer next to you “does this match your understanding?” is cheap. The key is that the check must be capable of returning a clear “no” if the theory is wrong. If the test cannot fail, it cannot teach you anything.

I acknowledge that cheap might be a relative term if it comes to safety-critical applications or the implications of a false “no” are otherwise large. Adjust your efforts accordingly.

Testing assumptions about the world

Not all theories are about code behaviour. Some are about the environment the code operates in. These domain assumptions cause just as much waste when they are wrong, but they can be harder to spot because you question them less frequently.

I recently needed to decide whether to keep supporting a particular hardware configuration. The configuration complicated the code with extra branches and special cases. My theory was: “Nobody is using this configuration in the field anymore.” I could have acted on that hunch, or I could have carried the complexity forward out of caution. Instead, I asked the team that manages customer configurations. Ten minutes of conversation confirmed the configuration had been absent from all deployments for years. The theory held, and we removed the code.

That is the same falsification logic applied to the problem domain rather than the implementation. The check was cheap—a conversation instead of a test script—but it served the same purpose. It let reality push back before I committed.

The distinction between essential and accidental complexity runs through this. A system that carries accidental complexity becomes harder to test, harder to modify, and harder to reason about. But you cannot tell which complexity is essential without testing your beliefs about the problem. Expert interviews, searching through field logs, checking with the team that handles customer reports—these are all ways to surface wrong assumptions about the world your code lives in.

Misguided effort

Wrong theories cost you in two ways. The first is direct: you build something based on a false belief, and it breaks. The second is more subtle: you build something based on a belief that is not false, but also not worth spending time on. You optimize a code path that runs once at boot. You guard against a timing condition that cannot occur with your scheduler. You design around a constraint that does not actually constrain anything.

This second category is harder to notice because the code works correctly. It just does not address anything relevant. The only defence is to state the belief clearly enough that you can ask: is this worth my attention right now? “Register spilling is the performance bottleneck” is a theory you can check. But the prior theory — “performance in this code path is currently a problem we need to solve” — also needs checking.

Summing up

The argument is not that you need certainty before you act. It is that the steps should be small, the theories should be written down or spoken aloud, and the checks should be cheap. When each piece of a system rests on a stated and examined belief, the whole holds up better. When something does go wrong, you can trace it back to a specific theory and re-examine it.

The habit builds over time. You start reaching for performance logs, the integration test, or the colleague with domain knowledge before you reach for the editor. You get better at phrasing your hunches in ways that can be shot down. A ten-minute check that saves a week of rework is not a distraction from the real work. It is the real work, done in the right order.

Categories
Allgemein

Observability is More Important Than You Think

By the Wikipedia definition, observability (in control theory) is a measure of how well internal states of a system can be inferred from knowledge of its external outputs. This idea translates very well to software and has thus been borrowed for this context as well. So, as a helpful definition, let us consider observability to be the extent to which we can determine the state and operating mechanisms (which we will jointly call “internals” going forward) of a software system during runtime by the outputs it produces.

In this sense, a software system which does not provide any indication of its internals, a complete black box, would be one with the least observability. A software system which reveals every detail of its internals, a complete white box, would be considered a perfectly observable system. Any real software system falls in between.

How does it help with reliability?

Why would we consider observability to be a quality metric, which helps us with reliability? Observability is essential for reliability, because reliability is about confidence that software behaves in accordance with its specified or implicit requirements or our own intentions. For such confidence to be achieved, one obviously needs to make a determination about the software behavior. While in some limited circumstances, a black box determination about the behavior of software is sufficient, e.g., the software only needs to solve some tasks, as software engineers, we most often can not ignore the internals. Broken software behind a currently working interface is not what we want. We want to have our software follow the paths we have envisioned and be “correct” in a much broader sense.

For a more concrete example, I have been working on some stock picking tools in my free time. Everything is very much based on probability theory and while the outputs of the tools often yield a convincing-looking statistic property, the devil is often in the details. I quickly learned that through most steps, it makes sense to visualize the distribution of the data and put the final result into the context of the intermediate steps. For example, in the figure below it is clear that the mode of my predictions differs from the median in the observed data, which is indicated by a vertical line. There is a significant left-skew.

Such surprising observations have often been a great starting point to improve my models.

But also on more “low-level” or “enterprise” problems than a Jupyter notebook, getting the right cues at the right time can help tremendously by:

  • Cutting debugging times, because you know more about an issue when it gets reported.
  • Stumbling upon odd behavior, which warrants an investigation.
  • Helping developers less familiar with the code to get an intuition about how the software operates.

So next, we discuss what can be done to improve observability.

How to Achieve Observability?

The best thing about observability is that every bit helps. Sure, there is a limit, where the flood of information is overwhelming, but most software doesn’t tell you enough about itself. There is the concept of the “Three Pillars of Observability” (logs, metrics, and traces) and that certainly is a great framework. In this section, I will just present some ideas which I found to be important, but starting with good intentions and a decent plan (which you can later improve) will probably get you far. If a structured framework, like the three pillars, helps you with that, then all the power to you.

Logging

A good log always helps. Make sure errors and unexpected events are marked clearly. Most of the time, a logging framework with loads of functionality can help you with that. Don’t be afraid to log many things at verbose logging levels. Find a mechanism that ensures you receive a log file every time an issue is reported. Be it by an automated process or some organizational provisions.

Make it Visual

If your software creates complex structures or states, find or implement a mechanism to get a visual representation to help with understanding the state. Be it by creating figures, dashboards or creating a graph, every bit helps.

Even if you are working on an embedded system or are running a headless server, you can probably store a CSV file and visualize it with a bit of Python afterwards. Be creative here.

Monitor What You Care About

Have some easy monitoring utilities to take measurements about the stuff you care about. Examples include execution times of critical sections, memory usage after initialization or in proportion to load or maximum iteration count until an event is finished. Sometimes even call counts or call frequencies can be helpful.

These don’t have to be complicated or even perfect. A small helper class or sometimes even just a macro can do the job. Make sure to make it a small snippet that you can just place wherever you need it.

Provide a Way to Look at the Interpretation of Binary

At least optionally, whenever you are dealing with binary data that your software needs to interpret, your software should have means to tell you what it thinks was in that data. Consider a message with 1 byte identifier and 4 bytes of data. 0x1e:BE:02:3C:AF is probably much less clear than COMMAND_UPDATE - FLAG_PRIORITY new value: 2 @ entry: 15535. This helps especially if the interpreatation of your software is wrong.

The same applies to sufficiently complex plain text files (like large JSON files).

Summary

Observability is an essential precondition for reliability, as only if you can observe the internals of your software, you can make a complete statement about its behavior. Hopefully, this article convinces you that observability is an important part of a reliable software system. And most importantly, you should be convinced that you have the means to improve the observability of the software you work on.

Categories
Allgemein

Software Quality in Academia

Most people, that studied or worked in an academic setting, have come across pieces of software, that are hard to understand and hard to maintain. I just handed in my master thesis, and came across many instances of low quality code during my studies. The most obvious shortcoming, was an almost complete absence of tests. I rarely found tests in the source code related to scientific publications. On a side note, I was not required once in my years of study, to provide any test case whatsoever.

Another big issue is often that academic source code does not have any abstractions. There is a very narrow problem, that is being tackled by the research conducted and reuse of the source code for other (related) problems does not seem to be intended. Interoperability is also often an issue (likely also because of a narrow research scope). Often researchers could easily make a library from their code base, but do not do it.

I acknowledge that there are already good posts on these issues out there, often from people, who are much more involved in academia. I liked the article “On the quality of academic software” by Daniel Lemire a lot. Also, “Why Academic Software Sucks” by Matthias Döring is a good (short) read. However, I would especially like to discuss the incentive structure in more detail, to identify causes and solutions.

Incentives

Academic researchers, be it post-docs, professors, or people working on a bachelor, master or PhD thesis, are judged by their publications or submissions. They are not being judged by the code that supports their research. This incentivizes them to focus on academic writing, instead of writing code. There is in general nothing wrong with that. Academic writing can act as an incredibly accurate and concise documentation to a project. However, if low quality software inhibits further research or reproduction, there is a problem.

Many people working at universities told me that academia is an incredibly competitive field (especially before you become a professor). The pressure to publish breakthrough results at a high pace is immense. In addition to the inherent quality issues arising from such pressure, this also fosters an environment of individualism and distrust. That means that researchers often work on their software completely isolated and have no interest in the success of their peers.

Academic researchers also often receive no economic benefit from providing high quality software. Their research is mostly funded by (public) research grants, and qualitative software requirements are often not a thing.

Possible Solutions

Universities do operate under very specific social, political and economic conditions. Those conditions are unlikely to change abruptly and insofar, everything I can state as a possible solution can likely not get implemented by any individual (contrary to the general claim this blog is based on). However, maybe these points help to identify positive change, where it arises.

It should become the norm in academia to deliver high quality source code and interoperable and robust software. Publications should be judged together with supporting software, as that software is required to reproduce and extend the results of the publication.

Cooperation in academia should be encouraged. Working together on a problem helps to deliver better results. That holds both for delivered software, and the overall results of the research. This can especially come in the form of inter-university open source projects, where an effort is made to maintain a common, well-maintained codebase.

At last, I believe that, where public funds are granted for software-centered research, a high quality open source software should be the result. Such qualitative requirements should be part of research grants.

Categories
Allgemein

History Comments

Comments are a useful tool to give contextual information directly in the source code. They are most typically used as a clarification comments and documentation comments (top Google result for code comments). However, there is a third use which I found comments to be useful for, documenting history!

Categories
Allgemein

Is shorter Code always better?

As developers, we are always searching for ways to make our code more concise, structured and understandable. Therefore, short code is preferred over longer code. This post explains why generally we find shorter code better and what some exceptions are.

Categories
Allgemein

Communicating Code

A big part of being a programmer is describing code to peers. However, describing the abstract and formalized code constructs in natural language is not always an easy task. This article highlights some techniques that help communicating code to peer programmers. The focus is on communicating the code “itself” instead of through its behavior, which is a whole different topic.

Categories
Allgemein

Technical Debt

Technical debt sounds scary. From what we know from financial debt, all debt has some inherent risk that it starts consuming all the income/revenue one has. One becomes insolvent and that is not at all what one wants to be. That same thing may happen with technical debt too. If all development efforts are used up by maintenance tasks and no actual progress can be made, a project or organization reaches technical insolvency.

Categories
Allgemein

constexpr Variables

constexpr variables are a powerful feature of C++. They have the potential to improve runtime performance and also convert some runtime errors to compile time errors. However, they are sparely used in code bases due to the constraints they impose on the developer. This post will argue that their use, especially in the context of templating should be considered more often.

Categories
Allgemein

Property-Based Testing

Property-based testing is a technique I came into contact with that probably changed the way I code the most. It is a powerful tool to describe the properties (/the behavior) of the units software consists of. It greatly reduces the amount of test code that needs to be written, while increasing the coverage of edge cases at the same time. We will start by finding out that we actually have a problem with existing test methods.