CAISA Lab

Highlights

2025-2026 Perspective: From Making AI Work to Making It Matter

The past decade of AI was largely driven by one question: how to make large language models work at all. How to scale them, stabilize them, and push their capabilities far enough to be usable.

The turn of this year feels different — because a longer research arc has become clear. It is no longer about whether these systems work, but about what we do with them once they do. How do we integrate them into human workflows responsibly? How do we make them robust, interpretable, and trustworthy under real-world uncertainty? And how do we embed them into science and public institutions in ways that scale—and last?

This shift, from making AI work to deciding how it should work for people and science, is what connects our recent results and what will guide our work over the next 5+ years. Almost everything my group has been building converges on two tightly connected research lines: Socially Aligned Artificial Intelligence, and AI for Accelerating Scientific Discovery, especially in physics.

What connects them is a shared methodological core: robustness, efficiency, and interpretability of foundation models. These properties are not optional. They are what make AI systems trustworthy in social settings and usable as scientific instruments. I am also glad we can embed this work into shared, centralized infrastructures for open AI science at HPC scale.

Socially Aligned Artificial Intelligence

Our Socially Aligned AI research asks how LLM systems model people, social interaction, and society—and how these capabilities can be measured, interpreted, and governed responsibly. A major milestone for our group this year is being awarded the ERC Starting Grant “LLMpathy”, accommodating five new researchers across personal psychology, LLM reasoning, and agentic simulations. What excites me most is what it enables: social intelligence becomes a measurable research object. With LLMpathy, we can simulate, stress-test, and systematically analyze social reasoning in large language models, rather than relying on anecdotal behavior or narrow benchmarks. This builds directly on our completed junior research group Dynamically Social Discourse Analysis and recent ACL and EMNLP publications.

Equally important is the community forming around this work. The approval of our Dagstuhl Seminar on “Social Artificial Intelligence” for summer 2026 with colleagues from Harvard, JHU and CMU, and the upcoming ACM CHI workshop “Redefining Empathy” in April signal that the field is ready for deeper, more reflective conversations. For me, socially aligned AI is inseparable from AI safety and long-term resilience: the question is not if AI will shape human systems, but how we design that integration to remain human-centered and sustainable.

AI for Scientific Discovery

Our work on AI for Scientific Discovery connects directly to alignment as both depend on transparent, uncertainty-aware, and interpretable methods. Physics, in particular, is unforgiving. Models must cope with distribution shifts and integrate into workflows where assumptions are constantly challenged. In that sense, AI for physics is one of the most demanding testbeds of methods that steer, explain and control transformer-based predictive systems.

A concrete example was our ECML PKDD Challenge “Colliding with Adversaries”, which created a shared evaluation space for machine learning and physics researchers and made conceptual challenges such as correlation attacks and physics-aware attacks visible in a way that mattered to both communities. Looking ahead, we aim to increasingly focus on physics foundation models in HEP, astroparticle physics, and astrophysics, and on combinations of LLM-based scientific agents that assist with analysis workflows and tool-augmented reasoning. My group will grow by four physics researchers this year to explore these opportunities.

This line of work is anchored in the newly acquired Dynaverse Excellence Cluster, where I serve as PI for AI for Astrophysics, and reinforced by my BMFTR ErUM-Data projects — AISafety, AALearning, and Physics-LLM —linking our group with partners at Bonn, RWTH Aachen, TU Dortmund, DESY, Forschungszentrum Jülich, TUM, and the Leibniz Institute for Astrophysics Potsdam. AISafety and AALearning reframe adversarial learning as a scientific tool for physics-informed simulations, while Physics-LLM explores how large language models can support knowledge organization and reproducibility in large physics collaborations. Complementary DFG-funded work on geometric representation learning, TRA Synergy Bubble AI for Astrophysics and our new connection to the ELLIS network add methodological depth and international structure.

Open-Source Foundation Models

As chair of Lamarr NLP, I’m happy about a third strand that matured significantly - our work on open-source foundation models.

For me, this is about making values like transparency, reproducibility, and controllability operational. In close collaborations with Fraunhofer IAIS, TU Dortmund, and Hessian.AI, the research insights and infrastructure came together. The JQL pipeline paper at EMNLP and the TACL paper on multilingual pruning address data quality, controllability and efficiency—prerequisites for serious open foundation models. Community matters here too. The Polyglot LLM Workshop we co-organize, and the fact that Nicolas Kluge will be joining our group after his award-winning work on Portuguese LLMs, strengthens our commitment to multilingual, open, and culturally grounded models. Looking forward, I see strong potential in combining state-space models with neural foundation models into hybrid architectures, especially for long-horizon reasoning.

Institutionally, this work is embedded in the Lamarr Institute, and via Fraunhofer IAIS connected to new broader OSFM initiatives and the Jupiter AI Factory. Shared compute, tooling, and deployment pathways make it realistic to maintain and evolve models beyond individual projects.

People, Transitions, and Research Community

In numbers, this year in my group encompasses 30 paper preprints, 20 international guest speakers, 3 DAAD visiting researchers, 6 new research grants, and one more professorship for our PhD alumni. But this year also marks a significant transition for the team. With the successful completion of five BMBF projects, several colleagues will be moving on at the end of the year—something that always feels bittersweet, but also reflects the training role of the group. At the same time, we are entering a growth phase. Over the coming months, the group will welcome four new researchers in AI for Science and four new researchers funded through the ERC, complemented by Florian’s junior research group on AI safety, which will work closely with Nicolas.

Alongside research, the past year also emphasized community engagement and outreach. We helped shape international discourse by organizing the INLG conference in Hanoi, strengthened transatlantic exchange through visits to Canadian AI institutes, and actively participated in AI policy and public debates across Germany, from Düsseldorf and Bonn to Berlin. In addition, it was fascinating to meet young AI talents in the Bundes-KI Wettbewerb and EGOI girls’ informatics olympics.

Looking Forward

I am convinced that the next phase of AI will not be defined by scale alone, but by whether we can align powerful systems with human judgment, scientific rigor, and institutional responsibility.

With socially aligned AI, physics foundation models, scientific agents, hybrid architectures, and open infrastructures coming together, we are now in a position to ask not just what AI can do, but what it should be trusted to do—and under which conditions. That is the question we are choosing to work on now, while the technology works well enough for the answer to matter.

To our dedicated researchers, collaborators, and supporters: thank you for your unwavering commitment to pushing the boundaries of human knowledge. Here’s to another year of curiosity, innovation, and transformative research!

Previous post
Highlights