Thursday, January 22, 2026
Technology
18 min read

CP2K: The Leading Open-Source Software Driving AI Model Development

Phys.org
January 19, 20263 days ago
Atomistic simulation software CP2K enables AI models

AI-Generated Summary
Auto-generated

The CP2K open-source software, widely used for atomistic simulations, is crucial for training AI models in materials science. Recent contributions from CASUS and international collaborators have expanded its capabilities and user base. A new overview article details CP2K's practical applications, including its role in generating data for AI surrogates, enabling more complex scientific computations.

The CP2K open-source package is among the top three most widely used research software suites worldwide for simulating the behavior of atoms and molecules. Among other applications, CP2K plays an important role in generating data used to train artificial intelligence (AI)-based models that determine molecular energies and forces. Since its beginnings, the range of the package's methods and functions has grown steadily—thanks to contributions from the Center for Advanced Systems Understanding (CASUS) at Helmholtz-Zentrum Dresden-Rossendorf. Together with colleagues from Germany, Switzerland, the UK and Canada, the CASUS team has now summarized the current status in an overview article. The paper, published in the Journal of Physical Chemistry B, focuses on the practical application of CP2K and is directed at new users from theoretical chemistry, materials science and neighboring fields. The CP2K package contains apps and algorithms for simulating the behavior of atoms and molecules based on "first principles." This means that it is built exclusively on fundamental physical models and does not require additional data from, for example, experiments. It uses classical and quantum mechanical approaches to calculate both the static properties as well as dynamic behavior of individual atoms or molecules in gases or solutions as well as large crystal lattices or two-dimensional materials. "The aim of CP2K is to predict average properties, such as those that arise in statistical mechanics and thermodynamics, of any substance made of interacting electrons and nuclei," says Dr. Frederick Stein, one of the paper's main contributors and research scientist at CASUS. "A distinguishing feature of CP2K is that from a wide variety of different classical and quantum mechanical energy and force methods, users can choose and combine the methods suitable for their needs." "We can see that interest in CP2K has grown tremendously in recent years," says Prof. Thomas D. Kühne, Director of CASUS and leader of the "Theory of Complex Systems" research team. "If a scientific tool is in such high demand, we see it as our responsibility to maintain and expand the suite." In addition to implementing new features, mostly having been requested by the research community, the CASUS CP2K team also provides user support, contributes to publications on its use, promotes the software at conferences and workshops and advises other developers in implementing new features. In materials science, simulations are an indispensable tool for screening a large number of theoretically possible materials to identify a few promising candidates that can then be investigated experimentally. However, these types of simulations are computationally demanding. They require both high-performance computing hardware as well as software tailored to the hardware and the task to be computed. CP2K calculations have long accounted for a significant proportion of the total computing time in many large supercomputing centers. "The software is highly scalable allowing efficient calculations with tens of thousands of central processing units or thousands of graphics processing units simultaneously," says co-author Dr. Johann Pototschnig, member of CASUS' CP2K team. "It is also optimized with regard to the physical models to be simulated and their algorithms. For example, one can reduce compute time by choosing the most efficient methods from both the classical or quantum mechanical realm." Providing data needed to train AI models CP2K is widely used to generate high-quality electronic-structure data for training AI models in atomistic and materials science. Such data are primarily generated computationally using dedicated software packages such as CP2K. Thus, it is a prerequisite for advanced AI methods in the field of theoretical chemistry and materials design. Beyond that, the CP2K package includes various AI surrogate models: machine-learned approximations of computationally expensive mappings that CP2K would otherwise compute explicitly. Instead of solving the full quantum-mechanical problem at every step, those AI surrogate models produce approximate results at a fraction of the computational cost. Using AI surrogates inside CP2K thus extends accessible time and length scales. Only the combination of large-scale simulations and AI models allows researchers to tackle the most complex computations. The possibility of generating high-quality data for training AI models is certainly one reason for the increased interest in CP2K. This interest mainly stems from user groups previously not connected to CP2K. The new overview paper aims to make it easier for newcomers to get started with the complex topic of atomistic simulations in general and how they are done using CP2K. The publication introduces the underlying methods and covers all capabilities of CP2K. Unlike other papers that focus on individual theoretical or application aspects, it is aimed specifically at an overview including aspects for practical use. The effort has been initiated by the CASUS team, who invited many other individuals and groups contributing to CP2K. Dr. Andreas Knüpfer, leader of the CASUS Scientific Computing Core, says, "This overview paper on CP2K reflects our department's mission to support cutting-edge research in various fields with our expertise in computational science. At the same time, it is important to make the results accessible and enable their transfer to other fields and application areas. Ultimately, this also benefits CP2K itself, because only with a broad community that is welcoming to new members, it can maintain the leading role it currently holds."

Rate this article

Login to rate this article

Comments

Please login to comment

No comments yet. Be the first to comment!
    CP2K Software Powers AI Models: Top Simulation Tool