Europe’s largest private AI lab Silo AI, with its large language model (LLM) arm SiloGen, is today announcing a large-scale initiative on open and trustworthy LLMs. SiloGen launches a consortium together with TurkuNLP, a research group at the University of Turku, to develop a family of open LLMs, including the world’s largest open source LLM. The initiative aims to ensure European digital sovereignty and democratize access to LLMs. Focusing on more accurate, trustworthy, and robust downstream applications of LLMs, SiloGen is also initiating the development of an LLM development suite.
To ensure digital sovereignty and democratize access to LLMs, SiloGen launches a consortium together with TurkuNLP, with a key focus to develop the world’s largest open source language model, covering all official European languages.
In addition to compute access totaling approximately 15 million GPU hours, the initiative is dedicated to ensure that data utilized in these models accurately represent European languages, also covering the English-speaking world. The initiative is conducted in close collaboration with key European institutions and agencies, and is committed to adhering to European regulations. Beyond Europe, the open source initiative will democratize access to LLMs and enable the development of use-case specific downstream applications.
The consortium, led by SiloGen and the University of Turku’s TurkuNLP research group, stands apart from many other initiatives in that it uniquely combines resources to build LLMs:
- World-class LLM team, including professors and leading scholars like Filip Ginter, Jussi Karlgren, Sampo Pyysalo, Magnus Sahlgren, Aarne Talman among others, as well as others involved out of Silo AI’s more than 150 PhDs and 300 AI experts,
- Data resources covering all European languages and code, including High-Performance Language Technology (HPLT) data, and other collected and curated data, and
- Access to compute, including software infrastructure to train LLMs and access to the LUMI supercomputer and other hardware and cloud services for LLM training.
In addition to a world-class team, the consortium has access to, and experience with, the LUMI supercomputer, which as one of the European High-Performance Computing (EuroHPC) undertakings is the third largest supercomputer in the world and the largest in Europe. Having built LLMs on LUMI for more than a year, the team has developed a distinctive software layer for training LLMs effectively and efficiently on the AMD-based hardware. As part of the EU-funded HPLT project, the data for this initiative has been collected and curated since early 2022 to provide a representative basis for LLM development. Combining all of this with a total of 15 million GPU hours, Silo AI and TurkuNLP are uniquely positioned to train a family of language models, including the world's largest open LLM.
“We are honored to contribute to the development of open LLMs. The development of base models aligned with European values is imperative for our digital sovereignty. This initiative helps to ensure that underlying models are based on data and information representing the citizens and organizations of the region, and overall compliance with regulation, data privacy and other vital concerns. And eventually we need sovereignty on how downstream applications and value creation happen. This requires trusted and secure approaches to independent base models that enable fine-tuning for domain-specific needs. This way we can ensure digital sovereignty, while advancing technological development,”
says Peter Sarlin, CEO and co-founder of Silo AI.
“LLMs are rapidly reshaping how we access information and interact with technology. As their impact grows, it is increasingly important to assure that the models are developed in a transparent and reproducible manner and made openly available to ensure accountability and equal access to the technology. From a European perspective, it is also critical that models are designed from the outset to prioritize multilinguality and an equitable approach to all languages. The High Performance Language Technologies (HPLT) project is addressing these goals through the creation of open European data resources and language models and delighted to partner in this consortium with SiloGen and Silo AI, an industry leader with shared goals"
says Sampo Pyysalo, University of Turku Research fellow and HPLT principal investigator.
Beyond open base models, SiloGen is also expanding its LLM development platform, to cater to the need to build more accurate, trustworthy and robust downstream applications. This extends Silo AI’s and SiloGen’s existing platform for data-centric AI development. The platform includes tooling for synthetic data generation, human feedback, and quality testing. It also comes with a long track record for natural language processing (NLP), vision and perception, as exemplified by projects together with Allianz, Honda, Rolls-Royce, Sandvik, Tietoevry and many more. As part of this large-scale initiative, SiloGen is committing to a significant investment and development effort to build additional platform features dedicated to ensuring trustworthy LLMs. Part of this investment is made possible through a grant from Finland’s innovation funding agency Business Finland.
In the years to come, the initiative will continue to develop both data resources and models that reflect regional characteristics, as evidenced by SiloGen’s collaboration with the Finnish public service media company Yle. Commenting on the collaboration, Merja Ylä-Anttila, CEO of Yle, comments:
"For Yle it is of utmost importance that in the years to come we will have readily available access to language models that are based on our languages and that truly reflect our local culture. We are more than happy to be part of the exploration on how public service media companies around Europe can participate in the development of trustworthy AI technologies, including language models, that take the rich diversity of languages and cultures into consideration.”
The TurkuNLP research group’s extensive experience in NLP and LLMs aligns with Silo AI’s and SiloGen’s commitment to contribute to world-class research on generative AI. The alignment, combined with the resources of the consortium, provides a robust foundation for redefining the boundaries of what is possible in the world of open source language models. Together with the LLM development platform, this opens a unique path for companies to create value using independent, trusted and secure base models with a possibility to finetune, instruct and control LLMs for domain-specific needs.
University of Turku
Want to discuss how Silo AI could help your organization?
Join the 5000+ subscribers who read the Silo AI monthly newsletter to be among the first to hear about the latest insights, articles, podcast episodes, webinars, and more.