Blog

SiloGen launches a consortium to build the world’s largest open LLM

SiloGen team with the title of the press release on the bottom part of the picture.

Europe’s largest private AI lab Silo AI, with its large language model (LLM) arm SiloGen, is today announcing a large-scale initiative on open and trustworthy LLMs. SiloGen launches a consortium together with TurkuNLP, a research group at the University of Turku, to develop a family of open LLMs, including the world’s largest open source LLM. The initiative aims to ensure European digital sovereignty and democratize access to LLMs. Focusing on more accurate, trustworthy, and robust downstream applications of LLMs, SiloGen is also initiating the development of an LLM development suite.

To ensure digital sovereignty and democratize access to LLMs, SiloGen launches a consortium together with TurkuNLP, with a key focus to develop the world’s largest open source language model, covering all official European languages. 

In addition to compute access totaling approximately 15 million GPU hours, the initiative is dedicated to ensure that data utilized in these models accurately represent European languages, also covering the English-speaking world. The initiative is conducted in close collaboration with key European institutions and agencies, and is committed to adhering to European regulations. Beyond Europe, the open source initiative will democratize access to LLMs and enable the development of use-case specific downstream applications.

The consortium, led by SiloGen and the University of Turku’s TurkuNLP research group, stands apart from many other initiatives in that it uniquely combines resources to build LLMs:

  • World-class LLM team, including professors and leading scholars like Filip Ginter, Jussi Karlgren, Sampo Pyysalo, Magnus Sahlgren, Aarne Talman among others, as well as others involved out of Silo AI’s more than 150 PhDs and 300 AI experts
  • Data resources covering all European languages and code, including High-Performance Language Technology (HPLT) data, and other collected and curated data, and 
  • Access to compute, including software infrastructure to train LLMs and access to the LUMI supercomputer and other hardware and cloud services for LLM training.

In addition to a world-class team, the consortium has access to, and experience with, the LUMI supercomputer, which as one of the European High-Performance Computing (EuroHPC) undertakings is the third largest supercomputer in the world and the largest in Europe. Having built LLMs on LUMI for more than a year, the team has developed a distinctive software layer for training LLMs effectively and efficiently on the AMD-based hardware. As part of the EU-funded HPLT project, the data for this initiative has been collected and curated since early 2022 to provide a representative basis for LLM development. Combining all of this with a total of 15 million GPU hours, Silo AI and TurkuNLP are uniquely positioned to train a family of language models, including the world's largest open LLM.

“We are honored to contribute to the development of open LLMs. The development of base models aligned with European values is imperative for our digital sovereignty. This initiative helps to ensure that underlying models are based on data and information representing the citizens and organizations of the region, and overall compliance with regulation, data privacy and other vital concerns. And eventually we need sovereignty on how downstream applications and value creation happen. This requires trusted and secure approaches to independent base models that enable fine-tuning for domain-specific needs. This way we can ensure digital sovereignty, while advancing technological development,”

says Peter Sarlin, CEO and co-founder of Silo AI.

“LLMs are rapidly reshaping how we access information and interact with technology. As their impact grows, it is increasingly important to assure that the models are developed in a transparent and reproducible manner and made openly available to ensure accountability and equal access to the technology. From a European perspective, it is also critical that models are designed from the outset to prioritize multilinguality and an equitable approach to all languages. The High Performance Language Technologies (HPLT) project is addressing these goals through the creation of open European data resources and language models and delighted to partner in this consortium with SiloGen and Silo AI, an industry leader with shared goals"

says Sampo Pyysalo, University of Turku Research fellow and HPLT principal investigator.

Beyond open base models, SiloGen is also expanding its LLM development platform, to cater to the need to build more accurate, trustworthy and robust downstream applications. This extends Silo AI’s and SiloGen’s existing platform for data-centric AI development. The platform includes tooling for synthetic data generation, human feedback, and quality testing. It also comes with a long track record for natural language processing (NLP), vision and perception, as exemplified by projects together with Allianz, Honda, Rolls-Royce, Sandvik, Tietoevry and many more. As part of this large-scale initiative, SiloGen is committing to a significant investment and development effort to build additional platform features dedicated to ensuring trustworthy LLMs. Part of this investment is made possible through a grant from Finland’s innovation funding agency Business Finland.

In the years to come, the initiative will continue to develop both data resources and models that reflect regional characteristics, as evidenced by SiloGen’s collaboration with the Finnish public service media company Yle. Commenting on the collaboration, Merja Ylä-Anttila, CEO of Yle, comments:

"For Yle it is of utmost importance that in the years to come we will have readily available access to language models that are based on our languages and that truly reflect our local culture. We are more than happy to be part of the exploration on how public service media companies around Europe can participate in the development of trustworthy AI technologies, including language models, that take the rich diversity of languages and cultures into consideration.”

The TurkuNLP research group’s extensive experience in NLP and LLMs aligns with Silo AI’s and SiloGen’s commitment to contribute to world-class research on generative AI. The alignment, combined with the resources of the consortium, provides a robust foundation for redefining the boundaries of what is possible in the world of open source language models. Together with the LLM development platform, this opens a unique path for companies to create value using independent, trusted and secure base models with a possibility to finetune, instruct and control LLMs for domain-specific needs.

About

Silo AI

Silo AI is Europe’s largest private AI lab on a mission to ensure Europe has a flagship AI company. We’re a trusted AI partner that brings competitive advantage to product R&D. We build AI-driven solutions and products to enable smart devices, autonomous vehicles, industry 4.0, and smart cities. Silo AI provides its customers unique access to world-class AI models and expertise, as well as the Silo OS infrastructure to speed up AI development and deployment. With SiloGen, Silo AI is currently building market leading open source LLMs, with the intent to ensure European digital sovereignty and democratize access to LLMs.
www.silo.ai

SiloGen

SiloGen is a large-scale initiative with the aim of building generative AI technology for Europe’s digital sovereignty. As Silo AI’s generative AI arm, SiloGen combines some of Europe’s leading generative AI and large language model (LLM) experts with access to data sources, powerful computational resources and infrastructure to train, run and operate LLMs. SiloGen has been operational since late 2022 and is currently working with clients like Allianz, Happeo, Sandvik and Tietoevry. As a trusted provider SiloGen offers base and specialized models as well as a development platform to ensure accurate, trustworthy and robust downstream applications.

University of Turku

The University of Turku is an inspiring and international academic community of 25,000 students and staff in Southwest Finland. We build a sustainable future with multidisciplinary research, education, and collaboration. The University of Turku was ranked among the 301–400 best universities in the 2023 Academic Ranking of World Universities, or the so-called Shanghai Ranking. Among Finnish universities, the University of Turku ranked in a shared position of 2nd-3rd.
www.utu.fi

Want to discuss how Silo AI could help your organization?

Get in touch with our AI experts.
Peter Sarlin, PhD
CEO & Co-Founder
peter.sarlin@silo.ai
+358 40 572 7670
Author
Authors

Share on Social
Subscribe to our newsletter

Join the 5000+ subscribers who read the Silo AI monthly newsletter to be among the first to hear about the latest insights, articles, podcast episodes, webinars, and more.

What to read next

Ready to level up your AI capabilities?

Succeeding in AI requires a commitment to long-term product development. Let’s start today.