SiloGen launches a consortium to build the world’s largest open LLM

Generative AI

Large language models

SiloGen

last updated

20.8.2023

Europe’s largest private AI lab Silo AI, with its large language model (LLM) arm SiloGen, is today announcing a large-scale initiative on open and trustworthy LLMs. SiloGen launches a consortium together with TurkuNLP, a research group at the University of Turku, to develop a family of open LLMs, including the world’s largest open source LLM. The initiative aims to ensure European digital sovereignty and democratize access to LLMs. Focusing on more accurate, trustworthy, and robust downstream applications of LLMs, SiloGen is also initiating the development of an LLM development suite.

To ensure digital sovereignty and democratize access to LLMs, SiloGen launches a consortium together with TurkuNLP, with a key focus to develop the world’s largest open source language model, covering all official European languages.

In addition to compute access totaling approximately 15 million GPU hours, the initiative is dedicated to ensure that data utilized in these models accurately represent European languages, also covering the English-speaking world. The initiative is conducted in close collaboration with key European institutions and agencies, and is committed to adhering to European regulations. Beyond Europe, the open source initiative will democratize access to LLMs and enable the development of use-case specific downstream applications.

The consortium, led by SiloGen and the University of Turku’s TurkuNLP research group, stands apart from many other initiatives in that it uniquely combines resources to build LLMs:

World-class LLM team, including professors and leading scholars like Filip Ginter, Jussi Karlgren, Sampo Pyysalo, Magnus Sahlgren, Aarne Talman among others, as well as others involved out of Silo AI’s more than 150 PhDs and 300 AI experts,
Data resources covering all European languages and code, including High-Performance Language Technology (HPLT) data, and other collected and curated data, and
Access to compute, including software infrastructure to train LLMs and access to the LUMI supercomputer and other hardware and cloud services for LLM training.

In addition to a world-class team, the consortium has access to, and experience with, the LUMI supercomputer, which as one of the European High-Performance Computing (EuroHPC) undertakings is the third largest supercomputer in the world and the largest in Europe. Having built LLMs on LUMI for more than a year, the team has developed a distinctive software layer for training LLMs effectively and efficiently on the AMD-based hardware. As part of the EU-funded HPLT project, the data for this initiative has been collected and curated since early 2022 to provide a representative basis for LLM development. Combining all of this with a total of 15 million GPU hours, Silo AI and TurkuNLP are uniquely positioned to train a family of language models, including the world's largest open LLM.

“We are honored to contribute to the development of open LLMs. The development of base models aligned with European values is imperative for our digital sovereignty. This initiative helps to ensure that underlying models are based on data and information representing the citizens and organizations of the region, and overall compliance with regulation, data privacy and other vital concerns. And eventually we need sovereignty on how downstream applications and value creation happen. This requires trusted and secure approaches to independent base models that enable fine-tuning for domain-specific needs. This way we can ensure digital sovereignty, while advancing technological development,”

says Peter Sarlin, CEO and co-founder of Silo AI.

“LLMs are rapidly reshaping how we access information and interact with technology. As their impact grows, it is increasingly important to assure that the models are developed in a transparent and reproducible manner and made openly available to ensure accountability and equal access to the technology. From a European perspective, it is also critical that models are designed from the outset to prioritize multilinguality and an equitable approach to all languages. The High Performance Language Technologies (HPLT) project is addressing these goals through the creation of open European data resources and language models and delighted to partner in this consortium with SiloGen and Silo AI, an industry leader with shared goals"

says Sampo Pyysalo, University of Turku Research fellow and HPLT principal investigator.

Beyond open base models, SiloGen is also expanding its LLM development platform, to cater to the need to build more accurate, trustworthy and robust downstream applications. This extends Silo AI’s and SiloGen’s existing platform for data-centric AI development. The platform includes tooling for synthetic data generation, human feedback, and quality testing. It also comes with a long track record for natural language processing (NLP), vision and perception, as exemplified by projects together with Allianz, Honda, Rolls-Royce, Sandvik, Tietoevry and many more. As part of this large-scale initiative, SiloGen is committing to a significant investment and development effort to build additional platform features dedicated to ensuring trustworthy LLMs. Part of this investment is made possible through a grant from Finland’s innovation funding agency Business Finland.

In the years to come, the initiative will continue to develop both data resources and models that reflect regional characteristics, as evidenced by SiloGen’s collaboration with the Finnish public service media company Yle. Commenting on the collaboration, Merja Ylä-Anttila, CEO of Yle, comments:

"For Yle it is of utmost importance that in the years to come we will have readily available access to language models that are based on our languages and that truly reflect our local culture. We are more than happy to be part of the exploration on how public service media companies around Europe can participate in the development of trustworthy AI technologies, including language models, that take the rich diversity of languages and cultures into consideration.”

The TurkuNLP research group’s extensive experience in NLP and LLMs aligns with Silo AI’s and SiloGen’s commitment to contribute to world-class research on generative AI. The alignment, combined with the resources of the consortium, provides a robust foundation for redefining the boundaries of what is possible in the world of open source language models. Together with the LLM development platform, this opens a unique path for companies to create value using independent, trusted and secure base models with a possibility to finetune, instruct and control LLMs for domain-specific needs.

About

Silo AI

Silo AI is a leading AI lab on a joint mission with AMD to shape the future of AI computing. We’re a trusted AI partner that brings competitive advantage to leadership AI solutions. We build AI to enable smart devices, autonomous vehicles, industry 4.0, and smart cities. Silo AI trains state-of-the-art open source AI models, and offers customers unique access to world-class AI capabilities and the SiloGen platform. With advanced compute, a full-stack AI platform and world-leading AI scientists, our approach empowers organizations to develop AI that they own and control.

www.silo.ai

SiloGen

SiloGen is a large-scale initiative with the aim of building generative AI technology for Europe’s digital sovereignty. As Silo AI’s generative AI arm, SiloGen combines some of Europe’s leading generative AI and large language model (LLM) experts with access to data sources, powerful computational resources and infrastructure to train, run and operate LLMs. SiloGen has been operational since late 2022 and is currently working with clients like Allianz, Happeo, Sandvik and Tietoevry. As a trusted provider SiloGen offers base and specialized models as well as a development platform to ensure accurate, trustworthy and robust downstream applications.

University of Turku

The University of Turku is an inspiring and international academic community of 25,000 students and staff in Southwest Finland. We build a sustainable future with multidisciplinary research, education, and collaboration. The University of Turku was ranked among the 301–400 best universities in the 2023 Academic Ranking of World Universities, or the so-called Shanghai Ranking. Among Finnish universities, the University of Turku ranked in a shared position of 2nd-3rd.

www.utu.fi

Want to discuss how Silo AI could help your organization?

Get in touch with our AI experts.

Peter Sarlin, PhD

Co-founder

peter.sarlin@silo.ai

Author

Authors

Share on Social

Subscribe to our newsletter

Join the 5000+ subscribers who read the Silo AI monthly newsletter to be among the first to hear about the latest insights, articles, podcast episodes, webinars, and more.

Country of residence

By submitting this form you agree to the processing of your personal data by Silo AI as described in the Privacy Policy.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Ready to level up your AI capabilities?

Succeeding in AI requires a commitment to long-term product development. Let’s start today.

Talk to an expert Join our team

SiloGen launches a consortium to build the world’s largest open LLM

About

Silo AI

SiloGen

University of Turku

Want to discuss how Silo AI could help your organization?

What to read next

AMD Silo AI and appliedAI expand their partnership with a program to accelerate AI adoption in life sciences, robotics and automotive

Europe's leading AI companies and research institutions combine their forces to develop next-generation open LLMs

AMD Silo AI and Combient unlock enterprise AI value on leadership compute platforms

Ready to level up your AI capabilities?