Viking 7B: The first open LLM for the Nordic languages

Generative AI

Large language models

SiloGen

last updated

15.5.2024

Together with University of Turku’s research group TurkuNLP and HPLT, Europe’s largest private AI lab Silo AI is releasing the first multilingual large language model (LLM) for all Nordic languages. Viking 7B is a best-in-class open source model that is sensitive to local values and cultures and further evidence of the team’s novel approach to training capable LLMs for low-resource languages. It is a significant milestone on the journey towards a state-of-the-art LLM family for all European languages.

Following the completion of the language model Poro, and the first checkpoint release of Viking, Silo AI and TurkuNLP of University of Turku are now releasing the full 7B parameter version of Viking. At the same time, we are also releasing further checkpoints for the Viking 13B and Viking 33B models. In addition to the Nordic languages, Viking also covers English and programming languages. Evaluations indicate best-in-class performance in all Nordic languages, without compromising performance in English.

Viking relies on the same training approach as Poro, focusing on low-resource languages without compromising English, but extends to include Danish, Finnish, Norwegian, Icelandic, Swedish and programming languages. And the model family comes with an updated architecture and in a variety of model sizes.

Building on Silo AI's strategy to promote democratic access to LLMs and promote linguistic diversity across Europe, the collaboration with TurkuNLP utilizes the latest advancements in multilingual LLMs. Unlike most other LLMs, we're prioritizing low-resource language performance, rather than relegating them to an afterthought. The team has worked on determining optimal training approaches and architectures to support this. This covers optimal model architecture for pre-training, as well as other approaches to training and data sampling like data reuse frequencies for low-resource languages during training and incorporating translated paired texts between high- and low-resource languages. Several of these strategies rely on a cross-lingual signal to enhance the model's understanding of the connections between languages, proving crucial in achieving superior performance for low-resource languages, without compromising performance in English.

Silo AI and TurkuNLP are dedicated to developing models that not only excel in linguistic performance and inclusivity but are also attuned to local values and cultures. Such sensitivity ensures that these technological advancements serve as connectors, rather than dividers, in digital communication. It enhances Europe’s digital infrastructure, thereby accelerating the adoption of LLM-driven products and applications. This, in turn, fosters innovation across sectors and use cases throughout Europe, bolstering the continent's technological ecosystem.

Further emphasizing digital sovereignty, Viking is trained on the EuroHPC supercomputer LUMI, utilizing up to 4096 AMD MI-250X GPUs. LUMI is not only Europe’s most powerful supercomputer and the 5th most powerful in the world, but also the 3rd greenest supercomputer among the top 500 supercomputers. LUMI’s energy consumption is covered with power produced 100% with hydroelectricity, and the waste heat of LUMI will account for about 20 percent of the district heating in the surrounding city of Kajaani.

With a purpose-built software layer to train models on AMD, Silo AI and TurkuNLP possess unmatched experience with training on AMD at scale, having shown that their theoretical predictions for throughput scaling materialize in weak and strong scaling experiments. As one of the seminal initiatives on AMD GPUs, this shows how it’s possible to achieve good throughput on the AMD-based LUMI, training the models with their open source training framework and utilizing up to 4096 MI-250X GPUs simultaneously.

Viking 7B completed and checkpoint performance

Today, the Viking models stand at 100% of training on Viking 7B, 85% on 13B and 65% on 33B. With common benchmarks, we can observe evidence of outperformance with respect to other open models (e.g. Falcon, GPT-SW3, Llama, Mistral, MPT, etc). Results indicate best-in-class performance in low-resource languages vis-à-vis other open models, without compromising performance in English and programming languages. In our latest evaluations, Viking is benchmarked on a large number of relevant measures, including translated tests, MMLU, Arc-C, HellaSwag etc. While translated tests are commonly used (e.g. to prove multilinguality of Mistral Large) and provide indicative evidence, they don't fully capture the multilingual reasoning capabilities of language models. Another measure, perplexity, further corroborates Viking’s performance. Overall, Viking not only showcases its adeptness at understanding and generating Nordic languages but also highlights its efficiency in processing and predicting linguistic sequences. This dual advantage indicates the viability of the approach to train multilingual models, and Viking's technological edge in navigating the complexities of multilinguality.

Viking 7B/13B/33B: A modern architecture with more languages

Below is a summary of key features of the Viking model family covering English, Finnish, Swedish, Norwegian, Danish, Icelandic and code. For transparency with respect to model architecture, data and other technical information, please refer to the official model card (Viking 7B, Viking 13B, Viking 33B).

Research Checkpoints: Silo AI and TurkuNLP are committed to publishing checkpoints throughout the training process, providing transparency on the model training process.
Model architecture: Viking uses an architecture similar to Llama 2, with flash attention, rotary embeddings, grouped query attention and supports a 4k sequence length
Model sizes: 7B, 13B and 33B parameters
Multilingual capabilities: The models are designed to process English and Nordic languages, and have proficiency with a variety of programming languages. Additionally, they can perform basic translation between English and Nordic languages.
Dataset: The model family is trained with a dataset of 2 trillion tokens, including Danish, English, Finnish, Icelandic, Norwegian, Swedish and a variety of programming languages.
Open source: The model family is freely available under the Apache 2.0 License, implying applicability for both commercial and research use.
Training hardware: Our models are trained using the LUMI supercomputer in Finland, covering up to 4096 AMD MI250X GPUs.

Considerations for Use

The intended audience for Poro Research Checkpoints is academic and industry research. These checkpoints are not suitable for deployment in a production use case without further training, fine-tuning and testing. For more on Silo AI's SaaS-based custom LLMs we invite you to familiarize yourself with the SiloGen platform.

Acknowledgments

We wish to thank the operators of the LUMI/EuroHPC supercomputer for computational resources and technical support, including AMD, HPE and CSC – the IT Center for Science, Finland. TurkuNLP researchers have received funding from the European Union’s Horizon Europe research and innovation programme High Performance Language Technologies (HPLT) under grant agreement No 101070350.

‍

About

Silo AI

Silo AI is a leading AI lab on a joint mission with AMD to shape the future of AI computing. We’re a trusted AI partner that brings competitive advantage to leadership AI solutions. We build AI to enable smart devices, autonomous vehicles, industry 4.0, and smart cities. Silo AI trains state-of-the-art open source AI models, and offers customers unique access to world-class AI capabilities and the SiloGen platform. With advanced compute, a full-stack AI platform and world-leading AI scientists, our approach empowers organizations to develop AI that they own and control.

www.silo.ai

TurkuNLP

The TurkuNLP Group is a group of researchers at the University of Turku, with a research focus on various aspects of natural language processing, language technology and digital linguistics. TurkuNLP has contributed to a large number of open source NLP resources, such as FinBERT, WikiBERT, FinGPT, Turku Dependency Treebank, Universal Dependencies, Turku Neural Parsing Pipeline, Large internet corpora, Turku Paraphrase Corpus, Turku Sentiment Corpus, Wikidata normalization, TurkuONE etc. The University of Turku is an international academic community of 25,000 students and staff and was ranked among the 301–400 best universities in the 2023 Shanghai Ranking.

Want to discuss how Silo AI could help your organization?

Get in touch with our AI experts.

Peter Sarlin, PhD

Co-founder

peter.sarlin@silo.ai

Author

Authors

Share on Social

Subscribe to our newsletter

Join the 5000+ subscribers who read the Silo AI monthly newsletter to be among the first to hear about the latest insights, articles, podcast episodes, webinars, and more.

Country of residence

By submitting this form you agree to the processing of your personal data by Silo AI as described in the Privacy Policy.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Ready to level up your AI capabilities?

Succeeding in AI requires a commitment to long-term product development. Let’s start today.

Talk to an expert Join our team