Unveiling Hermes 3: The First Full-Parameter Fine-Tuned Llama 3.1 405B Model is on Lambda’s Cloud

August 15, 2024 • 3 min read

Try Hermes 3 for free with the New Lambda Chat Completions API and Lambda Chat.

Introducing Hermes 3: A new era for Llama fine-tuning

We are thrilled to announce our partner Nous Research’s launch of Hermes 3 —the first full-parameter fine-tune of Meta's groundbreaking Llama 3.1 405B model, trained on Lambda’s 1-Click Cluster. Designed for the open-source community, Hermes 3 is a neutrally-aligned generalist model with exceptional reasoning capabilities, now available for free through the new Lambda Chat Completions API and Lambda Chat interface.

Powered by an 8-node Lambda 1-Click Cluster, Nous Research achieved outstanding results in just a few short weeks. Hermes 3 meets or exceeds Llama 3.1 Instruct on Open Source LLM benchmarks (see table below).

"Lambda’s 1-Click Clusters make the experience of renting and using a multi-node cluster as simple and easy as renting and using a single node,"

-Jeffrey Quesnelle, co-founder of Nous Research

Hermes 3: A uniquely unlocked, uncensored, and steerable model

Hermes 3 is the latest advancement in Nous Research's series of models, which have been downloaded over 33 million times. This instruct-tuned model is specifically designed to be flexible and adept at following instructions. It excels in complex role-playing and creative writing, offering users more immersive character portrayals, deeper simulations, and unexpected fictional experiences.

Hermes 3 benchmarks

In addition to its creative capabilities, Hermes 3 is an invaluable tool for professionals requiring advanced reasoning and decision-making abilities. Its strategic planning and operational decision-making features include function-calling, step-labeled reasoning, and more.

Optimized for efficiency

Hermes 3 was meticulously trained using synthesized data and supervised fine-tuning on Meta’s Llama 3.1 405B base model. This was followed by reinforcement learning from human feedback (RLHF) and finally, quantization using Neural Magic’s FP8 method.

This optimization effectively reduces the model's VRAM and disk requirements by approximately 50%, allowing it to run on a single node.

“Since the start of my journey in AI I wanted to bring about the realization of an open source frontier level model that aligns to you, the user - not some corporation or higher authority before the user. Today, with Hermes 3 405B, we've achieved that goal, a model that is frontier level, but truly aligned to you.

Thanks to our hard work on data synthesis and post training research, we were able to make a dataset that is fully synthetic over almost a year in the making to train Hermes 3 - and will be releasing much more to come.”

-Teknium, cofounder of Nous Research

For those seeking dedicated access and flexibility, Hermes 3 can run on a single node (available on-demand on Lambda’s Cloud), or quickly scale to a multi-node 1-Click Cluster for further fine-tuning using Lambda's scalable cluster infrastructure.

Try Hermes 3 for free - for a limited time!

We’re excited to offer the AI/ML community free access to Hermes 3 through Lambda’s new Chat Completions API, fully compatible with the OpenAI API. It provides endpoints for creating completions, chat completions and listing models.

No complex setup is required—simply generate a Cloud API key from Lambda’s dashboard (sign-up) and start exploring with our documentation’s help.

For a more interactive experience, we’re also providing a simple chat interface: try your prompts in Lambda Chat!