OpenMOSS is an open research platform dedicated to making Large Language Models Helpful, Safe, and Interpretable.


We are dedicated to open-sourcing our work to the community.


AI Assistant MOSS

From Alignment to Super Alignment

Alignment is an important phase in the training of LLMs. The OpenMOSS team researches alignment methods to construct helpful, harmless, and truthful AI assistants. Through weak-to-strong alignment, Self-Evolution, and other approaches, OpenMOSS aims to move towards the direction of super alignment.

Mechanistic Interpretability

We believe there is silver lining that neural networks, especially Transformer models, can be mechanistically understood to an unprecedented extent. Human's understanding of neural networks is a spectrum. We may not reach the end but we can always move forward.

Interpretable circuits connect interpretable features in the hidden states of neural networks and reveal the intriguing inner world of such alien intelligence.

Real Agent, Multimodal Agent

We are developing AI agents equipped with multimodal perception and generative capabilities. These agents can perceive through diverse channels such as vision, language, sound, touch, and more, enabling them to understand and interact within real environment.