Data Mixing Laws: Optimizing Data Mixture by Predicting Language Modeling Performance

Jiasheng Ye

How the mixtures proportions of training data affect the language modeling performance is be quantitatively predictable. This prediction guides tuning data mixture, so as to optimize pretrained model performance, or avoid catastrophic forgetting in continual pretraining.

Sparse Dictionary Learning on Language Models: Infrastructure, Observations and Agenda

Xuyang Ge, Fukang Zhu, Junxuan Wang, Wentao Shu, Zhengfu He

We build a framework to systematize our research on Sparse Auto Encoders (SAEs). We believe this framework will be beneficial for the mech interp community, especially for Chinese community to easily get into this field. We also illustrate some impressive features and phenomenologies GPT-2 Small exhibits.

AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling

Jun Zhan

Based on the original GPT structure and multimodal discrete representation, AnyGPT unifies four modalities: text, voice, image, and music, and realizes the any-to-any multimodal generation.

Evolutionary Agent in Evolving Social Norms

Shimin Li

In a milieu of evolving social norms, well-aligned agent groups thrive and propagate, while those inadequately aligned dwindle and are supplanted.

Can AI Assistants Know What They Don't Know?

Qinyuan Cheng

Can we enhance the truthfulness of AI assistants based on large language models in practical applications by aligning them in a way that allows them to recognize what they don't know and express this ignorance through language?

Dictionary Learning Improves Patch-Free Circuit Discovery in Mechanistic Interpretability

Zhengfu He

With dictionary learning, we dig in a small transformer trained on a synthetic task and find a number of human-understandable information flow inside of it.