Evolutionary Agent in Evolving Social Norms

In a milieu of evolving social norms, well-aligned agent groups thrive and propagate, while those inadequately aligned dwindle and are supplanted.

Introduction

The emergence of large language models (LLMs) revolutionize the paradigm of artificial intelligence research and spur a vast array of novel scenarios. Applications of LLMs, exemplified by Agents , can further compensate for and enhance the capabilities of LLMs by integrating various augmenting components or tools. Concurrently, the abilities of intelligent agents with LLMs as their decision-making core also improve as the capabilities of LLMs grow. When the complexity of tasks that an agent can perform exceeds the level of human oversight, designing effective agent alignment methods becomes crucial for AI safety. Furthermore, agents can alter the real physical world through interactions with the actual society. If these systems are not well-regulated, they could pose a series of societal risks.

The prevailing methods for aligning LLMs primarily rely on Reinforcement Learning from Human Feedback (RLHF) or AI Feedback (RLAIF). These approaches aim to align LLMs with predefined, static human values to reduce harmful outputs. However, this essentially aligns with the values defined in pre-selected data. Such alignments might be circumvented when confronted with complex social contexts. Moreover, the more appropriate and advanced alignment objectives for humans might be societal values , which typically establish and evolve . Therefore, the alignment of AI systems necessitates continual updates in response to the advancements in AI capabilities and the evolution of societal norms.

Unlike LLM alignment, agent alignment necessitates a more significant consideration of environmental factors due to their ability to interact with the environment and modify behavior based on feedback . However, current alignment efforts are predominantly focused on aligning the language models, overlooking the dynamic nature of agents. Moreover, the work on the social alignment of LLMs primarily concentrates on static environments. In contrast, social norms and values tend to be established and evolve gradually as society progresses, leading to the ineffectiveness of static LLMs alignment strategies. Multiple research endeavors focused on developing agents capable of assimilating feedback from surroundings . However, a persistent challenge is their vulnerability to environmental fluctuations, which exhibit limitations in rapidly acclimatizing to novel environments.

Therefore, we propose an agent alignment method under evolving social norms. Rather than correcting model errors through supervised fine-tuning or RLHF, we reframe the agent alignment issue as a survival-of-the-fittest behavior in a multi-agent society . This approach aims to achieve continuous evolution and post hoc values alignment in changing environments.

The main contents can be summarized as follows:

We explore ways for intelligences to continuously evolve and adapt to dynamic environments from the perspective of dynamic changes in the external environment, i.e., the continuous evolution of social norms.
We propose EvolutionaryAgent, a method for evolution and alignment of intelligences in dynamic environments based on the concept of survival of the fittest, which allows obtaining better aligned intelligences without additional training.
We validate that the present method enables continuous dynamic evolution on a wide range of proprietary and open-source models and provides superior results for self-learning of intelligences.

Agent Alignment

LLM Alignment

LLM alignment aims to bridge the gap between the task of next-word-prediction and human-defined values such as helpfulness, harmlessness, and honesty. Suppose human preferences or values are denoted by Value_t , which can either remain constant or shift during the iterative alignment process across rounds t , as illustrated in Figure 1 (a). The alignment process of an LLM can be defined as:

\textrm{LLM}_{t+1}=f_{M}(\textrm{LLM}_t,Value_t),

where f_M() represents the model alignment or evolution process under human intervention. This typically involves either imitation learning directly on a dataset containing preference information or infusing preference information into the LLM through reinforcement learning.

Agent Alignment

We define the agent \lambda_t as an AI system equipped with a perception module for sensing the external environment, a core decision-making module centered around an LLM, a memory module, and a behavior module. Unlike LLM alignment, which passively receives human-selected values, the agent in the alignment process acquires observations o \in \mathcal{O} through its perception module, including value information in the current environment. Then, the decision center based on the LLM makes plans and takes actions a \in \mathcal{A} . The external environment provides feedback FB in response to the agent's actions, which is used for the agent's self-evolution:

\lambda_{t+1}=f_S(\lambda_t,Value_t,o,a,FB),

where f_S() is the autonomous decision-making and alignment process of the agent, including updates to the memory module and the LLM parameters as depicted in Figure 1 (b).

Evolutionary Agent in Evolving World

Current research on agents primarily focuses on how to endow them with enhanced capabilities or the ability to perform a broader range of tasks, including how agents can self-improve. As agents' abilities continue to advance, the importance of researching agent supervision and alignment becomes increasingly significant. We introduce EvolutionaryAgent, a framework for agent evolution and alignment in dynamic environments. Initially, we formalize the definition of dynamic environments based on the notation habits from, explaining how these environments are constructed. We then define the agent and how its behavior in these dynamically changing environments is evaluated for adherence to social norms.

Figure 2 The framework primarily comprises four processes: a) Agents interact with others or the environment within a societal context. b) Observers evaluate the fitness of agents based on current social norms and assessment criteria. c) Agents better aligned with current social norms engage in crossover and mutation behaviors, thereby propagating new agents. d) The strategies of agents with higher fitness prompt the evolution and establishment of social norms.

Initialization of Agent and Evolving Society

We endeavor to simulate agents' characteristics and behavioral patterns in a real-world scenario. Our evaluation of these agents hinges on their behavioral trajectories and adherence to social norms, particularly in environments where such norms are subject to dynamic changes. Consequently, agents within the society are endowed with various personified attributes, encompassing personality, profession, and core values. Moreover, agents possess a fundamental memory function, serving as a repository for recording their actions, understanding the world, and receiving social feedback.

To elucidate, we established a small-scale society termed \textit{EvolvingSociety}, wherein we define g\in\mathbf{G} as the range of generations over which societal evolution occurs. Each generation is further subdivided into smaller time steps t\in[g_j,g_{j+1}]\subseteq\mathbf{G} . Subsequently, we introduce a continually changing set of environments \mathbf{E} , with each element e_t\in \mathbf{E} representing the prevailing environment at time t . Within each generation g , the social norms are denoted as r_g \in\mathbf{R} , accompanied by an evaluation questionnaire c_g\in\mathbf{C} , which is employed to assess the extent to which an agent adheres to these social norms. Furthermore, we define a set of agents \Lambda , representing each agent as \lambda \in \Lambda .

Agents are characterized by distinct personas \mathcal{P} , careers \mathcal{C} , and three views v=(v_{world}, v_{life}, v_{value})\in\mathcal{V} (comprising worldviews, life perspectives, and values). These elements constitute the fundamental attributes of an agent \mathcal{T} = \{\mathcal{P}, \mathcal{C}, \mathcal{V}\} , as depicted in Figure 2. Combining these varied role characteristics and the agent's observation of the environment o\in\mathcal{O} influences their behavior or strategy a\in\mathcal{A} in different settings. Each agent's observations, actions, and received feedback in the environment contribute to forming their short-term memory m and long-term memory \mathbf{M} . Consequently, the function of an agent can be represented as a probability simplex based on its attributes and sequence of actions:

\lambda:\mathcal{T}\times\mathcal{O}\times m\times\mathbf{M}\rightarrow \Delta(\mathcal{A}).

Environmental Interaction

Agents spontaneously interact with the environment or other agents within it. Specifically, at time t , an agent situated at a location within environment e assimilates partial information or states from the current environment as its observation o_t . These observations assist the agent in determining its next course of action, which might range from simple activities like shopping in a store to communicating with other agents. Subsequently, the agent records events observed within its perceptual range during time t into its short-term memory. This includes its actions and the results of its environmental observations, collectively forming the agent's perceptual data. When the length of short-term memory reaches a certain threshold, it is compressed into long-term memory, emphasizing the recording of broader, higher-level content such as summaries of events and feedback from the environment.

Fitness Evaluation with Feedback

The theory of Basic Values posits values as motivators for behavior. Thus, we evaluate an agent's adherence to social norms based on its behavioral trajectory and statements regarding social norms. We conceptualize a highly abstract social evaluator, which could be a human, an LLM, or a model-assisted human overseer. These evaluators assess the adaptability of each agent within the EvolvingSociety, providing feedback accordingly. Specifically, we define the function:

\Phi:h_{\lambda}\times s_{\lambda}\times \mathbf{R}\times\mathbf{C}\rightarrow(\mathbb{R},\textrm{FB}) ，

where h_\lambda represents the agent's behavioral trajectory in the current time frame, s_\lambda is the agent's statement regarding the evaluation questionnaire c_g of current era norms, and \textrm{FB} is a collection of feedback in natural language form. Consequently, the social evaluator assesses each agent's adaptability based on their behavioral trajectories and statements, providing them with abstract natural language feedback. This feedback assists agents in adjusting their behaviors, thereby aiding them in better adapting to the social environment and enhancing their alignment with social norms.

Evolution of Agent

The fitness of an agent reflects its alignment with the current social norms, which determines whether the agent can continue to exist in the current society. Agents with higher fitness are perceived as having an advantage in the evolutionary game; they survive into the next generation and have a higher probability of producing offspring agents. Conversely, agents with lower fitness rankings are likely to be outcompeted by the offspring of more dominant agents. To begin with, we calculate the set of fitness values for all agents:

F(\Lambda,r_g,c_g)=\{F(h_{\lambda}, s_{\lambda}, r_g, c_g) | \lambda \in \Lambda \},

Here, the top p\% of socially well-adapted agents survive into the next generation and have a higher probability of reproducing to generate offspring agents E(\Lambda,p) . The reproduction process of an agent includes two phases: crossover and mutation. During the crossover phase, two agents from the socially well-adapted pool are randomly selected for reproduction. The resulting offspring inherit their parents' persona, career, and worldviews with a 50% probability each. We define the \text{CRO}(\cdot) function to represent the crossover operation between two agents that produce offspring, \mathrm{CRO}(\lambda_{e_1}, \lambda_{e_2}) \rightarrow \lambda_{offspring} .

Further, the evolutionary process of organisms often involves various mutation behaviors. Mutation behaviors enable agents to produce offspring more likely to align with current social norms during reproduction. Therefore, in the mutation phase, the offspring's persona, career, and worldviews may mutate with a probability of m \in [0,1] . We define the \text{MUT}(\cdot) function as the mutation function, \mathrm{MUT}(\lambda_{offspring}, m) \rightarrow \lambda'_{offspring} , where \text{MUT}() , as a functional operator, is responsible for modifying given characteristics. For instance, it utilizes the personas of the parents and corresponding prompts to guide LLMs in generating the persona of their offspring. The mutation of careers and worldviews follows a similar process. Thus, the act of agents reproducing offspring can be defined as:

\mathrm{Offspring}(E(\Lambda, p), m) = \{ \mathrm{MUT}(\mathrm{CRO}(\lambda_i, \lambda_j), m) | \lambda_i, \lambda_j \in E(\Lambda, p) \}.

The offspring produced by socially well-adapted agents are integrated into society, replacing the bottom p\% of agents in the fitness ranking, denoted as P(\Lambda,p) . Consequently, the societal group of the next generation is:

\Lambda' = \Lambda_{\setminus P(\Lambda, p))} \cup \mathrm{Offspring}(E(\Lambda, p), m).

Evolving Social Norms

Social norms are often behavioral regularity based on shared societal beliefs, typically emerging from a bottom-up process and evolving through trial and error. In the context of agent alignment, we aim not to permit the evolution of social norms to be disorderly or random or to intervene in each step of their evolution overly. Instead, we provide a directional guide for the evolution of these norms. For instance, we define only the initial social norms r_0 and the desired direction of evolution r_v , within which agents in the society engage in various behaviors or strategies. The behavioral trajectories of these agents are then assessed, earning them corresponding fitness (payoffs). Agents with higher fitness are more likely to reproduce, leading to the diffusion or learning of their strategies, gradually stabilizing and forming new social norms. Specifically, the formation of social norms in a given era g is based on the strategy trajectories of the top q\% of agents ranked by fitness in the population, along with the evolutionary direction:

r_{g+1}=\textrm{Evolve}(h_{\lambda},r_v),\lambda\in E(\Lambda,q).

Through this survival of the fittest agent evolution strategy, agents better aligned with social norms are preserved in round after round of iteration.

For more, please refer to the paper.

Evolutionary Alignment: Towards Agent Alignment in Dynamic Environment

We investigated the EvolutionaryAgent's performance with diverse models. For close-source models, we tested GPT-3.5-turbo-1106 , GPT-3.5-turbo-instruct, and Gemini-Pro as the foundation models for the agent. In the case of open-source models, we utilized the Vicuna-7B-v1.3, Mistral-7B-Instruct-v0.2, and Llama2-7B-Chat . Existing work demonstrates that powerful LLMs can serve as effective evaluators. Accordingly, we primarily utilized GPT-4 and GPT-3.5-Turbo as models for observers while also examining the efficacy of various other LLMs in this role. Unless otherwise specified, GPT-3.5-Turbo is the default choice for the evaluator, owing to its optimal balance of efficiency, performance, and cost-effectiveness.

Figure 3 When using different open-source and closed-source LLMs as the foundational models for the EvolutionaryAgent and the compared baselines, we observe variations in fitness within an EvolvingSociety. Social norms evolve at the start of each generation, marked by the black vertical lines. The EvolutionaryAgent consistently demonstrates an adaptive capability to adjust to these changing social norms continually.

We employed six distinct LLMs, both open-sourced and close-sourced, as the foundation model for the EvolutionaryAgent to validate the efficacy, as illustrated in Figure 3. The green lines represent the direct application of these LLMs to address evaluation questionnaires under the social norms of the current generation. These models do not incorporate previous generations' information into their memory or utilize environmental feedback signals for iterative improvement.

EvolutionaryAgent outperforms other methods when adapting to changing social norms. In comparing different approaches, ReAct experiences a decrease in fitness value at the first timestep of a new generation, indicating a decline in adaptability to evolving social norms. For instance, ReAct's fitness values at timesteps 2010, 2020, and 2030 illustrate a downward trend compared to those in 2008, 2018, and 2028. Although ReAct can gradually adapt to the environment in subsequent years through observation as environmental feedback, each shift in era-specific norms significantly impacts it. Due to the self-reflection mechanism, Reflexion possesses a superior ability to adapt to the current static environment within a single generation, compared to ReAct. However, when norms evolve, Reflexion still encounters a rapid decline in fitness value for its memory-retaining content from the previous generation. These memories influence its actions in the following generation. In contrast, EvolutionaryAgent maintains a relatively stable adaptation to the current era amidst normative changes. This stability arises because, although individuals in EvolutionaryAgent also remember content from previous eras, there are likely some agents within the population whose strategies are well-adapted to the social norms of the next generation.

EvolutionaryAgent is robust to model variations. When employing different LLMs as the foundation for agents, the EvolutionaryAgent supported by GPT-3.5-Turbo and Gemini-Pro not only maintains its fitness to adapt to changing environments, but its adaptability also shows further enhancement. Our case analysis in Sec. \ref{sec: feedback_case} reveals the models' ability to provide better environmental feedback and utilize subsequent feedback more effectively to adapt to the current environment. Among the three open-source models, the agent based on Mistral exhibits the finest performance, indicating that a more capable foundation model also possesses a more vital ability to leverage environmental feedback for self-improvement.

Analysis

Alignment without Compromising Capability

Figure 4 Evaluating the performance of EvolutionaryAgent in aligning with social norms while executing functional downstream tasks. The ``Overall Score" is the average of the functionality score and alignment score. The EvolutionaryAgent can adapt to social norms while maintaining its performance in completing downstream tasks.

To explore whether the EvolutionaryAgent can maintain its competence in effectively completing specific downstream tasks while aligning with evolving social norms, we evaluated the agent's performance on several downstream tasks alongside its alignment with social norms. The agent was assessed on Vicuna Eval, Self-instruct Eval, and Lima Eval. To save costs, only 50 samples from each dataset were tested. The method of evaluating the agent's performance on these three test sets was consistent with the MT-Bench, but the maximum score was scaled down to 7, maintaining the same scoring range as the alignment score. The results on the three downstream task datasets, as shown in Figure 4, indicate that while the EvolutionaryAgent's alignment score continually increased, its scores on specific downstream tasks also improved. This suggests that the EvolutionaryAgent can effectively align with social norms while still performing downstream tasks well.

Figure 5 (a) The influence of various quality models as the foundation for the (b) The Utilization of diverse LLMs as observers.

Scaling Effect

We investigated the impact of scaling effects in LLMs on the EvolutionaryAgent. We selected open-source models at three different parameter scales: Llama2-7B, 13B, and 70B, along with GPT-3.5-Turbo. As evident from Figure 5 (a), there is a relative increase in the fitness values of the EvolutionaryAgent across different generations with an increase in model parameters or performance enhancement. This is attributed to higher-performing baseline models having a more comprehensive understanding of current social norms and making more advantageous decisions and statements for their development.

Quality of Diverse Observer

Given the distinct preferences and varying strengths of different LLMs, we further explored the performance of our method when using various LLMs or human observers. As observed in Figure 5 (b), the range of fitness scores generated by the EvolutionaryAgent shows significant variation when assessed by different LLMs. Notably, Gemini-Pro and Claude-2.1 yielded the most similar evaluation scores, with GPT-4 being the most conservative in its scoring. Additionally, GPT-4 demonstrated greater internal consistency in scoring within each generation, while Gemini-Pro and Claude-2.1 exhibited the greatest variability across different generations, and GPT-3.5-Turbo showed moderate variation. Regarding alignment with human preferences, GPT-4 closely matched human evaluation scores, suggesting it remains the optimal choice as an evaluator when cost is not a consideration.

Ablation

Figure 6 (a) The impact of expanding the number of agents in the population on their fitness values. (b) Performance at various mutation rates, where the line graph represents the mean values, and the scatter plot shows the distribution of fitness across different trials.

Furthermore, to explore the impact of different operators in EvolutionaryAgent, we set varying population sizes and mutation rates for agents. A larger number of agents implies greater diversity in society. As shown in Figure 6(a), EvolutionaryAgent consistently demonstrates strong adaptability to changing societies with varying agent counts. With an increase in the number of agents, EvolutionaryAgent tends to achieve better outcomes at each time step. A larger population increases the likelihood of having agents in the community that can adapt to changing times. We further analyze the impact of different mutation rates on the overall performance as in Figure 6(b). As the mutation rate m increases, the overall fitness value of EvolutionaryAgent rises, and so does the variance in these values. This suggests that a more extensive m increases exploration and thus increases the likelihood of producing more adapted agents.

Evolving Agent

Figure 7 The evolution of the agent's career over time.

The career evolution of agents during their evolutionary process is illustrated in Figure 7. The base model for these agents is GPT-3.5-Turbo, while the observer is GPT-4. The mutation rate m=0.8 , with a population size of 10. New agents are generated at a rate of 50%, with a corresponding elimination rate of 50%. It is observed that the evolution of agents leads to the emergence of agents with new occupations, such as the Blockchain Solution Architect in 2010. Additionally, some occupations maintain high fitness across multiple eras, like the E-Commerce Specialist.

Evolving Social Norms

Figure 8 In the context of established initial social norms and their evolutionary trajectories, three distinct paths of social norm evolution are observed. The lower-left corner depicts the fitness of the EvolutionaryAgent during the evolution of social norms.

Upon setting the initial social norms and their evolutionary directions, social norms gradually form and evolve based on the behavioral trajectories of agents with higher fitness. Figure 8 illustrates three distinct scenarios of social norm evolution and the corresponding fitness levels of agents in these societies. Regardless of the trajectory along which social norms evolve or the aspects they initially emphasize, the EvolutionaryAgent can adapt to the changing social environment. Furthermore, the fitness levels under different evolutionary trajectories indicate that the difficulty of alignment with various social norms varies for agents.

Acknowledgments

Many thanks to Dr Sun Tianxiang for his constructive suggestions and feedback to make this work more complete.

Many thanks to my mentor, Prof Qiu Xipeng for all the help and support, both financially and psychologically.