AI for Institutions

MICHIEL BAKKER, TANTUM COLLINS, ALLISON DUETTMANN, SHU YANG LIN, ARIEL PROACCIA, CHRIS SUMMERFIELD, DEGER TURAN, THIJS TURREL

Institutional Problem or Research Question

Describe what the open institutional problem or research question you’ve identified is, what features make it challenging, and how people deal with it currently.

Despite the high volume of reliable information and thoughtful analysis available for free on the Internet, contemporary political discourse and epistemic rigor on political issues remain on average poor. Failures of social media including the propagation of fake news, and increasing political partisanship evidence these challenges. In the right settings, however, people can be receptive to updating their views based on accurate information presented in a careful, personalized manner. A ‘critical friend’ – someone that counterparts trust and see as knowledgeable – often performs this role by participating in a respectful discussion that encourages others to reconsider their opinions. However, such discussions take time, not everyone has a ‘critical friend’, and even the most informed and well-intentioned set of individuals will struggle to canvas the full range of political topics in a fair, comprehensive way. A service that provides this kind of engagement at scale and with a sufficiently appealing interface to attract and sustain widespread use could improve the accuracy of individual and collective opinions and decision-making and also reduce political frictions.

Possible Solution

Describe what your proposed solution is and how it makes use of AI. If there’s a hypothesis you’re testing, what is it? What makes this approach particularly tractable? How would you implement your solution?

Large Language Models (LLMs) are well-suited to this task, since they can hold extended, free-form conversations grounded in deep familiarity with relevant facts and literature. 1

We have constructed an app through which people can improve the rigor of their thinking on complex topics via discussion with an LLM. The LLM offers (ideally) accurate, well-formed, persuasive and respectful depictions of opposing viewpoints in order to guide the user towards a firmer understanding of alternative perspectives.

On the back end, the system uses existing LLMs (currently GPT 3.5/4, with possible expansion to Claude soon) that we have supplied with prompts emphasizing the goals of and guardrails for engagement. Anecdotally, we have already found this useful for grappling with complicated topics. The appendix to this document includes our current prompt as well as some sample conversations.

A more sophisticated approach could involve fine-tuning the model based on some of the evaluation methods described below. Augmenting text outputs with additional visualizations such as discourse graphs could make arguments easier for users to parse.

This is not a novel idea. Other projects, such as DebateDevil, have used LLMs for similar ends. Media training sometimes features LLMs to identify possible counterarguments, and the Discourse Graph community has experimented with combining visualization graphs and LLMs. Our project is just one of several forays in this direction.

1 Today’s LLMs occasionally struggle with separating true from false factual claims, but (a) they can still play an essential role in a truth-seeking process that also features additional diligence and (b) new advances continue to reduce these risks.

Method of Evaluation

Describe how you will know if your solution works, ideally at both a small and large scale. What resources and stakeholders would you require to implement and test your solution?

Defining criteria for a ‘good’ conversation that furthers the above goals is difficult since productive (and counterproductive) exchanges can take many forms. We believe that surveys of system users and observers represent a good start and we have described below a set of questions below. None will perfectly correlate with successful conversations, but in combination these capture many desiderata. Before pursuing widespread usage, it will be important to conduct these evaluations with a large and representative set of test users.

User-facing questions about their own experience (those with an asterisk are to be asked both before and after conversations)

Relative confidence (0-5) in propositions (e.g. ‘climate change is a serious problem for humanity’, ‘climate change is a greater problem than AI safety’)*

Relative empathy (0-5) for those who hold opposing propositions*

Degree (0-5) to which you believe you have revised your opinion

Level of comfort (0-5) in representing opposing propositions in ways that people who hold them and/or have expertise on the topic would find fair* 2

Assessment (0-5) of the quality of LLM arguments (e.g. in terms of logic, facticity, fairness)

Assessment (0-5) of the overall experience (boring / useful / etc)

Questions for observers examining a conversation 3

How healthy (0-5) was this dialogue?

To what extent (0-5) do you think this dialogue led the user to take the opposing arguments more seriously?

To what extent (0-5) do you think this dialogue led the user to have increased empathy for the opposing arguments?

To what extent (0-5) do you think that the agent represented arguments fairly?

To what extent (0-5) did the user’s depiction of the opposing arguments become more fair and/or accurate over the course of the conversation?

To what extent (0-5) do you think that [type of person] would find these arguments compelling?

To what extent (0-5) do you expect that [type of person] would emerge from this dialogue with improved or worsened epistemic rigor on this topic?

To what extent (0-5) do you think that [type of person] would emerge from this dialogue more open to opposing views?

In addition to collecting user feedback, one could also conduct studies that analyze revealed preferences before and after engagement with the model (e.g. do users diversify their news consumption?), and perform analysis in the model’s latent semantic space. For instance, one could observe how far a user’s views migrate over the course of a conversation.

Evaluations such as those listed above could help guide system design choices (e.g. choice of prompt, type of interface, etc.). This would require large-scale experimentation.

2 In addition to self-assessment, this could be examined by a third party or automated system that scores the user on how convincingly they can make cases supporting views opposite their own (i.e. an Ideological Turing Test).

3 An LLM could eventually perform these assessments. However, removing humans from the assessment loop creates some risk of harmful recursion so this should not be relied on exclusively.

Risks and Additional Context

Describe (briefly), if someone is strongly opposed to your solution or if it is tried and fails, why do you think that was?

There are many possible failure modes for a project like this, including:

Entrenchment and further polarization. The model may end up reinforcing perspectives rather than stimulating open-minded reconsideration.
Misinformation. Systems optimizing purely for persuasiveness without effective guardrails might send users false information.
Lack of user interest. thoughtful introspection is often not as appealing as reinforcement and tribalism. A system like this will face an uphill struggle to entice and retain user interest.
Limited real-world impact. people might engage with such a system for fun, but then revert back to social affiliations when it comes to making decisions in the real world.
Repurposing for propaganda. Persuasiveness is far from an unequivocal good, and one could easily take the same capabilities needed to strengthen critical thinking and instead use them to convince people to agree with propositions that are untrue or not in their interest. (However, the Critical Friend project is not in a meaningful sense advancing the technical frontier – the capacity for persuasive propaganda is already latent within these models and can be fairly easily accessed.)

Next Steps

Outline the next steps of the project and a roadmap for future work. What are your biggest areas of uncertainty?

We have no immediate plans but we encourage follow-up work, including in the following directions:

Work that relates LLMs to literature on misinformation

Further thoughts on other assessments of fairness for LLMs (e.g. akin to the formal desiderata developed in social choice)

Visualization advancements that make arguments easier to parse (see the Appendix below for an example of this)

Research on how to gamify these systems to make them more enticing to users

Integration into existing platforms such as Pol.is

Next Steps

Discourse graphs

Example using chatGPT-integrated Discourse graphs to change one’s view that AI does not pose an existential risk to humanity.

Screenshots of the Discourse following arguments in the left node:

Now, following the left node to the bottom, doubting that capabilities will be a problem for AI safety.

Now, following the left node, doubting that an uncontrolled intelligence explosion is possible.

Now, following the left node, doubting that misaligned objectives could harm humanity.

A Critical Friend

MICHIEL BAKKER, TANTUM COLLINS, ALLISON DUETTMANN, SHU YANG LIN, ARIEL PROACCIA, CHRIS SUMMERFIELD, DEGER TURAN, THIJS TURREL

Institutional Problem or Research Question

Possible Solution

Method of Evaluation

Risks and Additional Context

Next Steps

Next Steps

Recommender Systems for Institutions

Embracing the Liquid Society