Recommender Systems for Institutions

15 Oct

SAFFRON HUANG

Institutional Problem or Research Question

Describe what the open institutional problem or research question you’ve identified is, what features make it challenging, and how people deal with it currently.

How do institutions aggregate information and understand data? Information flows today tend to be uncoordinated, unending, incomplete, and unprincipled. The Internet has disaggregated content distribution, which makes it hard to stay on top of the many places important information might be broadcast, but also creates space for new curation possibilities. This generates plenty of research opportunities regarding aggregation of distributed Internet data for institutional sense-making (e.g. employing online reviews, forum comments, news articles or opinion pieces to capture people’s opinions more accurately). The primary tools for such curation today are social media platforms such as Twitter and search engines like Google, which do not necessarily optimise for nuance and fidelity. (Arguably, the engagement-driven optimization of large social media platforms have actually decreased people’s overall understanding of the world and each other.)

Institutions with obligations to constituents, such as governments, need to understand community-level, rather than just individual-level characteristics. Today, polls, roundtables and a handful of other processes help these organisations hear stakeholder voices and public opinion. In the US and many other countries, there are also open comment periods on proposed legislation. All of these have drawbacks: polls capture a shallow but broad sense of opinion; roundtables only let a few people in the room; and open comment periods are laborious, requiring the active solicitation of thousands of comments, usually followed by manual sorting. Can we supplement these methods by harnessing abundant online data for institutional sense-making? A more scoped version of the same question: what would an information recommendation algorithm or system for a public institution or representatives look like? Can we design, for example, a Twitter that usefully recommends content to politicians that truly reflects the views of their constituents?

Questions that this research space would elicit include:

How can one specify viable algorithmic objectives for institutional sense-making purposes? For example, what kind of objective function could capture the goal of showing the most complete understanding of a community’s views/preferences, or the most relevant understanding of a community’s views/preferences for the organisation (e.g. for the NHS, or for a Congressperson’s office)
What approaches to recommender systems are the most suitable for this purpose? For instance, content-based filtering seems potentially more appropriate than collaborative filtering.
What are the input features, and would an effective system require data that social media platforms don’t already collect? (E.g. what data might provide more signal than a ‘Like’ button?)
What data is it easiest or most useful to build from – could it be existing Twitter data, for example?

Possible Solution

Describe what your proposed solution is and how it makes use of AI. If there’s a hypothesis you’re testing, what is it? What makes this approach particularly tractable? How would you implement your solution?

One solution is to create a recommendation system for politicians based on an existing social media platform. Such a system could display both ‘top down’ views based on each politician’s political agenda, and ‘bottom up’ views by topic modelling input from constituents, and areas of intense engagement and/or agreement that unites a diverse array of the community.

Let’s imagine starting with API data from a regular social media platform. We’d want two ‘layers’ of content — the ‘topics’ and the individual perspectives (e.g. a topic might be climate policy, an individual perspective might be someone posting an update about the impact of the Inflation Reduction act.)

First we would define politicians’ constituent areas and political agenda. (This could be extracted from public information such as their campaign platform.) On the user side, we would collect data such as zip code (to match them with the right constituency, although zip codes may present privacy issues) as well as generated content (posts, replies, likes, etc.). The top-down part of the algorithm would look for posts within each constituency that are relevant to each politician’s agenda, surfacing opinion clusters for each agenda item.

The bottom-up part of the algorithm would perform topic modelling across posted and liked comments to discover what constituents are talking about organically, and to cluster them into topics. Bridging-based ranking could upweight comments that are popular across diverse viewpoints.

Platforms’ existing recommender systems can already surface popular topics which one can assume would be useful for the politician. But many of these topics might be totally unrelated to political agendas. Feedback signals from politicians (e.g. marking insights as actionable, novel, or useful), could train ML models that bootstraps on the platforms’ existing recommender system to predict topics of particular relevance for politicians.

Summarisation would be critical, since politicians generally care more about general sentiment than individual comments. For instance, key statistics for each topic – how many people are posting or liking comments to do with that topic – would be helpful. LLMs could support this summarisation and link to source information.

One could select different objective functions depending on context, or use a multi-objective function that proxies the goal of a more complete understanding of community views and preferences (e.g. a weighted combination of diversity of opinion, informational content, user engagement, relevance, and user trustworthiness). One possible objective function to use in this system is an information reconstruction objective, to ensure a good summary by measuring how well the input information can be predicted from the summary.

Method of Evaluation

Describe how you will know if your solution works, ideally at both a small and large scale. What resources and stakeholders would you require to implement and test your solution?

At both a small and large scale, a battery of metrics can be applied, such as click-through rates, politician feedback on how actionable/novel/useful information is, or ratings of summaries. User interviews and qualitative comparisons to existing news aggregation sources (websites, polls, etc.) could also play helpful roles.

Risks and Additional Context

What are the biggest risks associated with this project? If someone is strongly opposed to your solution or if it is tried and fails, why do you think that was? Is there any additional context worth bearing in mind?

The biggest risk is that it fails from a precision/recall perspective, highlighting the wrong content and overlooking important content.

Next Steps

Outline the next steps of the project and a roadmap for future work. What are your biggest areas of uncertainty?

The next step would be to try one of the approaches for a specific constituency, either the bottom-up or the top-down one, and see if it recommends sensible things.

Saffron Huang https://cip.org