Institutional Generative Design via Rich Recommendations

15 Oct

TANTUM COLLINS

Institutional Problem or Research Question

Describe what the open institutional problem or research question you’ve identified is, what features make it challenging, and how people deal with it currently.

Organisational design focuses largely on matching people with one another and with other entities (projects, funding, etc.) in order to make the most of finite resources in service of a collective mission. Currently, most institutions use interpretable, human-designed structures to achieve this end. However, as anyone who has worked at a large organisation knows, these systems fall far short of optimality.

Consider a research lab that has grown from a dozen people to several hundred. In its smaller form, it functioned organically: everyone knew their colleagues’ interests and competences; people shared content and context easily and cycled fluidly between projects. As the lab grew, managing the flow of work and information in this way became intractable, so leadership instituted a traditional structure with a discrete taxonomy of teams and reporting lines. This hierarchy largely dictates who works on what and with whom, which resources they can access, what papers they read, and so on. There are of course friendships and collaborations that bridge teams, special interest associations (clubs, reading groups, etc.), and proverbial ‘watercooler chats’, but these play a marginal role. Although it works decently, the new structure generates plenty of false positives and negatives: almost everyone spends a decent amount of time consuming information irrelevant to them, and ends up missing some meetings and projects highly salient to their work.

Possible Solution

Describe what your proposed solution is and how it makes use of AI. If there’s a hypothesis you’re testing, what is it? What makes this approach particularly tractable? How would you implement your solution?

Machine-learning-powered recommender systems are well suited to the sorts of matchmaking problems that human-designed institutions have failed to address fully. Given access to enough relevant data, e.g. on individual interests, track record, satisfaction, productivity, etc. (much of which project management tools already passively collect), recommendation engines could almost certainly improve information flow and socialisation within organisations. We already see this in another context with platforms such as Netflix and Spotify: recommendation systems that harness ML-derived semantic representations of people and content tend to surface better suggestions than prior, more interpretable methods.

Within organisations, effective matches could help not only with accomplishing discrete packages of work, but also with longer-run improvements to organisational health, such as building rich lateral social connectivity, helping individuals learn and expand their horizons, etc. And advances in causal inference methods could surface novel relationships in observational data, for example by identifying the interactions that lead to new ideas or cause opinions to change.

A future version of the hypothetical research lab described above might completely lack teams or reporting lines; employees could simply peruse on a daily/weekly/monthly basis an interface that serves up papers to read, projects to work on, people to meet over coffee, and so on. This may sound alien and uninterpretable, but in some ways it just represents a return to the organic structure that almost all organisations exhibit in their early days; with ML-powered systems, such connectivity could scale, perhaps indefinitely. At the highest level, society could realise immense gains from imbuing large groups such as governments, political parties, activist collectives and so on with the efficiency of small teams.

Of course, there are many ways that this can go wrong. One of the most instructive categories of prior failure is the use of recommender systems in social media: in the quest for higher clickthrough rates, platforms served users information that entrenched biases, coarsened political discussion and was often untrue. Similarly, the sense of alienation produced by platform-enabled gig work illustrates some of the potential downsides of replacing middle management with ML-powered matchmaking systems that optimise purely for engagement/revenue. However, recommender systems need not have such crude objective functions. With respect to citizen polarisation, for instance, several proposals, such as bridging-based ranking (ranking information based on whether they have broad-based support across different demographics), show how recommender systems could have the opposite effect.

More broadly, one can imagine different kinds of recommender systems based on different objectives for information processing, e.g. a multi-objective function that proxies the goal of a more complete understanding of community views and preferences. Within a research lab, for instance, one might include in an objective function measures of productivity but also employee satisfaction. For a gig economy platform, the ideal function might feature measures of meaningful social relationships (e.g. between colleagues or with clients). What’s a Twitter algorithm that would be genuinely useful to a policymaker or to a civil society org, who wanted epistemically meaningful / representative information about their constituents / stakeholders?

A final risk that merits mentioning is privacy. In many cases, a system will better be able to suggest matches if it has access to more personal data, e.g. what projects people have worked on, how productive they are at different times of day, how they felt about various experiences, etc. This creates a tension between efficacy and privacy, especially in workplace settings where the information in question might be used to fulfill ends over which the individual to whom it pertains has only partial agency. To this end, transparent permissioning will be essential. Technical advances in privacy-preserving machine learning (PPML) could also play a critical role.

Method of Evaluation

Describe how you will know if your solution works, ideally at both a small and large scale. What resources and stakeholders would you require to implement and test your solution?

Assessment poses major challenges and varies widely depending on the particular tool in question. In some cases one can simply defer to immediate revealed preference such as click-through rates. However, in most situations the metric that really matters will contain many different elements, the challenge of credit assignment will be significant, and feedback will not be instant. And, as noted above, optimizing for simple, interpretable metrics can lead to long-run dysfunction. As a result, careful crafting of objective functions, perhaps via alignment and safety techniques such as inverse RL, will be essential to measuring success.

Once one has assembled a satisfactory set of initial metrics, assessment is in theory fairly straightforward: for any given system, A/B testing should shed light on whether it improves outcomes relative to the status quo. In practice large sample sizes may be necessary due to the high variance of organizational goals and circumstances. If such tools were integrated into widely-used communication and project management platforms (e.g. Slack), conducting testing at scale would be easy; for more bespoke tooling, running such experiments may pose challenges.

Risks and Additional Context

What are the biggest risks associated with this project? If someone is strongly opposed to your solution or if it is tried and fails, why do you think that was? Is there any additional context worth bearing in mind?

As described in the ‘Possible Solution’ section, the use of AI-enabled recommendation systems in organisational contexts poses a number of risks, including the consequences of poorly chosen objective functions, the potential for user alienation in service of organisational efficiency, and erosion of privacy.

Next Steps

Outline the next steps of the project and a roadmap for future work. What are your biggest areas of uncertainty?

Ultimately, realising this vision will require deployment and testing at scale in organisations. Some early efforts are underway in this direction, including some of Josh Tan’s work at Metagov. Several other strands of research can inform this, such as improvements to recommender systems in general.

Saffron Huang https://cip.org