Maintaining epistemic integrity in the era of answer engines

1.1 Face-Blue-ArticleHeroBanner-1110x452-ODI-Research

Tue Apr 28, 2026

In November 2025, Ofcom published a discussion paper on so-called “answer engines”, which was the culmination of qualitative and quantitative research conducted to understand attitudes of the British public towards the recent shift in traditional search queries now being replaced with consolidated answers by AI chatbots, such as Gemini and ChatGPT. This shift is important for a number of reasons, notably in that it serves to simplify the process of knowledge acquisition in ways that could potentially, unintentionally mislead users.

Some of these costs are articulated in the form of four risks identified by Ofcom in their paper:

The loss of context as well as incorrect or obscured source attributions, which could prevent users from evaluating the credibility of information
The inflation of user trust due to conversational interfaces, which could render harmful or inaccurate information more persuasive and difficult to evaluate
The potential for sycophantic responses, where GenAI search tools excessively align with user preferences and reinforce their views (even when incorrect or biased)
The potential for jailbreaking attacks to provide access to harmful information, for example where users attempt to build context across multiple searches to extract information that would be blocked in a single query attempt

Together, these developments can contribute towards the creation of systemic risks that can serve to compromise epistemic integrity of information on the web and democratic discourse through a reduction in informational transparency. As suggested by the findings of the Ofcom paper, these risks stand to be amplified as a result of obscuring the origins of answer sources, as well as the amplification of related filter bubbles and misinformation. Where answer engines potentially stand to enhance these familiar problems within the social media age is through their propensity for what some researchers are calling “social sycophancy”, which they suggest is “when LLM outputs excessively preserve the user’s positive face”.

This is a matter of increasing concern to myself and colleagues who are contributing to the CoCoDa project, as at present, it appears that current UK regulation has a gap where these issues of sycophantic AI and filter bubbles intersect. Through our collective work, the CoCoDa project is looking to find solutions to some of the systemic risks that can be attributed to the high concentration of control and data that lies with dominant online platforms.

What’s new?

Filter bubbles and misinformation are known problems. In recent decades we have seen extensive coverage of these threats, particularly in the context of social media recommender systems. For the purposes of this blog, it’s worth noting that at the time of their research, Bashky, Messing and Adamic suggested that, while social media can contribute to the creation of filter bubbles, this might have less to do with algorithmically generated seclusion and more due to users’ curatorial choices. Through the aforementioned social sycophancy that can contribute to the reinforcement of pre-held beliefs, answer engines may therefore serve to bridge this gap between algorithm and human, to optimise the identification of information and sources that support their existing positions on particular subjects.

Some progress has been made in developing mitigations for filter bubbles in social media recommender systems. These are primarily structural and auditable, such as user-facing personalisation controls and interface nudges. These operate transparently at the platform level and already fall within existing regulatory frameworks such as the EU’s Digital Services Act (DSA). AI sycophancy mitigations are different. While constitutional AI and reinforcement learning approaches attempt to train honesty and helpfulness into models, the training mechanism that produces sycophancy is entangled with what makes models useful. This ‘helpfulness’ training can therefore override safety guardrails, even in normal use. Currently, no regulatory body has issued specific guidance on sycophancy as a distinct risk category, meaning answer engines represent an escalation into a regime where the existing mitigation toolkit does not yet apply.

Moderation of content and speech is challenging, however it seems that there isn’t currently anything that neatly covers this new issue in UK regulation, despite some of the potential harms outlined by Ofcom being raised as points of discussion in parliament. At present this includes:

The Online Safety Act (OSA) - focuses on illegal content and user-to-user harms. While the OSA acknowledges filter bubbles, it doesn’t regulate the algorithmic creation of echo chambers. Answer engines/AI chatbots also don’t fit neatly into existing categories for regulation and are not currently in scope of the OSA, which means a sycophantic answer engine operating independently of a regulated platform faces no obligations at all. The OSA’s harms framework is also focused on physical or psychological injury, as opposed to epistemic harm.
UK General Data Protection Regulation (GDPR), Article 22 (rights related to automated decision making including profiling) - only applies where automated decisions have “legal or similarly significant effects.” The Information Commissioner’s Office (ICO) is explicit that the threshold for “significant effect” is high, so it is unlikely that answer engines providing individual inaccurate responses will trigger Article 22, no matter how cumulatively harmful these might be.
Digital Markets Act (DMA) - addresses competition and consumer protection in digital markets but doesn’t engage with epistemic or informational integrity harms. The DMA focuses on ensuring that consumer interactions are not misleading or manipulative, however the focus is on economic harm and commercial conduct.
Digital Services Act (DSA) - while the DSA has practical bearing on UK operations for platforms serving EU users, it is concerned primarily with platform liability and content moderation. The systemic risk provisions of the DSA focus on disinformation and election integrity rather than epistemic harms of sycophantic answer engines, and individual AI chatbot interactions are largely excluded from its scope as things are.

Some sector-specific regulation in domains such as the financial services imposes constraints on harmful advice, however these operate downstream, addressing liability for substantive advice once given. The harms posed by answer engines occur further upstream, at the level of belief formation, which is precisely why no existing framework adequately captures them.

Why does this matter?

The threat to informational integrity is something that has and continues to be struggled with as it falls outside of the established, or traditional categories of consumer harm, content moderation or competition issues. Added to this, it is evident that we are still in the process of determining the scale of this potential problem. Recent reporting from the New York Times suggested that 1 in 10 answers generated by Google’s Gemini answer engine were erroneous. While this does not seem high, based on estimates that Google processes more than five trillion searches a year, this amounts to potentially tens of millions of erroneous answers every hour. Furthermore, research from Microsoft Security recently demonstrated how trivially straightforward it can be to manipulate AI chatbots into providing unfounded or erroneous answers through AI recommendation poisoning. Systematic benchmarking reinforces this picture. Findings from the CitizenQuery-UK work found that even high-performing models exhibit variance and low abstention rates on citizen queries about benefits, tax and legal entitlements. As a result, users of government information services still receive confident sounding answers of highly uneven reliability. While answer accuracy and hardening against poisoning attacks are both areas that companies are actively looking to improve, current source transparency measures abstract users from being able to determine whether the answers they have been provided are grounded in truth, or cobbled together from social media feeds or blogs in which a person has ordained themself as the best in their field.

It is therefore encouraging that Ofcom has highlighted some of the associated risks posed by answer engines, however corresponding solutions and measures to counteract these risks aren’t as forthcoming. This is a matter of concern as we have seen the impacts of inactivity around filter bubbles with social media, which has also served to highlight the difficulties with regulating emerging technologies that have the potential to upend social conventions, for better and worse. It’s therefore imperative that we learn from the past attempts to provide researchers with access to the data they need to be able to scrutinise the workings and effects of these answer engines while the window to mitigate against these potential epistemic harms remains ajar.

What could be done?

While these appear to be additional intractable problems, there do appear to be remedies worth considering.

Before the regulatory framework catches up, there is meaningful groundwork that can be done at the data layer. Benchmark datasets and standardised evaluation methods represent a practical and actionable starting point. Together, these serve as the evidence based and mechanisms that a regulatory regime will require to take meaningful action. The aforementioned CitizenQuery-UK work has demonstrated how it’s possible to benchmark answer engine performance at scale. Similarly, recent work by researchers on the AdversaRiskQA benchmark for high risk domains highlights that adversarial factuality in high-risk domains remains almost entirely understudied. Both papers serve to demonstrate that the infrastructure for interrogating answer engines systematically doesn’t yet exist, but that it’s possible for this to be built and adopted. Establishing shared benchmarking standards now would mean that, if or when stronger regulatory obligations do arrive, there is already an evidence base capable of supporting them.

With regards to possible regulatory changes, Ofcom already has legal powers under the OSA to access information held by regulated companies and to remotely inspect how a platform’s algorithm works in real time, including what content is recommended to users. These powers could be extended via secondary legislation to require answer engine providers to submit algorithmic risk assessments to Ofcom that cover sycophantic response patterns. As evidenced by the NYT reporting, these risk assessments would also benefit from independent, third-party auditing.

In addition, a requirement could be considered for providers of answer engines to demonstrate measures preventing systematic belief reinforcement. The House of Commons Science, Innovation and Technology Committee has already recommended that generative AI platforms should be required to provide risk assessments to Ofcom on the risks associated with different prompts and outputs, and to explain how their models respond to sensitive topics and the guardrails they have in place to mitigate against associated harms. Extending this to cover sycophancy and belief reinforcement specifically could close the gap.

Finally, further consideration could be given to diversity or neutrality obligations for critical topics. As noted earlier within the article, this is currently insufficiently covered to directly address upstream epistemic harms associated with sycophantic answer engines. Ofcom’s existing due impartiality framework for broadcasting does not require strict neutrality, but requires that a sufficient range of views on a topic is covered. There’s also evidence that this can be operationalised from Wikipedia’s Neutral Point of View policy, which has shown that proportional, source-grounded neutrality can be institutionalised at scale without mandating false balance. The AI safety literature also shows that technical safeguards against bias and toxicity are already being developed and deployed. A proportionate analogue for answer engines, applied to demonstrably high-risk topics such as election or public health emergencies would therefore build on precedents that already exist in both information governance and AI safety practice.

Conclusion

The shift from search to answer engines is a structural one, with consequences for how millions of people form beliefs about the world. While research has begun into potential impacts and downstream effects, we do not yet have a clear understanding of how far-ranging these might be. The regulatory frameworks we have were not built for this, and the window to act thoughtfully won’t remain open for long. What is encouraging is that the tools to begin building accountability infrastructure exist, and that Ofcom has at least put a name to some of the problems. What’s needed now is to begin putting measures in place, such as investing in the necessary benchmarking and evaluation capacities that regulatory oversight will require. The lessons from our experiences with social media should serve to inform our actions now, rather than wait until the harms are undeniable and entrenched.

About us

Our five year plan

What we do

Solid

Membership

Maintaining epistemic integrity in the era of answer engines

What’s new?

Why does this matter?

What could be done?

Conclusion

Related

ODI and CoCoDa: Technolegal Solutions for Online Platform Accountability

Missing Infrastructure: Semantic Data for Effective Scrutiny of Online Platforms

How can we empower data communities in the era of generative AI?