Meta's leaked AI guidelines permit harmful chatbot content

- August 21, 2025

Meta’s AI Policies Allow Harmful Content on Sensitive Topics

A recently leaked internal policy document has revealed that Meta, the parent company of Facebook, WhatsApp, and Instagram, permitted its AI chatbots to generate provocative and harmful content on sensitive topics such as sex, race, and celebrities. The 200-page “GenAI: Content Risk Standards” guide, which was approved by Meta’s legal and policy teams, outlined what behaviors were considered acceptable when developing AI systems for these platforms.

The guidelines allowed certain controversial outputs, including flirtatious or romantic roleplay involving minors, generating false medical claims, and producing racist arguments under specific conditions. These provisions sparked significant concern among users, regulators, and experts in the field of artificial intelligence.

Meta confirmed the authenticity of the document but removed sections that allowed sexualized conversations with minors after questions were raised earlier this month. A company spokesperson, Andy Stone, admitted that such interactions should never have been allowed and were inconsistent with official policies. However, he acknowledged that enforcement of these rules had been inconsistent over time. Despite these changes, the company declined to share the updated policy, leaving some controversial provisions intact. For example, rules allowing derogatory statements about certain racial groups if explicitly framed as fictional remained in place.

False Stories and Image Generation

The leaked document also revealed that Meta’s AI systems were permitted to create false stories about public figures, including serious and damaging health rumors, as long as disclaimers stated the information was untrue. This approach raised concerns about the potential for misinformation to spread rapidly through AI-generated content.

In addition, the standards allowed image generation based on sexualized requests for celebrities. Instead of providing explicit content, the AI would offer humorous or unrelated substitutions. One example included replacing a topless request for a famous singer with an image of her holding a large fish, marking the former as unacceptable. This practice highlighted how Meta’s policies attempted to balance user requests with content moderation guidelines.

Non-Lethal Violence and Ethical Concerns

Beyond sexual and racial content, the policy also permitted AI-generated depictions of non-lethal violence. According to the guidelines, images showing adults or even elderly people being punched or kicked, as well as children fighting, were allowed as long as the scenes avoided excessive gore or fatal injuries. More graphic requests, such as impalement or disembowelment, were considered unacceptable.

This framework raised concerns among experts about the ethical standards and moral implications of AI-generated harmful content. Legal scholars pointed out that there is a clear distinction between allowing users to post problematic material and enabling an AI system to generate it directly. They questioned why a global tech company would sanction certain harmful narratives in its chatbot designs, especially those involving racism or sexualized depictions of minors.

Ongoing Scrutiny and Risks

The leak has intensified scrutiny over how technology firms set boundaries for artificial intelligence and the real-world risks these systems might pose when such limits are weak or inconsistently enforced. As AI continues to play a larger role in shaping online interactions, the need for transparent and ethical guidelines becomes increasingly urgent.

Experts argue that without stronger oversight and consistent enforcement, AI systems could perpetuate harmful content, erode trust in digital platforms, and contribute to societal divisions. The revelations about Meta’s policies underscore the importance of accountability in the development and deployment of AI technologies.

Search This Blog

Chris Infotainment