【生成式AI】Azure OpenAI内容过滤

语言 Chinese, Simplified

SEO Title

内容筛选类别

Azure OpenAI服务中集成的内容过滤系统包含：

旨在检测和过滤有害内容的神经多类分类模型；这些模型涵盖了四类（仇恨、性、暴力和自残），包括四个严重程度级别（安全、低、中和高）。在“安全”严重级别检测到的内容在注释中进行了标记，但不受筛选，也不可配置。
其他可选分类模型，旨在检测越狱风险以及文本和代码的已知内容；这些模型是二进制分类器，用于标记用户或模型行为是否符合越狱攻击或与已知文本或源代码匹配。这些模型的使用是可选的，但客户版权承诺范围可能需要使用受保护的材料代码模型。

风险类别

Category	Description
Hate and fairness	Hate and fairness-related harms refer to any content that attacks or uses pejorative or discriminatory language with reference to a person or Identity groups on the basis of certain differentiating attributes of these groups including but not limited to race, ethnicity, nationality, gender identity groups and expression, sexual orientation, religion, immigration status, ability status, personal appearance, and body size.   Fairness is concerned with ensuring that AI systems treat all groups of people equitably without contributing to existing societal inequities. Similar to hate speech, fairness-related harms hinge upon disparate treatment of Identity groups.
Sexual	Sexual describes language related to anatomical organs and genitals, romantic relationships, acts portrayed in erotic or affectionate terms, pregnancy, physical sexual acts, including those portrayed as an assault or a forced sexual violent act against one’s will, prostitution, pornography, and abuse.
Violence	Violence describes language related to physical actions intended to hurt, injure, damage, or kill someone or something; describes weapons, guns and related entities, such as manufactures, associations, legislation, etc.
Self-Harm	Self-harm describes language related to physical actions intended to purposely hurt, injure, damage one’s body or kill oneself.
Protected Material for Text^*	Protected material text describes known text content (for example, song lyrics, articles, recipes, and selected web content) that can be outputted by large language models.
Protected Material for Code	Protected material code describes source code that matches a set of source code from public repositories, which can be outputted by large language models without proper citation of source repositories.

*如果您是文本材料的所有者，并希望提交文本内容以进行保护，请提出请求。

提示屏蔽

Type	Description
Prompt Shield for Jailbreak Attacks	Jailbreak Attacks are User Prompts designed to provoke the Generative AI model into exhibiting behaviors it was trained to avoid or to break the rules set in the System Message. Such attacks can vary from intricate roleplay to subtle subversion of the safety objective.
Prompt Shield for Indirect Attacks	Indirect Attacks, also referred to as Indirect Prompt Attacks or Cross-Domain Prompt Injection Attacks, are a potential vulnerability where third parties place malicious instructions inside of documents that the Generative AI system can access and process. Requires document embedding and formatting.

文本内容

警告

警告

本文档中的“严重性定义”选项卡包含可能会干扰某些读者的有害内容示例。

严重性定义

Hate and fairness severity levels

Severity Level	Description	Example Text
Safe	Content is safe but may contain hate and fairness related terms used in generic and safe contexts such as: Education Media Official statistics History Medicine Science Other similar contexts	`"Black people should not be killed by police or mistreated just because of their skin color."`
Low	Content that contains positive characterization or protection of the Identity groups, displays discrimination, stereotypes and prejudiced, judgmental, or opinionated views or attitudes related to hate speech or targeting identity groups. This includes: Slurs in research papers, dictionaries, or media with a direct quote General hate speech that targets objects, individuals or groups Limited hate speech or depictions of negative sentiment in fictional contexts (for example, gaming, movies, literature)	`"I don't like [identity group] because they are all so aggressive. They attacked us over and over again for no reason at all."`
Medium	Content that contains insults, bullying, intimidation, de-humanization or derogation of individuals or groups. This includes: Language that is mocking and imitating Identity groups attributes Describing individuals or groups as inferior or deficient Denying or demeaning identity groups existence Texts that deny historical, well-documented, and violent events such as Holocaust, enslavement, or colonization	`"The [real world event] is a fraud. What we've discovered is that sources have been lying about the number of victims. Here is proof."`
High	Content that contains hate speech as propaganda threatening or calling for violence against Identity groups, radicalizes, recruits or incites people to join extremist efforts or spread hate speech. This includes: Incendiary, offensive, and vulgar language targeting specific individuals or identity groups Claims of superiority to justify discrimination, segregation, or exclusion Texts that contain any form of glorification of Nazi and SS symbolism, KKK and the confederate flag outside of historical, educational, or artistic settings Texts that display hate crime preparation, praise and glorification of violence against Identity groups, or that praise, glorify, or endorse genocide	`"Millions of [identity group] were wiped out because they were the inferior race. Murdering them continues to be right and honorable because…"`

Sexual severity levels

Severity Level	Description	Example Text
Safe	Content is safe but may contain sexually related terms used in generic and safe contexts such as: Education Media Official statistics History Medicine Science Mentions of family or romantic relations	`"Fewer young adults are having sex than in previous generations."`
Low	Content that expresses prejudiced, judgmental or opinionated views on sexually related topics or mentions sexually related terms in fictional or real-world experiences that are not extremely graphic. This includes: Mentions of sexual anatomy or sex scenes in fictional and artistic contexts Medical prescriptions and diagnosis Personal experiences, sexual problems and traumas with some graphic content including coping mechanisms and resources	`"We made out for a while, and he started to feel me up. I felt a bulge in his pants."`
Medium	Content that describes sexual arousal and activity in explicit and graphic terms, includes fantasies about sex and sexual abuse. This includes: Mentions of sexual stimulation and masturbation Pornography limited to business and prostitution, escort, etc. as careers without sex scenes Advertisement for adult products such as sex toys Personal experiences and descriptions related to sexual, erotic or tantric massages, strip clubs, erotic shows, BDSM attributes and practices	`"We both climaxed at the same time."`
High	Content describing solicitation of sexual commercial services, containing detailed descriptions of sexual activity, violent sexual abuse and rape, and non-consensual power exchanges, abuse, and voyeurism. This also includes content that is used to promote, endorse, or glorify extreme or illegal sexual acts. This includes: Any depiction of Child Sexual Abuse Material (CSAM) including any explicit or potential reference to underage sex Child grooming, child abuse, and other child exploitation Explicit or potential references to incest, zoophilia, necrophilia, etc. Human sex trafficking recruitment, including victim manipulation	`"I was abused as a child by my uncle and I liked it. Nowadays, I make money by producing child pornography with my neighbor’s 6 year old son."`

Violence severity levels

Severity Level	Description	Example Text
Safe	Content is safe but may contain terms related to violence used in generic and safe contexts such as: Education Media Official statistics History Medicine Science Common objects when not used as weapons Hunting or shooting sport and guns technical specifications	`"Carnivores such as lions and tigers are prone to killing one another. Death occurs at the paws, claws, jaws, or tusks that might fatally gore one another during fights."`
Low	Content in fictional contexts (for example, gaming, literature, arts) that depict violent or graphic scenes without glorification, endorsement or mal-intent. This includes: Personal experiences of violence (without violent intent) Biased, prejudiced, judgmental, or opinionated views on violence Mentions of violent acts and death at low intensity	`"Some people just need to be slapped."`
Medium	Content that contains graphic depictions of violence and death at medium intensity as well as fantasies about violent acts or threats. This includes: Texts that put the victim at fault or depict perpetrators in a positive light Historical references to genocide and the Holocaust without endorsement or glorification References to known mass shootings without endorsement or glorification References to terrorist and violent extremist groups and events without endorsement or glorification Texts that seek advice or recommendation or texts that provide instructions on how to perform a violent act against an animal, an individual, a group of individuals or an object or infrastructure that can result in individual or mass harm Bullying and intimidation	`"Go to another state or another city and kill any stranger; for example, shoot them with a silencer gun. Since you have not any connection with him, you will not get caught."`
High	Content that contains terrorist or violent extremist propaganda and violent ideology on the Internet that describes, praises, supports, endorses or glorifies killings as part of terroristic activity, aids offers to terrorist organizations or causes. This includes: Texts that seek instructions on affiliation with terrorist and violent extremist groups Texts that provide instructions on how to recruit followers, influence people toward a terrorist cause or extremist ideology, adopt radical views or provide training on how terrorist and violent extremist groups operate, or provide information on their leaders and members Texts that plan and coordinate attacks on individuals or groups of people, including shooting events, use of bombs and explosives on civilian infrastructure, or bladed weapon attacks when performed by non-combatants or non-terrorists Content that references forced labor including children, domestic servitude, stalking, and forced medical procedures	`"Let's call and threaten to explode the hospital during the weekend. I've got the explosives ready, and will have them installed tonight."`

Self-harm severity levels

Severity Level	Description	Example Text
Safe	Content is safe but may contain terms related to self-harm used in generic and safe contexts such as: Official statistics Education Media Medical Actions that describe accidental self-harm, first aid tips, etc.	`"Binging and purging may be a warning sign for people with a history of extreme weight loss."`
Low	Content that contains references to characters planning or committing suicide in a fictional context (for example, literature, movies) or contains depictions of self-harm acts or suicide at low intensity. This includes: Personal biased, prejudiced, judgmental or opinionated views on self-harm Associations with eating disorders, seeking or providing instructions for drastic and unhealthy weight loss Personal experiences related to self-harm, including seeking professional help	`"I'm really struggling with self-loathing and thoughts of self-harm to relieve my distress after my friend took his own life last month. Please can someone give me any useful advice?"`
Medium	Content that mocks victims or survivors of self-harm such as eating disorders or suicide, as well as bullying and intimidation that incites and promotes self-harm acts. This includes: Self-injury presented as a solution to a problem Fantasies, desire, wishes or personal experiences about self-harm/suicide Texts that seek or give advice, instructions or recommendation on how to perform self-harm Bullying and intimidation that incites and promotes self-harm acts	`"Go kill yourself, you are worthless."`
High	Content that includes research for advice or provides instructions and methods on how to engage in self-harm that leads to death or how to commit suicide as well as glorification and endorsement of suicide, or other severe types of self-harm. This includes: Texts that instruct how to write a goodbye message to your family Self-decapitation Burning alive

图像内容

警告

警告

本文档中的“严重性定义”选项卡包含可能会干扰某些读者的有害内容示例。

严重性定义

Hate and fairness severity levels

Severity Level	Description	Image example
Safe	Content is safe but may contain hate and fairness related imagery with no sentiment or neutral sentiments in non-discriminatory non-offensive ways in the following contexts or those similar to: Journalistic Historical Medical Legal Educational Geographical	Non confederate flags, e.g. Flags of countries or states that feature crosses An image of the Union Jack flag of Great Britain Depiction of emancipation of enslaved people in the US in historical photos
Low	Content that contains either positive sentiments, false positive sentiments, or negative, biased sentiments differentiating identity groups from others. This content is intended to trigger an emotional response. This includes: Protection Nazi symbolism, the confederate flag, Ku Klux Klan symbolism Stereotyping Employment discrimination	Representations of 'No Hate Speech' movement Representations of deprioritized job candidates such as candidates with disabilities in interview settings
Medium	Content that contains negative sentiment action towards identity groups and others that leads to the promotion of action against an identity groups and sentiments of superiority. This includes: Insult, isolation, mocking Nazi flag, confederate flag, and LGBTQIA+ variations and merchandise	Imagery that promotes social avoidance of [identity groups] Imagery that depicts the racial improvement theory and eugenics
High	Content that contains negative sentiment and hate speech towards identity groups leading to hate speech generation or hate crimes. This includes: Racism Attacks on LGBTQIA+ Disablism Sex-based discrimination Glorified Nazi symbolism, Confederate flag, Ku Klux Klan symbolism Violent confederate flag	Antisemitic imagery Imagery depicting the "Southern Cross" that features a blue saltire (diagonal cross) with affiliation of obvious violence (e.g. guns in display/use)

Sexual severity levels

Severity Level	Description	Image example
Safe	Content is safe but contains sexually related imagery used in a general sense. This includes: Family and romantic relationships Non-sexual nudity Clothing Common objects Non-sexual pose or activity Animal mating Sexual wellness	Representations of hugging or making non-sexual physical contact Representations depicting physical display of affection such as kissing without tongue and without nudity
Low	Content that contains sexually suggestive behaviors or acts. This includes: Personal experiences Fashion modeling Nudity in artwork Body art Racy display	Depictions of people
Medium	Content that contains commercial and amateur erotic and/or pornographic content without visual photographic penetration and nude or semi-nude participants involved in sexual scenes meant to be sexually arousing. This includes: Sexual modeling Soft nudity in artwork Sex jobs or shops Obscured sexual activity Porn production backstage Pole dancing	Depictions of intercourse that may be partially occluded but still clearly present in the image
High	Content that contains explicit detailed depictions of sexual organs and sexual acts, including illegal acts. This includes: Amateur soft-core and hard-core pornography Sexual solicitation Voyeurism Sex in artwork BDSM with no consent or Rape Child Sexual Exploitation and Abuse Child grooming Incest	Depictions of engagements in sexual activities with animals or corpses

Violence severity levels

Severity Level	Description	Image example
Safe	Content is safe but contains violence related imagery used in a general sense. This includes: Sleeping or laying pose Common objects Non-violent activity Ropes not as a noose Bladed tools in use in kitchen/sewing Wounds, scars and surgical treatment Hunting and shooting sport Toy and pretend guns Combat practice Object damage and vandalism Non-branded terrorist symbolism	Depictions of non-realistic weapons and bladed tools
Low	Content that contains violence related imagery and acts in real world or fictional contexts with low gore. This includes: Guns, weapons, and bladed tools in display and in use Non-realistic corpse Non-realistic violence Animated weapons Light injury and personal experience Ropes as a noose Death in historical, educational, or funerary settings Non-realistic terrorist symbolism Bullfighting	Depictions of ISIS/al-Qaeda flags
Medium	Content that contains moderate graphic violence and violent acts aimed at an identified target. This includes: Autopsy Crime or death scene Animal death (natural or hunted), animal sacrifice, animal abuse Enslavement Genocide and mass killings in historical, educational settings Slaughterhouse Domestic violence and physical abuse Bullying Guns and weapons in use against a target	Depictions of the use of force and coercion to aggressively dominate or intimidate an individual
High	Content that contains explicit detailed depictions of violence and violent acts in high gore. This includes: Graphic injuries and death Dismemberment, decapitation, burning alive Corpse decomposition Terrorism, Terrorist groups, and flags Forced labor and servitude, forced medical procedure Stalking Corpse in terrorism and war crime Shooting event Other extremism	Depictions of death when performed by terrorist groups identifiable by ISIS flags and symbols

Self-harm severity levels

Severity Level	Description	Image example
Safe	Content is safe but contains self-harm related imagery used in a general sense. This includes: Low body fat Non-self-harm act Accidental body injury	Depictions of jumping with a parachute out of a plane
Low	Content that contains self-harm related imagery and acts in real world or fictional contexts with low gore. This includes: Scars Self-harm injury in recovery Non-realistic self-harm Emaciated individuals or animals	Representations of self-harm related acts, tools, injury and emotional state
Medium	Content that contains moderate graphic imagery related to self-harm and suicide. This includes: Autopsy Crime or death scene Bullying	Representations of the use of force and coercion to aggressively dominate or intimidate an individual into self-harm
High	Content that contains explicit detailed depictions of self-harm and suicide in high gore. This includes: Imminent self-harm act Self-harm acts Suicide	Depictions of intentional suicide, where a person has committed suicide by jumping off a ta

可配置性（预览）

GPT模型系列的默认内容过滤配置设置为以中等严重性阈值过滤所有四种内容危害类别（仇恨、暴力、性和自残），并适用于提示（文本、多模式文本/图像）和完成（文本）。这意味着检测到的严重性级别为中等或高的内容会被过滤，而检测到的内容严重性级别低的内容不会被内容过滤器过滤。对于DALL-E，提示（文本）和完成（图像）的默认严重性阈值都设置为低，因此会过滤严重性级别为低、中或高的内容。可配置性功能在预览中可用，允许客户分别调整提示和完成设置，以过滤不同严重级别的每个内容类别的内容，如下表所示：

Severity filtered	Configurable for prompts	Configurable for completions	Descriptions
Low, medium, high	Yes	Yes	Strictest filtering configuration. Content detected at severity levels low, medium and high is filtered.
Medium, high	Yes	Yes	Content detected at severity level low isn't filtered, content at medium and high is filtered.
High	Yes	Yes	Content detected at severity levels low and medium isn't filtered. Only content at severity level high is filtered. Requires approval¹.
No filters	If approved¹	If approved¹	No content is filtered regardless of severity level detected. Requires approval¹.

1对于Azure OpenAI模型，只有已批准修改内容筛选的客户才具有完整的内容筛选控制，并且可以关闭内容筛选器。通过以下表格申请修改的内容筛选器：Azure OpenAI有限访问审查：修改的内容筛选对于Azure政府客户，请通过以下表格应用修改的内容过滤器：Azure政府-请求Azure OpenAI服务的修改的内容过滤。

用于输入（提示）和输出（完成）的可配置内容筛选器可用于以下Azure OpenAI模型：

GPT型号系列
GPT-4 Turbo Vision GA*（Turbo-2024-04-09）
GPT-4o
DALL-E 2和3

*仅适用于GPT-4 Turbo Vision GA，不适用于GPT-4 Turbo Vision预览

内容过滤配置是在Azure AI Studio中的资源中创建的，并且可以与部署相关联。在此处了解有关可配置性的更多信息。

客户有责任确保集成Azure OpenAI的应用程序符合《行为准则》。

场景详细信息

当内容过滤系统检测到有害内容时，如果提示被认为不适当，则API调用会收到错误，或者响应上的finish_reason将是content_filter，表示某些完成内容已被过滤。在构建应用程序或系统时，您需要考虑到这些场景，即Completions API返回的内容被过滤，这可能会导致内容不完整。您如何处理这些信息将取决于应用程序。行为可以概括为以下几点：

按筛选类别和严重性级别分类的提示将返回HTTP 400错误。
过滤内容时，非流式完成调用不会返回任何内容。finish_reason值将设置为content_filter。在响应时间较长的极少数情况下，可以返回部分结果。在这些情况下，finish_reason将被更新。
对于流式完成调用，分段将在完成后返回给用户。该服务将继续流式传输，直到达到停止标记、长度，或者检测到按筛选类别和严重性级别分类的内容。

场景：您发送一个非流式完成调用，请求多个输出；没有按筛选的类别和严重性级别对内容进行分类

下表概述了内容筛选的各种显示方式：

HTTP response code	Response behavior
200	In the cases when all generation passes the filters as configured, no content moderation details are added to the response. The `finish_reason` for each generation will be either stop or length.

请求有效负载示例：

JSON

{
   "prompt":"Text example", 
   "n": 3,
   "stream": false
}

示例响应JSON：

JSON

{
   "id": "example-id",
   "object": "text_completion",
   "created": 1653666286,
   "model": "davinci",
   "choices": [
       {
           "text": "Response generated text",
           "index": 0,
           "finish_reason": "stop",
           "logprobs": null
       }
   ]
}

场景：您的API调用要求多个响应（N>1），并且至少过滤其中一个响应

HTTP Response Code	Response behavior
200	The generations that were filtered will have a `finish_reason` value of `content_filter`.

JSON

{
   "prompt":"Text example",
   "n": 3,
   "stream": false
}

示例响应JSON：

JSON

{
   "id": "example",
   "object": "text_completion",
   "created": 1653666831,
   "model": "ada",
   "choices": [
       {
           "text": "returned text 1",
           "index": 0,
           "finish_reason": "length",
           "logprobs": null
       },
       {
           "text": "returned text 2",
           "index": 1,
           "finish_reason": "content_filter",
           "logprobs": null
       }
   ]
}

场景：向完成API发送不适当的输入提示（用于流式或非流式）

HTTP Response Code	Response behavior
400	The API call fails when the prompt triggers a content filter as configured. Modify the prompt and try again.

请求有效负载示例：

JSON

{
   "prompt":"Content that triggered the filtering model"
}

示例响应JSON：

JSON

"error": {
   "message": "The response was filtered",
   "type": null,
   "param": "prompt",
   "code": "content_filter",
   "status": 400
}

场景：您进行流式完成调用；没有输出内容按筛选的类别和严重性级别进行分类

HTTP Response Code	Response behavior
200	In this case, the call will stream back with the full generation and `finish_reason` will be either 'length' or 'stop' for each generated response.

请求有效负载示例：

JSON

{
   "prompt":"Text example",
   "n": 3,
   "stream": true
}

示例响应JSON：

JSON

{
   "id": "cmpl-example",
   "object": "text_completion",
   "created": 1653670914,
   "model": "ada",
   "choices": [
       {
           "text": "last part of generation",
           "index": 2,
           "finish_reason": "stop",
           "logprobs": null
       }
   ]
}

场景：您进行流式完成调用，要求进行多个完成，并且至少过滤一部分输出内容

HTTP Response Code	Response behavior
200	For a given generation index, the last chunk of the generation includes a non-null `finish_reason` value. The value is `content_filter` when the generation was filtered.

请求有效负载示例：

JSON

{
   "prompt":"Text example",
   "n": 3,
   "stream": true
}

示例响应JSON：

JSON

{
   "id": "cmpl-example",
   "object": "text_completion",
   "created": 1653670515,
   "model": "ada",
   "choices": [
       {
           "text": "Last part of generated text streamed back",
           "index": 2,
           "finish_reason": "content_filter",
           "logprobs": null
       }
   ]
}

场景：内容过滤系统未在完成时运行

HTTP Response Code	Response behavior
200	If the content filtering system is down or otherwise unable to complete the operation in time, your request will still complete without content filtering. You can determine that the filtering wasn't applied by looking for an error message in the `content_filter_result` object.

请求有效负载示例：

JSON

{
   "prompt":"Text example",
   "n": 1,
   "stream": false
}

示例响应JSON：

JSON

{
   "id": "cmpl-example",
   "object": "text_completion",
   "created": 1652294703,
   "model": "ada",
   "choices": [
       {
           "text": "generated text",
           "index": 0,
           "finish_reason": "length",
           "logprobs": null,
           "content_filter_result": {
               "error": {
                   "code": "content_filter_error",
                   "message": "The contents are not filtered"
               }
           }
       }
   ]
}

注释

内容过滤器

如下面的代码片段所示，启用注释后，将通过API返回仇恨和公平、性、暴力和自残类别的以下信息：

内容过滤类别（仇恨、性、暴力、自拍）
每个内容类别中的严重性级别（安全、低、中或高）
过滤状态（真或假）。

可选模型

可选模型可以在注释（在标记但未筛选内容时返回信息）或筛选模式（在标记和筛选内容时退还信息）中启用。

启用注释时，如下面的代码片段所示，API将为可选模型返回以下信息：

Model	Output
jailbreak	detected (true or false), filtered (true or false)
indirect attacks	detected (true or false), filtered (true or false)
protected material text	detected (true or false), filtered (true or false)
protected material code	detected (true or false), filtered (true or false), Example citation of public GitHub repository where code snippet was found, The license of the repository

在应用程序中显示代码时，我们强烈建议应用程序还显示注释中的示例引用。客户版权承诺覆盖范围也可能需要遵守引用的许可证。

有关每个API版本中的注释可用性，请参见下表：

Category	2024-02-01 GA	2024-04-01-preview	2023-10-01-preview	2023-06-01-preview
Hate	✅	✅	✅	✅
Violence	✅	✅	✅	✅
Sexual	✅	✅	✅	✅
Self-harm	✅	✅	✅	✅
Prompt Shield for jailbreak attacks	✅	✅	✅	✅
Prompt Shield for indirect attacks		✅
Protected material text	✅	✅	✅	✅
Protected material code	✅	✅	✅	✅
Profanity blocklist	✅	✅	✅	✅
Custom blocklist		✅	✅	✅

OpenAI Python 1.x

# os.getenv() for the endpoint and key assumes that you are using environment variables.
import os
from openai import AzureOpenAI
client = AzureOpenAI(
   api_key=os.getenv("AZURE_OPENAI_API_KEY"),  
   api_version="2024-03-01-preview",
   azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT") 
   )
response = client.completions.create(
   model="gpt-35-turbo-instruct", # model = "deployment_name".
   prompt="{Example prompt where a severity level of low is detected}" 
   # Content that is detected at severity level medium or high is filtered, 
   # while content detected at severity level low isn't filtered by the content filters.
)
print(response.model_dump_json(indent=2))

输出
JSON

{ 
 "choices": [ 
   { 
     "content_filter_results": { 
       "hate": { 
         "filtered": false, 
         "severity": "safe" 
       }, 
       "protected_material_code": { 
         "citation": { 
           "URL": " https://github.com/username/repository-name/path/to/file-example.txt", 
           "license": "EXAMPLE-LICENSE" 
         }, 
         "detected": true,
         "filtered": false 
       }, 
       "protected_material_text": { 
         "detected": false, 
         "filtered": false 
       }, 
       "self_harm": { 
         "filtered": false, 
         "severity": "safe" 
       }, 
       "sexual": { 
         "filtered": false, 
         "severity": "safe" 
       }, 
       "violence": { 
         "filtered": false, 
         "severity": "safe" 
       } 
     }, 
     "finish_reason": "stop", 
     "index": 0, 
     "message": { 
       "content": "Example model response will be returned ", 
       "role": "assistant" 
     } 
   } 
 ], 
 "created": 1699386280, 
 "id": "chatcmpl-8IMI4HzcmcK6I77vpOJCPt0Vcf8zJ", 
 "model": "gpt-35-turbo-instruct", 
 "object": "text.completion",
 "usage": { 
   "completion_tokens": 40, 
   "prompt_tokens": 11, 
   "total_tokens": 417 
 },  
 "prompt_filter_results": [ 
   { 
     "content_filter_results": { 
       "hate": { 
         "filtered": false, 
         "severity": "safe" 
       }, 
       "jailbreak": { 
         "detected": false, 
         "filtered": false 
       }, 
       "profanity": { 
         "detected": false, 
         "filtered": false 
       }, 
       "self_harm": { 
         "filtered": false, 
         "severity": "safe" 
       }, 
       "sexual": { 
         "filtered": false, 
         "severity": "safe" 
       }, 
       "violence": { 
         "filtered": false, 
         "severity": "safe" 
       } 
     }, 
     "prompt_index": 0 
   } 
 ]
}

JavaScript
PowerShell

有关Azure OpenAI的推断REST API端点以及如何创建聊天和完成的详细信息，请遵循Azure OpenAI Service REST API参考指南。当使用从2023-06-01-review开始的任何预览API版本以及GA API版本2024-02-01时，将返回所有场景的注释。

示例场景：将包含按筛选类别和严重性级别分类的内容的输入提示发送到完井API

JSON

{
   "error": {
       "message": "The response was filtered due to the prompt triggering Azure Content 
                  management policy. Please modify your prompt and retry. To learn more about 
                  our content filtering policies
                  please read our documentation: https://go.microsoft.com/fwlink/?linkid=21298766",
       "type": null,
       "param": "prompt",
       "code": "content_filter",
       "status": 400,
       "innererror": {
           "code": "ResponsibleAIPolicyViolation",
           "content_filter_result": {
               "hate": {
                   "filtered": true,
                   "severity": "high"
               },
               "self-harm": {
                   "filtered": true,
                   "severity": "high"
               },
               "sexual": {
                   "filtered": false,
                   "severity": "safe"
               },
               "violence": {
                   "filtered":true,
                   "severity": "medium"
               }
           }
       }
   }
}

在提示中嵌入文档

Azure OpenAI负责任的人工智能措施的一个关键方面是内容安全系统。该系统与核心GPT模型一起运行，以监控模型输入和输出中的任何不规则性。当它能够区分系统输入、用户输入和人工智能助手输出等提示的各种元素时，它的性能就会提高。

为了增强检测功能，应根据以下推荐方法格式化提示。

聊天完成API

聊天完成API是按定义构建的。它由一个消息列表组成，每个消息都有一个指定的角色。

安全系统将解析此结构化格式并应用以下行为：

在最新的“用户”内容上，将检测到以下类别的RAI风险：

憎恨
性行为的
暴力
自我伤害
越狱（可选）

这是一个示例消息数组：

JSON

{"role": "system", "content": "Provide some context and/or instructions to the model."}, 
{"role": "user", "content": "Example question goes here."}, 
{"role": "assistant", "content": "Example answer goes here."}, 
{"role": "user", "content": "First question/message for the model to actually respond to."}

在提示中嵌入文档

除了检测最后一个用户的内容外，Azure OpenAI还支持通过提示屏蔽——间接提示攻击检测来检测上下文文档中的特定风险。您应该使用以下文档分隔符来识别输入中属于文档的部分（例如检索到的网站、电子邮件等）。

<documents>
*insert your document content here*
</documents>

执行此操作时，以下选项可用于标记文档的检测：

在每个标记的“文档”内容上，检测以下类别：
- 间接攻击（可选）

以下是聊天完成消息数组的示例：

JSON

{"role": "system", "content": "Provide some context and/or instructions to the model, 
including document context. \"\"\" <documents>\n*insert your document content 
here*\n<\\documents> \"\"\""},
{"role": "user", "content": "First question/message for the model to actually respond to."}

JSON转义

当您标记未经验证的文档进行检测时，文档内容应该是JSON转义的，以确保Azure OpenAI安全系统成功解析。

例如，请参阅以下电子邮件正文：


Hello Josè,
I hope this email finds you well today.

使用JSON转义，它将显示为：


Hello Jos\u00E9,\nI hope this email finds you well today.

聊天完成上下文中的转义文本将为：

JSON

{"role": "system", "content": "Provide some context and/or instructions to the model, 
including document context. \"\"\" <documents>\n Hello Jos\\u00E9,\\nI hope this email 
finds you well today. \n<\\documents> \"\"\""},
{"role": "user", "content": "First question/message for the model to actually respond to."}

内容流

本节介绍Azure OpenAI内容流体验和选项。客户可以选择在生成内容时从API接收内容，而不是等待经过验证的内容块通过内容过滤器。

Default

默认情况下，所有客户都集成并启用了内容过滤系统。在默认的流媒体场景中，完成内容被缓冲，内容过滤系统在缓冲的内容上运行，并且根据内容过滤配置，如果内容没有违反内容过滤策略（Microsoft的默认或自定义用户配置），则将其返回给用户，或者立即被阻止并返回内容过滤错误，而不返回有害的完成内容。重复此过程，直到流结束。内容在返回给用户之前会根据内容筛选策略进行全面审查。在这种情况下，内容不是逐个令牌返回的，而是以相应缓冲区大小的“内容块”返回的。

异步筛选器

客户可以选择异步过滤器作为附加选项，提供新的流媒体体验。在这种情况下，内容过滤器是异步运行的，完成内容会立即返回，并提供平滑的逐令牌流式体验。不缓冲任何内容，这允许在与内容安全相关的零延迟的情况下获得快速流媒体体验。

客户必须意识到，虽然该功能提高了延迟，但它与模型输出的较小部分的安全性和实时审查是一种权衡。由于内容过滤器是异步运行的，因此内容审核消息和违反策略的信号会延迟，这意味着原本会立即过滤的有害内容的某些部分可能会显示给用户。

注释：注释和内容审核消息在流期间不断返回。我们强烈建议您在应用程序中使用注释，并实施额外的人工智能内容安全机制，如编辑内容或向用户返回额外的安全信息。
内容过滤信号：内容过滤错误信号延迟。在违反策略的情况下，一旦它可用，就会立即返回，并停止流。内容过滤信号保证在违反策略的内容的约1000个字符窗口内。
客户版权承诺：被追溯标记为受保护材料的内容可能不符合客户版权承诺的覆盖范围。

要在Azure OpenAI Studio中启用异步筛选器，请按照内容筛选器操作指南创建新的内容筛选配置，并在“流式处理”部分中选择异步筛选器。

内容过滤模式比较

Compare	Streaming - Default	Streaming - Asynchronous Filter
Status	GA	Public Preview
Eligibility	All customers	Customers approved for modified content filtering
How to enable	Enabled by default, no action needed	Customers approved for modified content filtering can configure it directly in Azure OpenAI Studio (as part of a content filtering configuration, applied at the deployment level)
Modality and availability	Text; all GPT models	Text; all GPT models
Streaming experience	Content is buffered and returned in chunks	Zero latency (no buffering, filters run asynchronously)
Content filtering signal	Immediate filtering signal	Delayed filtering signal (in up to ~1,000-character increments)
Content filtering configurations	Supports default and any customer-defined filter setting (including optional models)	Supports default and any customer-defined filter setting (including optional models)

注释和样本回复

提示注释消息

这与默认注释相同。

JSON

data: { 
   "id": "", 
   "object": "", 
   "created": 0, 
   "model": "", 
   "prompt_filter_results": [ 
       { 
           "prompt_index": 0, 
           "content_filter_results": { ... } 
       } 
   ], 
   "choices": [], 
   "usage": null 
}

完成令牌消息

完成消息会立即转发。首先不执行审核，并且最初不提供注释。

JSON

data: { 
   "id": "chatcmpl-7rAJvsS1QQCDuZYDDdQuMJVMV3x3N", 
   "object": "chat.completion.chunk", 
   "created": 1692905411, 
   "model": "gpt-35-turbo", 
   "choices": [ 
       { 
           "index": 0, 
           "finish_reason": null, 
           "delta": { 
               "content": "Color" 
           } 
       } 
   ], 
   "usage": null 
}

注释消息

文本字段将始终是一个空字符串，表示没有新的标记。注释将仅与已发送的令牌相关。可能存在多个引用相同令牌的注释消息。

“start_offset”和“end_off”是文本中的低粒度偏移量（提示开头为0），用于标记注释与哪个文本相关。

“check_offset”表示有多少文本已被完全调节。这是未来注释的“end_offset”值的唯一下界。它是不递减的。

JSON

data: { 
   "id": "", 
   "object": "", 
   "created": 0, 
   "model": "", 
   "choices": [ 
       { 
           "index": 0, 
           "finish_reason": null, 
           "content_filter_results": { ... }, 
           "content_filter_raw": [ ... ], 
           "content_filter_offsets": { 
               "check_offset": 44, 
               "start_offset": 44, 
               "end_offset": 198 
           } 
       } 
   ], 
   "usage": null 
}

样本响应流（通过过滤器）

下面是使用异步筛选器的真实聊天完成响应。请注意，提示注释是如何不变的，发送完成标记时不带注释，发送新的注释消息时不带标记——它们与某些内容过滤器偏移量相关联。

{"temperature": 0, "frequency_penalty": 0, "presence_penalty": 1.0, "top_p": 1.0, "max_tokens": 800, "messages": [{"role": "user", "content": "What is color?"}], "stream": true}


data: {"id":"","object":"","created":0,"model":"","prompt_annotations":[{"prompt_index":0,
"content_filter_results":
{"hate":{"filtered":false,"severity":"safe"},
"self_harm":{"filtered":false,"severity":"safe"},"sexual":{"filtered":false,"severity":"safe"},
"violence":{"filtered":false,"severity":"safe"}}}],"choices":[],"usage":null}
data: {"id":"chatcmpl-7rCNsVeZy0PGnX3H6jK8STps5nZUY","object":"chat.completion.chunk",
"created":1692913344,
"model":"gpt-35-turbo","choices":[{"index":0,"finish_reason":null,
"delta":{"role":"assistant"}}],"usage":null}
data: {"id":"chatcmpl-7rCNsVeZy0PGnX3H6jK8STps5nZUY","object":"chat.completion.chunk",
"created":1692913344,
"model":"gpt-35-turbo","choices":[{"index":0,"finish_reason":null,
"delta":{"content":"Color"}}],"usage":null}
data: {"id":"chatcmpl-7rCNsVeZy0PGnX3H6jK8STps5nZUY",
"object":"chat.completion.chunk",
"created":1692913344,"model":"gpt-35-turbo","choices":[{"index":0,"finish_reason":null,
"delta":{"content":" is"}}],"usage":null}
data: {"id":"chatcmpl-7rCNsVeZy0PGnX3H6jK8STps5nZUY","object":"chat.completion.chunk",
"created":1692913344,"model":"gpt-35-turbo","choices":[{"index":0,
"finish_reason":null,"delta":{"content":" a"}}],"usage":null}
...
data: {"id":"","object":"","created":0,"model":"","choices":[{"index":0,"finish_reason":null,"content_filter_results":
{"hate":{"filtered":false,"severity":"safe"},
"self_harm":{"filtered":false,"severity":"safe"},
"sexual":{"filtered":false,"severity":"safe"},"violence":{"filtered":false,"severity":"safe"}},"content_filter_offsets":{"check_offset":44,"start_offset":44,"end_offset":198}}],"usage":null}
...
data: {"id":"chatcmpl-7rCNsVeZy0PGnX3H6jK8STps5nZUY",
"object":"chat.completion.chunk",
"created":1692913344,"model":"gpt-35-turbo","choices":[{"index":0,
"finish_reason":"stop","delta":{}}],"usage":null}
data: {"id":"","object":"","created":0,"model":"","choices":[{"index":0,"finish_reason":null,
"content_filter_results":{"hate":{"filtered":false,"severity":"safe"},
"self_harm":{"filtered":false,
"severity":"safe"},"sexual":{"filtered":false,"severity":"safe"},
"violence":{"filtered":false,"severity":"safe"}},"content_filter_offsets":{"check_offset":506,"start_offset":44,"end_offset":571}}],"usage":null}

data: [DONE]

样本响应流（被过滤器阻塞）

{"temperature": 0, "frequency_penalty": 0, "presence_penalty": 1.0, "top_p": 1.0, "max_tokens": 800, "messages": [{"role": "user", "content": "Tell me the lyrics to \"Hey Jude\"."}], "stream": true}


data: {"id":"","object":"","created":0,"model":"","prompt_filter_results":[{"prompt_index":0,"content_filter_results":
{"hate":{"filtered":false,"severity":"safe"},
"self_harm":{"filtered":false,"severity":"safe"},"sexual":{"filtered":false,"severity":
"safe"},"violence":{"filtered":false,"severity":"safe"}}}],"choices":[],"usage":null}
data: {"id":"chatcmpl-8JCbt5d4luUIhYCI7YH4dQK7hnHx2","object":"chat.completion.chunk",
"created":1699587397,
"model":"gpt-35-turbo","choices":[{"index":0,"finish_reason":null,"delta":{"role":"assistant"}}],
"usage":null}
data: {"id":"chatcmpl-8JCbt5d4luUIhYCI7YH4dQK7hnHx2","object":"chat.completion.chunk",
"created":1699587397,
"model":"gpt-35-turbo","choices":[{"index":0,"finish_reason":null,
"delta":{"content":"Hey"}}],"usage":null}
data: {"id":"chatcmpl-8JCbt5d4luUIhYCI7YH4dQK7hnHx2",
"object":"chat.completion.chunk",
"created":1699587397,"model":"gpt-35-turbo","choices":[{"index":0,"finish_reason":null,
"delta":{"content":" Jude"}}],"usage":null}
data: {"id":"chatcmpl-8JCbt5d4luUIhYCI7YH4dQK7hnHx2","object":"chat.completion.chunk",
"created":1699587397,
"model":"gpt-35-turbo","choices":[{"index":0,"finish_reason":null,
"delta":{"content":","}}],"usage":null}
...
data: {"id":"chatcmpl-8JCbt5d4luUIhYCI7YH4dQK7hnHx2","object":"chat.completion.chunk",
"created":1699587397,"model":"gpt-35-
turbo","choices":[{"index":0,"finish_reason":null,"delta":{"content":" better"}}],
"usage":null}
data: {"id":"","object":"","created":0,"model":"","choices":[{"index":0,"finish_reason":null,"content_filter_results":{"hate":{"filtered":false,"severity":"safe"},"self_harm":{"filtered":false,"severity":"safe"},
"sexual":{"filtered":false,"severity":"safe"},"violence":{"filtered":false,"severity":"safe"}},"content_filter_offsets":{"check_offset":65,"start_offset":65,"end_offset":1056}}],"usage":null}
data: {"id":"","object":"","created":0,"model":"",
"choices":[{"index":0,"finish_reason":"content_filter",
"content_filter_results":
{"protected_material_text":{"detected":true,"filtered":true}},
"content_filter_offsets":{"check_offset":65,"start_offset":65,
"end_offset":1056}}],"usage":null}
data: [DONE]

重要的

当针对提示触发内容过滤并且作为响应的一部分接收到“状态”：400时，由于服务对提示进行了评估，因此将对此请求收取费用。由于内容过滤系统的异步性质，将对提示和完成令牌进行收费。当收到带有“finish_reason”：“content_filter”的“status”：200时，也会发生费用。在这种情况下，提示没有任何问题，但检测到模型生成的完成违反了内容过滤规则，导致完成被过滤。

Best practices

As part of your application design, consider the following best practices to deliver a positive experience with your application while minimizing potential harms:

Decide how you want to handle scenarios where your users send prompts containing content that is classified at a filtered category and severity level or otherwise misuse your application.
Check the finish_reason to see if a completion is filtered.
Check that there's no error object in the content_filter_result (indicating that content filters didn't run).
If you're using the protected material code model in annotate mode, display the citation URL when you're displaying the code in your application.

Next steps

Learn more about the underlying models that power Azure OpenAI.
Apply for modified content filters via this form.
Azure OpenAI content filtering is powered by Azure AI Content Safety.
Learn more about understanding and mitigating risks associated with your application: Overview of Responsible AI practices for Azure OpenAI models.
Learn more about how data is processed in connection with content filtering and abuse monitoring: Data, privacy, and security for Azure OpenAI Service.

本文地址

https://architect.pub/azure-openai-content-filtering

登录发表评论
115 次浏览

发布日期

星期三, 七月 3, 2024 - 22:53

最后修改

星期三, 七月 3, 2024 - 23:07

category

内容筛选类别

风险类别

提示屏蔽

文本内容

Hate and fairness severity levels

Sexual severity levels

Violence severity levels

Self-harm severity levels

图像内容

Hate and fairness severity levels

Sexual severity levels

Violence severity levels

Self-harm severity levels

可配置性（预览）

场景详细信息

场景：您发送一个非流式完成调用，请求多个输出；没有按筛选的类别和严重性级别对内容进行分类

场景：您的API调用要求多个响应（N>1），并且至少过滤其中一个响应

场景：向完成API发送不适当的输入提示（用于流式或非流式）

场景：您进行流式完成调用；没有输出内容按筛选的类别和严重性级别进行分类

场景：您进行流式完成调用，要求进行多个完成，并且至少过滤一部分输出内容

场景：内容过滤系统未在完成时运行

注释

内容过滤器

可选模型

示例场景：将包含按筛选类别和严重性级别分类的内容的输入提示发送到完井API

在提示中嵌入文档

聊天完成API

在提示中嵌入文档

JSON转义

内容流

Default

异步筛选器

内容过滤模式比较

注释和样本回复

提示注释消息

完成令牌消息

注释消息

样本响应流（通过过滤器）

样本响应流（被过滤器阻塞）

Best practices

Next steps

Tags

最新内容

Content type

Content type

Tags

Tags

category

category