【智能体方案】智能体作为自动扶梯：利用Amazon Bedrock

语言 Chinese, Simplified

SEO Title

Agents as escalators: Real-time AI video monitoring with Amazon Bedrock Agents and video streams

使用亚马逊Bedrock 智能体进行视频监控的好处

下图显示了来自不同监控场景的视频流输入示例。通过上下文场景理解，用户可以搜索特定事件。

前门摄像头会捕捉到一整天的许多事件，但有些事件比其他事件更有趣——如果有包裹正在递送或被取走（如下面的包裹示例所示）的上下文，就能将警报限制在紧急事件范围内。

Amazon Bedrock 是一项完全托管的服务，通过单一API提供对领先AI公司的高性能基础模型（FM）的访问。借助Amazon Bedrock，您可以构建安全、负责任的生成式AI应用程序。Amazon Bedrock Agents通过使应用程序能够在系统和数据源之间执行多步骤任务，扩展了这些功能，使其成为复杂监控场景的理想选择。该解决方案通过以下关键步骤处理视频流：

当从实时视频流或本地文件中检测到运动时，提取帧。
使用多模态特征模型（FMs）分析上下文。
使用基于代理的逻辑和可配置的响应来做出决策。
保持对事件的可搜索语义记忆。

您可以使用Amazon Bedrock Agents和Amazon Bedrock Knowledge Bases，通过自动化解决方案来构建这个智能视频监控系统。完整的代码已在GitHub仓库中提供。

当前视频监控系统的局限性

部署视频监控系统的组织面临着一个根本性的困境。尽管摄像机技术和存储能力取得了进步，但解释视频馈送的情报层往往还处于初级阶段。这造成了一种具有挑战性的情况，安全团队必须在监控方法上做出重大权衡。当前的视频监控解决方案通常迫使组织在以下两者之间做出选择：

简单的规则可以扩展，但会产生过多的误报
需要持续维护和定制的复杂规则
依赖于人类注意力且不可扩展的手动监控
仅处理特定场景但缺乏灵活性的点解决方案

这些权衡对有效的视频监控造成了根本性的障碍，影响了整个行业的安全性、安全性和运营效率。根据我们与客户的合作，我们确定了这些局限性带来的三个关键挑战：

警报疲劳——传统的运动检测和物体识别系统会对任何检测到的变化或识别到的物体发出警报。安全团队很快就会被正常活动的大量通知淹没。这导致在真正关键的事件发生时注意力减少，降低了安全效率，并因不断的人为验证误报而增加了运营成本。

有限的上下文理解——基于规则的系统从根本上难以进行细致的场景解释。即使是复杂的传统系统，由于缺乏上下文意识，对其监控的环境的理解也有限，因为它们无法轻松完成以下操作：

区分正常行为和可疑行为
了解时间模式，如每周重复发生的事件
考虑环境背景，如一天中的时间或地点
关联可能指示模式的多个事件

缺乏语义记忆——传统系统缺乏随着时间的推移构建和使用知识的能力。他们不能做以下事情：

建立常规事件与异常事件的基线
提供跨历史数据的自然语言搜索功能
支持对新兴模式的推理

没有这些功能，您就无法从监控基础架构中获得累积收益，也无法执行复杂的回顾性分析。为了有效地应对这些挑战，你需要一种根本不同的方法。通过将功能模块的上下文理解能力与事件分类和响应的结构化框架相结合，您可以构建更智能的监控系统。Amazon Bedrock Agents为这种下一代方法提供了理想的平台。

解决方案概述

您可以通过使用Amazon Bedrock Agents构建视频监控解决方案来解决这些监控挑战。该系统智能地筛选事件，过滤日常活动，并升级需要人工关注的情况，有助于减少警报疲劳，同时提高检测精度。该解决方案使用亚马逊基岩智能体来分析视频中检测到的运动，并在根据提供的说明发生感兴趣的事件时提醒用户。这使得系统能够智能地过滤掉可能触发运动检测的琐碎事件，如风或鸟，并将用户的注意力仅引导到感兴趣的事件上。下图说明了解决方案架构。

该解决方案利用三个主要组件来应对核心挑战：作为升级工具的代理、视频处理流程以及Amazon Bedrock Agents。我们将在以下部分对这些组件进行更详细的讨论。

该解决方案使用AWS云开发工具包（AWS CDK）来部署解决方案组件。AWS CDK是一个开源软件开发框架，用于将云基础设施定义为代码，并通过AWS CloudFormation进行配置。

自动扶梯智能体

第一个组件使用Amazon Bedrock Agent来检查检测到的运动事件，具有以下功能：

提供场景和活动的自然语言理解，用于上下文解释
保持跨帧序列的时间感知，以了解事件进展
参考历史模式来区分异常事件和常规事件
对行为进行情境推理，考虑一天中的时间、地点和动作序列等因素

我们实施了一个分级响应框架，按严重程度和所需行动对事件进行分类：

0级：仅记录–系统记录正常或预期的活动。例如，当送货人员在营业时间到达或被识别的车辆进入车道时，这些事件会被记录下来以供模式分析和未来参考，但不需要立即采取行动。它们仍然可以在事件历史中搜索。
第1级：人工通知——该级别处理需要人工关注的异常但非关键事件。附近停放的未被识别的车辆、意外的访客或异常的移动模式会触发对安保人员的通知。这些事件需要人工验证和评估。
第二级：即时响应——专为关键安全事件而设。未经授权的访问尝试、检测到烟雾或火灾或可疑行为会通过API调用触发自动响应操作。系统通过短信或电子邮件通知人员事件信息和背景。

该解决方案通过Streamlit应用程序提供交互式处理和监控界面。使用Streamlit UI，用户可以提供指令并与智能体交互。

该应用程序包含以下主要功能：

直播流或视频文件输入——该应用程序接受来自网络摄像头或安全监控源的M3U8流URL，或常见格式（MP4、AVI）的本地视频文件。两者均使用相同的运动检测流程进行处理，并将触发的事件保存到亚马逊简单存储服务（Amazon S3）中，以供代理分析。
自定义指令——用户可以提供特定的监控指导，例如“下班后若装卸码头附近出现不明身份人员，请提醒我”或“重点关注停车区的车辆活动”。这些指令会调整代理程序对检测到的运动事件的解读方式。
通知配置——用户可以为不同的警报级别指定联系信息。系统使用亚马逊简单通知服务（Amazon SNS）根据事件严重程度发送电子邮件或短信，以便针对潜在问题和紧急情况通知不同的人员。
关于过去事件的自然语言查询——该界面包含一个用于历史事件检索的聊天组件。用户可以询问“本周有哪些车辆经过车道？”或“显示昨晚的所有可疑活动”，系统会根据其事件记忆给出回复。

视频处理流水线

该解决方案利用多个AWS服务，通过模块化处理流程来捕获和准备视频数据以供分析。该方案支持多种类型的视频源：

直播视频流（M3U8格式）
本地视频文件（mp4、avi等）
亚马逊Kinesis视频流（Kinesis视频URL）
本地摄像头

在使用流时，OpenCV的VideoCapture组件负责处理连接和帧提取。为了测试，我们提供了演示不同场景的示例事件视频。视频处理的核心是一个用Python实现的模块化管道。关键组件包括：

简单运动检测——识别视频中的运动
帧采样——在检测到运动时，捕获随时间变化的帧序列
GridAggregator——将多个框架组织成一个可视化网格，以便上下文查看
S3Storage – 将捕获的帧序列存储在Amazon S3中

这个多进程框架通过并发运行组件并维护一个待处理帧队列来优化性能。视频处理管道以结构化的方式组织捕获的帧数据，然后将其传递给Amazon Bedrock代理进行分析：

帧序列存储——当检测到运动时，系统会捕获10秒内的帧序列。这些帧会使用基于时间戳的路径结构（YYYYMMDD-HHMMSS）存储在Amazon S3中，以便按日期和时间高效检索。如果运动持续时间超过10秒，则会创建多个事件。
图像网格格式——系统不是单独处理单个帧，而是将多个连续帧排列成网格格式（通常为3×4或4×5）。这种呈现方式提供了时间上的上下文，并被发送至亚马逊基础代理（Amazon Bedrock agent）进行分析。网格格式有助于理解运动随时间的变化过程，这对于准确的场景解读至关重要。

下图是发送给代理的图像网格示例。使用传统图像模型很难识别包裹盗窃。大型语言模型（LLM）能够对图像序列进行推理，从而对其意图进行观察。

The video processing pipeline’s output—timestamped frame grids stored in Amazon S3—serves as the input for the Amazon Bedrock agent components, which we discuss in the next section.

Amazon Bedrock agent components

The solution integrates multiple Amazon Bedrock services to create an intelligent analysis system:

Core agent architecture – The agent orchestrates these key workflows:
- Receives frame grids from Amazon S3 on motion detection
- Coordinates multi-step analysis processes
- Makes classification decisions
- Triggers appropriate response actions
- Maintains event context and state
Knowledge management – The solution uses Amazon Bedrock Knowledge Bases with Amazon OpenSearch Serverless to:
- Store and index historical events
- Build baseline activity patterns
- Enable natural language querying
- Track temporal patterns
- Support contextual analysis
Action groups – The agent has access to several actions defined through API schemas:
- Analyze grid – Process incoming frame grids from Amazon S3
- Alert – Send notifications through Amazon SNS based on severity
- Log – Record event details for future reference
- Search events by date – Retrieve past events based on a date range
- Look up vehicle (Text-to-SQL) – Query the vehicle database for information

For structured data queries, the system uses the FM’s ability to convert natural language to SQL. This enables the following:

Querying Amazon Athena tables containing event records
Retrieving information about registered vehicles
Generating reports from structured event data

These components work together to create a comprehensive system that can analyze video content, maintain event history, and support both real-time alerting and retrospective analysis through natural language interaction.

Video processing framework

The video processing framework implements a multi-process architecture for handling video streams through composable processing chains.

Modular pipeline architecture

The framework uses a composition-based approach built around the FrameProcessor abstract base class.

Processing components implement a consistent interface with a process(frame) method that takes a Frame and returns a potentially modified Frame:

```
class FrameProcessor(ABC):
    @abstractmethod
    def process(self, frame: Frame) -> Optional[Frame]: ...
```

The Frame class encapsulates the image data along with timestamps, indexes, and extensible metadata:

```
@dataclass
class Frame:
    buffer: ndarray  # OpenCV image array
    timestamp: float
    index: float
    fps: float
    metadata: dict = field(default_factory=dict)
```

Customizable processing chains

The architecture supports configuring multiple processing chains that can be connected in sequence. The solution uses two primary chains. The detection and analysis chain processes incoming video frames to identify events of interest:

```
chain = FrameProcessorChain([
    SimpleMotionDetection(motion_threshold=10_000, frame_skip_size=1),
    FrameSampling(timedelta(milliseconds=250), threshold_time=timedelta(seconds=2)),
    GridAggregator(shape=(13, 3))
])
```

The storage and notification chain handles the storage of identified events and invocation of the agent:

```
storage_chain = FrameProcessorChain([
    S3Storage(bucket_name=TARGET_S3_BUCKET, prefix=S3_PREFIX, s3_client_provider=s3_client_provider),
    LambdaProcessor(get_response=get_response, monitoring_instructions=config.monitoring_instructions)
])
```

您可以根据特定的监控需求，独立修改这些更改以添加或替换组件。

组件实现

该解决方案包含多个处理组件，展示了框架的功能。您可以修改每个处理步骤或添加新的步骤。例如，对于简单的运动检测，我们使用简单的像素差异方法，但您可以根据需要优化运动检测功能，或按照格式实现其他检测算法，如目标检测或场景分割。

其他组件包括用于控制捕获时序的FrameSampling处理器、用于创建视觉帧网格的GridAggregator，以及用于保存事件数据和触发代理分析的存储处理器，这些组件可以根据需要进行定制和替换。例如：

修改现有组件——调整阈值或参数以适应特定环境
创建替代存储后端——直接输出到不同的存储服务或数据库
实施预处理和后处理步骤——增加图像增强、数据过滤或额外的上下文生成

最后，LambdaProcessor通过调用AWS Lambda函数，将请求中的信息发送至已部署的代理，从而充当与Amazon Bedrock代理之间的桥梁。在此之后，Amazon Bedrock代理将接管并分析事件，并据此采取行动。

智能体实现

部署解决方案后，Amazon Bedrock代理别名即可使用。该代理作为智能分析层，处理捕获的视频事件，并根据其分析执行适当的操作。您可以直接在Amazon Bedrock控制台上测试该代理并查看其推理轨迹，如下图所示。

此代理将缺少Streamlit应用程序提供的一些元数据（如当前时间），并且可能无法给出与完整应用程序相同的答案。

调用流程

该代理通过一个Lambda函数调用，该函数负责处理请求-响应周期并管理会话状态。它找到已发布的最高版本ID，并使用该ID来调用代理并解析响应。

行动小组

代理的能力是通过使用BedrockAgentResolver框架实现的动作组来定义的。这种方法会自动生成代理所需的OpenAPI模式。

当代理被调用时，它会接收到一个事件对象，该对象包含API路径和其他参数，这些参数会告知代理框架如何将请求路由到相应的处理程序。您可以通过遵循相同的模式定义额外的端点处理程序并生成新的OpenAPI模式来添加新操作：

```
if __name__ == "__main__":
    print(app.get_openapi_json_schema())
```

Text-to-SQL integration

Through its action group, the agent is able to translate natural language queries into SQL for structured data analysis. The system reads data from assets/data_query_data_source, which can include various formats like CSV, JSON, ORC, or Parquet.

This capability enables users to query structured data using natural language. As demonstrated in the following example, the system translates natural language queries about vehicles into SQL, returning structured information from the database.

The database connection is configured through a SQL Alchemy engine. Users can connect to existing databases by updating the create_sql_engine() function to use their connection parameters.

Event memory and semantic search

The agent maintains a detailed memory of past events, storing event logs with rich descriptions in Amazon S3. These events become searchable through both vector-based semantic search and date-based filtering. As shown in the following example, temporal queries make it possible to retrieve information about events within specific time periods, such as vehicles observed in the past 72 hours.

The system’s semantic memory capabilities enable queries based on abstract concepts and natural language descriptions. As shown in the following example, the agent can understand abstract concepts like “funny” and retrieve relevant events, such as a person dropping a birthday cake.

Events can be linked together by the agent to identify patterns or related incidents. For example, the system can correlate separate sightings of individuals with similar characteristics. In the following screenshots, the agent connects related incidents by identifying common attributes like clothing items across different events.

This event memory store allows the system to build knowledge over time, providing increasingly valuable insights as it accumulates data. The combination of structured database querying and semantic search across event descriptions creates an agent with a searchable memory of all past events.

Prerequisites

Before you deploy the solution, complete the following prerequisites:

Configure AWS credentials using aws configure. Use either the us-west-2 or us-east-1 AWS Region.
Enable access to Anthropic’s Claude 3.x models, or another supported Amazon Bedrock Agents model you want to use.
Make sure you have the following dependencies:
- Python
- AWS CDK
- AWS Command Line Interface (AWS CLI)
- Git
- Node.js
- Docker

Deploy the solution

The AWS CDK deployment creates the following resources:

Storage – S3 buckets for assets and query results
Amazon Bedrock resources – Agent and knowledge base
Compute – Lambda functions for actions, invocation, and updates
Database – Athena database for structured queries, and an AWS Glue crawler for data discovery

Deploy the solution with the following commands:

```
#1. Clone the repository and navigate to folder
git clone https://github.com/aws-samples/sample-video-monitoring-agent.git && cd sample-video-monitoring-agent
#2. Set up environment and install dependencies
python3 -m venv .venv && source .venv/bin/activate && pip install -r requirements.txt
#3. Deploy AWS resources
cdk bootstrap && cdk deploy
#4. Run the streamlit app
cd code/streamlit_app && streamlit run app.py
```

On Windows, replace the second line with the following code:

```
python3 -m venv .venv && % .venv\Scripts\activate.bat && pip install -r requirements.txt
```

Clean up

To destroy the resources you created and stop incurring charges, run the following command:

```
cdk destroy
```

Future enhancements

The current implementation demonstrates the potential of agent-based video monitoring in a home security setting, but there are many potential applications.

Sample Use Cases

The following showcases the application of the solution to various scenarios.

Small business

{ “alert_level”: 0, “timestamp”: “2024-11-20T15:24:15Z”, “reason”: “Vehicle arrival in driveway”, “description”: ”Standard vehicle arrival and parking sequence. Vehicles present: Black Nissan Frontier pickup (parked), silver Honda CR-V (arriving),

and partial view of blue vehicle in foreground. Area features: Gravel driveway surface, two waste bins (County Waste and recycling), evergreen trees in background. Sequence shows Honda CR-V executing normal parking maneuver: approaches

from east, performs standard three-point turn, achieves final position next to pickup truck. Daytime conditions, clear visibility. Vehicle condition: Clean, well-maintained CR-V appears to be 2012-2016 model year, no visible damage or unusual

modifications. Movement pattern indicates familiar driver performing routine parking. No suspicious behavior or safety concerns observed. Timestamp indicates standard afternoon arrival time. Waste bins properly positioned

and undisturbed during parking maneuver.” }

Industrial

{ “alert_level”: 2, “timestamp”: “2024-11-20T15:24:15Z”, “reason”: “Warehouse product spill/safety hazard”,”description”: ”Significant product spill incident in warehouse storage aisle. Location: Main warehouse aisle between high-bay

racking systems containing boxed inventory. Sequence shows what appears to be liquid or container spill, likely water/beverage products based on blue colored containers visible. Infrastructure: Professional warehouse setup with

multi-level blue metal racking, concrete flooring, overhead lighting. Incident progression: Initial frames show clean aisle, followed by product falling/tumbling, resulting in widespread dispersal of items across aisle floor. Hazard

assessment: Creates immediate slip/trip hazard, blocks emergency egress path, potential damage to inventory. Area impact: Approximately 15-20 feet of aisle space affected. Facility type appears to be distribution center or

storage warehouse. Multiple cardboard boxes visible on surrounding shelves potentially at risk from liquid damage.” }

Backyard

{ “alert_level”: 1, “timestamp”: “2024-11-20T15:24:15Z”, “reason”: “Wildlife detected on property”, “description”: ”Adult raccoon observed investigating porch/deck area with white railings. Night vision/IR camera provides clear

footage of animal. Subject animal characteristics: medium-sized adult raccoon, distinctive facial markings clearly visible, healthy coat condition, normal movement patterns. Sequence shows animal approaching camera

(15:42PM), investigating area near railing (15:43-15:44PM), with close facial examination (15:45PM). Final frame shows partial view as animal moves away. Environment: Location appears to be elevated deck/porch

with white painted wooden railings and balusters. Lighting conditions: Nighttime, camera operating in infrared/night vision mode providing clear black and white footage. Animal behavior appears to be normal

nocturnal exploration, no signs of aggression or disease.” }

Home safety

{ “alert_level”: 2, “timestamp”: “2024-11-20T15:24:15Z”, “reason”: “Smoke/possible fire detected”, “description”: ”Rapid development of white/grey smoke visible in living room area. Smoke appears to be originating from left side of

frame, possibly near electronics/TV area. Room features: red/salmon colored walls, grey couch, illuminated aquarium, table lamps, framed artwork. Sequence shows progressive smoke accumulation over 4-second span

(15:42PM – 15:46PM).Notable smoke density increase in upper left corner of frame with potential light diffusion indicating particulate matter in air. Smoke pattern suggests active fire development rather than residual smoke.

Blue light from aquarium remains visible throughout sequence providing contrast reference for smoke density.”

进一步扩展

此外，您可以使用以下方法扩展FM功能：

针对特定监控环境进行微调——调整模型以识别特定领域的对象、行为和场景
针对特定用例的优化提示——创建专门的说明，优化智能体在工业设施、零售空间或住宅环境等特定环境中的性能

您可以扩展智能体采取行动的能力，例如：

直接控制智能家居和智能建筑系统——与物联网（IoT）设备API集成，以控制灯光、锁或报警系统
与安全和安全协议集成——连接到现有的安全基础设施以遵循既定程序
自动响应工作流——创建可由特定事件触发的多步骤操作序列

您还可以考虑增强事件记忆系统：

长期模式识别——识别长时间内重复出现的模式
跨摄像机相关性——将多个摄像机的观测结果联系起来，以跟踪空间中的运动
基于历史模式的异常检测——自动识别与既定基线的偏差
最后，考虑将监控功能扩展到固定摄像头之外：
监控机器人视觉系统——将相同的智能应用于巡逻或检查区域的移动机器人
基于无人机的监控——处理航拍镜头以进行全面的现场监控
移动安全应用程序——扩展平台以处理来自安保人员随身摄像头或移动设备的信息

这些增强功能可以将系统从被动监控工具转变为安全操作的积极参与者，对正常模式和异常事件有越来越深入的理解。

结论

将智能体用作自动扶梯的方法代表了视频监控的重大进步，它利用了功能模块的上下文理解能力和亚马逊基岩智能体的面向行动的框架。通过从噪声中过滤信号，该解决方案解决了警报疲劳的关键问题，同时增强了安全和安全监控能力。使用此解决方案，您可以：

在保持高检测灵敏度的同时减少误报
提供人类可读的事件描述和分类
维护所有活动的可搜索记录
在没有相应人力资源的情况下扩展监控能力

智能筛查、分级反应和语义记忆的结合使监测系统更有效、更高效，从而增强了人类的能力，而不是取代它们。立即尝试该解决方案，体验亚马逊基岩智能体如何将您的视频监控功能从简单的运动检测转变为智能场景理解。

本文地址

https://architect.pub/agents-escalators-real-time-ai-video-monitoring-amazon-bedrock-agents-and-video-streams

登录发表评论
24 次浏览

发布日期

星期一, 九月 22, 2025 - 16:22

最后修改

星期六, 十月 11, 2025 - 17:59

【智能体方案】智能体作为自动扶梯：利用Amazon Bedrock Agents和视频流进行实时AI视频监控

category

使用亚马逊Bedrock 智能体进行视频监控的好处

当前视频监控系统的局限性

解决方案概述

自动扶梯智能体

视频处理流水线

Amazon Bedrock agent components

Video processing framework

Modular pipeline architecture

Customizable processing chains

组件实现

智能体实现

调用流程

行动小组

Text-to-SQL integration

Event memory and semantic search

Prerequisites

Deploy the solution

Clean up

Future enhancements

Sample Use Cases

Small business

Industrial

Backyard

Home safety

结论

Tags

最新内容

Content type

Content type

Tags

Tags

category

category

Tags