跳转到主要内容
Chinese, Simplified

category

部署视频监控系统的组织面临着一个关键挑战:处理连续的视频流,同时保持准确的态势感知。使用基于规则的检测或基本计算机视觉的传统监控方法经常错过重要事件或产生过多的误报,导致操作效率低下和警报疲劳。

在这篇文章中,我们将展示如何构建一个完全可部署的解决方案,该解决方案使用OpenCV、Amazon Bedrock来处理视频流,以通过Amazon Bedrock Agent进行上下文场景理解和自动响应。此解决方案扩展了使用Amazon Bedrock Agent和知识库进行文档和数据检索的自动化聊天机器人中展示的功能,其中讨论了使用Amazon基岩Agent进行文档和资料检索。在这篇文章中,我们将Amazon Bedrock Agents应用于实时视频分析和事件监控。

使用亚马逊基岩代理进行视频监控的好处

下图显示了来自不同监控场景的视频流输入示例。通过上下文场景理解,用户可以搜索特定事件。

A front door camera will capture many events throughout the day, but some are more interesting than others—having context if a package is being delivered or removed (as in the following package example) limits alerts to urgent events.

Amazon Bedrock is a fully managed service that provides access to high-performing foundation models (FMs) from leading AI companies through a single API. Using Amazon Bedrock, you can build secure, responsible generative AI applications. Amazon Bedrock Agents extends these capabilities by enabling applications to execute multi-step tasks across systems and data sources, making it ideal for complex monitoring scenarios. The solution processes video streams through these key steps:

  1. Extract frames when motion is detected from live video streams or local files.
  2. Analyze context using multimodal FMs.
  3. Make decisions using agent-based logic with configurable responses.
  4. Maintain searchable semantic memory of events.

You can build this intelligent video monitoring system using Amazon Bedrock Agents and Amazon Bedrock Knowledge Bases in an automated solution. The complete code is available in the GitHub repo.

当前视频监控系统的局限性

部署视频监控系统的组织面临着一个根本性的困境。尽管摄像机技术和存储能力取得了进步,但解释视频馈送的情报层往往还处于初级阶段。这造成了一种具有挑战性的情况,安全团队必须在监控方法上做出重大权衡。当前的视频监控解决方案通常迫使组织在以下两者之间做出选择:

  • 简单的规则可以扩展,但会产生过多的误报
  • 需要持续维护和定制的复杂规则
  • 依赖于人类注意力且不可扩展的手动监控
  • 仅处理特定场景但缺乏灵活性的点解决方案

这些权衡对有效的视频监控造成了根本性的障碍,影响了整个行业的安全性、安全性和运营效率。根据我们与客户的合作,我们确定了这些局限性带来的三个关键挑战:

警报疲劳——传统的运动检测和物体识别系统会对任何检测到的变化或识别到的物体发出警报。安全团队很快就会被正常活动的大量通知淹没。这导致在真正关键的事件发生时注意力减少,降低了安全效率,并因不断的人为验证误报而增加了运营成本。

有限的上下文理解——基于规则的系统从根本上难以进行细致的场景解释。即使是复杂的传统系统,由于缺乏上下文意识,对其监控的环境的理解也有限,因为它们无法轻松完成以下操作:

  • 区分正常行为和可疑行为
  • 了解时间模式,如每周重复发生的事件
  • 考虑环境背景,如一天中的时间或地点
  • 关联可能指示模式的多个事件

缺乏语义记忆——传统系统缺乏随着时间的推移构建和使用知识的能力。他们不能做以下事情:

  • 建立常规事件与异常事件的基线
  • 提供跨历史数据的自然语言搜索功能
  • 支持对新兴模式的推理

没有这些功能,您就无法从监控基础架构中获得累积收益,也无法执行复杂的回顾性分析。为了有效地应对这些挑战,你需要一种根本不同的方法。通过将功能模块的上下文理解能力与事件分类和响应的结构化框架相结合,您可以构建更智能的监控系统。Amazon Bedrock Agents为这种下一代方法提供了理想的平台。

解决方案概述


您可以通过使用Amazon Bedrock Agents构建视频监控解决方案来解决这些监控挑战。该系统智能地筛选事件,过滤日常活动,并升级需要人工关注的情况,有助于减少警报疲劳,同时提高检测精度。该解决方案使用亚马逊基岩代理来分析视频中检测到的运动,并在根据提供的说明发生感兴趣的事件时提醒用户。这使得系统能够智能地过滤掉可能触发运动检测的琐碎事件,如风或鸟,并将用户的注意力仅引导到感兴趣的事件上。下图说明了解决方案架构。

The solution uses three primary components to address the core challenges: agents as escalators, a video processing pipeline, and Amazon Bedrock Agents. We discuss these components in more detail in the following sections.

The solution uses the AWS Cloud Development Kit (AWS CDK) to deploy the solution components. The AWS CDK is an open source software development framework for defining cloud infrastructure as code and provisioning it through AWS CloudFormation.

自动扶梯代理

第一个组件使用Amazon Bedrock Agent来检查检测到的运动事件,具有以下功能:

  • 提供场景和活动的自然语言理解,用于上下文解释
  • 保持跨帧序列的时间感知,以了解事件进展
  • 参考历史模式来区分异常事件和常规事件
  • 对行为进行情境推理,考虑一天中的时间、地点和动作序列等因素

我们实施了一个分级响应框架,按严重程度和所需行动对事件进行分类:

  • 0级:仅记录–系统记录正常或预期的活动。例如,当送货人员在营业时间到达或被识别的车辆进入车道时,这些事件会被记录下来以供模式分析和未来参考,但不需要立即采取行动。它们仍然可以在事件历史中搜索。
  • 第1级:人工通知——该级别处理需要人工关注的异常但非关键事件。附近停放的未被识别的车辆、意外的访客或异常的移动模式会触发对安保人员的通知。这些事件需要人工验证和评估。
  • 第二级:即时响应——专为关键安全事件而设。未经授权的访问尝试、检测到烟雾或火灾或可疑行为会通过API调用触发自动响应操作。系统通过短信或电子邮件通知人员事件信息和背景。

该解决方案通过Streamlit应用程序提供交互式处理和监控界面。使用Streamlit UI,用户可以提供指令并与代理交互。

The application consists of the following key features:

  • Live stream or video file input – The application accepts M3U8 stream URLs from webcams or security feeds, or local video files in common formats (MP4, AVI). Both are processed using the same motion detection pipeline that saves triggered events to Amazon Simple Storage Service (Amazon S3) for agent analysis.
  • Custom instructions – Users can provide specific monitoring guidance, such as “Alert me about unknown individuals near the loading dock after hours” or “Focus on vehicle activity in the parking area.” These instructions adjust how the agent interprets detected motion events.
  • Notification configuration – Users can specify contact information for different alert levels. The system uses Amazon Simple Notification Service (Amazon SNS) to send emails or text messages based on event severity, so different personnel can be notified for potential issues vs. critical situations.
  • Natural language queries about past events – The interface includes a chat component for historical event retrieval. Users can ask “What vehicles have been in the driveway this week?” or “Show me any suspicious activity from last night,” receiving responses based on the system’s event memory.

Video processing pipeline

The solution uses several AWS services to capture and prepare video data for analysis through a modular processing pipeline. The solution supports multiple types of video sources:

When using streams, OpenCV’s VideoCapture component handles the connection and frame extraction. For testing, we’ve included sample event videos demonstrating different scenarios. The core of the video processing is a modular pipeline implemented in Python. Key components include:

  • SimpleMotionDetection – Identifies movement in the video feed
  • FrameSampling – Captures sequences of frames over time when motion is detected
  • GridAggregator – Organizes multiple frames into a visual grid for context
  • S3Storage – Stores captured frame sequences in Amazon S3

This multi-process framework optimizes performance by running components concurrently and maintaining a queue of frames to process. The video processing pipeline organizes captured frame data in a structured way before passing it to the Amazon Bedrock agent for analysis:

  • Frame sequence storage – When motion is detected, the system captures a sequence of frames over 10 seconds. These frames are stored in Amazon S3 using a timestamp-based path structure (YYYYMMDD-HHMMSS) that allows for efficient retrieval by date and time. In the case where motions exceed 10 seconds, multiple events are created.
  • Image grid format – Rather than processing individual frames separately, the system arranges multiple sequential frames into a grid format (typically 3×4 or 4×5). This presentation provides temporal context and is sent to the Amazon Bedrock agent for analysis. The grid format enables understanding of how motion progresses over time, which is critical for accurate scene interpretation.

The following figure is an example of an image grid sent to the agent. Package theft is difficult to identify with classic image models. The large language model’s (LLM’s) ability to reason over a sequence of image allows it to make observations about intent.

The video processing pipeline’s output—timestamped frame grids stored in Amazon S3—serves as the input for the Amazon Bedrock agent components, which we discuss in the next section.

Amazon Bedrock agent components

The solution integrates multiple Amazon Bedrock services to create an intelligent analysis system:

  • Core agent architecture – The agent orchestrates these key workflows:
    • Receives frame grids from Amazon S3 on motion detection
    • Coordinates multi-step analysis processes
    • Makes classification decisions
    • Triggers appropriate response actions
    • Maintains event context and state
  • Knowledge management – The solution uses Amazon Bedrock Knowledge Bases with Amazon OpenSearch Serverless to:
    • Store and index historical events
    • Build baseline activity patterns
    • Enable natural language querying
    • Track temporal patterns
    • Support contextual analysis
  • Action groups – The agent has access to several actions defined through API schemas:
    • Analyze grid – Process incoming frame grids from Amazon S3
    • Alert – Send notifications through Amazon SNS based on severity
    • Log – Record event details for future reference
    • Search events by date – Retrieve past events based on a date range
    • Look up vehicle (Text-to-SQL) – Query the vehicle database for information

For structured data queries, the system uses the FM’s ability to convert natural language to SQL. This enables the following:

  • Querying Amazon Athena tables containing event records
  • Retrieving information about registered vehicles
  • Generating reports from structured event data

These components work together to create a comprehensive system that can analyze video content, maintain event history, and support both real-time alerting and retrospective analysis through natural language interaction.

Video processing framework

The video processing framework implements a multi-process architecture for handling video streams through composable processing chains.

Modular pipeline architecture

The framework uses a composition-based approach built around the FrameProcessor abstract base class.

Processing components implement a consistent interface with a process(frame) method that takes a Frame and returns a potentially modified Frame:

```
class FrameProcessor(ABC):
    @abstractmethod
    def process(self, frame: Frame) -> Optional[Frame]: ...
```

The Frame class encapsulates the image data along with timestamps, indexes, and extensible metadata:

```
@dataclass
class Frame:
    buffer: ndarray  # OpenCV image array
    timestamp: float
    index: float
    fps: float
    metadata: dict = field(default_factory=dict)
```

Customizable processing chains

The architecture supports configuring multiple processing chains that can be connected in sequence. The solution uses two primary chains. The detection and analysis chain processes incoming video frames to identify events of interest:

```
chain = FrameProcessorChain([
    SimpleMotionDetection(motion_threshold=10_000, frame_skip_size=1),
    FrameSampling(timedelta(milliseconds=250), threshold_time=timedelta(seconds=2)),
    GridAggregator(shape=(13, 3))
])
```

The storage and notification chain handles the storage of identified events and invocation of the agent:

```
storage_chain = FrameProcessorChain([
    S3Storage(bucket_name=TARGET_S3_BUCKET, prefix=S3_PREFIX, s3_client_provider=s3_client_provider),
    LambdaProcessor(get_response=get_response, monitoring_instructions=config.monitoring_instructions)
])
```

You can modify these changes independently to add or replace components based on specific monitoring requirements.

Component implementation

The solution includes several processing components that demonstrate the framework’s capabilities. You can modify each processing step or add new ones. For example, for simple motion detection, we use a simple pixel difference, but you can refine the motion detection functionality as needed, or follow the format to implement other detection algorithms, such as object detection or scene segmentation.

Additional components include the FrameSampling processor to control capture timing, GridAggregator to create visual frame grids, and storage processors that save event data and trigger agent analysis, and these can be customized and replaced as needed. For example:

  • Modify existing components – Adjust thresholds or parameters to tune for specific environments
  • Create alternative storage backends – Direct output to different storage services or databases
  • Implement preprocessing and postprocessing steps – Add image enhancement, data filtering, or additional context generation

Finally, the LambdaProcessor serves as the bridge to the Amazon Bedrock agent by invoking an AWS Lambda function that sends the information in a request to the deployed agent. From there, the Amazon Bedrock agent takes over and analyzes the event and takes action accordingly.

Agent implementation

After you deploy the solution, an Amazon Bedrock agent alias becomes available. This agent functions as an intelligent analysis layer, processing captured video events and executing appropriate actions based on its analysis. You can test the agent and view its reasoning trace directly on the Amazon Bedrock console, as shown in the following screenshot.

This agent will lack some of the metadata supplied by the Streamlit application (such as current time) and might not give the same answers as the full application.

Invocation flow

The agent is invoked through a Lambda function that handles the request-response cycle and manages session state. It finds the highest published version ID and uses it to invoke the agent and parses the response.

Action groups

The agent’s capabilities are defined through action groups implemented using the BedrockAgentResolver framework. This approach automatically generates the OpenAPI schema required by the agent.

When the agent is invoked, it receives an event object that includes the API path and other parameters that inform the agent framework how to route the request to the appropriate handler based. You can add new actions by defining additional endpoint handlers following the same pattern and generating a new OpenAPI schema:

```
if __name__ == "__main__":
    print(app.get_openapi_json_schema())
```

Text-to-SQL integration

Through its action group, the agent is able to translate natural language queries into SQL for structured data analysis. The system reads data from assets/data_query_data_source, which can include various formats like CSV, JSON, ORC, or Parquet.

This capability enables users to query structured data using natural language. As demonstrated in the following example, the system translates natural language queries about vehicles into SQL, returning structured information from the database.

The database connection is configured through a SQL Alchemy engine. Users can connect to existing databases by updating the create_sql_engine() function to use their connection parameters.

Event memory and semantic search

The agent maintains a detailed memory of past events, storing event logs with rich descriptions in Amazon S3. These events become searchable through both vector-based semantic search and date-based filtering. As shown in the following example, temporal queries make it possible to retrieve information about events within specific time periods, such as vehicles observed in the past 72 hours.

The system’s semantic memory capabilities enable queries based on abstract concepts and natural language descriptions. As shown in the following example, the agent can understand abstract concepts like “funny” and retrieve relevant events, such as a person dropping a birthday cake.

Events can be linked together by the agent to identify patterns or related incidents. For example, the system can correlate separate sightings of individuals with similar characteristics. In the following screenshots, the agent connects related incidents by identifying common attributes like clothing items across different events.

This event memory store allows the system to build knowledge over time, providing increasingly valuable insights as it accumulates data. The combination of structured database querying and semantic search across event descriptions creates an agent with a searchable memory of all past events.

Prerequisites

Before you deploy the solution, complete the following prerequisites:

  1. Configure AWS credentials using aws configure. Use either the us-west-2 or us-east-1 AWS Region.
  2. Enable access to Anthropic’s Claude 3.x models, or another supported Amazon Bedrock Agents model you want to use.
  3. Make sure you have the following dependencies:

Deploy the solution

The AWS CDK deployment creates the following resources:

  • Storage – S3 buckets for assets and query results
  • Amazon Bedrock resources – Agent and knowledge base
  • Compute – Lambda functions for actions, invocation, and updates
  • Database – Athena database for structured queries, and an AWS Glue crawler for data discovery

Deploy the solution with the following commands:

```
#1. Clone the repository and navigate to folder
git clone https://github.com/aws-samples/sample-video-monitoring-agent.git && cd sample-video-monitoring-agent
#2. Set up environment and install dependencies
python3 -m venv .venv && source .venv/bin/activate && pip install -r requirements.txt
#3. Deploy AWS resources
cdk bootstrap && cdk deploy
#4. Run the streamlit app
cd code/streamlit_app && streamlit run app.py
```

On Windows, replace the second line with the following code:

```
python3 -m venv .venv && % .venv\Scripts\activate.bat && pip install -r requirements.txt
```

Clean up

To destroy the resources you created and stop incurring charges, run the following command:

```
cdk destroy
```

Future enhancements

The current implementation demonstrates the potential of agent-based video monitoring in a home security setting, but there are many potential applications.

Sample Use Cases

The following showcases the application of the solution to various scenarios.

Small business

{ “alert_level”: 0, “timestamp”: “2024-11-20T15:24:15Z”, “reason”: “Vehicle arrival in driveway”, “description”: ”Standard vehicle arrival and parking sequence. Vehicles present: Black Nissan Frontier pickup (parked), silver Honda CR-V (arriving), 

and partial view of blue vehicle in foreground. Area features: Gravel driveway surface, two waste bins (County Waste and recycling), evergreen trees in background. Sequence shows Honda CR-V executing normal parking maneuver: approaches 

from east, performs standard three-point turn, achieves final position next to pickup truck. Daytime conditions, clear visibility. Vehicle condition: Clean, well-maintained CR-V appears to be 2012-2016 model year, no visible damage or unusual

 modifications. Movement pattern indicates familiar driver performing routine parking. No suspicious behavior or safety concerns observed. Timestamp indicates standard afternoon arrival time. Waste bins properly positioned 

and undisturbed during parking maneuver.” }

Industrial

{ “alert_level”: 2, “timestamp”: “2024-11-20T15:24:15Z”, “reason”: “Warehouse product spill/safety hazard”,”description”: ”Significant product spill incident in warehouse storage aisle. Location: Main warehouse aisle between high-bay 

racking systems containing boxed inventory. Sequence shows what appears to be liquid or container spill, likely water/beverage products based on blue colored containers visible. Infrastructure: Professional warehouse setup with

 multi-level blue metal racking, concrete flooring, overhead lighting. Incident progression: Initial frames show clean aisle, followed by product falling/tumbling, resulting in widespread dispersal of items across aisle floor. Hazard 

assessment: Creates immediate slip/trip hazard, blocks emergency egress path, potential damage to inventory. Area impact: Approximately 15-20 feet of aisle space affected. Facility type appears to be distribution center or 

storage warehouse. Multiple cardboard boxes visible on surrounding shelves potentially at risk from liquid damage.” }

Backyard

 

{ “alert_level”: 1, “timestamp”: “2024-11-20T15:24:15Z”, “reason”: “Wildlife detected on property”, “description”: ”Adult raccoon observed investigating porch/deck area with white railings. Night vision/IR camera provides clear 

footage of animal. Subject animal characteristics: medium-sized adult raccoon, distinctive facial markings clearly visible, healthy coat condition, normal movement patterns. Sequence shows animal approaching camera 

(15:42PM), investigating area near railing (15:43-15:44PM), with close facial examination (15:45PM). Final frame shows partial view as animal moves away. Environment: Location appears to be elevated deck/porch 

with white painted wooden railings and balusters. Lighting conditions: Nighttime, camera operating in infrared/night vision mode providing clear black and white footage. Animal behavior appears to be normal 

nocturnal exploration, no signs of aggression or disease.” }

Home safety

{ “alert_level”: 2, “timestamp”: “2024-11-20T15:24:15Z”, “reason”: “Smoke/possible fire detected”, “description”: ”Rapid development of white/grey smoke visible in living room area. Smoke appears to be originating from left side of

 frame, possibly near electronics/TV area. Room features: red/salmon colored walls, grey couch, illuminated aquarium, table lamps, framed artwork. Sequence shows progressive smoke accumulation over 4-second span 

(15:42PM – 15:46PM).Notable smoke density increase in upper left corner of frame with potential light diffusion indicating particulate matter in air. Smoke pattern suggests active fire development rather than residual smoke. 

Blue light from aquarium remains visible throughout sequence providing contrast reference for smoke density.”

进一步扩展

此外,您可以使用以下方法扩展FM功能:

  • 针对特定监控环境进行微调——调整模型以识别特定领域的对象、行为和场景
  • 针对特定用例的优化提示——创建专门的说明,优化代理在工业设施、零售空间或住宅环境等特定环境中的性能

您可以扩展代理采取行动的能力,例如:

  • 直接控制智能家居和智能建筑系统——与物联网(IoT)设备API集成,以控制灯光、锁或报警系统
  • 与安全和安全协议集成——连接到现有的安全基础设施以遵循既定程序
  • 自动响应工作流——创建可由特定事件触发的多步骤操作序列

您还可以考虑增强事件记忆系统:

  • 长期模式识别——识别长时间内重复出现的模式
  • 跨摄像机相关性——将多个摄像机的观测结果联系起来,以跟踪空间中的运动
  • 基于历史模式的异常检测——自动识别与既定基线的偏差
  • 最后,考虑将监控功能扩展到固定摄像头之外:
  • 监控机器人视觉系统——将相同的智能应用于巡逻或检查区域的移动机器人
  • 基于无人机的监控——处理航拍镜头以进行全面的现场监控
  • 移动安全应用程序——扩展平台以处理来自安保人员随身摄像头或移动设备的信息

这些增强功能可以将系统从被动监控工具转变为安全操作的积极参与者,对正常模式和异常事件有越来越深入的理解。

结论

将代理用作自动扶梯的方法代表了视频监控的重大进步,它利用了功能模块的上下文理解能力和亚马逊基岩代理的面向行动的框架。通过从噪声中过滤信号,该解决方案解决了警报疲劳的关键问题,同时增强了安全和安全监控能力。使用此解决方案,您可以:

  • 在保持高检测灵敏度的同时减少误报
  • 提供人类可读的事件描述和分类
  • 维护所有活动的可搜索记录
  • 在没有相应人力资源的情况下扩展监控能力

智能筛查、分级反应和语义记忆的结合使监测系统更有效、更高效,从而增强了人类的能力,而不是取代它们。立即尝试该解决方案,体验亚马逊基岩代理如何将您的视频监控功能从简单的运动检测转变为智能场景理解。

本文地址
最后修改
星期一, 九月 22, 2025 - 16:22
Tags
 
Article