AI Vision Feast: Object Detection Delight

Main Dish (Idea):

This recipe creates an automated object detection system. It takes a folder of images as input, processes them using a pre-trained YOLO model served via a FastAPI endpoint, and then outputs the bounding boxes, labels, and confidence scores for each detected object, along with showing the image with the boudning boxes.

Ingredients (Concepts & Components):

Freshly captured frames: Images from a folder (frames directory). These are the visual input for our dish.
FastAPI Marinade: A FastAPI application to create an API endpoint, serving as the communication layer.
YOLO Reduction: A pre-trained YOLO model (yolo11s.pt) acts as the primary “flavoring” – the object detection algorithm.
Image Processing Glaze: PIL (Pillow) and NumPy are used to convert and prepare the images for the YOLO model.
OpenCv Garnish: Displays the image with bounding boxes.
Requests Digest: Used for sending HTTP requests to the API endpoint.
JSON Reduction: JSON for structuring and returning the processed detection data.

Cooking Process (How It Works):

Frame Harvesting:
- Gather all the images from the designated frames folder. This is our initial mise en place.
API Infusion (Backend – FastAPI):
- The FastAPI app starts, loading the yolo11s.pt model, which preps the object-detecting sauce.
- It defines a /predict POST endpoint, ready to receive image ingredients.
- Upon receiving an image:
  - The image (in bytes) is read and converted into a NumPy array.
  - The YOLO model then processes the image, identifying objects and providing bounding boxes, labels, and confidence scores.
  - The results are formatted into a JSON response with a list of "detected_objects".
Image Relay (Frontend – Image Sending):
- The get_image_files function scans the designated folder (frames).
- Loop through each image file found.
- For each image:
  - Image Transfer: The image is sent as a file to the /predict endpoint of the FastAPI server, starting the detection process.
  - Data Receiving and Decoding: Receives the JSON response from the API, containing the list of detected objects with their bounding boxes, labels, and confidence scores.
  - Bounding Box Display: Show the image with bounding boxes
  - Result Printing: Displaying the result of objects detected.

Serving Suggestion (Outcome):

This “AI Vision Feast” delivers a system that automatically processes images, identifies objects within them, and provides a structured JSON output detailing the object locations, labels, and confidence. It allows for automated understanding of image content, making it useful for a wide range of applications, from surveillance and robotics to image analysis and content moderation, all from a simple folder of images.

Leave a Reply Cancel reply