Locate Image Objects with Einstein Object Detection

Train deep-learning models to recognize and count multiple distinct objects within an image using the Einstein Object Detection API. The API identifies objects within an image and provides details, like the size and location of each object.

For each object or set of objects identified in an image, the API returns the coordinates for the object’s bounding box and a class label. It also returns the probability of the object matching the class label. Some scenarios for using the Object Detection API include locating logos in images or counting products on shelves.

Object Detection Flow

Einstein Object Detection is part of Einstein Vision, so the calls that you make are similar to the calls for image and multi-label models. But an object detection model is different from a multi-label model. A multi-label model returns the probability that particular objects are in an image. In contrast, an object detection model identifies the location of specific objects within an image.

Start with the Dataset

To create a detection model, you start with the dataset. When you create the dataset, specify image-detection as the type. This cURL call creates a dataset.
curl-X POST -H "Authorization: Bearer <TOKEN>" -H "Cache-Control: no-cache" -H "Content-Type: multipart/form-data" -F "path=http://<DATA_URL>/products.zip" -F "type=image-detection" https://api.einstein.ai/v2/vision/datasets/upload

To create the model, train the detection dataset using the same endpoint as a standard classification model: https://api.einstein.ai/v2/vision/train.

Classify an Image

Classifying an image uses the new /detect endpoint. The image you pass in can be on a local drive or referenced by a URL.
curl -X POST -H "Authorization: Bearer <TOKEN>" -H "Cache-Control: no-cache" -H "Content-Type: multipart/form-data" -F "modelId=<YOUR_MODEL_ID>" -F "sampleLocation=http://web.yoursite.com/<IMAGE_FILE>.png" https://api.einstein.ai/v2/vision/detect
Let’s say that a company is using the API to detect products on store shelves. After you send an image in for detection, you receive a response that looks like this JSON. The labels that you see vary depending on the labels in your model. The response returns a label, a probability, and the coordinates for the bounding box. These coordinates specify where the item was detected in the image.
{
        "probabilities": [
            {
                "label": "Bran Cereal",
                "probability": 0.9994634,
                "boundingBox": {
                    "minX": 325,
                    "minY": 300,
                    "maxX": 483,
                    "maxY": 402
                }
            },
            {
                "label": "Out of Stock",
                "probability": 0.99834275,
                "boundingBox": {
                    "minX": 536,
                    "minY": 322,
                    "maxX": 647,
                    "maxY": 385
                }
            },
            {
                "label": "Oat Cereal",
                "probability": 0.9977386,
                "boundingBox": {
                    "minX": 697,
                    "minY": 356,
                    "maxX": 789,
                    "maxY": 395
                }
            },
            {
                "label": "Protein Mix",
                "probability": 0.99745244,
                "boundingBox": {
                    "minX": 40,
                    "minY": 289,
                    "maxX": 150,
                    "maxY": 332
                }
            },
            {
                "label": "Corn Flakes",
                "probability": 0.9832312,
                "boundingBox": {
                    "minX": 390,
                    "minY": 101,
                    "maxX": 431,
                    "maxY": 134
                }
            },
            {
                "label": "Other",
                "probability": 0.94256777,
                "boundingBox": {
                    "minX": 368,
                    "minY": 350,
                    "maxX": 447,
                    "maxY": 408
                }
            }
        ] }