Back

V7 & Voxel51: Fine-Tune Your Datasets for Precise Model Training

We are excited to announce the launch of an integration between Voxel51 and V7! This technical collaboration empowers customers to drive maximum value from their data and maximize annotation efficiency—while decreasing data volumes.
Read time
December 14, 2023

Building scalable datasets necessary for high-quality AI products requires stellar data management pipelines.

V7 and Voxel51, two best-in-class AI data platforms, have joined forces to empower their customers to impactfully fine-tune their training data preparation. 

This integration enables joint customers to get the most value from training data and maximize annotation efficiency—while decreasing data volumes. Joint users can curate and improve datasets, easily annotate and reannotate data, and transfer it seamlessly between the platforms. All of this will help improve model accuracy, minimize annotation costs, and speed up time-to-market.

V7 and Voxel51 integration is currently in beta.

Solving training data challenges for better AI development

Machine learning products are only as good as the data they’re trained on. Particularly in the noisy AI product space, cutting corners can cost companies a competitive advantage. In fact, according to the 2022 IBM Global AI Adoption Index, 24% of businesses cite “too much data complexity” as one of the top barriers to AI adoption.

One of the deceptively simple solutions to boost model accuracy is to increase the number of training samples. However, large datasets can be a double-edged sword. The time and resources needed for data collection and annotation, as well as the high infrastructural demands of storage and computing, can hinder any project—by inflating budgets and extending time-to-production.

That’s why it can be better to train models on smaller, carefully curated datasets. However, the data selection process is not without its challenges. Selecting data of the highest quality, ensuring equal class distribution, and monitoring existing datasets for mistakes are enormous strains on resources—unless you’re supported by appropriate software.

V7 and Voxel51: Toward best-in-class dataset management

To battle these challenges, V7 and Voxel51 have connected their efforts to empower their customers to build smart, scalable, and high-quality dataset management pipelines.

This integration makes it easy to explore, visualize, and understand datasets, as well as streamline annotation—to improve efficiency, build better-performing models, and maximize the ROI of your training data operations. Now, users can easily curate new datasets, calibrate, correct, and augment existing ones, and leverage active learning workflows. 

V7 is a powerful AI data engine enabling better AI products to reach the market faster. Used by enterprise customers worldwide, including Continental, Wanzl, and Boston Scientific, V7’s unique workflows enable faster, and more accurate labeling. Features such as auto-annotation, model visualization, advanced video labeling, bespoke workflow design, intelligent QA, and elite labeling task forces converge to offer a scalable solution that prioritizes impactful AI development.

Voxel51 is the company behind FiftyOne, the leading open-source toolkit for building high-quality datasets and computer vision models. AI teams around the world rely on FiftyOne and FiftyOne Teams to visualize, curate, manage, and QA data, and automate the workflows that support enterprise machine learning.

Key benefits of the V7 and Voxel51 partnership

Together, these two platforms provide customers with cutting-edge solutions primed to deliver top-tier AI products.

Dataset curation for smarter annotation

FiftyOne by Voxel51 helps users identify the most relevant samples from datasets to send to V7 for annotation. It does this by providing a variety of tools and workflows to:

  • explore and balance your datasets by class and metadata distribution
  • visualize, de-duplicate, sample, and pre-label your data distributions using embeddings
  • perform automated pre-labeling with off-the-shelf or custom models

And more. These workflows enable the creation of diverse, representative data subsets while minimizing data volume—getting you the most out of your annotation budget and boosting model performance.

Dataset improvements for model fine-tuning and evaluation 

This integration clears the way for efficiently optimizing and augmenting existing datasets to boost model performance even further.

FiftyOne enables powerful image and object-level annotation review and QA workflows. The platform’s embedding visualization, compatible with off-the-shelf and custom models, can be used to help analyze the quality of the annotations and the dataset—to weed out mistakes and find areas for improvement.

By adding model predictions to a dataset for comparison with ground truth, or using the built-in similarity search features and vector database integrations, you can identify the difficult samples with targeted precision and pinpoint inaccurate annotations. FiftyOne’s sample- and label-level tags, as well as saved views, make it easy to mark samples for reannotation back in V7.

Seamless data transfers

Data can be easily sent back and forth between Voxel51 and V7 via an API—to reduce the time and effort needed for transfers and ensure top data security. The integration will allow for seamless conversion of all data formats, retaining all existing annotations (including labels made in other tools).

Support for all data formats 

V7 supports all data in its native formats—be it images, videos, or medical imagery. Powered with auto-annotation, specialized video labeling, and SAM integration, labeling teams can annotate data faster without sacrificing quality.

Notably, V7 supports DICOM, NIfTI, and WSI imagery, showcasing its commitment to delivering fit-for-purpose infrastructure for industries of all types.

How to use V7 and Voxel51 together 

The integration between Voxel51’s FiftyOne and V7 Darwin is provided by darwin_fiftyone.

It enables FiftyOne users to send subsets of their datasets to Darwin for annotation and review. The annotated data can then be imported back into FiftyOne.

Let’s go through a quick rundown of how to set it up.

Set up FiftyOne

To start, you need to install FiftyOne, an open-source tool for building high-quality datasets and computer vision models.

pip install fiftyone

Darwin V7 configuration

Connect FiftyOne with V7’s Darwin to start annotating files. Here’s how to integrate with the Darwin backend:

1. Install the backend

pip install darwin-fiftyone

2. Configure FiftyOne to use darwin-fiftyone

cat ~/.fiftyone/annotation_config.json
{
  "backends": {
    "darwin": {
      "config_cls": "darwin_fiftyone.DarwinBackendConfig",
      "api_key": "d8mLUXQ.**********************"
    }
  }
}
Note: Replace the api_key placeholder with a valid API key generated from Darwin.

Loading example data in FiftyOne

Let’s start by loading example data into FiftyOne.

import fiftyone.zoo as foz
import fiftyone as fo

dataset = foz.load_zoo_dataset("quickstart", dataset_name="quickstart-example")
session = fo.launch_app(dataset)
view of voxel51

Annotation

Now let’s load this data into V7 for annotation and refinement.

To illustrate, let's upload all samples from this dataset into a Darwin dataset named "quickstart-example".

If the dataset doesn't already exist in Darwin, it will be created.

dataset.annotate(
    "annotation_job_key",
    label_field="ground_truth",
    attributes=["iscrowd"],
    launch_editor=True,
    backend="darwin",
    dataset_slug="quickstart-example",
    external_storage="example-darwin-storage-slug"
)
V7

After the annotations and reviews are completed in Darwin, you can fetch the updated data as follows:

dataset.load_annotations("annotation_job_key")

API

In addition to the standard arguments provided by dataset.annotate(), we also support:

  • backend=darwin: The Darwin backend being used
  • dataset_slug: The name of the dataset to use or create in Darwin
  • external_storage: The sluggified name of the Darwin external storage; all samples should exist in this external storage

Model training and next steps

The data annotated in V7 can now be further reviewed and refined before it’s used for model training.

FiftyOne has a variety of tools to enable easy integration with your model training pipelines. You can easily export your data in common formats like COCO or YOLO suitable for use with most training packages. FiftyOne also provides workflows for popular model training libraries such as PyTorch, PyTorch Lightning Flash, and Tensorflow.

What’s more, with FiftyOne’s new plugins architecture, custom training workflows can be directly integrated into the FiftyOne App and become available at the click of a button. The delegated execution feature lets you process the workflows on dedicated compute nodes.

Once a model is trained, you can easily run inference, load the model predictions back into FiftyOne, and evaluate them against the ground truth annotation.

# Load an existing dataset with predictions
dataset = foz.load_zoo_dataset("quickstart")

# Evaluate model predictions
dataset.evaluate_detections(
    "predictions",
    gt_field="ground_truth",
    eval_key="eval",
)

FiftyOne’s evaluation and filtering capabilities make it easy to spot discrepancies between model predictions and ground truth, including annotation errors or difficult samples where your model underperforms.

Tag annotation mistakes in FiftyOne for reannotation in V7, and add the difficult samples to the next iteration of your training set. Take a snapshot of your dataset and move on to the next round of improvements.

Loading cloud-backed media and working with dataset versioning

If you’re a FiftyOne Teams customer and work with cloud-backed files, you will be able to load items directly from your cloud storage.

The two steps you need to complete are:

  • Set up V7 external storage settings with your cloud provider
  • Add the sluggified V7 storage name to your fiftyone-darwin config in the external_storage config field. 

The name will be identical to the name field in the V7 external storage settings.

Once you follow these steps, you are ready to connect your cloud-backed media to FiftyOne Teams and V7 Darwin.

Note: The Darwin cloud-backed media registration process is handled as a part of the annotate method.

FiftyOne Teams also includes dataset versioning, which means every annotation and model you ran can now be captured and versioned in a history of dataset snapshots. Dataset snapshots in FiftyOne Teams can be created, browsed, linked to, and re-materialized with ease—without complex naming conventions or manual tracking of versions

What’s next?

Voxel51's platform, FiftyOne, supercharges your machine learning workflows by enabling you to visualize datasets and interpret models faster and more effectively. With V7 added to the equation, you’ll streamline the process of annotating data and training more accurate models even further.

We are excited about the continuing collaboration between Voxel51 and V7, and we look forward to exploring new capabilities and adding even more value to the AI product development of V7 and Voxel51’s joint customers.

Do you have any feedback, comments, or questions regarding the integration? Let us know. Our mission is to help you move the needle in the AI space—so we want to offer you the best-in-class solutions. 

Reach out to partners@v7labs.com for more information—our Partnership team will be happy to help. 
Automate repetitive tasks with V7's new Gen AI tool
Explore V7 Go

Other news

Ready to get started?
Try our trial or talk to one of our experts.