Adding color and texture to Machine Learning development using Edge

22nd Apr 2024 dRISK

By Federico Arenas Lopez


How the painting looks now: Introduction


As Machine Learning (ML) models become a core part of our daily workflows, the need for developers to have fine control of their behavior has never been greater. Perhaps the most evident example is during Autonomous Vehicle (AV) development, where the stakes are so high, that the better we can train the behavior of self-driving vehicles, the more lives will be safe when they are deployed.


dRISK Edge is the only tool out there that lets ML developers manipulate their models end-to-end graphically and intuitively. Edge integrates easily with ML development workflows by letting users load their datasets, understand their distribution in multiple dimensions, and ultimately control their model’s performance with unprecedented granularity. In this short blog we will illustrate this capability by training and evaluating an object detection DEtection TRansformer (DETR) architecture in an AV Perception development workflow, using a subsample of the nuImages Dataset. We will use Edge to look at DETR’s performance under a completely new lens which will let us surface failure modes that otherwise could remain hidden.


For starters, Edge’s uniqueness lies in how intuitively you can load a complete dataset and understand its broad strokes while being able to dive into the finer details of its composition. In our object detection case, this means looking at the entire dataset, while being able to inspect the different images and annotation structure.


3000 samples from the nuImages dataset loaded into Edge (left) and 4 different frames and their annotations (right).


Adding more color and texture means better grasp of your data


While Edge’s graphical interface lets you appreciate all the detail of your data at once, this alone is not enough; to truly understand their data, developers need to add more color and texture to contemplate their data in a crisp, comprehensive way. In the AV perception case, the annotation data holds a large well of potential that can be used to look at an object detection dataset under completely new tones. 


To achieve this, Edge let’s developers easily add new, latent dimensions that are already hidden in the dataset. For instance, using the bounding box information we can add features that let us see (1) the distribution of how close vehicles are in each frame,  (2) how crowded they are, and (3) how many of these are in the frame. These three different features –what we will call “closeness”, “crowdedness”, and “number of vehicles”– already account for a vast repository of phenomenology that lets us grasp the nuImages dataset with more understanding and control. We can look at this phenomenology in detail by looking at their distributions in Edge, as we do below.


New “closeness”, “crowdedness”, and “number of vehicles” features added to the entire dataset (left) and their different histogram distributions (right).


Expressive paintings to understand, and communicate, your model’s performance


While looking at the three distributions is helpful, this is not very compact. Edge’s ability to make embeddings from multiple dimensions lets users look at all of their features in one single, compact representation. After training our DETR ResNet-50 model for 100 epochs on a randomly sampled split of our dataset, we can use both the t-SNE and PCA embeddings to look at the distribution of performance along the features we created. This workflow empowers ML developers with a concise and interpretable method for inspecting, and communicating, their model’s performance under diverse phenomena. 


Concretely, we can color our embeddings by the mAP (mean Average Precision) calculated over a randomly sampled validation set. By adding the performance measure as a 3rd dimension in color, Edge has enabled us to surface regions where our network is underperforming. Some of these regions are surfacing failure modes such as “poor performance for detecting far away vehicles”, or “poor performance at detecting frames with high vehicle numerosity”. 


Performance metrics loaded into the dataset (left), and side-by-side embeddings of the entire dataset and the validation split colored by performance. Red means poor performance and blue means good performance.


By looking at the embedding in more detail, we can quickly locate frames where high risk vehicles are not being detected, or frames where vehicles are not being detected simply because they are behind a fence. This level of granularity is key for developing safe AVs, and is an inherent characteristic of Edge.


Zoom in to specific failure modes found by inspecting the validation embedding. Red means poor performance and blue means good performance.


Users of Edge can benefit from a compact and transparent way of identifying and communicating shortfallings of their models, in order to address them systematically by e.g. drawing more data from the regions of poor performance, or by tuning their models to increase overall detection performance.


Going further: Contrasting colors from different models


Another workflow Edge facilitates is the ability to compare, and communicate, the performance of multiple ML models at once. Our platform prides itself on the ability to concisely express the shortfallings and strengths of one model compared to another one, in the context of the features developers care the most about. In practice, we trained DETR ResNet-101 and compared its performance to the above results from DETR ResNet-50. We can color our embeddings by the difference in mAP, just as shown below.


(left) Performance delta distribution over the validation set and (right) t-SNE and PCA embeddings colored by the delta in performance between both models


Looking at the results under the new contrasted colors quickly allows us to see that the detection performance stayed the same or improved for around 80% of the samples in the dataset. Moreover, we can hone in on the specific failure modes and realize that most of them were mitigated by going from a ResNet-50 architecture to a ResNet-101 architecture!


The painting after adding more color: Conclusion


In this blogpost we covered how, in a very compact way, we can interface with Edge to train and evaluate DETR for an object detection application. We showed how Edge can enhance ML developers’ workflows by increasing their level of control over a network’s performance, in an interpretable manner. 

By using Edge, ML developers, just like painters, can exploit the latent color that is in their data to ultimately understand and improve their model’s overall interpretability and performance. This is applicable to any retraining workflow, be it in Computer Vision or Natural Language Processing. If you’re an AI developer and you’d like to try out Edge, you can do so at