A Closer Look at Motion Prediction with dRISK Edge

17th Jun 2024 dRISK

By Hugh Blayney

Introduction

Here at dRISK, we want to help people understand the “brains” behind AI. In this blog post, we’ll use dRISK’s Edge software to examine an Autonomous Vehicle (AV) use case, and diagnose an underlying data problem.

We have a history of working on complex challenges within the field of AVs; one such area is that of motion prediction – given the historical motion of an entity and information about the scene around it, can you predict where it’ll go next? The setup for this problem is typically to look at other entities in a scene (i.e. not your own car), and try to predict what their movement is about to look like. The idea is that this information would then be fed into a planning module, dictating the future movement of the AV.

A paper that caught our eye with an interesting approach to the problem was “Multimodal Trajectory Prediction Conditioned on Lane-Graph Traversals” (PGP). To rephrase the title somewhat, the method in this paper explores paths through a graph representation of lanes, in order to help predict future trajectories. Even better – the authors provide a great open source implementation of their model for us to play around with. We’ll use this as a test model to see what we can learn from it …

How do you know what the predictor is doing?

A natural first question would be: how good is this model? There are some common metrics used in the literature, and PGP performs very well on them, achieving SoTA on NuScenes when it was released. But what is the model bad at? Where does it fail? How can we make it better? We could get an intuition for the model’s performance by visualizing some outputs where it doesn’t perform so well. But it would be even better if we could see these outputs in the same way that the model “sees” them – if we could organize them in some way according to the internal “world view” of the model. We will achieve this by looking at all of the scenarios at once – laid out according to the embedding that they are assigned by the model.

We can visualize this in dRISK Edge, a web app that allows you to quickly find complex patterns in arbitrarily structured data. Visualizing these high-dimensional embeddings in two dimensions using t-SNE gives us the following view:

Each of these points represents a “test scenario”, and they are arranged according to the model’s representation of them just before the trajectory prediction begins – so we’re visualizing how the model has represented the first 2 seconds of each scenario.

To actually look at each scenario in detail and allow us to build our intuition, we use the dRISK Scenario Gym – an extremely lightweight simulator for quickly exploring driving scenarios. For each test scenario we render a video like the one shown below; the ground truth trajectory of the vehicle in question is the red box, and the 10 predicted trajectories are shown as the lighter red boxes – their degree of opacity corresponds to their probability as output by the model.

Armed now with the power of visualization – and also dRISK’s taxonomy of automatic scenario annotations connected using Edge – let’s first see if we can get an idea of what this embedding represents. If we color by the ego maximum speed annotation (red is greater), we see that it corresponds closely to the x-axis:

And if we highlight our binary metrics corresponding to left turns / right turns / going straight, we see that these closely match the y axis:

The existence of this broad structure can be explained since the t-SNE embedding uses PCA initialisation, so the two largest principal components in the PGP embedding correspond to speed and turning – in a sense, this means that these concepts are the most important in the model’s embedding, which intuitively sounds about right. But t-SNE also does a good job of local structure; clustering scenarios together that are nearby in the PGP embedding space. This is very useful if we want to try and spot patterns – or clusters – of failures in our scenarios that aren’t captured by the largest principal components.

Let’s now look at where PGP doesn’t perform so well. We’ll start this investigation by coloring the scenarios by the NuScenes minimum Average Displacement Error (the average deviation of the predicted trajectory from the ground truth trajectory) over the top 10 predictions (minADE 10) metric (strictly – we’ll color by the log, to avoid outliers squashing the interesting color spectrum). Our embedding now looks like this:

Where “more red” corresponds to “greater MinADE 10” – or, at a high level, “worse performance”. Immediately, we see some clusters of poor performance – let’s put the human in the loop (that’s us!) and start to develop some intuition using the dRISK scenario gym renders:

There are some outliers here – scenarios where the underlying “ground truth” trajectory is unusual and the model understandably performs poorly. But the majority of poor performance is on what looks like a single intersection. We can view spatial / geographical data in dRISK Edge as well; we now explore the same embedding in the context of the scenarios’ real-world locations:

Looking at the earlier Scenario Gym renders, we can immediately see the pattern in these high-error scenarios: the underlying lane graph is incorrect, it appears to be missing a right-turning lane. Since the PGP architecture is dependent on traversals over the underlying lane graph, it makes sense that this error causes the predictions to stray significantly from the true trajectories – this particular data failure hits this model architecture particularly hard.

Conclusion

Here we used Edge to quickly investigate the performance of an AV motion prediction model, but we’ve seen Edge successfully used it for a whole host of use cases involving complicated, heterogeneous and unstructured data. If you want to check it out, you can find a free demo version here – give it a go, load in whatever data you’re interested in, and if you want to get in touch we’d love to hear from you!