Grid Raven downscales wind to specific locations with the help of deep learning. In this post we’ll open the ‘black box’ to explore how the machine predicts the wind.
The wind downscaling model has various ‘branches’ of input, including LIDAR data (high detailed geographical elevation maps, accurate to 1m), the Numerical Weather Prediction (NWP) for a grid of locations (the output of physics simulators, run many times a day on supercomputers) and some spatial and temporal metadata about a prediction. From this, the model is expected to predict, to the exact metre, the wind in a specific location - not a simple task by any means. You’d expect the model to analyse every metre of terrain with its trees, houses, roads; anything at all that could cause the wind to change direction even the slightest bit to achieve this, all along power lines crossing entire countries. But what is it really learning from this data? Can its results be trusted, and what can we do to help nudge the model in the right direction? To answer these questions, we can use various methods for interpreting the workings of these ‘black box’ models.
One method for exploring the workings of the model is perturbation analysis, where we ‘get rid’ of inputs, and see how much this affects the model’s prediction of windspeed in both the East and North directions across 20 different samples.
As seen in the above results, the input surface pressure and windspeed from the NWP are both key to the model’s prediction, consistently impacting the prediction when missing. Furthermore, the model seems to think that surface models and the slope of the terrain are the next most significant factors for windspeed, also good signs.
To try and gain greater insights into how each feature is impacting the result, we can look at the perturbation results for specific samples. We can take the below example, taken from a Finnish weather station on top of a hill during a windy day.
For this example, the model was able to predict the wind direction with an accuracy of just 3 degrees! Let’s consider the following decision plot for the model output predicting windspeed to the East:
Like usual, the model is using surface pressure and windspeed as the main weather features for the prediction, as well as both the surface model and terrain slope. But here, a strange ‘zig zag’ shape appears.
If we start from the bottom of the diagram and work our way up, we can see that removing the first 7 features had no or negligible effect on the model prediction. The next 4 had little effect (each changing the prediction by just 0.05 m/s at max), and can be ignored. But from ‘Windspeed going north’ and further up, the impacts they had on the model decision become increasingly significant. If we consider just the inputs that tell the model about the topography, ‘surface model’ and ‘slope of the terrain’, we can see that they have a positive impact on the model prediction, moving the final prediction towards -0.25 m/s. The weather inputs on the other hand, seem to have a negative effect on the result, pushing the final prediction value towards -0.55, where it finishes its decision at.
This discourse between different types of input to the model is a good indication that the model is considering the features in a desirable way. It shows that the model is taking into account the interactions between weather and topography, even if one gets in the way of another, leading to more reliable and accurate predictions. However, this example also hints at some issues with the model, including the little consideration of the terrain model in an example where you’d expect the height and shape of the hill to be a significant factor, as well as the lack of consideration of both latitude and longitude, which are generally key indicators of the type of weather to be expected in a geography.
While this is just a single example, repeated indications like the ones described above can hint at larger issues or confirm positive improvements in the model’s performance. The theories taken away from these can help shape future model development targets and strategies. For instance, comparisons between the considerations of different models across a wide range of samples was a key factor for the focus on improving the model processing LIDAR data, leading to the continued development of masked auto encoders as an alternative pre-training method.