Challenge: The objective of this project is to create a machine learning model trained on accurate WRF-PartMC data that predicts climate-relevant aerosol properties from only the features that current GCMs can output. Problem Statement
To start solving this question, we first needed to normalize our input and output data points to reduce floating point error and to clean up major discrepancies between the norms of each input params.
-
Method 1: Normalizing Using Mean and STD
- If value is zero, we replace it with a minimum non-zero value (so we can log).
- When calculating mean and standard deviation, we use the log(value) to ensure floating point precision .
- All variables except 'z' are converted to log space.
- Global mean is subtracted and normalized by standard deviation for each variable.
- Added cos(Time) as additional feature.
- Used for MLP and TabNet.
-
Method 2: Normalizing for Each height
- Variables are converted to log space similar to method 1.
- Mean and standard deviation are calculated at each height instead of global.
- Dataset used for final TabNet model.
Next, we visualized the correlation of the input variables with the output variables at a single timestamp, then over the course of the time range given.
Read Our Full presentation HERE
Sunny Tang, Kedar Phadke, Chu-Chun Chen, Labdhi Jain