Convolutional Neural Networks as Efficient Emulators for Atmospheric Models

Author: Berlin Chen
Mentors: Hai Nguyen, Derek Posselt
Editor: Michael Yao


We used Convolutional Neural Networks (CNNs) to emulate the physics of the atmosphere in order to bypass solving partial-differential equations (PDEs) explicitly, which cuts down on computational cost. This is important because in the past, the models used to produce reliable weather forecasts required a computationally complex calibration.
We let the CNNs learn on a 4-dimensional (longitude x latitude x height x time) geophysical dataset, with a separate CNN for each height index. After a series of experiments we conducted following implementation, we found that zero-padding the data, varying time scale, and changing sample space had little effect on the CNNs’ performance, and that there was little correlation between training data size and error. We also observed that given the same training-data size, the CNNs with a more complex configuration (more sets of weights) actually performed worse.

1 Background

1.1 Variability Quantification

Reliability of weather forecasts has a variety of important consequences. Certainly, reliable weather forecasts bring people convenience of knowing when to bring an umbrella or when to stay at home rather than go to the supermarket. Our economy also has a stake in reliable weather forecasts, as natural disasters incur billions of dollars in damage each year [1]. ^1

Improving the reliability of weather forecasts, however, can be a challenge. In particular, initial conditions of the forecast model (such as referential sea-level pressure) can differ from reality. Because the underlying mechanisms governing the atmospheric behavior can be unstable, small changes in initial conditions can lead to wildly different predictions.

One way to characterize variability is to treat the value of each parameter not as a unique value, but rather as a probability distribution of values. In this context, prediction is the calibration of the probabilities of a parameter having certain values in the future, given the current measurement of some parameters. Bayes’ theorem allows us to quantify this information:

P(A | B) \; \propto \; P(A) \cdot P(B | A)

where P(A|B) denotes the probability of A having certain value given that B has certain value. In our model we let B be the parameter we would like to predict and A be the parameter corresponding to a control variable.

Repeated simulations evaluating the sensitivity of B in response to changes in A can be done by computing the solutions to algorithms approximating partial-differential equations (PDEs) governing the underlying physics; however, this method comes with high computational cost ^2. As a potential solution, we speculated that, by using techniques in machine learning, we will be able to scale down the computer runtime without compromising the accuracy significantly.

1.2 Machine Learning and Neural Networks

Machine learning essentially describes the process of approximating input-output relationships. Toddlers learn by observing and drawing connections (as opposed to memorizing formal rules). In the same way, we can analogously design a procedure that directs computers how to acquire information about a function, instead of coming up with hard-and-fast rules for arriving at a solution. In practice, this means that, instead of computing desired outputs from given inputs, one would feed into the computer many input-output pairs and let the computer learn the mapping. This technique is highly relevant when the function is too complex to solve analytically ^3.

The machine-learning technique that we used to realize this concept is called a \textit{neural network}. For the most part, a neural network can be represented as a computational graph. One can think of a computational graph as an assembly line consisting of a collection of workers, each doing a separate task. ^4. Moreover, there is an established workflow among the workers, such that a worker may work on something that is produced by another worker. In our case, instead of manufacturing material goods, each worker in a computational graph does a simple task such as addition of two numbers. Although each worker contributes only a simple task, cooperation among the workers can result in complex calculations, which is why neural networks are so expressive. ^5. Figure 1 gives an example of a computation graph.

Figure 1
Figure 1: An example of a computational graph [4]. One can think of each bubble as a worker doing a simple task such as addition of two numbers (or, in the case of the bottom two bubbles, delivering a number). The arrows specify the workflow among these workers.
In neural networks, the nodes in the graph (the workers) are organized into non-mutually exclusive subsets, called \textit{layers}. Figure 2 illustrates an example layer. The nodes in a layer, in turn, can be organized into \textit{inputs}, \textit{weights}, and \textit{outputs}, with inputs being the parents of weights and weights being the parents of outputs (the node is a \textit{parent} of another node when the latter’s calculation depends on the former’s output). Whereas weights are constants, variable outputs are defined by taking dot products of parent weights with inputs, and then transforming the result with some function (called the \textit{activation function}) so that an output of one layer can be used as an input to another layer.

Figure 2
Figure 2: An example layer of a neural network (taken from [5] with modification). To better visualize such a layer (which is a computational graph), it is flipped sideways, and the bubbles are redrawn into boxes and circles to distinguish between input and output layers (weights are omitted).
Every neural net has an \textit{input layer} and an \textit{output layer}. The input layer contains all the nodes with no incoming edges, while the output layer contains all the nodes with no outgoing edges. A layer that is neither input layer nor output layer can be called a \textit{hidden layer}. In this context, a run of the neural nets means feeding data into the input layer and observing values of the output layer’s outputs.

To implement a typical neural network model, we must first train the model on many input-target pairs by running the neural network with some or all of the inputs. We may determine the neural network’s performance by using some error measures (such as the mean squared error, or MSE, of the neural network’s outputs with respect to corresponding targets). To the minimize error measures, we may update the neural network’s weights. ^6

Generally, a model is only as good as the training data, and so the neural-network approach is domain specific and cannot be easily generalized from, say, one geographical region to another [4]. The model, however, is advantageous in speed. Using machine-learning techniques, we bypass the task of solving PDEs (partial differential equations), which is traditionally where the bulk of computation takes place. Consequently, the runtime of neural-net predictions will be orders of magnitude less than those of PDE solvers. Hence, if our speculation is sound, then neural nets could address the need for speeding up repeated simulations of differing initial values, which, as mentioned earlier, is a crucial contributor to the characterization of weather-forecast uncertainty.

1.3 Convolutional Neural Nets (CNN)

The distinctive feature of a CNN, which is the type of neural networks we used for this project, is the inclusion of one or more convolutional layer(s). To understand how the output of a convolutional layer is related to its inputs, one can imagine applying (\textit{i.e.} taking the dot product and applying some activation function of) a matrix of weights (with dimensions less than or equal to the input’s) repeatedly across the input space.
It has been shown that a CNN can achieve remarkable accuracy when it is applied to the task of labeling certain features in some spatially-meaningful data (such as images and videos) [9]. Therefore, since the atmospheric behavior for the kinds of data we use is also spatially meaningful, a CNN is a promising candidate.

1.4 Nature Run Dataset

For our experiments, we used the Nature Run dataset. It is generated by the model in 5-minute intervals from 00 UTC 22 Nov 2006 to 06 UTC 23 Nov 2006 (a total of 30 hours) ^7, and is comprised of 4-D grids of size 2701 x 1999 x 74 x 360 (longitude x latitude x height x time), which corresponds to observations that are 1.3 km apart, 74 levels tall ^8, and span 30 hours. Each grid contains information about a certain geophysical measurement, such as water vapor. For our experiment, we took grids containing information about water vapor, air temperature, and winds. We merged those data in the third (z) dimension, which would have 74\cdot 4 = 296 indices. For the sake of computational feasibility (and also to reduce correlation between data points), we took every ten steps in the first two dimensions. Consequently, the dataset we used is of size 270 x 200 x 296 x 360.

2 Experiments

2.1 Preliminary Results

Prior to designing and running the experiments, we trained the CNNs on the time interval from the 0^{th} to the 1^{st} hour of Nature Run and visualized its prediction on the 2nd time step (5 minutes from the beginning of Nature Run) with the input being the 1st time step (the beginning of Nature Run). Figure 3 illustrates the heat maps produced by our CNN that was trained to predict data with z index equals 11, which corresponds to a vertical layer of water-vapor data. The values of both the prediction and the data we hope to predict (which we will refer to as “truth” or “target”) are standardized.

Figure 3
Figure 3: Prediction of a single time step with input being some training data. This shows the heatmaps produced at the 11^{th} vertical (z) index. The heatmap to the left is CNN’s prediction, and the heatmap to the right is the target. The heatmap in the middle is their difference. All values are standardized.

To measure loss, we divided the variance of the residual (Y-\hat{Y}) by the variance of the ground truth (we will call this measure \textit{loss}). This measure allowed us to both quantify how well the model fits and detect instances when the algorithm is merely guessing randomly (in which case we obtain 1 using this measure). The data illustrated in Figure 3 has a loss values of 0.0078. We judged that a low loss value and visual similarity in heatmaps mean the CNNs performed quite well in minimizing the training error. However, note that it conveyed little information about the generalizability of the CNNs.

Additionally, we trained the CNNs on the interval from the 6th to the 30th hour of Nature Run and let it predict the same 24-hour interval it was trained on (and so the input corresponding to the zeroth time step is the ground truth while all subsequent inputs are predictions made by the CNNs), in order to gauge how well the CNNs may generalize inputs generated by themselves. Figure 5 illustrates snapshots taken at various time steps, along with the plot of the loss as a function of time steps predicted ^9. We observed that the CNN started with fairly accurate predictions (which is somewhat expected since little generalization is needed to accurately predict the first few time steps); however, after around the 24^{th} prediction, the quality of prediction diminished, as evident from the blurred resolution of the heatmaps.

2.2 Variable Testing

Having these preliminary results, we ran a series of five experiments in an effort to understand how various parameters affect the behavior of the CNN.

First, we tested whether using zero padding significantly affects CNN’s prediction accuracy. \textit{Zero padding} is a technique that keeps the input dimension consistent by expanding the dimension of some input data and filling in the empty entries with zeros.) Zero padding is relevant in our pipeline because for each prediction, there exist data points at the “edges”, where we could not obtain enough information necessary to make a prediction. Consequently, we need zero padding to address such loss, but we would like to access its effect on CNN’s prediction accuracy.

To establish basis for comparison, we introduced another padding scheme: instead of filling in zeros, we filled in the ground truth (\textit{true padding}). Using the same (default) CNN configuration and same training/prediction dataset, we could compare the prediction using the two padding schemes. Based on the results (see Figure 6) we observed that there is very little difference in CNNs’ performance between zero padding and true padding. ^{10}

Secondly, we tested whether training the CNNs to output the spatial data that is two or more time steps away from the input data (the default is one time step) would yield better results (\textit{i.e.} whether larger time increment would lead to higher prediction accuracy). It is possible that predictions done on a coarser time scale yield better results. The reason is that finer meshes could cause CNNs to primarily pick up noise in the data, instead of the desired physical interactions.

To examine this possibility, we trained two versions of CNNs, \textit{ceteris paribus}: in the first version, the CNNs would be trained to predict data 1 time step away from the input, whereas in the second version, the CNNs would be trained to predict 2 time steps away from the input. From our result (see Figure 7), we did not observe any noticeable improvement in prediction accuracy for CNNs trained with larger time increment.^{11}

Thirdly, we tested how well the model learned by our CNNs extrapolates to time intervals outside of the training data’s time interval (ideally, the model learned by the CNNs would generalize to any given time interval). To that end, we compared predictions where the first input is taken from the training data, to predictions where the first input lies outside the training time-step interval. In this case, comparison entails using the same CNN models to predict different ranges of time steps, one inside and one outside the training time-step interval, and observing the discrepancy in accuracy. Based on the result (see Figure 8), we observed very little difference in CNNs’ performance. In this case, even as predictions were made on different time intervals, the quality of prediction does not differ significantly. This supports that the time interval for predictions has little influence on the quality of predictions.

Fourth, we assessed the performance of the CNNs as a function of the available sample size. This is motivated by the observation that training the CNNs with all available training data until convergence is demonstrably infeasible given our time budget and computing resources. To that end, we trained the CNNs given 1, 2, 5, and 10 percent of all available training data (all configurations have a training time-step interval of six hours).

Based on our result (see Figure 9), we observed a lack of correlation between prediction accuracy and training data size. Although this does not inform us about the performance of CNNs with training data larger than 10 percent of total available data, the lack of correlation between prediction accuracy and training data size at least provides a reason to believe that the CNNs trained with 2 percent of total available data do not perform consistently worse than those trained with 10 percent.

Finally, we assessed how the CNN’s prediction accuracy would be affected if we make the CNN’s architecture more complex ^{12}, in hope that, given the complexity of the task, a more complex architecture can develop a more accurate model that yield better performance.

From Figure 10, it seems clear that CNNs with higher complexity have lower prediction accuracy. A possible explanation is that, in the case of more complex CNNs, there are more parameters in the CNNs to tune, and so the space of all possible models (the \textit{hypothesis space} ) is larger. This implies that the CNNs in this case would be more prone to overfitting, explaining the lower prediction accuracy.

Figure 4 provides a visual summary of all experiments mentioned.

Figure 4
Figure 4: Visual summary of the five experiments aimed to better understand how various parameters affect the CNNs’ prediction accuracy. We found that adding zero padding, adjusting time increments, varying time interval for prediction, and varying training sample size all had little noticeable affect on prediction accuracy. A more complex CNN, in fact, resulted in decrease in prediction accuracy.
Figure 5
Figure 5. Visualization of repeated prediction over a 24-hour interval. t in this case refers to the temporal index. For each temporal index, the upper left panel is the CNN’s prediction, the upper left is the target, the lower left is the difference between prediction and truth, and the lower right is the loss as a function of temporal indices. (Note that the y-axis is inverted.)
Figure 6
Figure 6: RMSE plots showing the predictions by CNNs with zero padding and by CNNs with true padding (Experiment 1). Each value of the X-axis (level) refers to a CNN trained to predict the Nature Run dataset given a vertical index. Y-axis refers to the CNN’s non-standardized RMSE (to show the physical aspect of data). Vertical lines are added to distinguish one type of data from another. This interpretation of X- and Y-axes applies to Figures 7, 8, 9, and 10 as well.
Figure 7
Figure 7: RMSE plots showing the predictions by CNNs trained to predict data 1 time index away from the input and the predictions by CNNs trained to predict data 2 time indices away from the input (Experiment 2). In addition to comparing the predictions across the same data, we compared predictions across the same number of predictions made. Figure 7e shows the result of both congurations making a single prediction (we also gathered the CNN’s RMSE on the test data).
Figure 8
Figure 8: RMSE plots showing the CNNs predicting two time intervals, one overlapping with the
training time interval and the other lying outside of it (Experiment 3).
Figure 9
Figure 9: RMSE plots showing the prediction by CNNs trained with various available sample sizes (Experiment 4). In addition to the repeated predictions is a plot showing the predictions with
inputs being the test data.
Figure 10
Figure 10: RMSE plots showing the prediction by CNNs configured with different number of sliding windows (Experiment 5).

Future Works

Based on our findings, we are confident that CNNs could be refined as an emulator that efficiently quantifies uncertainties, a task crucial to reliable weather forecasts. Besides the questions we addressed in this research, there are other inquiries relevant in our assessment of our CNN implementation. For instance, we would like to know whether tuning hyperparameters, such as the learning rate for the CNNs, would lead to better performance. Also, for the CNNs to be an effective emulator for longer time intervals, we would like to further investigate principle factors that contribute to the blurring of prediction overtime. So far, we provided only qualitative assessments, and so further research is needed in order to identify the kind of quantitative assessments suitable for our purpose. Additionally, we would like to assess the CNNs’ performance beyond the Nature Run dataset.


The author would like to thank his mentors, Hai Nguyen and Derek Posselt, for their mentorship, expertise, and support. The author also would like to thank Amy Braverman, the Principle Statistician of JPL, for making this opportunity possible and for offering numerous helpful advices. The author owes much to the helpful feedback and support he received from JPL’s Uncertainty Quantification group, as well as the technical support he benefited from Bob P. Thurstans, a member of the Technical Staff at JPL’s Science Division. The author is also grateful for the abundant resources and opportunities available to him as a participant of the Caltech SURF program. Finally, the author would like to thank his family and colleagues, with whom he shares the same workspace, for their constant presence and support.

[1] Denning, Marty. “The Impact of Accurate Weather Forecasting Data For Business.” The Weather Company. October 27, 2015. Accessed October 20, 2017.

[2] Rappaport, Edward N. “Loss of Life in the United States Associated with Recent Atlantic Tropical Cyclones.” \textit{Bulletin of the American Meteorological Society} 81, no. 9 (March 10, 2000): 2065-073. doi:10.1175/1520-0477(2000);2.

[3] Smith, Ralph C. \textit{Uncertainty quantification: theory, implementation, and applications}. Philadelphia: SIAM, 2014.

[4] Olah, Christopher. “Calculus on Computational Graphs: Backpropagation.” Colah’s blog. August 31, 2015. Accessed October 21, 2017.

[5] Mas, Jean, and Juan Flores. “The application of artificial neural networks to the analysis of remotely sensed data.” \textit{International Journal of Remote Sensing} 29 (February 2008): 617-63. Accessed October 20, 2017.

[6] Hagan, Martin T., Howard B. Demuth, Mark Hudson. Beale, and Orlando De Jesús. \textit{Neural network design}. S. l.: S. n., 2016.

[7] Skamarock, William C., Joseph B. Klemp, Jimy Dudhia, David O. Gill, Dale M. Barker, Micheal G. Duda, Xiang-Yu Huang, Wei Wang, and Jordan G. Powers. “A Description of the Advanced Research WRF Version 3.” NCAR TECHNICAL NOTE, June 2008.

[8] Karpathy, Andrej. “Convolutional Neural Networks (CNNs / ConvNets).” CS231n Convolutional Neural Networks for Visual Recognition. Accessed October 20, 2017.


^1 Of course, natural disasters are responsible for loss of lives. According to [2], freshwater flood can claim up to thousands of lives in the Americas

^2 Some numerical solvers have a runtime of O(n^3), with n being the size of the input data.

^3 An example of such a relationship would be the change in rainfall as a function of a change in the value of a model parameter (\textit{e.g.} the distribution of water vapor in the geographical region).

^4 More precisely, a computational graph is an acyclic directed graph with each node representing a variable that is either a constant or an operation on variables represented by its parent nodes.

^5 The “neural networks” discussed here are limited to feedforward neural networks, the type of neural networks used in this paper.

^6 This can be done using some kind of backpropagation scheme, such as stochastic gradient descent.

^7 The ensemble of physics schemes used are Morrison 2-moment microphysics, RRTMG shortwave and longwave radiation, MYJ PBL, Monin-Obukhov surface layer, and Unified Noah land surface.

^8 The vertical (z) dimension is measured in the \sigma coordinate. For more information about this measure, see section 2.1 of [7].

^9 Note that we had not considered using zero-padding at this point, and so the heatmaps and the loss were calculated exclusive of the edges of the data.

^{10} Although true padding has lower RMSE at certain z indices such as z=200, for the most part the RMSE of CNNs with zero padding matches closely with the RMSE of CNNs with true padding. This suggests that zero padding has little influence on the CNN performance.

^{11} Although we observed CNNs trained with larger time increment (for convenience we call them “t+2” and we call the CNNs with smaller time increment “t+1“) had consistently lower RMSEs, we also observed that the RMSE of t+2 is not consistently lower than t+1‘s when comparing across a single prediction, as shown in Figure 7e. Based on this, we conjectured that the driving factor that caused lower RMSE in t+2 is that the errors are compounded less, as it would take less iterations of t+2 to generate a prediction, and not that t+2 is a better model than t+1.

^{12} In particular, we increase the number of filters, such that the first convolutional layer has 64 filters and the second has 128, as opposed to 32 and 64.

Detection of Volume Changes in Greenland’s Marine Terminating Glaciers

Increased melting of marine-terminating glaciers of the Greenland Ice Sheet could lead to unstable dynamic ice mass loss, further accelerating global sea level rise. To better understand the historical context of the present-day widespread glacier front retreat, we mapped frontal positions for two years, 1994 and 2017, for 146 and 251 major outlet glaciers, respectively. Front position locations were identified using optical remote sensing imagery from the Landsat 5 (1994) and Landsat 8 (2017) satellites. Of the fronts surveyed, we find that 86% retreated between 2017 and 1994 and 62% retreated between 2017 and 2015, when the most recent glacial front positions were recorded. In addition, ice surface elevation differences for four dynamically-thinning glaciers were calculated using data collected from NASA’s Oceans Melting Greenland (OMG) mission in 2016 and 2017 by the Glacier and Ice Surface Topography Interferometer (GLISTIN-A). In each case, ice surface lowering was observed in excess of 10 m within the 10km-wide swath of GLISTIN-A near the glacier front. This evaluation of glacier change advances our understanding of ocean-driven Greenland ice mass loss.

Author: Gurjot Kohli
Mentors: Ala Khazendar & Ian Fenty
Editor: Hana Kim
Jet Propulsion Laboratory, California Institute of Technology


Increased melting of marine-terminating glaciers of the Greenland Ice Sheet could lead to unstable dynamic ice mass loss, further accelerating global sea level rise. To better understand the historical context of the present-day widespread glacier front retreat, we mapped frontal positions for two years, 1994 and 2017, for 146 and 251 major outlet glaciers, respectively. Front position locations were identified using optical remote sensing imagery from the Landsat 5 (1994) and Landsat 8 (2017) satellites. Of the fronts surveyed, we find that 86% retreated between 2017 and 1994 and 62% retreated between 2017 and 2015, when the most recent glacial front positions were recorded. In addition, ice surface elevation differences for four dynamically-thinning glaciers were calculated using data collected from NASA’s Oceans Melting Greenland (OMG) mission in 2016 and 2017 by the Glacier and Ice Surface Topography Interferometer (GLISTIN-A). In each case, ice surface lowering was observed in excess of 10 m within the 10km-wide swath of GLISTIN-A near the glacier front. This evaluation of glacier change advances our understanding of ocean-driven Greenland ice mass loss.

OMG, INTRODUCTION (What’s happening to Greenland?)

Increased melting of marine-terminating glaciers of the Greenland Ice Sheet could lead to dynamic ice mass loss, further speeding up global sea level rise. Dynamic change refers to a rapid amount of volume flux in a glacier that can be unstable. This dynamic outlet glacier response includes acceleration, thinning, and frontal calving, where chunks of icebergs break off. Glaciers can lose large amounts of their ice through processes such as thinning from the melting of ice or calving of the front of the glacier, when huge chunks of ice break off and separate from the glacier as it retreats.1 Thermodynamic melting of the vertical face of marine-terminating glaciers (Figure 1)  requires a dynamic ocean circulation process where dense warm saline waters flow towards the glacier at depth and induce subsurface melting before cooling, freshening, and eventually flowing away from the glacier in a shallower layer.2 Depending on many factors including glacier bed and fjord geometry, glacier melting from its base may not trigger an immediate dynamical response. This difference of dynamical glacier responses to similar stresses of thermodynamic melting can give rise to variability between proximal glaciers with some retreating and others remaining stable.1 Over long timescales, with respect to overall loss of ice mass, there is a relationship between thinning and glacial front retreats as a change in one proportionally changes the other.3

From 1992 to 2011, the Greenland ice sheet mass loss has increased over four times, from 51 to 211 gigatons per year.  Thus, scientists are eager to collect more data on Greenland and to better understand subaqueous melting and calving processes. NASA’s Oceans Melting Greenland (OMG) mission is investigating the impacts of ocean melting on the ice sheet’s marine-terminating glaciers through ocean, glacier and bathymetry campaigns. Specific to ice observations, OMG aims to observe year-to-year changes of glacier heights and extent of nearly all of Greenland’s major marine-terminating glaciers.4 Due to the large influx of data, this is the first time the project was able to capture, synthesize and analyze this data to further studies of anomalous glacial regions across the ice sheet.

In support of the OMG mission, the goal of this research project was to measure volume changes for Greenland’s marine-terminating glaciers. These volume changes are determined by frontal position change and ice surface elevation. This was achieved through creating a detailed inventory of Greenland glacier front positions in 1994 and 2017 using Landsat imagery. The inventory was used to characterize the distribution of glacier front position changes for two periods: 2017 vs. 1994 and 2017 vs. 2015. To contextualize recent changes, glacier front positions in 1994 were catalogued where Landsat imagery permitted. These 1994 front positions expand an existing database that was limited to northwest Greenland. In order to calculate ice surface elevation changes for four rapidly-thinning glaciers, there was use of the OMG GLISTIN-A interferometric radar. These four glaciers were chosen based on large surface lowering rates (∆h/∆t) of glaciers measured between 2005 and 2007 that fall within available GLISTIN-A swaths for both 2017 and 20166. Our analysis of version 3 of OMG’s GLISTIN-A data shows continued rapid thinning of four outlet glaciers exceeding 10m/yr.

METHODOLOGY (How was it figured out?)

Digitization of glacier front positions

Glacier front positions were traced (digitized) manually after examining the Landsat 8 Operational Land Imager (OLI) /Thermal Infrared Sensor (TIRS) and Landsat 5 Thematic Mapper (TM) satellite scenes in conjunction with other data (e.g. use of a Greenland base layer with a mosaic of Landsat 7 scenes. Analysis of these data and the mapping of glacier front positions was done through use of the QGIS open-source geographic information system software.

For each glacier front, candidate Landsat scenes were identified by their path/row designation as defined by the descending orbital track in the World Reference System 2 (WRS 2). After identifying the path and row of candidate scenes, the corresponding images files were downloaded from the USGS Earth Explorer portal. These scenes are filtered by acquisition date range, land scene cover and cloud cover tolerances, data type, and sensor identifier type. After importing the geographically-referenced Landsat scenes of a glacier front into QGIS, the front position was identified and then manually traced by defining a set of connected points. The resulting set of points define a “vector” layer that is saved using the common ESRI shapefile format. This technique allowed for a comprehensive data set to record glacial positions for 2017 and 1994.

Glacier front position mapping and change analysis

Over the course of this study, over 200 glacier fronts were mapped for 2017 and 1994.  Where possible, we mapped the same glaciers whose 2015 positions were previously identified by the OMG team (Figure 2). Use of Landsat imagery provided the study with high spatial resolution (30 m) and minimal cloud/scene cover (<10% -<20%) for mapping out the frontal positions of glaciers. For 2017, the Landsat images were found between March and April whereas for 1994 the images were collected between July and September due to lack of availability of March and April scenes. Finding new 2017 scenes and digitizing older 1994 scenes provides a more complete database of the historical record of glacier front position change within Greenland. With these new front data, we can better contextualize recent changes and understand the behavior of glaciers with greater depth (Figure 4).

In analyzing the extent of 2017 glacier front position changes distributed across Greenland relative to 1994 and 2015 positions, we defined six different categories to describe the type of change observed (Figure 5). These six categories are: 1) clear advance, 2) clear retreat, 3) no change or static, 4) undetermined due to limited number of scenes that cannot be clarified by mosaic Landsat scenes, 5) advance/retreat where a part of the glacier has advanced and another retreated and 6) not available where scenes cannot be found for comparison. The categories of advance and retreat are used if a front position’s movement exceed 150m. Note, we cannot distinguish between interannual and seasonal front position variations. We identified retreat in 183 out of 214 glaciers between 2017 and 1994 and 161 out of 260 between 2017 and 2015. Although 55 out of 260 glaciers between 2017 and 2015 advanced, most front changes are categorized as retreat.

Ice surface elevation change

Four dynamically thinning glaciers around Greenland were studied based on available GLISTIN-A flight lines that overlapped with acquired glacial surface elevation lowering data from an earlier mission5. These glaciers (Hagen Brae, Harald Moltke Brae, Tracy Gletscher, Helheimgletscher) had a negative elevation change rate (ma-1) relationship derived between 2005 to 20075. First, Hagen Brae (Figure 6) shows surface lowering of 10 m upstream of the frontal position. Harald Moltke Brae (Figure 7) also follows the same pattern. For Tracy Gletscher (Figure 8), we find both thinning and retreat. However, Heilprin Gletscher, a small dynamically thinning glacier, demonstrates inconsistent patches of surface lowering that is not as continuous as its neighbor, Tracy Gletscher. The fjord into which both Tracy and Heilprin glaciers flow needs to be studied further in terms of its bathymetry, salinity and temperature profiles to understand why these two comparable glaciers seem to have such different elevation differences. Lastly, Helheimgletscher (Figure 9), a large dynamically thinning glacier, has three tributaries that feed into the channel where the front is contained. Although the GLISTIN-A swath analyzed is upstream of the front, we observe that the glacier is thinning at a more rapid rate than that of the last 23 years.


Our newly mapped glacier front positions expand the record of glacier evolutions over the past quarter century. Currently, the glacier front data are running through a submission process to the Physical Oceanography Distributed Active Archive Center (PO. DAAC) at JPL. Having our data stored at PO.DAAC will allow the further dissemination of more comprehensive frontal position change amounts to those who require a database for their research on Greenland. Furthermore, these data will be useful in guiding to OMG project scientists if they need to alter GLISTIN-A swath positions based on ice front advance or retreat. These frontal position shapefiles, all the corresponding Landsat images and reference spreadsheet files are in the shared directory of the OMG Linux server for all team members to use. In terms of the height data, there are still 80 elevation comparisons to run beyond the 4 completed.

This internship addressed the key concept of understanding the effect of subaqueous melts and overall glacial ice mass loss in terms of ocean-ice interactions. Determining the level of glacier volume change throughout Greenland provides further data to compare against other measurements made within the OMG mission including multibeam sonar and water column profiles of ocean temperature and salinity collected by Airborne eXpendable conductivity, temperature, and depth (AXCTD) probes. Both bathymetry and ocean hydrographic data are expected to be useful for investigations into the processes of subaqueous melting.4 Increasing knowledge of this phenomena is imperative to learning more about global sea level rise and the significant role oceans play in shaping ice masses around the world.  In particular, this project has motivated a study of ocean-ice interactions with respect to Tracy and Heilprin Gletschers as they are adjacent systems that experience drastically different rates of dynamic ice loss .


This research was carried out at Jet Propulsion Laboratory, California Institute of Technology, and was sponsored by the Summer Internship Program/Dr. Joshua Willis and the National Aeronautics and Space Administration. Also, I would like to thank Dr.Willis for all his support towards this project and providing academic career guidance, Dr. Ian Fenty and Dr. Ala Khazendar for acting as my mentors and helping with day to day issues and planning of the trajectory of this internship, and Keri Smith for organizing my arrival to JPL and providing logistical assistance. Also, I want to appreciate the efforts of Sean Hardman for setting me up with my account on the Linux server and providing knowledge on the computer infrastructure of the OMG team. Lastly, I want to thank Jessica Hausman in the PO.DAAC group in helping me publish my glacier front data and supplementary data sets.


  1. Truffer, M., and R. J. Motyka (2016), Where glaciers meet water: Subaqueous melt and its relevance to glaciers in various settings, Rev. Geophys., 54, 220– 239 10.1002/2015RG000494
  2. Straneo, F. and Heimbach, P., 2013. North Atlantic warming and the retreat of Greenland’s outlet glaciers. Nature504(7478), p.36.
  3. Felikson D, T. Bartholomaus, G. Catania, N. Korsgaard ,K. Kjaer, M. Morlighem, B. Noel , M. van den Broeke, L. Stearns, E. Shroyer, D. Sutherland, and J. Nash 2017. “Inland Thinning on the Greenland Ice Sheet Controlled by Outlet Glacier Geometry.” Nature Geoscience 10 (5): 366–69.
  4. Fenty, I., J.K. Willis, A. Khazendar, S. Dinardo, R. Forsberg, I. Fukumori, D. Holland, M. Jakobsson, D. Moller, J. Morison, A. Münchow, E. Rignot, M. Schodlok, A.F. Thompson, K. Tinto, M. Rutherford, and N. Trenholm. 2016. Oceans Melting Greenland: Early results from NASA’s ocean-ice mission in Greenland. Oceanography 29(4):72–83,
  5. Pritchard, H.D., Arthern, R.J., Vaughan, D.G. and Edwards, L.A., 2009. Extensive dynamic thinning on the margins of the Greenland and Antarctic ice sheets. Nature461(7266), p.971.
  6. Rignot, E., and J. Mouginot (2012), Ice flow in Greenland for the International Polar Year 2008–2009, Geophys. Res. Lett., 39, L11501, doi:10.1029/2012GL051634.



Screen Shot 2018-07-10 at 10.09.27 PMcurjfigures-6.png


Machine Learning for Cybersecurity: Network-based Botnet Detection Using Time-Limited Flows

Author: Stephanie Ding
Mentor: Julian Bunn
Editor: Sherry Wang


Botnets are collections of connected, malware-infected hosts that can be controlled by a remote attacker. They are one of the most prominent threats in cybersecurity, as they can be used for a wide variety of purposes including denial-of-service attacks, spam or bitcoin mining. We propose a two-stage, machine-learning based method for distinguishing between botnet and non-botnet network traffic, with the aim of reducing false positives by examining both network-centric and host-centric traffic characteristics. In the first stage, we examine network flow records generated over limited time intervals, which provide a concise but partial summary of the complete network traffic profile, and use supervised learning to classify flows as malicious or benign based on a set of extracted statistical features. In the second stage, we perform unsupervised clustering on internal hosts involved in previously identified malicious communications to determine which hosts are most likely to be botnet-infected. Using existing datasets, we demonstrate the feasibility of our method and implement a proof-of-concept, real-time detection system that aggregates the results of multiple classifiers to identify infected hosts.


In the twenty-first century, networked devices have become an integral part of any business or organization; this is because they support an extensive number of applications and services such as access to the World Wide Web, multimedia sharing, file storage, and instant messaging and email. The growing number and complexity of networked devices means that they are increasingly being targeted by cybercriminals, who exploit vulnerable devices for malicious purposes – particularly due to the advent of the Internet of Things (IoT), which represent new potential attack vectors as modern malware is now capable of taking over devices such as surveillance cameras and ‘smart’ household devices with built-in network capabilities. One of the common uses for compromised devices is to integrate them into a botnet – a collection of connected hosts (“bots” or “zombies”) infected with malware that allows them to be controlled by a remote host (the “botmaster”). Botnets are powerful assets for attackers due to their versatility as they can be employed for a wide variety of purposes, such as generating a large amount of traffic towards an existing service in an attempt to prevent legitimate traffic from being processed (known as Distributed Denial-of-Services or DDoS attacks), email phishing and spam, and with the advent of cryptocurrencies in recent years, distributed bitcoin mining, making them one of the most prominent threats in the cybersecurity field.

Presently, botnets are responsible for the majority of online email spam, identity theft, phishing, fraud, ransomware and denial-of-service attacks, and are substantial sources of damage and financial loss for organizations and business. Taking action to neutralize the potential impact and harmful behaviors of botnets have become essential steps in almost all malware mitigation strategies, resulting in the need for rapid and effective identification of botnet infections.

Background and related work

Historically, botnet detection was achieved through setting up “honeypots” or “honeynets” – security mechanisms that appear to contain data and are a legitimate part of some network, but are in fact isolated systems designed to detect and/or counteract attempts at intrusion into the network – and developing specific signatures for various types of botnets in order to defend against future attacks of the same type. These signatures are then used for payload analysis techniques such as deep packet inspection (DPI), which requires individually inspecting every packet transmitted on the network and matching for malicious packet signatures. While payload inspection techniques usually achieve a high level of identification accuracy, they have several downsides. Detailed analysis at the packet level often exposes private information sent by network users, signature-based detection methods are slower to adapt to new and emerging botnet attacks, and the development of large-scale honeypots is a significant time and economic investment. Furthermore, inspecting each packet sent through the network is extremely time and resource-intensive due to the sheer volume of packets that must be processed, making packet-inspection techniques unsuitable for real-time detection.

Network-centric, traffic analysis-based schemes have become increasingly popular as an alternative to signature-based detection schemes as they do not suffer from the same limitations of payload inspection techniques. These methods often focus on examining network flows, which is conventionally defined as a sequence of packets over some period of time, grouped by source internet protocol (IP) address, source port, destination IP address, destination port, and protocol – essentially a summary of a communication channel between two hosts. The underlying assumption of flow-based network analysis is that botnet traffic is distinguishable from regular network traffic in some manner, which can be determined through statistical features irrespective of individual packet contents. This makes a flow-based approach less susceptible to encryption or obfuscation techniques, as well as vastly reducing the amount of data that needs to be processed. Furthermore, many types of bots may exhibit similar patterns of behavior despite having different signatures, making a flow-based approach more generalizable.

Strayer et al. [1] were one of the first to demonstrate the use of supervised machine learning to identify botnets using internet relay chat (IRC) for communication. They were able to successfully classify transmission control protocol (TCP) flows with low (< 3%) false positive and negative rates, despite the fact that  their method modeled TCP as the primary communication channel of botnet traffic. Similarly, Masud et al. [2] were able to detect botnet traffic by using a flow-based machine learning approach and performing classification on host-level forensic and deep packet inspection in order to differentiate between benign and botnet traffic. Saad et al. [3] proposed a new approach of botnet detection, focused on identifying traffic during a period of the botnet life cycle prior to the attack being launched (termed the ‘command and control’ or C&C stage by the authors) and applied machine learning to this subset of network traffic in order to detect peer-to-peer (P2P) botnets, utilizing both host-based and flow-based traffic features. Camelo et al. [4] presented a new method of identifying botnet activity by appropriating features from several data feeds such as domain name server (DNS) domain responses, live communication directed to C&C servers, and performing machine learning on a graph representation of the data, allowing them to identify botnets as singly-connected components of known malicious servers (domains) and infected machines (IPs) to a reasonable degree of accuracy. The success of these methods confirms that botnet traffic exhibits certain characteristics and communication patterns that can be exploited using classification techniques.


Figure 2.1.1

Figure 2.1.1. Overview of our detection method in a proposed system.

Our detection method contained  two stages, allowing us to examine both network-centric features of the traffic and similarities in traffic generated by multiple infected hosts. For each window, we extracted a set of flow features from each flow and then applied a two-stage machine learning process to determine infected network hosts. In stage 1, we utilized supervised learning by training a classifier on existing datasets to perform binary classification and determine which flows are likely to be botnet flows. In stage 2, we applied unsupervised learning and clustered the hosts involved in the identified botnet flows into two clusters, benign and anomalous, based on a separate set of host-based features. We detected whether a host is infected or not by examining a sequence of windows and analyzing the evolution of the host’s cluster membership over time. Finally, to demonstrate the potential real-world applications of our detection method, we built  a proof-of-concept application designed for use in a network monitoring situation and illustrate how our detection scheme can be applied for real-time detection.



The dataset used was the CTU-13 dataset  [5] which is a publicly available, labelled dataset developed by researchers at the Czech Technical University containing thirteen separate scenarios of mixed botnet, background, and normal traffic. In each scenario, the researchers captured network traffic of the entire university network at an edge router for a period of time, during which a botnet infection was simulated on one or several networked virtual machines by running a specific type of malware. Each scenario contained a packet capture (.pcap) of the botnet traffic only, and a truncated .pcap of the full packet capture which included the complete headers of each packet but removed packet payloads to protect the anonymity of network users. The full packet capture was also processed using the utility Argus (the Audit Record Generation and Utilization System) [6] to generate bidirectional network flow summaries (in .binetflow format), with a limited number of features output for each flow. Each .binetflow file was labelled by the researchers to identify particular communications as either a botnet flow, a normal flow, or a background flow.

We examined seven of the thirteen scenarios – scenarios 5, 6, 7, 9, 11, 12 and 13 – with a particular emphasis on scenario 9 as it was one of the larger captures and contained the most infected hosts. The selected scenarios covered a diverse range of botnets that vary in the number of infected network hosts, protocols, and malicious actions, and were representative of a large majority of behaviors found in modern botnets. Figure 3.2.1 describes each scenario in further detail:

Scenario Botnet name Infected hosts Capture duration (hrs) Protocol Behaviors and characteristics
5 Virut 1 11.63 HTTP Spam, port scan, web proxy scanner.
6 Menti 1 2.18 HTTP Port scan, proprietary C&C, RDP.
7 Sogou 1 0.38 HTTP Connects to Chinese hosts.
9 Neris 10 5.18 IRC Spam, click fraud, port scanning. Bitcoin miner.
11 Rbot 3 0.26 IRC ICMP DDoS.
12 NSIS.ay 3 1.21 P2P Synchronization.
13 Virut 1 16.36 HTTP Spam, port scan, captcha and web mail.

Figure 3.1.1. Details of selected scenarios in the CTU-13 dataset.

Stage 1: Supervised learning (binary classification)

3.3.1 Flow extraction and feature selection

A feature is a characteristic of a flow over a period of time, which may either be extracted directly from packet headers (for example, source and destination IP) or calculated from the packet captures (such as packet interarrival time, standard deviation of packet size or ratio of packet asymmetry between source and destination). We utilized the flow exporter Argus (also used by the authors of the CTU-13 dataset) which is capable of generating bidirectional network flow data generator with detailed statistics about each flow, including reachability, load, connectivity, duration, rate, jitter and other metrics. 40 features were selected to describe each flow, based on domain knowledge and some assumptions about the behavior of botnets.

Figure 3.1.2

Figure 3.1.2. The 40 selected flow features ranked by relative importance in classification.

The features that comprised the standard 5-tuple (source IP address, source port, destination IP address, destination port, protocol) were used to define the flow ID for each flow. Almost all features were numeric in nature, with the exception of two categorical values direction and state, which were mapped to discrete integer values through a direct enumeration of unique values. As we wished to select a set of network-agnostic, universal features for the flow feature vector, the features that comprised the 5-tuple were not included in the feature vectors to train the classifiers, as different networks may use a variety of ports and services and have different IP addresses.

3.3.4 Training models on limited time intervals

For each scenario, a training dataset was generated with a ratio of 1:10 botnet to non-botnet flows. This ratio was chosen due to the highly imbalanced nature of the captures, which contained significantly more background and normal flows (around 90% to 97% of the entire dataset) compared to botnet flows (around 0.15% to 8.11%). We found the ratio of 1:10 to be sufficiently similar to the real datasets while containing adequate samples of botnet flows to obtain good classifier accuracy, although other models trained with different botnet to non-botnet flow ratios demonstrated that it is possible to alter model performance and attain a higher TPR at the cost of more false negatives, and vice-versa.

Dataset generation was performed by first splitting each packet capture into 300 second windows. Each window had a 150 second overlap with the previous window, to maintain some temporal continuity between flows split across multiple windows. Argus was then used to generate the flows for each window and compute 36 features for each flow. As the captures vary in length, some contain significantly more flows than others. On shorter captures, a training dataset with 1,000 botnet flow samples and 10,000 non-botnet flow samples was generated, while on longer captures a training dataset containing 10,000 botnet flow samples and 100,000 botnet flow samples was generated.  For each capture, a small portion of the generated dataset would then be used for training a supervised learning classifier while the remaining flows were reserved for testing and validation of the trained classifier.

We initially tested a variety of supervised learning algorithms (Naïve Bayes, Support Vector Machines, Decision Trees, Random Forests) on our dataset to explore the differences in classifier performance and to confirm our hypothesis that the selected 40 features facilitated distinction between botnet and non-botnet flows, ultimately selecting to use random forest classifiers due its robustness and superior performance on our dataset. Using the training dataset, a random forest classifier was trained with 100 estimators to perform binary classification on flow feature vectors generated over 300 second windows, i.e. output 0 if it determines a feature vector to be representative of a non-botnet flow, or output 1 if it determines a feature vector to be representative of a botnet flow.

Stage 2: Unsupervised learning (clustering)

Stage 2 of the detection process applied unsupervised learning on the results of stage 1, and examined host-based characteristics, rather than flow-based characteristics. In each window, the classifier was first used to identify potential botnet flows. An internal host is defined as being involved in botnet communications if it is either the source IP or the destination IP of a flow predicted by the classifier as being a botnet flow. For all hosts involved in botnet communications, seven host-based features were computed:

  1. The total number of predicted botnet flows that the host is involved in.
  2. The total number of outgoing packets from the host involved in botnet flows.
  3. The total number of incoming packets from the host involved in botnet flows.
  4. The total number of incoming bytes from the host involved in botnet flows.
  5. The total number of outgoing bytes from the host involved in botnet flows.
  6. The total number of unique destination ports the host is communicating with.
  7. The total number of unique destination IPs the host is communicating with.

These features were standardized such that the range of values for each feature had a mean of 0 and a standard deviation of 1 and were then used to form a feature vector for each host. Agglomerative hierarchical clustering (a ‘bottom-up’ method of cluster analysis, in which each data point begins in its own cluster, but are progressively merged into other clusters higher up in the hierarchy) was applied on the host feature vectors, using the standard Euclidean distance as the distance metric between vectors and utilizing Ward linkage (minimum variance criterion) which ensures that at each merging step, the pair of clusters that leads to the minimum increase in total inter-cluster variance is chosen.



4.1.1. Stage 1

Due to the imbalanced distribution of botnet and non-botnet flows within the datasets, accuracy alone is not a good predictor of classifier performance as a high accuracy can be achieved simply by classifying the majority of flows as non-botnet due to there being significantly higher numbers of non-botnet flows compared to botnet flows. Thus, in stage 1, the performance of the binary classifier was evaluated with metrics such as the true positive rate and true negative rate instead. Here we define a true positive as a flow vector with a botnet flow ID, during a time window when the botnet is running, that is classified as a botnet flow. Similarly, a true negative is a botnet flow vector with a non-botnet flow ID that is correctly classified as a non-botnet flow.

A separate classifier was trained for each scenario examined, and the performance of the classifier was evaluated on the specific dataset it was trained on. The following table describes the number of flow vectors extracted from each dataset:

Scenario Unique botnet flow IDs Unique normal flow IDs Unique background flow IDs Number of windows Average number of flows per window Total botnet flow vectors Total non-botnet flow vectors
5 856 4631 113667 13 32310 2326 417704
6 4621 7238 394056 52 48911 10960 2532403
7 44 1666 103950 9 47119 308 423766
9 93438 27749 1100291 125 67714 435076 8029131
11 8155 613 91436 7 45301 8855 308249
12 1972 7448 265712 30 40159 9234 1195532
13 32866 27183 943512 393 18136 98997 7028311

Figure 4.1.1. Statistics about flow vectors extracted from each scenario.

Figure 4.1.2

Figure 4.1.2. Total TPR/TNR across all scenarios examined. True positive rate (TPR) was calculated as TPR=TPTP+FN and true negative rate (TNR) was calculated as TNR=TNTN+FP. The total TPR/TNR was calculated on the sum total of the true positives and true negatives across all windows in a dataset, while for the average TPR/TNR reflects the average of all individual TPR/TNR values calculated on each window separately.

4 1 3

Figure 4.1.3. Average TPR/TNR across all scenarios examined. As the botnet is not active in the first few windows for many scenarios, these values are calculated only during time windows when the botnet is running.

4.1.2 Stage 2

In unsupervised learning, we made the heuristic assumption that the anomalous/botnet cluster has higher intra-cluster variance, which was true for the datasets examined. The performance evaluation of stage 2 reflects the number of local IP addresses that are correctly clustered. Note that in this context a false negative is a botnet host that was not present in the anomalous cluster during a window when the botnet is running – either as a result of being clustered incorrectly into the majority cluster, or its flows not being identified by the classifier as botnet flows.

Using t-distributed stochastic neighbor embedding (t-SNE), a dimensionality reduction algorithm that facilitates visualization of clustering results, separation between points corresponding to botnet and non-botnet hosts was clearly observed during windows when the botnet was active, despite the clustering algorithm failing to identify the correct clusters in all cases. This indicates an underlying structure and differences between botnet and non-botnet traffic that could potentially be better modelled with other clustering methods in the future.

Figure 4.1.4. t-SNE of host feature vectors on some windows in scenario 9 when the botnet was inactive. Red dots denote true positives, blue dots denote true negatives, red crosses indicate false positives, and blue crosses denote false negatives. A lack of structure in the underlying data is observed and the anomalous cluster consists of a few isolated edge points.

Figure 4.1.5. t-SNE of host feature vectors on some windows in scenario 9 when the botnet was active.

Due to the temporal nature of the windowed captures, the true positive rate, false positive rate, and false negative rate were measured across all time windows for each scenario (Figure 4.1.4). Note that some scenarios (5, 6, 7, 13) only have a single infected host, so if multiple false positives occur this is shown as a false positive rate of >1.

   Screen Shot 2018-07-10 at 9.55.01 PM

Figure 4.1.6. Graphs of stage 2 TPR/FPR/FNR over time windows.

Scenario Total IPs TP FP FN TN TPR TNR
5 252 9 11 1 231 90.00% 95.45%
6 513 50 4 0 459 100.00% 99.14%
7 173 3 8 2 160 60.00% 95.24%
9 5549 427 151 161 4810 72.62% 96.96%
11 59 11 7 3 38 78.57% 84.44%
12 643 44 12 31 556 58.67% 97.89%
12 3471 392 50 0 3029 100.00% 98.38%

Figure 4.1.7. The total true positives, true negatives, false positives and false negatives for each scenario, which is a sum of the respective values across all time windows.

The classifier performance in stage 1 demonstrated that the selected 40 features were capable of distinguishing between botnet and non-botnet flows to a high degree of accuracy (> 89%), even if the flows were only generated over fixed-time windows and provided a limited perspective of the entire network traffic. The true negative rate was consistently high, but this is likely a reflection of the nature of the training set, in which there were significantly more non-botnet flows.

The classifier for scenario 12 had a significantly lower total TPR when compared to other scenarios. This may have been due to scenario 12 featuring a P2P botnet, which, unlike a traditional botnet with a single C&C server, communicates with a list of trusted servers (including other infected machines) in a decentralized manner. The traffic was likely to be less structured and have greater variance, appearing more similar to background traffic and thus making the classification task more difficult. In contrast, scenarios 6 and 11, which are a traditional C&C botnet and an ICMP DDoS botnet respectively, were much more likely to have botnet traffic that is more structured and distinguishable from regular network traffic, and thus the classifier trained for these scenarios have better performance.

In stage 2, generally, true infected hosts were consistently clustered into the anomalous cluster after the botnet began running, while non-infected hosts (false positives) were not consistently identified as such across consecutive windows. The detection scheme was highly effective on scenarios 5, 6, and 13, where each contained one infected host, and maintained a TPR of 1.0 throughout the duration of the botnet’s execution. Scenarios 7 and 11 also featured a TPR rate of 1.0 when the bot began running, but the detection rate dropped over subsequent windows. The performance of the detector may have been affected by the relatively short duration of these captures. Scenario 9 featured a more realistic example of botnet traffic on a network, as it contained ten infected hosts. After the botnet begins running, a gradual increase in the detection rate is observed, reaching 1.0 during later stages of the botnet’s execution.

Scenario 12 contained 3 infected hosts. The TPR for scenario 12 fluctuated between 0.33 and 0.66 during the majority of the botnet’s execution, indicating that 1 or 2 of the 3 infected hosts were consistently detected. This may have been due to the lower accuracy of the scenario 12 classifier in stage 1, which impacts the number of flows identified as botnet flows. It is likely that some botnet flows were incorrectly classified by the classifier as benign, resulting in a host not being identified as involved in botnet communications.


The results of this project show that it is possible to distinguish between botnet and non-botnet network flows to a high degree of accuracy (> 0.89 TPR), even if the flows are generated over limited time windows and provide an incomplete representation of the complete network traffic profile. This makes time-limited flows suitable for the purpose of real-time detection. Furthermore, the two-stage process of classification and clustering is able to effectively identify infected hosts for several classes of malware. Although our current clustering method is not able to produce the ideal clusters correctly in all windows, t-SNE visualization of the host data points indicates strong separability between the botnet hosts and non-botnet hosts. This has implications for further research in this area as clustering could be improved by using different algorithms in order to detect the underlying structure.

One limitation of our work is that the classifiers are trained on existing botnet data, making our detection method potentially vulnerable to new and emerging types of botnets which may have different traffic patterns. Furthermore, as we rely on statistical features of flows for classification, attackers may evade detection through varying these characteristics if they are known. Additionally, our criterion for labelling the normal and anomalous cluster is presently only a heuristic that is valid for the datasets we have been examining, and is not true of all networks, and a more robust method of identifying normal and anomalous is necessary to generalize this detection scheme to other types of botnets.

We envision our detection method being used as part of a real-time network monitoring solution. To demonstrate the potential practical applications of our detection method, we have developed a prototype real-time detection system and tested it on CTU scenario 9, successfully identifying all 10 true infected hosts with a minimal number of false positives. For more information, see the project log at or the Github repository


[1] W. T. Strayer, D. Lapsely, R. Walsh and C. Livadas, “Botnet detection based on network behaviour,” Advances in Information Security, vol. 36, pp. 1-24, 2008.

[2] M. Masud, T. Al-khateeb, L. Khan, B. Thuraisingham and K. Hamlen, “Flow-based identification of botnet traffic by mining multiple log files,” in First International Conference on Distributed Framework and Applications, 2008.

[3] S. Saad, I. Traore and A. Ghorbani, “Detecting P2P botnets through network behavior analysis and machine learning,” 2011 Ninth Annual International Conference on Privacy, Security and Trust, 2011.

[4] P. Camelo, J. Moura and L. Krippahl, “CONDENSER: A Graph-Based Approach for Detecting Botnets,” 2014.

[5] S. Garcia, M. Grill, H. Stiborek and A. Zunino, “An empirical comparison of botnet detection methods,” Computers and Security Journal, vol. 45, pp. 100-123, 2014.

[6] QoSient, LLC., “Argus: Auditing network activity,” 1 June 2016. [Online]. Available: [Accessed 22 September 2017].

[7] Wireshark Foundation, “Wireshark,” 29 August 2017. [Online]. Available: [Accessed 22 September 2017].

[8] S. Garcia, “CTU-Malware-Capture-Botnet-51 or Scenario 10 in the CTU-13 dataset.,” 05 May 2017. [Online]. Available: [Accessed 22 September 2017].

[9] A. H. Lashkari, G. Draper-Gil, M. S. Mamu and A. A. Ghorbani, “Characterization of Tor Traffic Using Time Based Features,” in 3rd International Conference on Information System Security and Privacy, Porto, 2017.

[10] L. v. d. Maaten and G. Hinton, “Visualizing Data using t-SNE,” Journal of Machine Learning Research, vol. 9, pp. 2579-2605, 2008.

[11] Riverbank Computing Limited, “PyQt4 Download,” 2016. [Online]. Available: [Accessed 22 September 2017].

[12] L. Campagnola, “PyQtGraph,” 2012. [Online]. Available: [Accessed 22 September 2017].

[13] S. Zander and C. Schmoll, “netmate-flowcalc,” 4 December 2016. [Online]. Available: [Accessed 22 September 2017].

[14] S. Burschka, B. Dupasquier and A. Fiaux, “Tranalyzer,” 23 June 2017. [Online]. Available: [Accessed 22 September 2017].


I would like to acknowledge my mentor, Dr. Julian Bunn for giving me the opportunity to participate in the Caltech Summer Undergraduate Research Fellowship (SURF) program. Thank you so much for your endless patience, continued support, insight and guidance throughout this project.

I would also like to thank Bruce Nickerson and his family for the honor of being named as the J. Weldon Green SURF Fellow for 2017. Thank you for your commitment to supporting aspiring undergraduate researchers at Caltech – this project was only made possible by your generous support and contribution towards the SURF program.

Finally, I would also like to thank Arun Viswanathan and Dr. K. Mani Chandy for their assistance and input on technical aspects of this project.

Interview with Professor Maxwell J. Robb

Interview by Jonathan Chan, Associate Editor


Fun trivia:

  • Favorite food: sushi
  • Favorite genre of music: punk rock
  • Favorite molecule: naphthopyran
  • Favorite TV show: obliged to say Breaking Bad
  • Favorite scientist(s): Craig Hawker and Jeffrey Moore (former research advisers)

1. What type of research does your lab work on, and how might this research change the world?

The research I do is in the field of synthetic polymer chemistry. We’re interested in making soft materials that are functional – they respond to their environment by changing color or changing some other property. In particular, we’re interested in developing polymers that undergo a chemical change under mechanical forces. Under stress, these molecules, called mechanophores, might fluoresce, emit light, or undergo a change in their electrical conductance. In terms of applications, we spend a lot of time developing molecules that will undergo different chemical reactions at discrete force thresholds. If each of these molecules produces a different color change, for example, under a different force, then we can obtain a visual indication for the level of stress in a material. Overall, our research aims to take advantage of mechanical stimulation to design materials with unusual and fascinating properties for a wide variety of different applications.

2. What computational tools does your lab use to inform its research? How good is the match between theory and experiment?

We’re predominantly an experimental group. But before we get into the lab, we do some relatively simple computational calculations on new potential mechanophore molecules. These calculations only take about a day or two to complete, and they allow us to predict the effect of elongation at the single-molecule level. Essentially, we are asking, “What happens to the molecule when we pull it apart?” These computations are a powerful tool because they help steer us in the right direction in determining whether a molecule will have the properties that we are trying to access. Once we have some positive computational results, then we go into the lab and try to synthesize these molecules. We are still trying to answer the question of whether our computational models are reliable. So far, these models have only been applied to a relatively small number of molecules. We’re working to determine if the models we use are effective in predicting molecular behavior more generally.

3. How does Caltech compare to previous institutions that you’ve worked in? How would you compare yourself as an undergrad to a “typical” Caltech undergrad?

Caltech is incredibly supportive of all faculty, including junior faculty like me. It’s also supportive of all members of the community. I think this support structure, in addition to how much the students have a role in governing how the institution operates, makes Caltech different than other institutions. I was an undergraduate student at a science and engineering school – the Colorado School of Mines. It was a relatively small school of about 4000-5000 undergrads, so I think my undergraduate experience was similar to Caltech’s in terms of the overall environment, the small size, and the high level of interaction I had with faculty.

4. Tell me about your typical day.

The nice thing about doing this job is that every day is a little different. When I come into work, I typically read literature in the morning for 30 or 45 minutes – anything that looks interesting. Then, I’ll typically respond to emails – that unfortunately takes a lot of time. I try to spend as much time as possible thinking about the research projects in my group and coming up with new research directions. When my group was just getting started, I spent a lot of time in the lab – sometimes doing experiments myself, but mostly working alongside my students and postdocs to help them with their experiments.  Now, I spend a lot of time meeting with my group members to discuss their research and help them evaluate the data that they’re collecting.  Then there’s preparing lecture notes and content for my courses.  And of course, lots of writing!

5. What types of pressures do you deal with as a professor and how do you manage that?

Financially, I am in the fortunate position in which I have startup support from Caltech. This support is designed to get my group’s research off the ground. However, there is a constant pressure to think about how to support our research through external grants. That’s something that professors have on their minds all the time – trying to find out which funding organizations are interested in the work we do, and to write proposals to secure that funding. The other pressure that I face, particularly as a pre-tenure faculty member, is making sure that the research we do will have a big impact on the community. As a professor, you have pressure from your peers and your senior colleagues – sometimes direct, but often it’s indirect.  We need to be doing cutting-edge research that’s pushing the boundaries of our field. So that’s what we strive for. I’m reminded of my favorite quote from Arnold Beckman – it’s a guiding philosophy for my work, and I try to emphasize it regularly to my students: “There is no satisfactory substitute for excellence.”

6. What do you think the next “big” discovery will be in polymer chemistry?

I think that controlled radical polymerization deserves to win a Nobel Prize. That’s not the next “big discovery” per say, but it would be the next “big recognition.” The problem is that so many people have contributed and influenced the field that it’s difficult for the Nobel Committee to come up with three undisputed visionaries. The ability to control polymerizations and synthesize polymers through radical methods has allowed polymer chemistry to impact so many scientific disciplines. One of the longstanding goals in polymer chemistry has been to develop sequence-defined polymers – much like how proteins are defined by an exact sequence of amino acids. Currently, with synthetic polymers, we can’t control the exact sequence and the composition of each repeating unit in that polymer structure with the precision, for example, of proteins or other natural polymers. Being able to do that is certainly a “holy grail” of polymer chemistry. With that said, there’s still the open question of whether being able to control the sequence even matters – maybe a highly sequence-controlled polymer would not perform better than polymers in which the sequences are less well-defined.

7. What interests do you have outside of science, and how much time do you spend on those interests?

I like outdoor activities a lot. I like to hike. I like to golf. In October, there’s a CCE division golf tournament in which grad students, staff members, and a couple faculty members participate. I played terribly in that, but it was fun. On a Sunday, I try to go on a hike. It doesn’t happen all that often, but I try to because I think a balance between work and leisure is really important; you need to get outside of your head sometimes and take a break from your academic routine. That way, you can come back to your work refreshed and ready to focus.

8. Today, America is undoubtedly the world’s scientific leader. Will it still be the leader in science in the next fifty years and beyond?

It’s an important question. I hope so. It’s very dependent on policy makers in this country. As scientists, we need to do a much better job in convincing our fellow nonscientist citizens why scientific research and science education is important. We need to convince them what science can do for the economy and for different sectors outside academia. I think that America will remain a leader in science inasmuch as we have the support of funding agencies and policymakers. Funding for more fundamental research is critical as well. A lot of grant-funded research now is focused on direct applications – we are overly concerned with how the new technology is going to immediately affect the community in the next month or year. We need to look beyond that into developing new understandings of the world that might lead to advances we haven’t yet conceived. Another policy-related issue is our ability to recruit and retain the most talented people from around the world. Diversity has always contributed to the strength of our science and it’s something we need to continue working toward. It’s scary to think that the future of American leadership in science is uncertain right now, but I still believe that the U.S. will continue to lead. Caltech is one of the great beacons for science and so we are in a position – and have a responsibility – to lead by example and continue the tradition of excellence in American science and ingenuity.

Development and Implementation of Captive Trajectories in the NOAH Water Tunnel Laboratory

Abstract: The NOAH Water Tunnel Laboratory is currently used to improve our understanding of various aspects of turbulence and fluid mechanics. The goal of this project is to understand the capabilities of a newly installed technology and the opportunities it presents for enhancing the the use of the NOAH Laboratory in the future. The recently installed Captive Trajectory System (CTS) , a cyber-mechanical system capable of moving and rotating within the water tunnel, allows us to explore new methods of simulating objects moving in complex, turbulent fluids. By harnessing the ability to program how the CTS behaves, the system was shown effective in modeling the motion of an object in real time as variable forces were applied to it. Through simple, controlled examples, we discovered the ability of the CTS to accurately model various types of motion, such as that of a mass-spring-damper system, or a planet orbiting a sun. In more complex examples, the CTS was able to simulate the general behavior of an airfoil in the wake of a cylinder with vortex shedding. The examples explored over the course of the project have proven that the CTS can be used as a useful experimental tool and will open the door to new methods of studying turbulence and unsteady aerodynamics in the future.

Author: Alex Wuschner
Mentor: Dr. Beverley McKeon
Co-Mentor: Maysam Shamai
Editor: Sophie Ding

Introduction: For many years, scientists and engineers have studied the phenomenon of turbulence intensely, aiming to learn how it’s governed, how it can be predicted, and how we can help effectively engineer the world with it in mind. In the NOAH Water Tunnel Laboratory, the newly installed Captive Trajectory System (CTS) has been utilized to study just that. Over the course of the summer of 2017, the capabilities of this system were explored and demonstrated through examples of increasing complexity. As a programmable cyber-mechanical machine capable of moving and rotating attached objects within the NOAH Water Tunnel, by the end of the summer the CTS was used to study how airfoils moved and behaved when subject to a turbulent phenomenon known as vortex shedding. Vortex shedding, a turbulent phenomenon characterized by a line of vortices alternating in orientation, is depicted in Figure 1 below.

Figure 1: Vortex shedding, a common turbulent phenomenon in which vortices are “shed” off an object with alternating orientation, is easy reproducible by flowing water over a cylinder.

Vortex shedding is a common phenomenon that is easily reproduced, such as by flowing water at a high enough speed past a cylinder. After observing how an airfoil’s motion was influenced by vortex shedding by allowing it to move freely in the water tunnel, the CTS was programmed in attempts to replicate the observed motion. By closely replicating how the CTS moved to the observed motion, the principles that guided the turbulent interactions with the airfoil could be further understood.

Since turbulence has many seemingly unpredictable properties, the CTS was programmed not in a way that would repeat the same exact path of motion with each trial, but rather in a way that would respond and react in approximately real time to the forces acting on the airfoil due to its interaction with the stream of vortices. This method of capturing an object’s path of motion, known as a captive trajectory, was implemented in multiple “example cases” to replicate situations, such as a object moving while attached to a spring or an object orbiting another according to the laws of gravity, as a means of proving the accuracy and precision with which the CTS was capable of operating. Using a force sensor and gathering position and velocity data from the CTS at very small time steps (about 200 Hz), the CTS would constantly adapt its trajectory in response to the small changes in the state of the system. Using these measurements, small time steps, and any other factors implemented in the programming of the CTS, enabling the CTS to produce captive trajectories allowed us to accurately mimic the type of motion predicted by theory or observation.

Developing a Mass-Spring-Damper Captive Trajectory: Before its application to complex, multi-dimensional systems, the CTS was first used to model simpler, well-understood physical systems. The first captive trajectory developed modeled a mass-spring-damper system restricted to one degree of freedom. Since this model is well understood, the response of the CTS was compared to theoretical expectations to assess the captive trajectory’s accuracy.

Both the measured applied forces as well as virtual forces were included in the design of this captive trajectory. Taking into account a restoring spring force proportional to the displacement from equilibrium and a damping force proportional to the velocity, the equation governing the motion of the CTS due to virtual forces is given as:

mx”= -kx-bx’        (1)

in which k and b are virtually defined coefficients, and m is the virtually defined mass of the test body.

The measured applied forces were received directly from the force sensor. Including the measured forces, F(t), the overall equation governing the motion of the CTS was:

mx”= -kx-bx’+F(t)        (2)

An explicit Euler method was used to calculate the subsequent position and velocity at each time-step.

After collecting and analyzing the trajectory data, as displayed in Figure 2, it was clear that the amplitudes and periods of the oscillation matched those predicted by theory very well.

Figure 2: The response of the CTS simulating a mass-spring-damper system. A 295g weight was placed on the force sensor at approximately t = 15 seconds and then removed at approximately t = 30 seconds.

Aside from its good agreement with theoretical results, this first implementation of a captive trajectory also served to verify the responsiveness of the CTS. For the more complex experiments the CTS will eventually be used for, hydrodynamic forces are spatially and temporally dependent.

Figure 3: A comparison of the actual and desired CTS response.  The trajectory that is predicted by the software appears in red while the trajectory that is actuated appears in blue.  The lines appear dashed due to the overlap of the different colors.

Observing Figure 3, a comparison of the commanded position and velocity with the actuated position and velocity, it can be seen that the differences between the commanded and implemented outputs were very small, and thus the CTS proved it could closely match the trajectory it was simulating.

Developing a Gravitational Model Captive Trajectory: Once the mass-spring-damper trajectory was successfully designed and implemented, the next step involved the implementation of a captive trajectory with multiple degrees of freedom. As a result of the previously described benefits of modelling a well-known system, a gravitational model was chosen as the type of motion that this captive trajectory would replicate.  After defining a large, virtual, stationary mass located at the approximate center of the tunnel, the CTS was programmed to govern the motion of a smaller mass elsewhere in the coordinate system based on gravitational forces.

In the absence of measured forces, the equation governing the virtual force used in the CTS is given by:

screen-shot-2018-07-10-at-9-14-32-pm.png       (3)

in which M is the virtually defined mass of the central stationary body, m is the virtually defined mass of the test body, r is the distance between the two, and k is a constant.

The virtually defined forces separated into components are given as:

screen-shot-2018-07-10-at-9-14-37-pm.png    (4)

Screen Shot 2018-07-10 at 9.14.42 PM      (5)

Each of these components are numerically integrated to yield the commanded velocity and position in each direction.

After designing the trajectory, it was implemented on the CTS in the form of a captive trajectory program. The initial conditions of the trajectory could be varied to produce circular, elliptical, and hyperbolic orbits, as predicted by theory.  Examples of the orbits implemented on the CTS are shown below in Figure 4.

Screen Shot 2018-07-10 at 9.18.31 PM

When the trajectories were manually manipulated by forces applied during the middle of the trajectory, the CTS immediately responded to both the direction and magnitude of the applied force.  In Figure 5, an example is shown in which initial conditions were set defining a circular trajectory, but after a couple orbits a force was applied that pushed the CTS into an elliptical orbit.

Figure 5: The implementation of a gravitational trajectory in which initial conditions were set to create a circular orbit, and then an applied force pushed the CTS into an elliptical orbit.

Like the mass-spring-damper trajectory, the gravitational trajectory demonstrated the system’s potential to quickly and accurately respond to changes in both measured and virtual forces.

Modelling Unsteady Aerodynamics: During the spring of 2017, Morgan Hooper and Ben Barthel used the NOAH Laboratory to observe the free-response of an airfoil in the wake of a cylinder with vortex shedding.4 While vortex shedding has been heavily documented, its interactions with airfoils has not.  Thus, it was considered an interesting case to study, and would serve as a way to compare the behavior of the CTS to a corresponding physical system.

The experimental setup allowed for the observation of a NACA 0018 airfoil that was free to exhibit heaving motion along the spanwise axis as well as pitching motion about the vertical z-axis.4 All other degrees of freedom were restricted.  Due to reasons described in Hooper and Barthel (2017), the experimental setup was slightly revised during the current study to reduce friction by using a second linear motion cart, improved rotary bearing, and other minor improvements. A Sharp GP2Y0A21YK0F IR Distance sensor, and a Vishay HE-351 Hall-Effect Rotary Encoder were used to measure the translation and pitch of the airfoil during experiments.4 The free-response airfoil assembly is shown in Figure 6.

Figure 6: The free response airfoil assembly. Linear rolling carts are used to enable heaving motion, and the airfoil is allowed to pitch about its central shaft.  Position and forces are measured using appropriate sensors.

A 10.1 cm diameter PVC cylinder was placed upstream of the airfoil in the center of the test section in order to generate vortices.  The airfoil mounting assembly was placed downstream, and for each run the airfoil was set to start at the center of the test sections with no angle of attack relative to the flow.  The experimental setup is shown in Figure 7. The free-response of the airfoil was studied at multiple tunnel speeds.

Figure 7: The free-response experimental setup.  Alternating vortices are shed from a cylinder upstream of an airfoil free to pitch and heave.

A combination of applied and virtually defined forces were considered very carefully when designing the captive trajectory program.  The attempts to simulate this type of motion differed from those of the mass-spring-damper and the gravitational trajectories in multiple ways, the most important of which being that in this captive trajectory, forces were measured on an actual test object.  In order to do so, a NACA 0018 airfoil was mounted to the CTS, and remained inside the water tunnel during the course of the trajectory. This addition required the consideration of multiple other factors when attempting to predict the airfoil’s motion. A diagram of the mounted airfoil assembly is shown in Figure 8.

Screen Shot 2018-07-10 at 9.24.12 PM.png
Figure 8:  The airfoil assembly mounted on the CTS.  The airfoil is attached to the CTS by a shaft whose axis coincides with the leading edge.  The force sensor mounted at the top is able to measure forces on the airfoil and torques about the leading edge.

In the captive trajectory model, hydrodynamic forces were assumed to be the mechanism controlling the airfoil’s motion. In the force sensor coordinate system, forces measured in both the x and y-axes, as shown in the following equation:

Screen Shot 2018-07-10 at 9.25.22 PM.png(5)

As would be intuitively expected, larger masses and moments of inertia responded to changing hydrodynamic forces in a less dramatic manner than tests with a smaller virtual mass and moment of inertia.

For the majority of parameters, the captive trajectory program produced a response that shared many similarities to that of the free-response setup.  Oscillations consistent with the frequency of the vortex shedding were observed in both the heaving motion along the y-axis and pitching motion. The relationships between the heaving and pitching oscillations showed similarities to the pattern observed in the experimental setup.  As the mass and moment of inertia were lowered to the values of the real parameters, the trajectories became more and more similar to those observed in the experimental trials. Figure 9 shows a plot of the airfoil’s heaving and pitching position.

Screen Shot 2018-07-10 at 9.30.49 PM.png
Figure 9: The trajectory predicted and implemented by the CTS for an airfoil with a virtual mass of 10 kg and a virtual moment of inertia of 0.05 kg·m2.  The trajectory ended when the position of the airfoil exceeded the 0.1m software-defined heaving position bound at about t = 76 seconds.

As the virtual parameters were reduced even further to approach the actual values, the CTS was no longer able to predict and actuate the correct trajectory of the airfoil.  This was determined to be a caused by the physical, not virtual, restraints of the CTS and the force sensor.

When force data from the sensor is observed, it becomes evident that given the typical forces produced by the vortices on the airfoil, no vortices produce a force that would accelerate the airfoil beyond the acceleration bounds of the CTS’ motors.  However, as noted previously, the force sensor is susceptible to noise, and it seems that this sensor noise is most likely the cause of the unrealistically large accelerations.

Before each trajectory began, a delay time was included in which the force sensor was able to read initial forces on the force sensor.  At this time, the airfoil was in its starting position and the forces being read were those of the alternating vortices being shed on the stationary airfoil.

Figure 10: The pitching torques read on the airfoil before and after the initiation of the trajectory at about t = 65 seconds.  Initially, as the stationary airfoil experiences vortex shedding, the force sensor picks up measurements with little relative noise.  As the trajectory initiated and the airfoil began to move, the noise significantly amplified measurements despite the vortices being of the same strength as before.

While the airfoil was stationary, the noise in the force sensor data was noticeable, but relatively insignificant compared to the overall fluctuation of forces with each passing vortex. However, as the airfoil began its trajectory, the level of noise read by the force sensor dramatically increased to the point at which the noise amplitude surpassed force fluctuations produced by the vortices, leading to choppy movement of the airfoil. This can be seen in Figure 10.

It is evident that the increased noise stemming from the choppy movement of the airfoil caused the captive trajectory program to predict acceleration values that exceed acceleration constraints.

As can be observed from its motion, the airfoil vibrates throughout its trajectory.  This is likely due to the noise in the force sensor: the small fluctuations in the force readings are directly translated to fluctuations in acceleration, which can produce a shaky type of motion. In order to solve this issue, the logical next step was to attempt to filter out the force sensor noise so that the force readings translating into motion would appear much smoother.  A low pass filter of the force sensor readings was implemented, as this type of filter is effective in removing the effects of the small, random fluctuations of the sensor readings.

Multiple considerations were accounted for when choosing a filter. As mentioned before, the filter needed to be a low pass filter and it also needed to have minimal phase lag, otherwise known as the time delay between changes in the actual forces and changes in force measurement readings.  If the phase lag of the filter is too large, the airfoil will not immediately respond to the actual forces (as the force readings lag in time) and will produce a trajectory that fails to model the system.

Screen Shot 2018-07-10 at 9.36.58 PM.png
Figure 11: The effect of a low-pass moving average filter on noise amplification due to vibration.  As the size of the filter is increased, the vibration of the CTS is reduced.

To assess the effect of a filter on the airfoil vibrations, a simple moving average filter was implemented in the captive trajectory program.  Figure 11 shows the changes in the raw force sensor readings as the size of the filter was increased to average over a greater number of past force values.

As the filter size increases, the force sensor readings show that the vibration is noticeably decreased. However, for large filter sizes, the airfoil motion noticeably lags behind the varying forces caused by the vortices, and thus this simple moving average filter is not sufficient.  Future work must include the development of a low pass filter that is able to effectively reduce the system vibration without introducing significant phase lag.

Conclusions: The progress made on the development of captive trajectories in the NOAH Laboratory has laid a foundation with which the CTS may be utilized as a tool for the study of free-response aerodynamic behavior. The mass-spring-damper and the gravitational models successfully proved the CTS’ ability to respond to forces in real-time to produce captive trajectories.  While the captive trajectory modelling the airfoil in the wake of a cylinder with vortex shedding is still being refined, it has also provided a lot of insight into the complexities involved in attempting to replicate a system without an analytical solution to describe the motion of the model. Future work will be directed towards the design of a low pass filter to properly filter out the high frequency, zero mean noise from the force sensor while minimizing phase lag.  This will allow the CTS to better simulate systems with lower virtual mass or large force fluctuations.


  1. Smits, A. J. & Marusic, I. Wall-bounded turbulence. Physics Today 66,25–30 (2013).
  2. Overview. xerial / sqlite-jdbc – Bitbucket Available at: (Accessed: 17th June 2018)
  3. Mackowski, A. & Williamson, C. Developing a cyber-physical fluid dynamics facility for fluid–structure interaction studies. Journal of Fluids and Structures 27, 748-757 (2011).
  4. Hooper & Morgan, & Barthel, B. Free-response of an Airfoil to Periodic Vortex Shedding

Acknowledgments: The opportunity to conduct the research presented in this paper was made possible by Dr. Beverley McKeon and the McKeon Research Group in GALCIT as a part of the Caltech SURF Program.  A special thanks goes to Maysam Shamai for guidance and assistance throughout the project, as well as to Morgan Hooper and Ben Barthel for their contributions to the design of the free-response experimental setup in NOAH Water Tunnel.  I would also like to thank Will Dickson for his contributions towards the design and construction of the CTS, as well as his assistance throughout the various stages of the project. Final appreciation goes to the Caltech Associates for contributing towards the funding of this research and the opportunity this project has allowed me.