Igniting the Future of Quantitative Trading
About Us
Legal
R&D
Core Technology
Business
Members
About Us
Our conglomerate, driven by AI technology,
excels in high-frequency trading.
Legal
Inclusivity, compliance, and transparency have always been the guiding principles we adhere to.
R&D
Our trading method is based on algorithmic research and development.
Core Technology
The Firefly Trading System is our core competitive advantage, as it encompasses all of our trading accomplishments.
Business
We consistently upholds the principles of transparency, compliance, and rationality to ensure that our business collaborations meet the highest industry standards.
Members
To become our member, you need to comply with our Fund's KYC and anti-money laundering policies.
Artificial Intelligence Research
The process of learning and extracting patterns from data in order to achieve specific tasks in artificial intelligence typically involves the following steps: Firstly, a model is established, and then a large amount of sample data is used to train the model. This involves adjusting and optimizing the model's parameters based on the provided training data, aiming to make accurate predictions or decisions on unseen data.
The core principles of using artificial intelligence
In machine learning and deep learning, it is common to construct a model, which is a mathematical function or algorithm, to represent the relationship between input data and output results. Subsequently, a large amount of training data is utilized as a dataset to adjust and optimize the model's parameters, enabling it to learn and extract patterns and features from the data.
During the training process, the model adjusts its parameters based on the provided training data to minimize the discrepancy (loss function) between predicted results and actual results. Through iterative optimization algorithms, the model gradually fine-tunes its parameters to improve its accuracy on the training data. Once the model has been trained and performs well on the training set, it can be used to make predictions or decisions on unseen data. By applying the learned patterns and features, the model infers new data and strives to accurately predict their corresponding outputs.
This process embodies the fundamental concept of learning and extracting patterns from data, which is the essence of artificial intelligence approaches for accomplishing specific tasks. By training the model, we can extract valuable information and structures from the data, enabling the model to perform accurate predictions, classifications, regressions, and other tasks.
Toolbox for integrating the use of different frameworks
Utilizing Neural Networks for Deep Learning (Python)
We will show how to build Neural Networks with Python and how to explain Deep Learning using visualization and creating an explainer for model predictions.
Deep Learning is a type of machine learning that imitates the way humans gain certain types of knowledge, and it got more popular over the years compared to standard models. While traditional algorithms are linear, Deep Learning models, generally Neural Networks, are stacked in a hierarchy of increasing complexity and abstraction (therefore the “deep” in Deep Learning).
Neural Networks are based on a collection of connected units (neurons), which, just like the synapses in a brain, can transmit a signal to other neurons, so that, acting like interconnected brain cells, they can learn and make decisions in a more human-like manner.
Often data scientists, first have to simplify these complex algorithms for the Business, and then explain and justify the results of the models, which is not always simple with Neural Networks. The best way to do it is through visualization.
Environment Setup, tensorflow vs pytorch.
Artificial Neural Networks breakdown, input, output, hidden layers, activation functions.
Deep Learning with deep neural networks.
Model design with tensorflow/keras.
Visualization of Neural Networks with python.
Model training & testing.
Explainability with shap.
There are two main libraries for building Neural Networks: TensorFlow (developed by Google) and PyTorch (developed by Facebook). They can perform similar tasks, but the former is more production-ready while the latter is good for building rapid prototypes because it is easier to learn.
Those two libraries are favored by the community and businesses because they can leverage the power of the NVIDIA GPUs. That is very useful, and sometimes necessary, for processing big datasets like a corpus of text or a gallery of images.
ANN are made of layers with an input and an output dimension. The latter is determined by the number of neurons (also called “nodes”), a computational unit that connects the weighted inputs through an activation function (which helps the neuron to switch on/off). The weights, like in most of the machine learning algorithms, are randomly initialized and optimized during the training to minimize a loss function.
The layers can be grouped as:
Input layer has the job to pass the input vector to the Neural Network. If we have a matrix of 3 features (shape N x 3), this layer takes 3 numbers as the input and passes the same 3 numbers to the next layer.
Hidden layers represent the intermediary nodes, they apply several transformations to the numbers in order to improve the accuracy of the final result, and the output is defined by the number of neurons.
Output layer that returns the final output of the Neural Network. If we are doing a simple binary classification or regression, the output layer shall have only 1 neuron (so that it returns only 1 number). In the case of a multiclass classification with 5 different classes, the output layer shall have 5 neurons.
The simplest form of ANN is the Perceptron, a model with one layer only, very similar to the linear regression model. Asking what happens inside a Perceptron is equivalent to asking what happens inside a single node of a multi-layer Neural Network… let’s break it down.
Let’s say we have a dataset of N rows, 3 features and 1 target variable (i.e. binary 1/0):
Just like in every other machine learning use case, we are going to train a model to predict the target using the features row by row. Let’s start with the first row:
What does “training a model” mean? Searching for the best parameters in a mathematical formula that minimize the error of your predictions. In the regression models (i.e. linear regression) you have to find the best weights, in the tree-based models (i.e. random forest) it’s about finding the best splitting points…
Usually, the weights are randomly initialized then adjusted as the learning proceeds. Here I’ll just set them all as 1:
So far we haven't done anything different from a linear regression (which is pretty straightforward for the business to understand). Now, here’s the upgrade from a linear model Σ(xi*wi)=Y to a non-linear one f(Σ(xi*wi))=Y … enter the activation function.
The activation function defines the output of that node. There are many and one can even create some custom functions, you can find the details in the official documentation and have a look at this cheat sheet. If we would set a simple linear function in our example, then we would have no difference from a linear regression model.
We shall use a binary step activation function that returns 1 or 0 only:
We have the output of our Perceptron, a single-layer Neural Network that takes some inputs and returns 1 output. Now the training of the model would continue by comparing the output with the target, calculating the error and optimizing the weights, reiterating the whole process again and again.
And here’s the common representation of a neuron:
One could say that all the Deep Learning models are Neural Networks but not all the Neural Networks are Deep Learning models. Generally speaking, “Deep” Learning applies when the algorithm has at least 2 hidden layers (so 4 layers in total including input and output).
Imagine replicating the neuron process 3 times simultaneously: since each node (weighted sum & activation function) returns a value, we would have the first hidden layer with 3 outputs.
Now let’s do it again using those 3 outputs as the inputs for the second hidden layer, which returns 3 new numbers. Finally, we shall add an output layer (1 node only) to get the final prediction of our model.
Remember that the layers can have a different number of neurons and a different activation function, and in each node, weights are trained to optimize the final result. That’s why the more layers you add, the bigger the number of trainable parameters gets.
Now you can review the full picture of a Neural Network:
Bias: inside each neuron, the linear combination of inputs and weights includes also a bias, similar to the constant in a linear equation, therefore the full formula of a neuron is
f( Σ(Xi * Wi ) + bias )
Backpropagation: during training, the model learns by propagating the error back into the nodes and updating the parameters (weights and biases) to minimize the loss.
Gradient Descent: the optimization algorithm used to train Neural Networks which finds the local minimum of the loss function by taking repeated steps in the direction of steepest descent.
The easiest way to build a Neural Network with TensorFlow is with the Sequential class of Keras. Let’s use it to make the Perceptron from our previous example, so a model with only one Dense layer. It is the most basic layer as it feeds all its inputs to all the neurons, each neuron providing one output.
The summary function provides a snapshot of the structure and the size (in terms of parameters to train). In this case, we have just 4 (3 weights and 1 bias), so it’s pretty lite.
If you want to use an activation function that is not already included in Keras, like the binary step function that I showed in the visual example, you gotta get your hands dirty with raw TensorFlow:
Now let’s try to move from the Perceptron to a Deep Neural Network. Probably you are gonna ask yourself some questions:
1. How many layers? The right answer is “try different variants and see what works”. I usually work with 2 Dense hidden layers with Dropout, a technique that reduces overfitting by randomly setting inputs to 0. Hidden layers are useful to overcome the non-linearity of data, so if you don’t need non-linearity then you can avoid hidden layers. Too many hidden layers will lead to overfitting.
2. How many neurons? The number of hidden neurons should be between the size of the input layer and the size of the output layer. My rule of thumb is (number of inputs + 1 output)/2.
3. What activation function? There are many and we can’t say that one is absolutely better. Anyway, the most used one is ReLU, a piecewise linear function that returns the output only if it’s positive, and it is mainly used for hidden layers. Besides, the output layer must have an activation compatible with the expected output. For example, the linear function is suited for regression problems while the Sigmoid is frequently used for classification.
we will assume an input dataset of N features and 1 binary target variable (most likely a classification use case).
Please note that the Sequential class isn’t the only way to build a Neural Network with Keras. The Model class gives more flexibility and control over the layers, and it can be used to build more complex models with multiple inputs/outputs. There are two major differences:
The Input layer needs to be specified while in the Sequential class it’s implied in the input dimension of the first Dense layer.
The layers are saved like objects and can be applied to the outputs of other layers like: output = layer(…)(input)
This is how you can use the Model class to build our Perceptron and DeepNN:
One can always check if the number of parameters in the model summary is the same as the one from Sequential.
Remember,visualization is our best ally. I prepared a function to plot the structure of an Artificial Neural Network from its TensorFlow model.
Let’s try it out on our 2 models, first the Perceptron:
then the Deep Neural Network:
TensorFlow provides a tool for plotting the model structure as well, We usually use it for more complex Neural Networks with more complicated layers (CNN, RNN, …).
Finally, it’s time to train our Deep Learning model. In order for it to run, we must “compile”, or to put it in another way, we need to define the Optimizer, the Loss function, and the Metrics. We usually use the Adam optimizer, a replacement optimization algorithm for gradient descent (the best among the adaptive optimizers). The other arguments depend on the use case.
In (binary) classification problems, you should use a (binary) Cross-Entropy loss which compares each of the predicted probabilities to the actual class output. As for the metrics, We like to monitor both the Accuracy and the F1-score, a metric that combines Precision and Recall (the latter must be implemented as it is not already included in TensorFlow).
On the other hand, in regression problems, We usually set the MAE as the loss and the R-squared as the metric.
Before starting the training, we also need to decide the Epochs and Batches: since the dataset might be too large to be processed all at once, it is split into batches (the higher the batch size, the more memory space you need). The back propagation and the consequent parameters update happen every batch. An epoch is one pass over the full training set. So, if you have 100 observations and the batch size is 20, it will take 5 batches to complete 1 epoch. The batch size should be a multiple of 2 (common: 32, 64, 128, 256) because computers usually organize the memory in power of 2. I tend to start with 100 epochs with a batch size of 32.
During the training, we would expect to see the metrics improving and the loss decreasing epoch by epoch. Moreover, it’s good practice to keep a portion of the data (20%-30%) for validation. In other words, the model will set apart this fraction of data to evaluate the loss and metrics at the end of each epoch, outside the training.
We can launch and visualize the training as follows:
Those plots are taken from two actual use cases which compare standard machine learning algorithms with Neural Networks.
We trained and tested a simple model, and then we can further build an explainer to demonstrate whether our deep learning model is a black box or not.
Shap works well in conjunction with neural networks: for a given prediction, it can estimate the contribution of each feature to the model's predicted value. Basically, it answers the question “why the model says this is a 1 and not a 0?”.
We demonstrated the design and construction of both deep and non-deep artificial neural networks using the simplest approach. We broke down the process step by step, highlighting what happens within individual neurons and more generally within layers. We then utilized TensorFlow to create various neural networks, starting from perceptrons to more complex architectures. Subsequently, we trained deep learning models and evaluated their interpretability in classification and regression use cases.
The trading model we actually designed is significantly more complex and involves countless parameters that need to be considered. The volume of data is also extensive. Additionally, continuous updating and training of data are required to ensure the effectiveness of the trading model sequence within the core system.
We are always on the path of seeking exploration and discovering answers. That is what drives our work.