Data Science – AlgoTrading101 Wiki

What is Negative Correlation?

Jovan Medford — Mon, 20 Jul 2020 21:55:35 +0000

Negative correlation occurs when the rise in one item accompanies a fall in another.

Example of Negative Correlation

When gas prices go up, stocks of shipping companies tend to fall, and vice versa.

When interest rates go up, bond prices tend to fall, and vice versa.

In the short run, when stock prices go up, bond prices tend to fall, and vice versa.

Negative correlation between a shipping stock and oil prices

Why is this Important to You?

Negatively correlated assets provide diversification to an investment portfolio.

Imagine you had a portfolio with all positively correlated instruments. If the price of one of the instruments increased, then all the other prices will theoretically increase as well. By doing this you are essentially putting all of your eggs in one basket. If you were wrong about the market direction, then you lose on all of your investments.

The current market conditions completely determine the success of a non-diversified portfolio. As a trader, your goal is to use strategies that take control of making profits regardless of the market situation. Therefore, managing risk is essential to increasing your effectiveness.

Note, however, that decreasing your risk in this way often lowers the highest potential return. For instance, if you had two perfectly negatively correlated assets then your return would be 0. This is because the profit from one asset would be canceled out by the loss in the other.

On the flip-side, managing your risk increases your probability of having a positive return. This is where the risk/reward trade-off comes from.

Some risks are inherent to a sector which is also something to be mindful of. For instance, in a year of drought, the entire farming industry will be affected. This emphasizes the importance of holding assets across multiple sectors to achieve a balanced portfolio.

Understanding Negative Correlation

To be a little bit more precise, correlation is a statistical concept that measures the linear relationship between two variables.

The correlation coefficient is a number between -1 and 1 which describes the strength and direction of a correlation.

Two perfectly correlated have a correlation coefficient of 1.

Two perfectly negatively correlated variables have a coefficient of -1.

If the coefficient is 0 then we say there is no linear relationship. We say no linear relationship since the correlation coefficient cannot determine whether there are other more complex relationships.

How do you chart a correlation?

In practice, we can visualize the correlation between two stocks by fitting a linear regression line through historical price data. The correlation coefficient determines the slope (in red) in the example below.

This graph is a little extreme but it shows the relationship between oil prices and the price of airline stock. Since the price of fuel is such a major factor in the airline costs, it plays a direct role in profitability of airline operations. Therefore, as the cost of oil rises, the price of airline stock falls.

The post What is Negative Correlation? appeared first on AlgoTrading101 Wiki.

What are Parametric Equations?

Lucas Liew — Thu, 16 Jul 2020 11:21:02 +0000

Parametric equations are math statements that describe a relationship between 2 items via a common third item.

Examples of Parametric Equations

Parametric equations come in pairs.

Understanding Parametric Equations

To understand parametric equations, you need to understand regular mathematical equations.

If you don’t understand regular equations, detour over here to learn them: What is an Equation?

Simplifying a parametric equation

If an apple costs 2 dollars and a banana costs 1 dollar. One apple can buy you two bananas.

Thus, in math terms, Apple = 2 Bananas.

I.e. Apple = 2 x Banana. Or in other words,

Apple = 2 Banana

Similarly, if an orange costs 4 dollars and a banana still costs 1 dollar.

Orange = 4 Banana

This tells us that we can get 2 apples for 1 orange.

2 Apple = Orange

Converting Fruits to Math

Let’s use a single letter to represent the names apple, banana and orange. I will call them y, t and x respectively.

Thus… Apple = 2Banana becomes

Orange = 4Banana becomes

In this case, the pair of equations are a parametric equation because they describe the relationship between y (apples) and x (oranges) using a 3rd item t (banana).

In the above math equations, the items x, t, and y are known as parameters.

Merging a parametric equation

From the above fruit example, we saw that 2Apple = Orange.

Therefore, 2Apple = Orange becomes

The above equation is a merged form of the parametric equation:

To merge a parametric equation into a single equation, rearrange the common parameter to one side. We shall put the parameter t on its own, on the right side.

Now equate both left sides. Hence,

Multiply 4 to both sides to beautify it.

Why is it Important to You?

Parametric equations help us understand relationships between 2 parameters when they are related to other parameters.

In finance, we use parametric equations to understand the relationship between different financial products.

One use of parametric equations is to check the sensitivity of stocks against the overall market movement.

Knowing these sensitivities will allow us to size our bets when running a trading strategy that involves buying and shorting (To short a stock is to bet that it will drop).

One example is the pair trading strategy.

Sensitivity to the overall market

The overall market is a term that refers to the majority of stocks. This often refers to a group of stocks that represent the local stock market.

In the US, the most popular group of stocks is the S&P500. The S&P500 is a group of 500 major stocks in the US.

When the overall market moves, individual stocks will likely move in a similar manner.

However, each stock might move to a different extent.

Let’s assume that when the S&P500 moves by 1%, Tesla moves by 2% and General Motors (GM) moves by 0.5%.

Thus, the parametric equations are:

The S&P500 is our common factor. Therefore:

Beautifying it…

We now know that a 4% move in Tesla is equivalent to a 1% move in GM.

Using this information in a Pair Trading Strategy

A pair trading strategy involves buying one stock and shorting another at the same time.

The idea of this strategy is to cancel out the exposure to the overall market while betting that one stock does better than the other.

If we were to buy Tesla and short GM, we could buy 1 share of Tesla and short 4 shares of GM.

This way, if the S&P500 moves up 1%, we should gain 2% on our Tesla shares while the GM shares will will lose 2% in value (0.5% * 4 shares).

Netting them (2% – 2%) will result in no movement for our pair trade.

Our pair trade should not fluctuate when the overall markets move, but will work in our favor if Tesla’s business performs better than GM.

This is the bet we are making in this hypothetical scenario.

The post What are Parametric Equations? appeared first on AlgoTrading101 Wiki.

Stochastic Calculus

Lucas Liew — Fri, 07 Jun 2019 09:00:21 +0000

Definition

Stochastic calculus is a way to conduct regular calculus when there is a random element.

Regular calculus is the study of how things change and the rate at which they change.

Description

Think of stochastic calculus as the analysis of regular calculus + randomness.

Regular Calculus

Regular calculus studies the rate at which things changes.

Just a normal chart

At W, there is 0 change
At X, there is an increasing increase
At Y, there is a constant increase
At Z, there is a decreasing increase

Credits to coolmath.com!

The red lines indicate the rate of increase at the black dots.

As the red lines become steeper, the increase in values is going up at a faster pace.

Imagine that we are climbing up a ladder, but now we are climbing up at a faster pace.

Randomness

Let’s talk about randomness before combining this with the earlier section on regular calculus.

This is what a bunch of random charts look like:

Randomness charts over time

This behavior is described as Brownian motion.

This means that their behaviour is random, but over the long run and with enough samples, their overall movement resembles a bell shape. In other words, they are normally distributed.

Brownian motion. Image credits to link.springer.com.

This randomness is not so random after all. The end result of all these random movements is a bell shaped output. (See the bell shape by tilting your head to the right.)

That means that most of the data points end in the middle while the rest are spread out across the sides.

More info on Normal Distribution: Normal Distribution – MathIsFun

In Brownian motion, the values can be negative. However, stock prices can’t be negative.

Thus, in finance, we use geometric Brownian motion to model our stock prices.

Geometric Brownian motion (GBM) is essentially regular Brownian motion but with an upward drift.

Geometric Brownian motion. Image credits to link.springer.com.

The end result of all these GBM movements is a skewed bell shaped output.

This skewed bell-shaped curve no longer resembles a normal distribution. It now resembles a log-normal distribution.

Top: Log-normal distribution. Bottom: Normal Distribution.

Stochastic Calculus = Regular Calculus + Randomness

When we zoom in on a curve chart, we get a nice curve line. We can then measure the rate of increase using those slopes.

A regular non-random chart

The curve is smooth

Now let’s look at a chart with randomness.

A price chart with randomness

If we zoom in, we see that it looks… somewhat the same.

After zooming in, it still looks random.

We can keep zooming in but we will not be able to find a smooth curve. Without a smooth curve, we can’t draw those slope lines productively.

Thus, normal calculus will fail here. This is why we need stochastic calculus.

Stochastic Calculus Mathematics

The main aspects of stochastic calculus revolve around Itô calculus, named after Kiyoshi Itô.

The main equation in Itô calculus is Itô’s lemma. This equation takes into account Brownian motion.

Itô’s lemma:

Explanation: Change in X = Constant A * change in time + Constant B * change due to randomness as modeled by Brownian motion.

Which means the change in the value of a variable = some constant value over time + change due to randomness multiplied by another constant.

More info on the derivation of Itô’s lemma: Derivation of Itô’s lemma by Math Partner

A variation of Itô’s lemma that uses GBM is:

Before we explain it. Let’s replace X (a regular variable) with S (stock price) so that you can visualize this better.

In this case, we try to link the equation to finance. Let S be stock price.

Explanation: Change in S = Constant A * Current S * change in time + Constant B * Current S * change due to randomness as modeled by GBM

Which means the change in the stock price = current stock price multiplied by some constant value over time +
current stock price + change due to randomness multiplied by another constant.

That should intuitively make sense as over time, the change of the stock price is based on some overall trend (the Constant A part) and an element of randomness (the Constant B part and randomness part).

Constant A and Constant B are usually derived by analyzing historical market data.

Finance and Stochastic Calculus

This is where we relate everything we’ve just said to finance.

In 1900, Louis Bachelier, a mathematician, first introduced the idea of using geometric Brownian motion (GBM) on stock prices.

His theory is later built upon by Robert Merton and Paul Samuelson in their work on options pricing. They won an Nobel Prize in Economics for it.

Essentially, these mathematicians argue that GBM can be used to model stock prices because it is said that:

The GBM process has only positive values. Stock prices only has positive values.
Expected value of the data in the next time period has nothing to do with the last time period. Similarly, it is said that the expected value of the stock price in the next time period has nothing to do with the last time period
The GBM chart is rough and random. Stock prices look rough and random.
Calculations with GBM processes are relatively easy

However, those points above are debatable.

In reality, the randomness and volatility changes over time. In GBM, the volatility is assumed to be constant.
In reality, there are sudden jumps in prices. In GBM, there are not.
In reality, the stock prices may not be random and log-normally distributed in the long run. In GBM, they are.

Stochastic calculus as applied to finance, is a form of pseudo science. There are assumptions that may not hold in real-life. Some of the assumptions are there for the convenience of mathematical modelling.

Black Scholes Model – Application to Finance

The most famous application of stochastic calculus to finance is to price options (options are a special financial instrument that gives the holder the choice to buy or sell an asset at a certain price).

The main intuition is that the price of an option is the cost of hedging it.

By hedging, we mean that we can separately create a combination of stocks and cash to mimic the market exposure of the option.

Thus, the cost of this hedging process should be the price that option is worth.

Price of option = cost of hedging with stock and cash.

Now, we can calculate the price of the option if we assume that the stock can be modeled using Ito’s lemma, which brings us back to the equation above:

Using the above equation and the fact that the price of the option = cost of hedging with stock and cash, we can derive our Black-Scholes equation

Black-Scholes Equation

We are not going to do the derivation here as it is too technical.

Here is the derivation: Paul Wilmott on Quantitative Finance, Chapter 5, Black-Scholes

Once you solve that equation and turn it into a form that we can plug in figures and use, you’ll get the Black-Scholes Formula:

This is how you get from the equation to the formula:
Solution of the Black-Scholes Equation – University of Nebraska (warning: It gets technical)

Links to Other Explanations

Related Terms

The post Stochastic Calculus appeared first on AlgoTrading101 Wiki.

Backtesting Biases and Risks

Lucas Liew — Wed, 15 May 2019 16:22:44 +0000

Definition

Backtesting biases refer to how the results of a trading strategy backtest can be misleading.

Description

Here are the 8 common biases:

Black Swan Reconciliation
Survivorship Bias
Spreads
Cost of carry/Holding costs
Inaccurate Price Simulation
Change in Contract Specifications
Look-ahead Bias
Curve-Fitting and Optimization Bias

Black swans in real life

Bias 1 – Black Swan Reconciliation

Black swan events refer to events that come as a surprise and have a huge impact.

Brokers and exchanges may alter the prices of assets after a volatile price moves (black swan events). There are 2 types of alteration.

Type 1 – Changing the fill price

After an unexpected large price move, brokers and exchanges might change the prices that you got filled on your trades.

Example

EURUSD is trading at 1.1300. A black swan event occurs and EURUSD spikes up 2000 pips (to 1.3300) (1 pip = $0.0001).

You long EURUSD 1000 pips into the 2000 pips move. You are long EURUSD at 1.2300. It is now trading at 1.3300. You close the trade at a 1000 pips profit.

A few hours after the trade, you receive an email saying that “In view of this unexpected event, all trades will be cleared at 1.1800 price”.

Your 2000 pips profit becomes a 500 pips loss. Your account gets wiped out.

Real example: Saxo Trades Lawsuits With Clients After Swiss Currency Turmoil

Type 2 – Changing their historical price

After an unexpected large price move, brokers and exchanges might not change the prices that you got filled on your trades.

However, they alter the price on the historical charts and data. Thus, the prices you see in your charts are different (almost always worse) than the prices you get in live trading.

In your backtests, you might have bought Apple shares at $180, but in real life, you would have gotten those shares at $250.

Bias 2 – Survivorship Bias

Survivorship bias, or survival bias, refers to the fact that people overlook entities/processes that failed because they only see successful entities/processes.

Example

We are selecting a bunch of stocks to trade. We create a list of criteria to identify potentially successful stocks.

Next, we filter the universe of stocks listed in the US based on these criteria.

And with that, survivorship bias just got to us. This universe of stocks only includes stocks that survive. There may be stocks that are delisted but fit our criteria.

We need to consider those stocks as well to give us an idea of how sound our strategy is.

Bias 3 – Spreads

The difference between the price we can buy at (bid price) and the price we can sell at (ask price) is called the spread.

Spreads change in real time. It depends on the buyers and sellers on exchanges, or brokers.

During volatile events, spreads usually widen, sometimes by a 100 times.

Without accurate bid and ask data, these spread widening events will make our backtests inaccurate.

Bias 4 – Cost of carry/Holding costs

If you are leveraged (you trade a size larger your capital by borrowing from the broker), shorting or trading a derivative, you might need to pay interest to hold your positions.

This interest represents the fees needed to cover the capital loaned to you, or the costs to hold any underlying assets.

These holding costs might vary without warning during the lifetime of a trade. Hence, it is difficult to estimate these costs in your backtest.

Example

The usual interest cost to short a stock is less than 2% a year.

However, for a period in early 2019, the cost to short Tilray, a cannabis stock shot up to over 800% a year.

Bias 5 – Inaccurate Price Simulation

Not all backtesters replicate the exact historical price movement, some use simulated fake price movements.

This might not be significant if you make a few trades a year and analyze the market using end-of-day data.

However, your backtest results will be greatly skewed if your strategy is related to scalping (price action and movement) and fires many trade per day on lower timeframe data.

Bias 6 – Change in Contract Specifications

An exchange or broker may change the contract specifications (i.e. details) of their products.

For instance, they may increase margin requirements, change the settlement specifications or contract size of their products. These may lead to jumps in market prices.

The main takeaway here is – in such cases, do not take a price change at face value. Your P&L may not change proportionally to a price change.

For instance, increasing the margin requirement for silver may cause silver prices to fall. In your backtest, your short silver position may look like it is doing well. However, if you had traded that move in real-life, you may get a margin call and be forced to close the position.

Real-life examples

Bias 7 – Look-ahead Bias

Look-ahead bias involves having prior knowledge of how the market behaves before running a backtest.

Example

You want to run a strategy that takes advantage of trends. You look for assets that trend and discard those that don’t trend.

You then run a backtest on these assets using a trending strategy. Unsurprisingly, your strategy does well.

These tests are not useful as you have only chosen assets that you know would have done well in your backtests.

Bias 8 – Curve-Fitting and Optimization Bias

Curve fitting is the process of adapting a trading system so closely to the past that it becomes ineffective in the future.

Optimizing strategies too closely to past data will result in inflexibility to adapt to the future. Hence, it leads to poor performance in the future.

We need to adapt our trading strategies to signals in historical data, not noise.

Curve fitting data points

Links to Other Explanations

Related Terms

The post Backtesting Biases and Risks appeared first on AlgoTrading101 Wiki.

Big Data

Lucas Liew — Tue, 30 Apr 2019 18:45:37 +0000

Definition

Big data is a field that involves analyzing and managing huge amounts of data.

Description

Similar to smaller data sets, the usual aim of big data is to derive insights from large data sets.

There isn’t a specific size to determine if a data set is big enough to be considered big data.

A data set can be considered big data if the organization has difficulty using traditional methods, software and database to manage their data.

Characteristics of Big Data

Volume

This refers to the quantity of data.

Velocity

This refers to the speed at which the data is received and needs processing.

Data from real-time sources usually requires much faster management and processing capabilities, especially when the insights from the data need to be extracted quickly.

Variety

This refers to the type of data. The common types are:

Text
Numbers
Audio
Imagery
Video

Another way to categorize data is structured vs unstructured data.

Structured data is organised and formatted in a way that is easily searchable, processed and analyzed.

Unstructured data has no pre-defined organization or format. This makes it harder to search, process and analyze.

Veracity

This refers to accuracy of the data.

Value

This refers to how much useful insights can be derived from the data.

Variability

This refers to the consistency of the flow of data. The creation of some data peak during certain times, days or months, but slow down during other times.

Complexity

This refers to how complex it is to clean, match, link and manage the data. This characteristic is especially important when there are multiple data sources.

Big data in Industries

Big data is common in the following industries:

Manufacturing
Media
Government
Social Media
Finance
Healthcare
Insurance
Technology

Examples of Big Data in Action

Millions of surveillance cameras capture videos of the public across the country. Machine learning is then used to identify faces.
Spotify tracks the data of its users. It then analyzes this data to recommend the users music they might like.
Uber generates and uses a huge amount of data regarding drivers, their vehicles, locations, every trip from every vehicle. These data are analyzed to predict the demand, supply, location of the drivers and decide whether to slap on a surcharge.

Links to Complicated Explanations

Big Data – Wikipedia

Related Terms

The post Big Data appeared first on AlgoTrading101 Wiki.

Data Science

Lucas Liew — Tue, 30 Apr 2019 08:44:35 +0000

Definition

Data science is a field that focuses on extracting useful information from data.

Description

The aim of data science is to get predictive or useful information from data.

Data science has become a buzzword that can be broadly used to represent business analytics, business intelligence and predictive modeling.

3 Concepts of Data Science

Data science combines the fields of strategy, statistics and programming.

Strategy

Since the aim of data science is to extract useful information for a certain goal, data scientists need to understand the goal well.

Examples of such goals are to:

Improve business revenue
Lower business costs
Find trading opportunities in the markets
Solve engineering tasks
Create self-driving cars

Once the data scientist understands the goal and its underlying mechanics, he or she will be able to devise an appropriate strategy to analyze and extract information that will be useful for that goal.

Statistics

The data scientist needs good knowledge of statistics in order to analyze the data in an appropriate way.

Misusing statistics might lead to results that are misleading or erroneous.

Machine learning and big data management are complementary skills here.

Programming

Programming skills are needed for the data scientist to apply their statistical skill to the data.

Examples of Data Science

Google uses its vast amount of data to determine which search results are the most relevant.
Netflix applies machine learning to its users’ data to determine what shows are they more likely keen on.
Paypal analyzes its users and their transactions to spot possible fraud.

Links to Complicated Explanations

Data Science – Wikipedia

Related Terms

The post Data Science appeared first on AlgoTrading101 Wiki.

Machine Learning

Lucas Liew — Thu, 11 Apr 2019 09:58:28 +0000

Definition

Machine learning techniques enable computers to do things without being told explicitly how to do them.

Description

The essence of machine learning is the ability for computers to learn by analyzing data or through its own experience.

Traditional Computing Rules:

If an image has 4 legs, fur, pointy ears and whiskers, label it as a cat.

Machine Learning Rules:

We give the computer 1000 cat pictures and 1000 pictures that are not cats. After analyzing these 2000 pictures, the computer will be able to tell if a picture contains a cat.

Advantages of Machine Learning

Being able to analyze large quantities of data without being explicitly told what to look for
Being able to understand texts (in large quantities and different languages)
Being able to interpret images
Being able to come up with creative solutions
Being able to analyze and output a prediction fast

Machine Learning Training Techniques

Machine learning techniques are essentially methods to train a computer. A computer has to be trained before it can perform on its own.

There are 3 main types of training techniques – 1) Supervised Learning, 2) Unsupervised Learning and 3) Reinforcement Learning

Supervised Learning

We train our computers with data that is labelled correctly.

The above cat example uses a supervised training method. The computer analyzes the labelled cat data and creates a set of rules on its own to decide what defines a cat.

Unsupervised Learning

The computer is given a set of data without labels, and it has to make sense of it.

Unsupervised learning is mainly used to find patterns and common traits between the data points.

For instance, a computer is given 1000 unlabeled pictures of horses and 1000 unlabeled pictures of dogs. It is then tasked to divide the pictures into 2 piles.

Reinforcement Learning

The computer is told what its objective is, then tries to figure out the best way to achieve it.

For example, we are trying to teach a robot with 2 legs to walk. The robot uses reinforcement learning to walk in many different ways until it finds the optimal way to move.

In 2019, Google’s DeepMind developed AlphaStar (a computer trained using reinforcement learning), a Starcraft 2 gaming robot. This robot defeated one of the world’s top Starcraft 2 players.

Programming Languages

The common programming languages used to code machine learning techniques are:

Python
C++
Javascript
R

Difference between Machine Learning (ML) and Artificial Intelligence (AI)

AI is a broad concept that covers the idea that machines can do tasks and behave in ways that we consider are smart and independent.

ML is concerned with getting machines to improve and learning through data or experience.

Examples of Machine Learning Use Cases

ML enables your email system to differentiate spam and legitimate emails
ML enables a computer to recognize your voice and understand your commands
ML enables your surveillance cameras to recognize millions of faces a day
ML helps social media companies identify what your likes and dislikes are

Examples of Popular ML Training Techniques/Algorithms

Naïve Bayes Classifier Algorithm
K Means Clustering Algorithm
Support Vector Machine Algorithm
Apriori Algorithm
Linear Regression
Logistic Regression
Artificial Neural Networks
Random Forests
Decision Trees
K Nearest Neighbors
Convolution Neural Network
Recurrent Neural Network

Links to Complicated Explanations

Related Terms

The post Machine Learning appeared first on AlgoTrading101 Wiki.