This post describes how Xcode manages *Project Working Directories* and what options do we have to setup it correctly. In the end you learn how to setup custom build locations for you compiled binaries.

**Definitions:**

- Project Working Directory – is working directory associated with process created by executing built binary
- Project Directory – is directory with project source code
- Products Directory – is directory where built sources are placed to

**Option 1 – default Project Working Directory**

If you create fresh new *command line tools* project you will get following directory structure:

The default build scheme builds and places *debug* binary to:

/Users/[my_user]/Library/Developer/Xcode/DerivedData/wd_example-[some_random_string]/Build/Products/Debug/wd_example

Now consider following code snippet which simply reads text file content. What’s worth paying attention here is that we provided just relative file path to *std::ifstream::open()* method!

#include<fstream> #include<iostream> #include<string> using std::cout; using std::endl; using std::ifstream; using std::ios_base; using std::string; int main() { ifstream myFile; myFile.open("HelloFile.txt", ios_base::in); if (myFile.is_open()) { cout << "File open successful. It contains: " << endl; string fileContents; while (myFile.good()) { getline(myFile, fileContents); cout << fileContents << endl; } cout << "Finished reading file, will close now" << endl; myFile.close(); } else cout << "open() failed: check if file is in right folder" << endl; return 0; }

Let’s create *HelloFile.txt* test file in that wild Xcode default P*roject Working Directory* and see if it works.

/Users/[my_user]/Library/Developer/Xcode/DerivedData/wd_example-[some_random_string]/Build/Products/Debug/wd_example/HelloFile.txt

Buildin’n’Runin ( CMD + R ) in Xcode we get:

File was read successfully.

**Option 2 – Xcode copies file(s) during build phase to Project Working Directory**

Another solution is to let the Xcode copy your file from your *Project Directory* to *Project Working Directory*.

- Copy
*HelloFile.txt*to*Project Directory*:

- Reference the file with your project:

- Open your project’s settings via project navigator:

- Select Build Phases:

- Select Copy Files:

You need to set Destination to

Products DirectorybecauseProducts Directoryis Project Working Directoryby default. Next click “+” and addHelloFile.txt. Note that your files actually don’t need to be strictly in Project Directorybut in arbitrary location. Uncheck “Copy only when installing” otherwise files will not be copied during debug build process.

- After Build’n’Run ( CMD + R ) you can check that your file was indeed copied to the
*Products Directory*:

**Option 3 – set custom Project Working Directory and Build directory**

Let’s create custom project structure adding *data/* directory where we can place our *HelloFile.txt*.

- Use
*New Group*option from context menu which will create new directory and adds (references) it to the project:

- Copy
*HelloFile.txt*to*data/*directory and add it to project! You should end up with structure like this:

- Set custom
*Project Working Directory*:

- Set the
*custom working directory*to your*main.cpp*location:

- Before we can test if C++ program finds
*HelloFile.txt*in*data/*directory we need to slightly change the file relative path:

- After Build’n’Run ( CMD + R ) program finds and reads
*HelloFile.txt*.

**Now we will setup custom Products Directory.**

- Open Xcode Preferences:

- Open
*Locations*tab and set*Derived Data*to “Relative”.*DerivedData/*directory should be already set.

- Open
*Advanced*settings and set*Build Location*to*Shared Folder*–*Build/*

- Try to Build’n’Run your project ( CMD + R ) and you should get
*DerivedData/*directory created in place where your*.xcodeproj*file resides. Note that we have set build location globally so compiled binaries with every new project will follow this directory structure.

TIP:If you want to set custom build paths and locations per-project basis you need to edit build paths in File > Project Settings…

You learned how to setup working directories and locations for compiled binaries. However one method wasn’t discussed – using absolute paths in program. This method is not recommended because it reduces code portability and should be used just in development stage if ever.

]]>We are currently working on time-series database solution for collecting high-frequency crypto-exchanges data – namely for tick data and one-minute orderbook snapshots. We’ve developed REST API collector bots which are continuously fetching data from numerous REST API endpoints and saving them to database. This solution would work in the perfect world but that’s not where we do live. During the traffic peaks the endpoints are often unreachable and connections to them time out regularly which in turn has effect of missing valuable data. The first upgrade in collectors infrastructure was to deploy collector bots on two hosts and synchronize both databases. Did it work? Well, it did, but we can do better. We deploy third collector node which is scraping not via REST API but via Websocket(WS).

There are quite significant differences in collectors implementation collecting via REST or WS. Let’s compare both APIs with respect to high-frequency data.

When fetched via REST API the process is very straightforward. You will usually make HTTP GET request with orderbook depth and pair parameter and in turn get JSON formatted response. Saving the data into the database is the matter of proper parse. Parameter options may vary from exchange to exchange but orderbook depth and pair are considered to be standard.

When fetched via Websocket the whole process goes little bit twisted. Firstly we need to subscribe to the proper channel on WS server provided by exchange via subscription message which usually contains settings regarding the data stream. Then we have WS connection established and we are able to receive messages. How do these messages look like regarding orderbook data? Convention is to get initial complete orderbook snapshot in the first message. In every other consequent message we get just data which were changed and we have to incorporate these update messages to the orderbook instance in the client program and thus update it.

When fetched via REST API we are making GET request generally with pair and since parameter. We can simply remember last saved tick (according to the trade id or timestamp) and fetch only the new data. Unfortunately these parameters are not available on all exchanges which are sometimes trying to be original and introduce new exotic parameters or new *since* technique.

WS tick data stream is simpler then its orderbook brother. Every new message we get from WS server is new trade which has occurred on exchange.

REST API calls are subject to HTTP requests limits. If you hit the limit you get your IP blocked usually for few minutes. On the other hand if exchange REST API server gets down we are still able to retrieve historical ticks exactly thanks to the *since* parameter if available. In the case of the orderbook we are out of luck because historical orderbook snapshots are impossible to get directly from exchanges.

WS messages flow in stream hence there are no API limits and you can receive messages up to the real-time speed. If WS exchange server gets down and your stream is interrupted you are unable to retrieve historical ticks. What was once broadcast is never broadcast again. Sometimes exchanges broadcast messages with useless data which can cause serious overhead network traffic. WS is slightly trickier to implement due to the work with streams and not with *discrete data packages.*

In a nutshell – imagine you need to make 20 HTTP requests to different hosts. Let’s imagine that these requests are part of some function – *coroutine*. Now we have 20 coroutines which constitute *task*. If you would go classical synchronous way you would execute above mentioned coroutines one after one which would result in final processing time as follows – 20x request <—> response waiting time + 20x request/response cpu processing time on your machine. This is huge waste of time. If you manage this task asynchronously you start the coroutine and as soon as program finds out it’s waiting (and not processing) it immediately starts processing next coroutine and so on. When request in one of the coroutine completed the program goes back into it and continues processing code after “waiting” point. Note that program is going back to the “waiting*“* points randomly in time and requests are not handled sequentially as in case of synchronous programming. On the other hand the “waiting” points must be declared in the code. Such leaping behaviour is called *switching context.*

We say that such tasks (or particular coroutines) which cause waiting(and not processing) of program which they are called from to be blocking or I/O bound. Network communication tasks can be considered as blocking as well as reading/writing operations. Now it’s clear that even communication over WS protocol is blocking task and that asynchronous programming makes sense regarding it.

This code rebuilds multiple orderbooks using asyncio coroutines. There is no waiting for blocking tasks (waiting for messages after *receive()* call). I’m using here the Bitfinex WS API. Program is continuously updating opted pairs orderbooks keeping them in global *orderbooks* instance. From there they are printed out every 10 seconds in the pretty table structured as you are used to see on exchanges platforms. But what is the point here is that we have orderbooks available in real-time in the *orderbooks* global variable!

I built following solution on Python’s asyncio library if you are not familiar with it I recommend this tutorial which is in my humble opinion the shortest and most accessible one you can find on the internet.

import aiohttp import asyncio import ujson from tabulate import tabulate from copy import deepcopy # Pairs which generate orderbook for. PAIRS = [ 'BTCUSD', 'ETCBTC', # 'ETCUSD', # 'ETHBTC', # 'ETHUSD', # 'XMRBTC', # 'XMRUSD', # 'ZECBTC', # 'ZECUSD' ] # If there is n pairs we need to subscribe to n websocket channels. # This the subscription message template. # For details about settings refer to https://bitfinex.readme.io/v2/reference#ws-public-order-books. SUB_MESG = { 'event': 'subscribe', 'channel': 'book', 'freq': 'F1', 'len': '25', 'prec': 'P0' # 'pair': <pair> } def build_book(res, pair): """ Updates orderbook. :param res: Orderbook update message. :param pair: Updated pair. """ global orderbooks # Filter out subscription status messages. if res.data[0] == '[': # String to json data = ujson.loads(res.data)[1] # Build orderbook # Observe the structure of orderbook. The prices are keys for corresponding count and amount. # Structuring data in this way significantly simplifies orderbook updates. if len(data) > 10: bids = { str(level[0]): [str(level[1]), str(level[2])] for level in data if level[2] > 0 } asks = { str(level[0]): [str(level[1]), str(level[2])[1:]] for level in data if level[2] < 0 } orderbooks[pair]['bids'] = bids orderbooks[pair]['asks'] = asks # Update orderbook and filter out heartbeat messages. elif data[0] != 'h': # Example update message structure [1765.2, 0, 1] where we have [price, count, amount]. # Update algorithm pseudocode from Bitfinex documentation: # 1. - When count > 0 then you have to add or update the price level. # 1.1- If amount > 0 then add/update bids. # 1.2- If amount < 0 then add/update asks. # 2. - When count = 0 then you have to delete the price level. # 2.1- If amount = 1 then remove from bids # 2.2- If amount = -1 then remove from asks data = [str(data[0]), str(data[1]), str(data[2])] if int(data[1]) > 0: # 1. if float(data[2]) > 0: # 1.1 orderbooks[pair]['bids'].update({data[0]: [data[1], data[2]]}) elif float(data[2]) < 0: # 1.2 orderbooks[pair]['asks'].update({data[0]: [data[1], str(data[2])[1:]]}) elif data[1] == '0': # 2. if data[2] == '1': # 2.1 if orderbooks[pair]['bids'].get(data[0]): del orderbooks[pair]['bids'][data[0]] elif data[2] == '-1': # 2.2 if orderbooks[pair]['asks'].get(data[0]): del orderbooks[pair]['asks'][data[0]] async def print_books(): """ Prints orderbooks snapshots for all pairs every 10 seconds. """ global orderbooks while 1: await asyncio.sleep(10) for pair in PAIRS: bids = [[v[1], v[0], k] for k, v in orderbooks[pair]['bids'].items()] asks = [[k, v[0], v[1]] for k, v in orderbooks[pair]['asks'].items()] bids.sort(key=lambda x: float(x[2]), reverse=True) asks.sort(key=lambda x: float(x[0])) table = [[*bid, *ask] for (bid, ask) in zip(bids, asks)] headers = ['bid:amount', 'bid:count', 'bid:price', 'ask:price', 'ask:count', 'ask:amount'] print('orderbook for {}'.format(pair)) print(tabulate(table, headers=headers)) async def get_book(pair, session): """ Subscribes for orderbook updates and fetches updates. """ print('enter get_book, pair: {}'.format(pair)) pair_dict = deepcopy(SUB_MESG) pair_dict.update({'pair': pair}) async with session.ws_connect('wss://api.bitfinex.com/ws/2') as ws: ws.send_json(pair_dict) while 1: res = await ws.receive() # print(pair_dict['pair'], res.data) # debug build_book(res, pair) async def main(): """ Driver coroutine. """ async with aiohttp.ClientSession() as session: coros = [ get_book(pair, session) for pair in PAIRS ] # Append coroutine for printing orderbook snapshots every 10s. coros.append(print_books()) await asyncio.wait(coros) orderbooks = { pair: {} for pair in PAIRS } loop = asyncio.get_event_loop() loop.run_until_complete(main())

Let’s start on line 136. We declare empty orderbook dictionary which serves as shared variable between orderbook update and reading coroutines. On line 140 and 141 we create and start asyncio event loop which in turn executes main() coroutine. Let’s move to line 124-134. The main() coro initializes aiohttp *session* which we will use for all WS connections in the script. On line 127 we create list of get_book() coroutines already equipped with arguments. The rest of main() coro just appends printing coro to the *coros* list and registers the content of coros to the event loop. As soon as code run reaches line 134 the control is switched back to the event loop because *await asyncio.wait(coros)* is blocking call*.*

Now we have as many *get_book()* coros running as many *PAIRS *we uncommented before code execution + one printing *print_books()* coro.

Let’s enter into execution of one of the *get_book()* coros on line 112. First run of every *get_book()* coro prepares subscription message, initiates websocket connector and sends subscription message. Every *get_book()* coro has different websocket connection, every *get_book()* fetches orderbook messages for one currency pair.

Now comes the important part. When code control reaches line 120 – *res = await ws.receive() *in *while* loop – it immediately returns control to the event loop hence another *get_book()* coro can be executed and so on. Now imagine that all of our *get_book()* coros were executed and all of them are waiting on the line 120 for response message^{[1]}. If *res = await ws.receive()* is unblocked (receives message) in arbitrary *get_book()* instance, the waiting event loop grasp opportunity and immediately continues code execution after unblocked *res = await ws.receive()* statement and so on.

When control reaches *build_book()* function call it simply continues program execution sequentially until *build_book()* is complete. Then another *get_book()* while loop starts with same blocking mechanism as described above. *build_book()* function is matter of websocket message parse. It creates orderbook from the initial orderbook message with complete orderbook snapshot and thereafter incorporates new update messages into the global orderbook instance with each subsequent update message. That way we have ever updating orderbook instance available and we can use it for whatever we want.

For demonstration purposes we registered the *print_books()* function on the event loop which prints out orderbook snapshots from the global *orderbooks* variable. Note that code in *while* loop is blocked periodically for 10 seconds by await asyncio.sleep(). In other words the content of while loop is being unblocked (available for execution on the event loop) every 10 seconds.

1. This situation actually never occurs or the waiting time is negligible simply because the blocking call in one of the *get_book()* coro on line 120 unblocks before processing of lines 114-118 in all other *get_book()* coros finishes.

Before using authenticated endpoints be sure you have created

#include #include "BitfinexAPI.hpp" #include using std::cout; using std::endl; using std::ifstream; using std::string; int main(int argc, char *argv[]) { const char *keyFilePath = "/Path/to/your/file/with/API-key-secret"; ifstream ifs(keyFilePath); if (!ifs.is_open()) { cout << "Can't open file: " << argv[1] << endl; return -1; } else { string accessKey, secretKey; getline(ifs, accessKey); getline(ifs, secretKey); BitfinexAPI bfxAPI(accessKey, secretKey); string response; int errCode; ///////////////////////////////////////////////////////////////////////// // Examples // Note that default values are not mandatory. See BitfinexAPI.hpp // for details. ///////////////////////////////////////////////////////////////////////// /// Public endpoints /// // errCode = bfxAPI.getTicker(response, "btcusd"); // errCode = bfxAPI.getStats(response, "btcusd"); // errCode = bfxAPI.getFundingBook(response, "USD", 50, 50); // errCode = bfxAPI.getOrderBook(response, "btcusd", 50, 50, 1); // errCode = bfxAPI.getTrades(response, "btcusd", 0L, 50); // errCode = bfxAPI.getLends(response, "USD", 0L, 50); // errCode = bfxAPI.getSymbols(response); // errCode = bfxAPI.getSymbolDetails(response); /// Authenticated endpoints /// // Account // // errCode = bfxAPI.getAccountInfo(response); // errCode = bfxAPI.getSummary(response); // errCode = bfxAPI.deposit(response, "bitcoin", "deposit", 1); // errCode = bfxAPI.getKeyPermissions(response); // errCode = bfxAPI.getMarginInfos(response); // errCode = bfxAPI.getBalances(response); // errCode = bfxAPI.transfer(response, 0.1, "BTC", "trading", "deposit"); // errCode = bfxAPI.withdraw(response); // configure withdraw.conf file before use // // Orders // // errCode = bfxAPI.newOrder(response, "btcusd", 0.01, 983, "sell", "exchange limit", 0, 1, // 0, 0, 0); // // How to create vOrders object for newOrders() call // BitfinexAPI::vOrders orders = // { // {"btcusd", 0.1, 950, "sell", "exchange limit"}, // {"btcusd", 0.1, 950, "sell", "exchange limit"}, // {"btcusd", 0.1, 950, "sell", "exchange limit"} // }; // errCode = bfxAPI.newOrders(response, orders); // // errCode = bfxAPI.cancelOrder(response, 13265453586LL); // // How to create ids object for cancelOrders() call // BitfinexAPI::vIds ids = // { // 12324589754LL, // 12356754322LL, // 12354996754LL // }; // errCode = bfxAPI.cancelOrders(response, ids); // // errCode = bfxAPI.cancelAllOrders(response); // errCode = bfxAPI.replaceOrder(response, 1321548521LL, "btcusd", 0.05, 1212, "sell", // "exchange limit", 0, 0); // errCode = bfxAPI.getOrderStatus(response, 12113548453LL); // errCode = bfxAPI.getActiveOrders(response); // // Positions // // errCode = bfxAPI.getActivePositions(response); // errCode = bfxAPI.claimPosition(response, 156321412LL, 150); // // Historical data // // errCode = bfxAPI.getBalanceHistory(response, "USD", 0L, 0L, 500, "all"); // errCode = bfxAPI.getDWHistory(response, "BTC", "all", 0L , 0L, 500); // errCode = bfxAPI.getPastTrades(response, "btcusd", 0L, 0L, 500, 0); // // Margin funding // // errCode = bfxAPI.newOffer(response, "USD", 12000, 25.2, 30, "lend"); // errCode = bfxAPI.cancelOffer(response, 12354245628LL); // errCode = bfxAPI.getOfferStatus(response, 12313541215LL); // errCode = bfxAPI.getActiveCredits(response); // errCode = bfxAPI.getOffers(response); // errCode = bfxAPI.getTakenFunds(response); // errCode = bfxAPI.getUnusedTakenFunds(response); // errCode = bfxAPI.getTotalTakenFunds(response); // errCode = bfxAPI.closeLoan(response, 1235845634LL); ///////////////////////////////////////////////////////////////////////// ///////////////////////////////////////////////////////////////////////// cout << "Response: " << response << endl; cout << "Error code: " << errCode << endl; ifs.close(); return 0; } }]]>

**Notice**

This parser is intended to be an educational project. Use of CME settlement files for any other purposes without CME permission is forbidden. If you want to use this project in serious real world application you have to use other source of daily prices or contact CME about your intentions. Note that in practice analysts have prices coming from multiple data vendors so they can cross-check their accuracy.

I’m using Debian on my workstation so this walktrough deals with PSQL installation on Debian (or Debian based) machine. Even if you will install PSQL on different OS, have a quick look at the user and database name I will be using. Remember that there exist front-end clients for PSQL such as web-based phpPgAdmin or standalone pgadmin which you can use for better overview over your databases. Anyway if you are not familiar with the SQL I recommend you to glance over some short tutorial.

Fire up console, get and install PSQL server and client. Installation needs root privileges.

$ sudo apt-get install postgresql postgresql-client

Installation created new system user *postgres*. This is superuser for all PSQL related administration. Switch to the *postgres* user and start the PSQL command line tool *psql*. Default password for *postgres* user is “postgres”.

$ sudo -u postgres bash $ psql

In psql command line change *postgres* default password.

=> \password postgres

2x* Ctrl-d* switches you back to the bash console. Create new system user *r_client*. This is the user for our later R tool. Provide just password and leave other fields blank.

$ sudo adduser r_client

Create new database user.

$ sudo -u postgres createuser r_client

Create new database.

$ sudo -u postgres createdb -O r_client cme_data

Note that system user name and database user name are the same. If you want use different system/db names you need configure pg_ident.conf file and define custom system/db name maps.

Because I want to access the database from remote machine I need to configure PostgreSQL server to allow remote connections. If you are using PSQL server and PSQL clients (R, pgAdmin) on same machine I recommend you to do following configuration as well because I noted that PSQL server doesn’t listen on defined port by default.

Locate and edit *pg_hba.conf* file.

$ locate pg_hba.conf /etc/postgresql/9.4/main/pg_hba.conf . $ sudo -u postgres nano /etc/postgresql/9.4/main/pg_hba.conf

Append following line(change IP and subnet according to yours):

Enable database listening on all local interfaces and default port 5432.

$ locate postgresql.conf /etc/postgresql/9.4/main/postgresql.conf . $ sudo -u postgres nano /etc/postgresql/9.4/main/postgresql.conf

Find and set following directives:

Restart server.

$ sudo service postgresql restart

Install and load CRAN packages (I’m using RStudio IDE).

> install.packages("RPostgreSQL", "data.table", "DBI", "R6", "stringr") > library("RPostgreSQL") > library("data.table") > library("DBI") > library("R6") > library("stringr")

So now we have PSQL database server running and properly configured for remote connections. Setting up the R was quite straightforward.

Now with database ready we need to work out how the data scheme will actually look like. What is the structure of tables and what are relations between them. Let’s work with following simple design.

My* cme_data* database consists of four tables – daily price, data_vendor, symbol, exchange. Have a look at particular columns description. Note that I’m using *table:column_name* notation and that not all columns are described considering their names as self-explanatory.

**daily_price**

*id*– Serial number as primary key. (example: 1, 2, 3, 4, …)*data_vendor_name*– Data vendor name with foreign key*data_vendor:name.*(example: IB, GLOBEX)*exchange_symbol*– Instrument symbol as denoted by exchange with foreign key*symbol:instrument.*(example: AC)*vendor_symbol*– Instrument symbol as denoted by data vendor. (example: EH)*complete_symbol*– Instrument symbol with contract month and year identifier. (example: EHJ13)*contract_month*– Contract month of the given instrument in form MMMYY. (example: APR13)

**symbol**

*name*– Full name of the instrument. (example: ETHANOL)*product_group*– Sector which does the instrument belong to. (example: Energy)*currency*– Currency which is the instrument traded in.

Such design allows symbol table for multiple OHLCV data with same date and instrument but different data vendors.

Before we start using R, clone *cme_parser* repository from my github and change R working directory via setwd() function to *cme_parser* directory. I will use *sql *files as input to R functions and scripts. Customize and run following R script*. *It creates desired tables and function triggers which automatically updates time in *modified* column.

### Connection library(DBI) library(RPostgreSQL) drv <- dbDriver("PostgreSQL") # con <- dbConnect(drv) # default connection for localhost con <- dbConnect(drv, dbname = "cme_data", host = "192.168.88.202", # use PSQL server IP or "localhost" port = 5432, user = "r_client", password = "yourPassword") ## Create tables fileName <- "DBqueries/create_tables.sql" query <- readChar(fileName, file.info(fileName)$size) dbExecute(con, query) ## Create function trigger fileName <- "DBqueries/update_trigger.sql" query <- readChar(fileName, file.info(fileName)$size) dbExecute(con, query) ## Check created tables and field names tables <- dbListTables(con) for (i in 1:length(tables)) { cat(dbListFields(con, tables[i])) cat("\n-----------------------------------------------------------------------------\n") } # Disconnect from database dbDisconnect(con)

With tables ready we can start populating them. First we fill tables containing foreign keys for other tables. We can’t fill *daily_price* or *symbol* table first because some of their columns are related to other tables in other words if we want to insert data into constrained column in *daily_price*, PSQL will try to match our input data with data in “foreign key” table *symbol* which is obviously empty so it will cause error. Make sure your *con* connection object is still active and run following code.

### Populating data_vendor table fileName <- "DBqueries/populate_data_vendor_table.sql" query <- readChar(fileName, file.info(fileName)$size) dbExecute(con, query) ### Populating exchange table fileName <- "DBqueries/populate_exchange_table.sql" query <- readChar(fileName, file.info(fileName)$size) dbExecute(con, query)

The *symbol* table is little bit trickier to fill. Well, as you know the instrument full symbol consists of three parts – exchange symbol, contract month and contract year. So for example for March 2017 US T-Bonds we have complete symbol ZB H 17 (spaces added just for clearness). For current database purposes I chose 54 instruments from various markets. Have a look at input file with particular instruments details which we use for query construction. Note that the file has Matlab syntax. About a year ago I made this project in Matlab, so I’m using some stuff arranged before. Step through the following script and fill the symbol table.

### Populating symbol_table ## Input file parse out <- readLines("data/symbols.txt") out1 <- gsub("[;'{}]", "", out[out != ""], fixed = FALSE) out2 <- strsplit(out1," = ") ## Char to list l <- vector("list",length(out2)/7) rl <- 0 for (r in seq(1, length(out2), 7)) { cl <- 1 rl <- rl+1 for (c in seq(r, r+6, 1)) { l[[rl]][cl] <- out2[[c]][2] cl <- cl+1 } } ## List to dataframe df <- do.call(rbind.data.frame, l) colnames(df) <- c( "symbol", "months", "exchange", "name", "product_group", "currency", "born_year") df <- data.frame(lapply(df, as.character), stringsAsFactors = FALSE) ## Generating INSERT INTO query source("qGen.R") source("twodigityears.R") q <- apply(df, 1, qGen, fromYear = 1980, toYear = 2060) # set desired time frame q1 <- unlist(q, recursive = FALSE) q1[[length(q1)]] <- gsub(",$", ";", q1[[length(q1)]]) q2 <- c("INSERT INTO symbol (exchange_abbrev, instrument, name, product_group, currency,created, modified) VALUES", q1) q3 <- paste(q2, collapse = "\n") # write(q3,"q2Output.txt") # uncomment just for debugging purposes dbExecute(con, q3) # Disconnect from database dbDisconnect(con)

With tables with foreign keys for main table – daily_price, we can finally download, parse and export data to the database. Use my *Parser* R6 class. If you are interested in *Parser* implementation check *readme* file in the repository.

### Parser script setwd("/Users/jvr23/Documents/R/CME_Parser") source("Parser.R") # Initialization of new R6 Class object p <- Parser$new() # Configuring database connection p$set("db", list("192.168.88.202", 5432, "cauneasi54Ahoj")) # parse method of Parser object downloads and parse settlement report files p$parse() # exportQuery method constructs SQL query and exports data to the database p$exportQuery() # Uncomment when running from CRON or other scheduler q("no")

Let’s quickly check our new fresh exported data.

dbGetQuery(con, "SELECT * FROM daily_price ORDER BY complete_symbol LIMIT 30;")

Output should look something like this (created and modified columns are not shown).

Settlement prices are posted at approximately 6:00 p.m. CT which is around 23:00 UTC. That means that we could schedule *parserScript.r* to be executed every day at 23:11 UTC say with *cron* tool. Before we do so we need to create and configure *.Rprofile* to load needed libraries every time *r* is started.

$ cd $ nano .Rprofile

Paste following lines into the file and save (ctrl+o, enter, ctrl+x).

.First <- function(){ library("RPostgreSQL") library("data.table") library("DBI") library("R6") library("stringr") } .Last <- function(){ }

Test *parserScript.r* execution. (**Every time you test run p$parse() and p$exportQuery() make sure you truncate daily_price table and change date in log/reportdate.log !**)

$ r -f /Full/Path/to/CME_Parser/parserScript.R R version 3.3.2 (2016-10-31) -- "Sincere Pumpkin Patch" . . . Loading required package: DBI > ### Parser script . . . > p$parse() Downloading data:.... OK Loading settlement files:..... OK Checking settlement reports date:... OK Generating symbols for download:.......... OK Generating SQL query rows:...... OK > # exportQuery method constructs SQL query and exports data to the database > p$exportQuery() [1] TRUE > # Uncomment when running from CRON > q("no")

If execution was successful we can add entry to crontab.

$ crontab -e

Add following line.

11 23 * * 1,2,3,4,5 /full/path/to/r -f /Full/Path/to/CME_Parser/parserScript.R > /dev/null 2>&1

Congratulations! You have open-source home database solution for daily market data.

]]>Advantage of conditional variance models is that they better describe following time series properties:

- Returns of an asset have positive
*excess kurtosis*^{[1]}which means their PDF peak is sharper than the normal PDF peak. Tails of returns PDF often embody higher probability density than PDF shoulders, such the PDF has well-known fat-tails. - Volatility tends to cluster into periods with higher and lower volatility. This effect means that volatility at some time must be dependent on its historical values say with some degree of dependence.
- Returns fluctuations have asymmetric impact on volatility. Volatility changes more after downward return move than after upward return move.

Consider the general form of conditional variance model

(1)

Firstly we see that value of dependent variable consists of mean and innovation . In practice can be chosen as conditional mean of such that where ^{[2]} is arbitrary historical information affecting value of . In other words we model every by suitable linear regress model or using AR^{[3]} process. Often is sufficient use just fixed value or . Innovation consists of variance(volatility) root where ^{[4]} and i.i.d. random variable from normal or -distribution or .

From fragmented notation above we can write general conditional variance model as is known in econometrics literature

(2)

Observe that innovations are not correlated but are dependent through term (later we will briefly see contains lagged ). If is non-linear function then model is non-linear in mean, conversely if is non-linear then model is non-linear in variance which in turn means that is changing non-linearly with every through a function of . Since now we should know what autoregressive conditional heteroskedasticity means.

After discussion above we can quickly formulate the ARCH model which was introduced by Rober Engle in 1982. If we take (2) and specify condition(based on historical information ) for we get ARCH(m) model

(3)

where are i.i.d. random variables with normal or -distribution, zero mean and unit variance. As mentioned earlier in practice we can drop term thus get

or regress mean with exogenous explanatory variables as

or use any other suitable model. Recall that significant fluctuation in past innovations will notably affect current volatility(variance). Regarding positivity and stationarity of variance , coefficients in (3) condition have to satisfy following constraints

GARCH model was introduced by Robert Engle’s PhD student Tim Bollerslev in 1986. Both GARCH and ARCH models allow for leptokurtic distribution of innovations and volatility clustering (conditional heteroskedasticity) in time series but neither of them adjusts for leverage effect. So what is the advantage of GARCH over ARCH? ARCH model often requires high order thus many parameters have to be estimated which in turn brings need for higher computing power. Moreover the bigger order is, the higher probability of breaking forementioned constraints there is.

GARCH is “upgraded” ARCH in that way it allows current volatility to be dependent on its lagged values directly. GARCH(m, n) is defined as

(4)

where are i.i.d. random variables with normal or -distribution, zero mean and unit variance. Parameters constraints are very similar as for ARCH model,

In practice even GARCH(1, 1) with three parameters can describe complex volatility structures and it’s sufficient for most applications. We can forecast future volatility of GARCH(1, 1) model using

where

is unconditional variance of innovations . Observe that for as we get . So prediction of volatility goes with time asymptotically to the unconditional variance. If you are interested how are derived mentioned results and further properties of GARCH and ARCH I recommend you read this friendly written lecture paper.

Finally we get to the model which adjusts even for asymmetric responses of volatility to innovation fluctuations. GJR-GARCH was developed by Glosten, Jagannathan, Runkle in 1993. Sometimes referred as T-GARCH or TARCH if just ARCH with GJR modification is used. GJR-GARCH(p, q, r) is defined as follows

where are leverage coefficients and is indicator function. Observe that for negative innovations give additional value to volatility thus we achieve adjustment for asymmetric impact on volatility as discussed at the beginning of the article. For we get GARCH(m = p, n = q) model and for we get exotic result where upward swings in return or price have stronger impact on volatility than the downward moves. Need to mention that in most implementations of GJR-GARCH we will find GJR-GARCH(p,q) where leverage order is automatically considered equal to order . Parameters constraints are again very similar as for GARCH, we have

Prediction for GJR-GARCH can be estimated as

If you are interested in analytical solutions for predictions of non-linear conditional variance models read Franses-van Dijk (2000).

Estimation of linear GARCH and non-linear GARCH models is done using MLE, QMLE and robust estimation.

*Denotation: **I was using as dependent variable, since now let .*

I will demonstrate GARCH(m, n) estimation procedure on returns of bitcoin daily price series which I used in earlier post about volatility range estimators. Let’s have look at input data.

C = BFX_day1_OHLCV(:,4); date = BFX_day1_date; %% Returns. Note that we don't know return for C(1) so we drop first element r = double((log(C(2:end)./C(1:end-1)))*100); % scaled returns in [%] for numerical stability e = r - mean(r); % innovations after simple linear regression of returns C = C(2:end); date = date(2:end); %% Plot C and r % C figure1 = figure; subplot1 = subplot(2,1,1,'Parent',figure1); hold(subplot1,'on'); plot(date,C,'Parent',subplot1); ylabel('Closing price'); box(subplot1,'on'); set(subplot1,'FontSize',16,'XMinorGrid','on','XTickLabelRotation',45,'YMinorGrid','on'); % r subplot2 = subplot(2,1,2,'Parent',figure1); hold(subplot2,'on'); plot(date,r,'Parent',subplot2); ylabel('returns [%]'); box(subplot2,'on'); set(subplot2,'FontSize',16,'XMinorGrid','on','XTickLabelRotation',45,'YMinorGrid','on');

*Fig.1 Volatility clusters in returns series are obvious at the first glance.*

Let’s examine character of returns mean. Is conditioned by its lagged values? ACF, PACF and Ljung-Box test^{[5]} help us in this decision. Note that series is stationary with mean very close to zero. We could use everywhere just instead of innovations but correct is to use innovations/residuals.

%% Autocorrelation of returns innovations - ACF, PACF, Ljung-Box test % ACF figure2 = figure; subplot3 = subplot(2,1,1,'Parent',figure2); hold(subplot3,'on'); autocorr(e); % input to ACF are innovations after simple linear regression of returns % PACF subplot4 = subplot(2,1,2,'Parent',figure2); hold(subplot4,'on'); parcorr(e); % input to ACF are innovations after simple linear regression of returns % Ljung-Box test [hLB,pLB] = lbqtest(e,'Lags',3);

*Fig.2 Returns innovation series exhibits autocorrelation at
*

ACF and PACF show us that returns are autocorrelated. We can also reject Ljung-Box test hypothesis with ^{[6]} thus there is at least one non-zero correlation coefficient in .

Next we will check for conditional heteroskedasticity of returns by examining autocorrelation of squared innovations .

%% Conditional heteroskedasticity of returns - ACF, PACF, Engle's ARCH test % ACF figure3 = figure; subplot5 = subplot(2,1,1,'Parent',figure3); hold(subplot5,'on'); autocorr(e.^2); % PACF subplot6 = subplot(2,1,2,'Parent',figure3); hold(subplot6,'on'); parcorr(e.^2);

*Fig.3 Squared innovation series exhibits autocorrelation at which tells us that variance of returns is significantly autocorrelated thus returns are conditionally heteroskedastic.
*

Let assure ourselves by conducting Engle’s ARCH test^{[7]}.

% ARCH test [hARCH, pARCH] = archtest(e,'lags',2);

ARCH test rejects with ridiculously small in favor of the hypothesis so returns innovations are autocorrelated – returns are conditionally heteroskedastic. Consider as ‘lags’ input into the ARCH test. What do we expect to be? Bigger lag we choose, bigger we can expect. It is naturally caused by that we include more and more values that are not significantly autocorrelated in the ARCH test therefore probability of not-rejecting grows.

Here we go, we made sure that we can apply (1) to given data. After slight modification we have

where is conditional mean and is conditional innovation. We describe our by AR-GARCH models by setting up the ARIMA model objects.

%% AR-GARCH model, ARIMA object MdlG = arima('ARLags',2,'Variance',garch(1,1)); % normal innovations MdlT = arima('ARLags',2,'Variance',garch(1,1)); % t-distributed innovations MdlT.Distribution = 't';

This model corresponds to

(5)

(6)

where we suppose that is or and i.i.d. We will compare quality of both models using information criterions.

Let’s proceed to parameters estimation.

%% Parameters estimation % normal innovations EstMdlG = estimate(MdlG,r); % t-distributed innovations EstMdlT = estimate(MdlT,r);

gives us following results for normally distributed innovations:

ARIMA(2,0,0) Model: -------------------- Conditional Probability Distribution: Gaussian Standard t Parameter Value Error Statistic ----------- ----------- ------------ ----------- Constant -0.0567801 0.172784 -0.328618 AR{2} -0.0656668 0.0493691 -1.33012 GARCH(1,1) Conditional Variance Model: ---------------------------------------- Conditional Probability Distribution: Gaussian Standard t Parameter Value Error Statistic ----------- ----------- ------------ ----------- Constant 1.30152 0.216776 6.00402 GARCH{1} 0.831643 0.02415 34.4366 ARCH{1} 0.0868795 0.0155263 5.59563

Hence we can rewrite (5) and (6) as

where we have just one unknown – volatility or conditional variance of returns which we can recursively infer. We found out that in (5) thus has no explanatory power for returns as dependent variable. Moreover it seems that innovations autocorrelation is not strong enough to give statistical significance to in (5) and to . T-test^{[8]} t-statistic for and don’t fall into the critical region so we can’t reject hypothesis about zero explanatory power of these two coefficients.

For innovations from -distribution we get:

ARIMA(2,0,0) Model: -------------------- Conditional Probability Distribution: t Standard t Parameter Value Error Statistic ----------- ----------- ------------ ----------- Constant -0.0651421 0.0895812 -0.727185 AR{2} -0.0728623 0.0350649 -2.07793 DoF 2.78969 0.325684 8.56564 GARCH(1,1) Conditional Variance Model: ---------------------------------------- Conditional Probability Distribution: t Standard t Parameter Value Error Statistic ----------- ----------- ------------ ----------- Constant 1.33968 0.575195 2.32909 GARCH{1} 0.742378 0.0554331 13.3923 ARCH{1} 0.257622 0.102234 2.51993 DoF 2.78969 0.325684 8.56564

Hence we can rewrite (5) and (6) as

where . In this case all estimated parameters except of have statistical significance.

We can also try to model variance using just pure GARCH(1,1) with constant in (5) .

%% GARCH without mean offset (\mu_t = 0) % normally distributed innovations EstMdlMdloffset0G = estimate(Mdloffset0G,r); % t-distributed innovations EstMdlMdloffset0T = estimate(Mdloffset0T,r);

We get similar results as with AR-GARCH approach because AR(2) plays insignificant role in AR-GARCH:

GARCH(1,1) Conditional Variance Model: ---------------------------------------- Conditional Probability Distribution: Gaussian Standard t Parameter Value Error Statistic ----------- ----------- ------------ ----------- Constant 1.35151 0.219309 6.16257 GARCH{1} 0.824422 0.024637 33.4627 ARCH{1} 0.0918974 0.0164784 5.57683 GARCH(1,1) Conditional Variance Model: ---------------------------------------- Conditional Probability Distribution: t Standard t Parameter Value Error Statistic ----------- ----------- ------------ ----------- Constant 1.30701 0.549996 2.3764 GARCH{1} 0.740395 0.0547972 13.5115 ARCH{1} 0.259605 0.100135 2.59256 DoF 2.82167 0.329997 8.55058

So which model choose now? Model with -distributed innovations seems to be promising. Let’s examine it quantitatively by AIC^{[9]}, BIC^{[10]}. Before we can compare our models we need to infer log-likelihood objective functions for each of the model. We can also extract final conditional variances – volatilities.

%% Infering volatility and log-likelihood objective function value from estimated AR-GARCH model [~,vG,logLG] = infer(EstMdlG,r); [~,vT,logLT] = infer(EstMdlT,r); %% Comparing fitted models using AIC, BIC % AIC,BIC % inputs: values of loglikelihood objective functions for particular model, number of parameters % and length of time series [aic,bic] = aicbic([logLG,logLT],[5,6],length(r))

we get

aic = 3752.3348 3485.1211 bic = 3774.9819 3512.2976

So both AIC and BIC indicate that AR-GARCH with -distributed innovations should be chosen.

Now we specify and estimate AR-GJR-GARCH adjusting for asymmetric volatility responses and compare it with better performing AR-GARCH with -distributed innovations using AIC and BIC. We will define just version with -distributed innovations.

%% AR-GJR-GARCH, ARIMA object MdlGJR_T = arima('ARLags',2,'Variance',gjr(1,1)); MdlGJR_T.Distribution = 't'; %% Parameters estimation % t-distributed innovations EstMdlGJR_T = estimate(MdlGJR_T,r); ARIMA(2,0,0) Model: -------------------- Conditional Probability Distribution: t Standard t Parameter Value Error Statistic ----------- ----------- ------------ ----------- Constant -0.0784738 0.0895587 -0.876227 AR{2} -0.0714758 0.0347855 -2.05476 DoF 2.77131 0.325323 8.51864 GJR(1,1) Conditional Variance Model: -------------------------------------- Conditional Probability Distribution: t Standard t Parameter Value Error Statistic ----------- ----------- ------------ ----------- Constant 1.34392 0.580147 2.31653 GARCH{1} 0.74785 0.0542454 13.7864 ARCH{1} 0.201164 0.0961537 2.09211 Leverage{1} 0.101972 0.113445 0.898867 DoF 2.77131 0.325323 8.51864 %% Infering volatility from estimated AR-GJR-GARCH model [~,v_GJR_G,logL_GJR_G] = infer(EstMdlGJR_T,r); %% Comparing fitted models using BIC, AIC [aic2,bic2] = aicbic([logLT,logL_GJR_G],[6,7],length(r));

we get

aic2 = 3483.12113743012 3483.95446404909 bic2 = 3505.76823162144 3511.13097707866

therefore original AR-GARCH slightly outperforms AR-GJR-GARCH. Actually it is obvious from the output of AR-GJR-GARCH estimate because leverage coefficient is statistically insignificant.

Our resulting conditional mean and variance model is AR-GARCH with -distributed innovations in the following form

Let’s plot closing prices along with AR-GARCH with -distributed innovations and AR-GARCH with normally distributed innovations.

%% plot results % Closing prices figure4 = figure; subplot7 = subplot(2,1,1,'Parent',figure4); hold(subplot7,'on'); plot(date,C); ylabel('Closing price'); set(subplot7,'FontSize',16,'XMinorGrid','on','XTickLabelRotation',45,'YMinorGrid','on','ZMinorGrid',... 'on'); % volatility AR-GARCH, innovations t-distributed subplot8 = subplot(2,1,2,'Parent',figure4); hold(subplot8,'on'); plot(date,vT); % volatility AR-GARCH, innovations normally distributed plot(date,vG); ylabel('volatility'); legend({'$\varepsilon_t$ $t$-distributed','$\varepsilon_t$ normally distributed'},'Interpreter','latex'); set(subplot8,'FontSize',16,'XMinorGrid','on','XTickLabelRotation',45,'YMinorGrid','on','ZMinorGrid',... 'on');

*Fig.4 Comparison of generated volatilities. Difference between distributions is obvious.
*

Download all code in one in GARCHestimation.m matlab script.

For purpose of this text we consider excess kurtosis as

where is fourth centered moment about the mean and is clearly squared variance of . PDF of the random variable with is respectively said to be platykurtic, mesokurtic or leptokurtic. ARCH models allow for leptokurtic distributions of innovations and returns.

2. -algebra

Suppose we have a vector of variances and a vector of values of dependent variable . Let denote time. We can state that these two vectors contain information. Now consider arbitrary functions , then information generated by is considered to be -algebra generated by given set of vectors. So if we have whatever conditional variable it just means that we suppose its value is dependent on some other values through a function.

3. AR process

Auto-regressive process AR(p) is defined as

After slight modification we can us AR(p) as

4. Just common denotations for one value, take a think just about last equality.

5. Ljung-Box test

Test whether any of a given group of autocorrelations of a time series are different from zero.

: The data are independently distributed (up to specified lag ).

: The data are not independently distributed. Some autocorrelation coefficient is non-zero.

6.

Familiar discussed by many students and practitioners forevermore. There are many intuitive interpretations of this value, some of them correct some of them not. I recommend you find your one which fits your mind best. For me personally it’s “the greatest significance level up to which we would not reject the null hypothesis”. Sometimes it’s being said to be *plausibility* because less is, less acceptable null hypothesis is.

7. Engle’s ARCH test

Engle’s ARCH test assess the significance of ARCH effects in given time-series. Time-series of residuals doesn’t need to be autocorrelated but can be autocorrelated in squared residuals, if so, we get familiar ARCH effect. Note that innovations figure as residuals.

: The squared residuals are not autocorrelated – no ARCH effect.

: The squared residuals are autocorrelated – given time-series exhibits ARCH effect.

8. T-test

We use one-sample T-test with hypotheses defined as

.

.

where is parameter in question.

9. AIC

Akaike’s Information Criterion (AIC) provides a measure of model quality. The most accurate model has the smallest AIC.

10. BIC

The Bayesian information criterion (BIC) or Schwarz criterion is a criterion for model selection among a finite set of models, the model with the lowest BIC is preferred.

]]>Volatility is the degree of variation of a price series over time as measured by the standard deviation of returns.

Why is volatility of vast importance in financial world? One of the main reason is because it’s used as a measure of *risk*. Greater volatility of an asset means riskier opportunity for potential investor and vice versa. In practice traders can use output from volatility models to set up leverage(or any other parameter) of their positions thus volatility can help optimize a trading strategy. Usually PnL of a trading strategy is a function of volatility or at least variance of PnL is. We can explore such a volatility-PnL relation via regression analysis but it’s not aim of this article.

Sometimes called realized volatility or simple moving average(SMA). The historically oldest approach to volatility comes directly from the definition. We just select rolling window of length over time serie and calculate volatility as *sample variance of returns over given period* from time to . That is

(1)

where is simply *sample mean of returns over given period* calculated over same window from to

and is particular asset return over one time unit

Observe that this model has one big disadvantage of assigning equal weights to terms in sum in (1). Suppose there was some short-term significant swing in volatility at time . Then every calculation of from time to time will contain this outlier with same weight. This will cause plateauing of such an estimate. Download function for calculation here. I will demonstrate the plateauing effect on bitcoin daily price time serie.

*Fig.1 On these two plots we can see that sudden fluctuation of returns causes the plateauing effect and the model gives strongly biased volatility estimate.
*

The market convention is to quote in terms of annualized volatility. For daily historical volatility we have and we can annualize it as .

If we slightly modify the (1) considering we get rid of one estimation. This can be clearly done just for series of logarithmic returns, not for price series. Thus we get *simple close-close volatility estimator*

*Fig.2 Comparison of SMA and SMA with close-close approach. Both estimates follows tightly each other.
*

EWMA model is extented version of forementioned simple historical volatility and partial solution to the plateauing issue. EWMA approach was developed by J.P.Morgan within the RiskMetrics methodology framework and is defined as follows

(2)

or after rearranging

(3)

where is an asset price, is mean of an asset price and is decay factor such that . In (2) note that as we have or so deeper observations get smaller and smaller weights.

If we want EWMA model for logarithmic returns we just swap in (2) and (3) for and in practice we can drop the term after we checked that holds. Hence we get

Observe how and weight terms in the model. Greater makes model more affected by last variance, in other words model tends to revert to its previous volatility level(variance). Conversely small gives more weight to last return. is sometimes called memory because it directly affects how much is variance dependent on its previous value. RiskMetrics methodology suggest use which is widely used by practitioners.

*Fig.3 Exponentially decaying EWMA is still biased by outliers but gives much better volatility estimate than SMA.
*

You can download Matlab function for EWMA volatility estimate here.

In 1980, a physicist Michael Parkinson showed in his paper that we can project additional information to volatility estimate by using not just close prices but rather price extremes. He proposes use of log differences between highs and lows over specified time window. Such an estimator is five times more efficient than historical SMA approach (for same amount of input data PE variance is 1/5th of SMA variance). Rolling PE is given by

where is PE volatility computed over given time window to and are corresponding high and low.

*Fig.4 Even that Parkinson estimator is significantly more precise in the term of variance it tends to underestimate volatility as seen on picture above. It should be used in combination with other estimators which don’t underestimate. We can include PE in a volatility composite.*

Plots above were made using my PEvol() function.

In 1980, Garman-Klass realized that utilizing all of the OHLC information must give even more precise volatility estimation than PE. It can be explained as an optimal (smallest variance) combination of SMA and PE. G-K estimator is 7.4x more efficient than SMA. Considering our rolling window of size , G-K estimator is written as

(4)

where are respectively open,high,low,close at time in the particular rolling window. Second term in (4) in the brackets can be neglected since it’s very small.

*Fig.5 Disadvantage of G-K estimator is that it’s trend dependent. As we used bitcoin price data with strong drift the G-K estimation gives overestimated volatility values.
*

Download Matlab function for Garman-Klass estimation.

R-S volatility estimator was published in 1991. This estimator is independent of the drift and computed as

(5)

*Fig.6 R-S estimator closely follows SMA estimate during low volatility periods. Difference of estimators during volatility peaks is due to the wide trading ranges in these periods and lack of SMA estimator incorporate this fact.*

Matlab function for R-S estimate can be downloaded here.

Discussion above shows that historical volatility estimators can be sorted into

- mean-deviation estimators (SMA, EWMA)
- close-close estimators (SMA(), EWMA())
- range estimators (P-E, G-K, R-S).

Great paper on properties of range-based estimators with further details can be downloaded here.

]]>Implied volatility (IV) is the volatility of an asset derived from changes in value of corresponding option in such way that if we input IV into option pricing model, it will return theoretical value equal to the current option value. Contrary to historical volatility, IV is the volatility forecast for price of the underlying asset from current time to option expiration . Let’s quickly introduce the Black-Scholes option pricing model. There are multiple ways to derive BSE and you can review them here. The brief solution of BSE can be found here. So the familiar BSE form is

(1)

where

– option value,

– underlying asset spot price,

– risk-free interest,

– current time, date of expiry, time to expiry,

– volatility (in our case implied volatility).

The solution of (1) for value of an call or put option at time can be expressed as

(2)

where

is strike price of an option and is value of normal CDF at particular points . Recall that normal CDF is written

Do you see how deeply is buried? We see that and and since there doesn’t exist the closed form solution of the can’t be expressed analytically. That’s why we use numerical methods and approximation to evaluate implied volatilities. Familiar widely used numerical approaches are Newton-Raphson or secant method. Approximation of IV can be estimated by Bharadia-Christopher-Salkin(1996) model or Corado-Miller(1996) model. Among the latest and most effective estimators we can sort Hallerbach(2004), Jaeckel(2006) and Jaeckel(2013).

For purpose of this article let’s show how to implement just the most basic B-C-S model and secant method.

The B-C-S model is derived as

(3)

where

– time to option expiration,

– current option value,

– underlying asset spot price,

– option strike price( is discounted strike price).

The secant numerical method is given by

(4)

where

is theoretical option value from (2).

Algorithm itself works as follows(suppose computation for one specific ):

- Compute using (3).
- Compute theoretical value of an option using (2).
- Set initial values(initial guess) for secant method, that is
- and .
- These temporary variables will change in the secant method loop. Keep in mind that the iterator in (4) of the secant loop is (not !).
- Compute approximation error(last product on the left side of (4)).
- Compute new using (4).
- Compute new using (2)
- If error < accuracy threshold then last is our from secant method alias implied volatility. Proceed to the next .
- If error is > accuracy threshold go back to step 6 (proceed to the next ) and don’t forget you need to recompute new values. Use from last secant iteration (note that initial values from step 3. and 4. will not be used in 2nd secant iteration). On every new iteration, variables become variables and new variables are computed.

I will use historical EOD option chain data for BAC stock.

*Fig.3 Example option chain .csv file for BAC stock.*

If you are wading through my intro to volatility thoroughly you may want to download the example option chain. Option data are very expensive and tough to store and manipulate so it’s nearly impossible to find them free. So now we need to prepare data for IV calculation as follows:

- Use optionparse() function for importing the .csv option chain into matlab table.

table = optionparse('path/to/datafile','ivolatility','equity');

- Now we extract dates , adjusted stock spot prices and we choose call options with strike , time to expiry and the ‘risk-free’ interest rate be which was T-Bill rate in the early 2015. Let me point out that in practice “constant” will not be constant because we will not find such options at every time which have exactly half a year expiration. Only finite number of options is offered every . So our will be variable. At every I choose the first option with and . Following code finds forementioned options.

% find dates [~,ia,ib] = unique(table.data.date,'first','legacy'); t = table.data.date(ia); S = table.data.adjusted_stock_close_price(ia); % time to expiration in N/252 fraction, for simplicity I don't consider holidays, in practice you should Tte = yearfrac(table.data.date,table.data.expiration,13); % search options with expiration T_t and strike K T_t_const = 0.5; K = 15; % expiration filter ic_start = ia + accumarray(ib,Tte,[],@(x) find(x>=T_t_const,1)) - 1; ic_end = ia + accumarray(ib,Tte,[],@(x) find(x<=T_t_const+0.5,1,'last')) - 1; idExpiration = zeros(size(Tte)); % strike filter id = zeros(length(ic_start),1); for i=1:length(ic_start) temp = find(table.data.strike(ic_start(i):ic_end(i)) == K,1); if ~isempty(temp) id(i,1) = find(table.data.strike(ic_start(i):ic_end(i)) == K,1); else id(i,1) = NaN; end end % uncomment proper option idTK = ic_start + id - 1; % call options % idTK = ic_start + id; % put options % time to expiry at each time t T_t_var = Tte(idTK); % let the value of the option be (ask+bid)/2. V is vector with filtered % options V = (table.data.ask(idTK) + table.data.bid(idTK))/2;

- Our variables and constants are ready. I will demonstrate IV computation for .

1. Calculate initial IV estimate by B-C-S model (3).

t = 1; r = 0.001; delta = (1/2)*(S(t) - K*exp(-r*T_t_var(t))); % (3) sigma_BCS(t,1) = sqrt(2*pi-T_t_var(t)) * (V(t)-delta)/(S(t)-delta); % (3)

2. Compute theoretical value of an option using (2).

d1 = (log(S(t)/K) + (r+1/2*sigma_BCS(t,1)^2)*(T_t_var(t))) / sigma_BCS(t,1)*sqrt(T_t_var(t)); % (2) d2 = d1 - sigma_BCS(t)*T_t_var(t); BS(t) = S(t)*cdf('normal',d1,0,1) - K*exp(-r*T_t_var(t))*cdf('normal',d2,0,1); % (5) for call option

3. Set initial values(initial guess) for secant method, that is . Note that I smuggled in the initial error value and secant method accuracy.

err_thr = 1e-4; % secant method error tolerance(accuracy) err = 1; % initial error value temp_sigma(1:2,1) = [0,sigma_BCS(t,1)];

4. and .

temp_BS(1:2,1) = [0,BS(t,1)];

5. Secant method, steps described in the code.

while err >= err_thr % error update - step 6 err = (temp_BS(2,1) - V(t)) * ... ((temp_sigma(2,1) - temp_sigma(1,1)) / (temp_BS(2,1) - temp_BS(1,1))); % (7) error corresponding to last computed sigma % sigma_{i+1} update - step 7 temp_sigma(3,1) = temp_sigma(2,1) - err; % new sigma from secant iteration temp_sigma(1) = []; % old sigma no more needed in next loop % BS(sigma_{i+1}) update - step 8 d1 = (log(S(t)/K) + (r+1/2*temp_sigma(2,1)^2)*(T_t_var(t))) / temp_sigma(2,1)*sqrt(T_t_var(t)); % (2) d2 = d1 - temp_sigma(2,1)*T_t_var(t); % (2) temp_BS(3,1) = S(t)*cdf('normal',d1,0,1) - K*exp(-r*T_t_var(t))*cdf('normal',d2,0,1); % (2) for call option temp_BS(1) = []; % old BS(sigma) no more needed in next loop end

9. If error < accuracy threshold then last is our .

sigma_secant(t,1) = temp_sigma(2,1);

10. If error > accuracy threshold go to step 6 (next ).

You can download complete working BCS_secant() function here.

*Fig.4 B-C-S-secant computed daily implied volatility for Bank of America stock in 2015
*

We can slowly move to the next concept of practical importance which ensues directly from implied volatility calculation. We have computed one IV value for every and we supposed that where were fixed (“fixed” for because cannot be strictly constant). Clearly we can release this assumption and let the be variables. For every we now have not just one point but set of points which constitutes the *implied volatility surface.* If we now fix just at some time in we obtain values constituting *implied volatility smile or implied volatility skew*, if we fix just at some time we obtain *implied volatility term structure*.

Multitudes of volatility indexes (composites) are constructed using various setups in IV computation. Some of them average IV’s of stocks in indexes such a S&P500, DJIA, NASDAQ …, some of them average similar ETF’s etc. As was mentioned before such indexes measure expectation of volatility. For example have a look at the CBOE volatility indexes overview.

]]>

Besides ordinary crypto-currency spot exchange Bitfinex also offers margin trading with BTC, LTC, ETH and USD. What does it mean? It means that you can borrow fiat or crypto from a liquidity provider(lender) at some interest rate and you can trade with that borrowed money and therefore effectively leverage your positions. If you want going short BTC/USD (sell BTC/USD) you simply borrow BTC and then sell them for USD. Conversely if you want going long BTC/USD(buy BTC/USD) you take an USD loan and buy BTC.

- The current lever is 3.3x – which means you can trade with 330% of your net balance so 330% = 100% of your own balance + 230% from a loan.
- The maintenance equity margin is 15% – that is when forementioned 100% net balance drops below 15% your positions get force liquidated.
- The expiration for loans is selected by the lender and is 2-30 days. Loan can be prematurely closed by borrower only.
- Bitfinex fee for providing liquidity lending feature is 15% of interest rate. Fee is deducted from lender’s earnings.
- Interest rate is paid once a 24 hours to the loan creditor(lender).
- The smallest time for which interest rate applies is 1 hour, any loan durable for less than 1 hour is considered as it lasted for 1 hour.

Particular loans are “traded” in very similar fashion as a BTC/USD on ordinary spot exchange. Just instead of price as primary order book driver we have an *interest rate per day*. Technically we call such an order book as FIFO continuous double auction and such a market as *money market*. Which orders get filled first? Clearly those orders with best interest rate hence the *interest rate per day* is the primary order book driver. What if there are more orders with same interest rate? We need a secondary order book driver which is in our case *time of order arrival*. This mechanism is called FIFO – first in first out from a queue of orders with same *interest rate per day*. We could easily use LILO (last in last out) instead of FIFO but it’s not a convention :). Continuous stands for the fact that order book is constantly updated and any of the lender or borrower can send order anytime. By this way the *interest rate per day* is “fully” driven by offer and demand of money market participants.

Let me quickly note that there exist “not FIFO continuous double auction” order books. How can they look like?

For example instead of FIFO matching algorithm we can have matching on *pro-rata* basis. Imagine that you have five orders with five volumes at one price level in an order book. Now arrives a market order with same price as is our price level but with not enough volume to sweep through all five limit orders. With FIFO matching this volume is matched only with orders with highest time priority(the earliest orders) but with pro-rata matching the given market order volume is matched with ALL of five limit orders and the market order volume is distributed proportionally with size of given five limit orders. This matching algorithm eliminates needs for “be first in queue” hence significantly reduces order cancellations and exchange servers load. On US stock markets ~96% of orders are cancelled so it does matter.

Instead of continuous there can be discrete matching of orders. We simply collect orders and match them say every hour. Non-matched orders are moved to a next matching. Discrete matching works as capacitor, it absorbs excessive volatility.

I used basic statistics and few assumptions to design my bot. Let’s have a look into it.

Sample lending book:

Central question is “Where to send my limit order?” Do I need to send it at the top of the offer side? Of course I don’t. Say I want the loan offer to be executed in 1 hour with at least 85% probabilty. That way I can slightly improve interest rate at which I lend my funds by a manner of eliminating loan offers at the lowest rates.

I can estimate interest rate of an order which is gonna to be executed with at least 85% probability in 1 hour by the following way.

Download historical lending data from here. I’m using download script and cron because I want to have fresh .csv file for volume threshold calculation every day. If you are wondering what that mysterious “volume threshold” means, it’s a cumulative volume next to which is our desired interest rate with 85% probability of execution in 1 hour. It becomes more clear as we move ahead.

Example historical tick data .csv:

I compute the volume threshold based on last 10 days data. It should be enough to give some prediction power to the estimation. Let’s divide our tick data into 30 minute interval bins and sum corresponding volumes in particular intervals. In the following picture is the discrete PDF with 0.38 percentile.

Let’s give some intuition behind the histogram. If I send a loan offer at cumulative volume level 34887 then there is the 62% probability of my order being executed in 30 minutes. In other words there is 62% probability that following 30 minute volume will be more than 34887 therefore our order will be executed. This probability holds for 30 min interval. What’s the probability of the order being executed in 1 hour … etc.? I use Bernoulli’s formula for binomial probability of getting at least *k* event in *n* trials:

So if I send my loan offer at 34887 volume level (volume threshold) it should get executed in 1 hour with 85.6% probability. That’s it. See appendix for intuition behind the Bernoulli’s formula.

I run volume threshold calculation script once a day. Output from this script is naturally volume threshold which is in turn input for lending script itself.

Before you use my lending script you need to download Bitfinex API MATLAB client. I will not go into details how to use this library because it’s quite straightforward. In a nutshell you need to generate key and corresponding secret (on BFX platform) and then insert both to the “key_secret” file. When you are finished, check “examples.m” file. I recommend you to read the short documentation along the library. It should take you just a few minutes.

So we have freshly calculated volume threshold, we have API client and keys/secrets in their place. Now we automate lending_script.m, threshold_calc.m and SizeCorrectionwget.sh by making an entry in crontab:

*/11 * * * * /usr/local/bin/matlab -nodesktop -nosplash -r lending_script > /dev/null 2>&1 5 0 * * * /usr/local/bin/matlab -nodesktop -nosplash -r threshold_calc > /dev/null 2>&1 55 23 * * * /home/maple/Documents/MATLAB/scripts/SizeCorrectionwget.sh

lending_script.m runs each 11 minutes, threshold_calc.m once a day at 00:05 and SizeCorrectionwget.sh once a day at 23:55. Note that there need to be set few custom paths in scripts. It’s completely up to you how you set them.

If you have waded through the lending_script.m code you have noticed a piece of logging code. It simply appends status line from last script execution to the selected log file. If you want your log files divided by week, month etc. then I recommend logrotate utility otherwise you will end up with just one long log file :).

Example log file:

There are two types of entries in the log file. The first type – Unused funds … , is logged when there are no available funds to lend. Threshold is the volume threshold discussed above. Ratewmean is volume weighted average rate of all loans calculated by the following way:

– volume of loan

– price of loan

– volume weighted average rate

– number of outstanding loans

Second value in brackets is so it’s net interest rate p.a. after fees deducted.

Second type line is logged when there are funds available for lending. Amount is amount lent in dollars, rate is interest rate in % p.a.(%/100 per day) and period is set expiration time.

Here is figure comparing my volume weighted portfolio rate and volume weighted BFX rate as a benchmark. The BFX rate is computed exactly as *r _{vw}* above except that we take lending tick data and all prices with their respective volumes.

It can be clearly seen that at times with low BTC price volatility both rates are closely following each other. Volume threshold is very close to the best ask. Rates started diverge during May where BTC price became more volatile. Key role in rates divergence plays lending script feature which set different expirations for different rates. Currently it’s hardcoded into the script but I can imagine it could be optimized. Script sets 30 days expiration to loans with interest more than 20% p.a. (fees excluded).

Let’s translate the

formula.

Let the is probability of success hence must be probability of failure (funds were lent vs. funds weren’t lent), is number of trials with at least successes. Substitute “and” instead of and “or” instead of . Then we can write our event as

*“probability of at least one success in two trials” = “probability of exactly one success in two trials” or “probability of exactly two successes in two trials”.*

**Exactly one success** means that there will be exactly one success and one failure in two trials. We also have possibilities of doing such an event. Hence

*“probability of exactly one success in two trials” .*

**Exactly two successes** are clear. This event means success in 1st trial and success in 2nd trial. Note that we have just possibility of doing so. Hence

*“probability of exactly two successes in two trials”* .

That way we can derive formula for at least successes in trials

]]>

tick2bar() function assumes you have already imported your tick data into Matlab. If you haven’t and your tick data still resides just in a text file then you need to import them via an import function such the csvread(), textscan() or dlmread(). Check Matlab’s ways to import text files or low level I/O text import.

Arguments for tick2bar() function are:

*dates*– vector in datetime data type*price*– vector in arbitrary numeric type*amount*– vector in arbitrary numeric type (this value is traded volume corresponding with particular tick/price)*barlenght*– integer (this option sets duration of a bar such that each bar has a duration of*barlenght**bartype*units)*bartype*– one of the character ‘s’, ‘m’, ‘h’, ‘d’, ‘M’, ‘Y’ (this option is just the unit for*barlength*scalar)

So for 10 minute bars output we simply choose *barlenght* = 10 and *bartype* = ‘m’.

Your input should look something like this:

Output from the tick2bar() has two components – datetime vector and OHLCV matrix.

Suppose we want to convert tick data into forementioned 10 minutes bars, so we call the function as follows:

[times,OHLCV] = tick2bar(times,price,amount,10,'m');

Download tick2bar().

]]>