Posted October 28, 2015

Contents

Bitcoin Price Prediction Using Weka

(3 votes, average: 4.00 out of 5)

Hoang Pham Truc Phuong, hptphuong@gmail.com, is the author of this article and he contributes to RobustTechHouse Blog for our Machine Learning column. RobustTechHouse is a web & mobile app development house focusing on Financial (Fintech) and ECommerce sectors and likes to dabble with data analysis and machine learning too.

In this post, I will experiment with real bitcoin price data using Weka and try to forecast the trend of bitcoin price.

1. Data Preprocessing

1.1 Data overview

The data is bitcoin price from “2015-03-12 22:16:48” to “”2015-04-12 22:23:48″” every minute. Some indicators used are: SMA-5 and SMA-10. A sample of the data:

We will use data from “2015-03-12 22:16:48” to “2015-04-12 22:03:48” for the training model and try to predict price movements in next 20 minute.

1.2 Convert data to ARFF

The original data is kept as csv format and Weka allows us to import data in csv format. However, the original data is provided in the following format:

amount,price,time,Trend,priceSMA5,priceSMA10

295.47,2.42125956,”2015-03-12 22:16:48″,down,2.064252512,1.246701952

295.44,0.2,”2015-03-12 22:17:48″,down,2.096251912,1.2658019520000001

295.33,0.09037846,”2015-03-12 22:18:48″,up,1.090327604,1.245839798

295.31,0.88,”2015-03-12 22:19:48″,up,0.8663276040000002,1.1542462480000002

295.31,3.207,”2015-03-12 22:20:48″,down,1.3597276040000001,1.473946248

295.45,1.0,”2015-03-12 22:21:48″,up,1.075475692,1.5698641020000002

295.43,2.387679,”2015-03-12 22:22:48″,down,1.513011492,1.8046317020000002

295.42,0.2,”2015-03-12 22:23:48″,up,1.5349358000000002,1.312631702

If we import it into Weka directly, we will receive the result which are listed below.

As you can see, the type of “time” is nominal which means data is seen as a primitive string. In Weka, the right type for “datetime” format is “timestamp”. Let helps Weka recognize right type for “time ” by massaging the data a little.

Finally we save the file with a new extension (arff). This step helps Weka recognize the right type of date time. We can now import the data again.

2. Select Attributes

To start forecasting, you need to do the “Feature selection” step to improve your accuracy, reduce overfitting and decrease your training time. For feature selection, Weka provides an “Attribute selection” tool. The process is separated into two parts:

Attribute Evaluator: Method by which attribute subsets are assessed.
Search Method: Method by which the space of possible subsets is searched.

In this practice, I choose “CorrelationAttributeEval” for my evaluator and there is the result:

From the result, I realized that the “amount” field does not carry much info for price trend prediction, so I can remove it. In the preprocess tab, we can choose first attribute (amount) and choose remove:

3. Build model to predict trend of Bitcoin price

After selecting the right features that we need, we begin to build the model. Here, I build some models to test. There is the list of some models I will try:

J48 model: Class for generating a pruned or unpruned C4.5 decision tree
J48 consolidated model: Uses the Consolidated Tree Construction (CTC) algorithm.
MultilayerPerceptron: the neural network with backpropagation to classify instances . It user sigmoid kernel for every node.
SVM: support vector machine.

We use the classify feature of Weka. Choose “classify” tab and the classify window will appear:

3.1 Setting test option:

Before you run the classification algorithm, you need to set test options. Set test options in the ‘Test options’ box. The test options that are available to you are:

Use training set: Evaluates the classifier on how well it predicts the class of the instances it was trained on.
Supplied test set: Evaluates the classifier on how well it predicts the class of a set of instances loaded from a file. Clicking on the ‘Set…’ button brings up a dialog allowing you to choose the file to test on.
Cross-validation: Evaluates the classifier by cross-validation, using the number of folds that are entered in the ‘Folds’ text field.
Percentage split: Evaluates the classifier on how well it predicts a certain percentage of the data, which is held out for testing. The amount of data held out depends on the value entered in the ‘%’ field.

3.2 Choosing a classifier:

A classifier is the model which you will apply to your data.

First you need to choose the attribute that you need to classify. Here I choose ”trend” with test option “use training set”.

Click on ‘Choose’ button in the ‘Classifier’ box just below the tabs and select the model you need. Here is the list of models that I chose: J48, J48 consolidated, Random forest, SVM.

3.3 Results for J48:

Choose J48 in classifier box, click on the white box has J48, the “GenericObjectEditor” box will appear:

Change “confidenceFactor” to 1, click ok then click start, we have results:

Classifier model is a pruned decision tree in textual form that was produced on the full training data. The tree has size: 1045 with 523 leaves. It took about 12 seconds to build the model. Prediction accuracy is mediocre at just 58.5933%.

=== Run information ===Scheme: weka.classifiers.trees.J48 -C 1.0 -M 2Relation: bitCoindatatrain_withtrend-weka.filters.unsupervised.attribute.Remove-R1-2

Instances: 44608

Attributes: 5

price

time

trend

sma5

sma10

Test mode: evaluate on training data

=== Classifier model (full training set) ===

Number of Leaves : 492

Size of the tree : 983

Time taken to build model: 26.24 seconds

=== Evaluation on training set ===

Time taken to test model on training data: 0.52 seconds

=== Summary ===

Correctly Classified Instances 26106 58.5231 %

Incorrectly Classified Instances 18502 41.4769 %

Kappa statistic 0.3543

Mean absolute error 0.3549

Root mean squared error 0.4213

Relative absolute error 80.6928 %

Root relative squared error 89.8292 %

Coverage of cases (0.95 level) 99.8184 %

Mean rel. region size (0.95 level) 93.6088 %

Total Number of Instances 44608

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class

0.691 0.292 0.579 0.691 0.630 0.387 0.771 0.631 up

0.742 0.308 0.583 0.742 0.653 0.420 0.794 0.672 down

0.222 0.048 0.627 0.222 0.328 0.264 0.684 0.482 noChange

Weighted Avg. 0.585 0.233 0.593 0.585 0.558 0.366 0.756 0.607

=== Confusion Matrix ===

a b c <– classified as

11325 4147 929 | a = up

3575 12148 640 | b = down

4664 4547 2633 | c = noChange

3.4 Results for J48 consolidated:

Do the same above step with J48 model change to J48 consolidate, we will get below result:

=== Run information ===Scheme: weka.classifiers.trees.J48Consolidated -C 1.0 -M 2 -Q 1 -RM-C -RM-N 99.0 -RM-B -2 -RM-D 50.0Relation: bitCoindatatrain_withtrend-weka.filters.unsupervised.attribute.Remove-R1-2Instances: 44608

Attributes: 5

price

time

trend

sma5

sma10

Test mode: evaluate on training data

=== Classifier model (full training set) ===

J48Consolidated pruned tree

[RM] N_S=f(99% of coverage)=4 %Min=balanced Size=maxSize (without replacement)

True coverage achieved: 0.9956748784095607

Number of Leaves : 2721

Size of the tree : 5441

Time taken to build model: 183.75 seconds

=== Evaluation on training set ===

Time taken to test model on training data: 0.51 seconds

=== Summary ===

Correctly Classified Instances 28541 63.9818 %

Incorrectly Classified Instances 16067 36.0182 %

Kappa statistic 0.4495

Mean absolute error 0.307

Root mean squared error 0.3921

Relative absolute error 69.8055 %

Root relative squared error 83.6128 %

Coverage of cases (0.95 level) 99.4822 %

Mean rel. region size (0.95 level) 82.6145 %

Total Number of Instances 44608

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class

0.687 0.225 0.640 0.687 0.663 0.456 0.831 0.741 up

0.713 0.208 0.665 0.713 0.688 0.498 0.849 0.766 down

0.473 0.117 0.593 0.473 0.527 0.385 0.793 0.631 noChange

Weighted Avg. 0.640 0.190 0.637 0.640 0.636 0.452 0.827 0.721

=== Confusion Matrix ===

a b c <– classified as

11269 3013 2119 | a = up

2975 11665 1723 | b = down

3373 2864 5607 | c = noChange

We see it looks better than J48, with higher accuracy at 63.9818%, higher F-measure and smaller error.

3.5 Results for NBTree:

Accuracy is lower in this case

=== Run information ===Scheme: weka.classifiers.trees.NBTreeRelation: bitCoindata_withtrend-weka.filters.unsupervised.attribute.Remove-R1Instances: 44628Attributes: 5

price

time

trend

sma5

sma10

Test mode: evaluate on training data

=== Classifier model (full training set) ===

NBTree

——————

: NB0

Leaf number: 0 Naive Bayes Classifier

Number of Leaves : 1

Size of the tree : 1

Time taken to build model: 1.69 seconds

=== Evaluation on training set ===

Time taken to test model on training data: 0.41 seconds

=== Summary ===

Correctly Classified Instances 24480 54.8535 %

Incorrectly Classified Instances 20148 45.1465 %

Kappa statistic 0.3004

Mean absolute error 0.3583

Root mean squared error 0.4351

Relative absolute error 81.4648 %

Root relative squared error 92.7917 %

Coverage of cases (0.95 level) 98.4337 %

Mean rel. region size (0.95 level) 91.0609 %

Total Number of Instances 44628

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class

0.640 0.305 0.550 0.640 0.592 0.327 0.742 0.607 up

0.688 0.309 0.563 0.688 0.619 0.367 0.757 0.642 down

0.228 0.086 0.491 0.228 0.312 0.191 0.686 0.430 noChange

Weighted Avg. 0.549 0.248 0.539 0.549 0.528 0.306 0.733 0.573

=== Confusion Matrix ===

a b c <– classified as

10507 4251 1655 | a = up

3948 11268 1154 | b = down

4649 4491 2705 | c = noChange

3.6 Results for MultilayerPerceptron

Weka allow us to build a Neural network by hand through the GUI option eg set GUI to true as the below:

Click “ok”, then choose ”start” and you will see the shape of your “network”

Here are some option which will effect to network’s accuracy. There are:

Num of Epoch: An epoch is a measure of the number of times all of the training vectors are used once to update the weights.
Learning rate: how much an updating step influences the current value of the weights.
Momenturn: The technique help the network out local minima

Set Num of Epoch = 5000 , keep default to learning rate.

Here is the result for MLP:

=== Run information ===Scheme: weka.classifiers.functions.MultilayerPerceptron -L 0.3 -M 0.2 -N 500 -V 0 -S 0 -E 20 -H a -G -RRelation: bitCoindatatrain_withtrend-weka.filters.unsupervised.attribute.Remove-R1-2Instances: 44608

Attributes: 5

price

time

trend

sma5

sma10

Test mode: evaluate on training data

=== Classifier model (full training set) ===

=== Evaluation on training set ===

Time taken to test model on training data: 0.27 seconds

=== Summary ===

Correctly Classified Instances 23746 53.2326 %

Incorrectly Classified Instances 20862 46.7674 %

Kappa statistic 0.2612

Mean absolute error 0.3884

Root mean squared error 0.4413

Relative absolute error 88.3102 %

Root relative squared error 94.1102 %

Coverage of cases (0.95 level) 99.9843 %

Mean rel. region size (0.95 level) 99.9641 %

Total Number of Instances 44608

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class

0.852 0.517 0.489 0.852 0.622 0.337 0.738 0.570 up

0.593 0.219 0.611 0.593 0.602 0.376 0.759 0.632 down

0.006 0.003 0.421 0.006 0.011 0.021 0.564 0.328 noChange

Weighted Avg. 0.532 0.271 0.516 0.532 0.452 0.267 0.700 0.529

=== Confusion Matrix ===

a b c <– classified as

13972 2390 39 | a = up

6603 9707 53 | b = down

7978 3799 67 | c = noChange

We can reach the better accuracy with higher epoch but overall accuracy is still low at 53.2326%

Tags: Machine Learning, Trading

admin