Bitcoin Price Prediction Using Weka

 

1 Star2 Stars3 Stars4 Stars5 Stars (2 votes, average: 3.50 out of 5)
Loading...

 

Hoang Pham Truc Phuong, hptphuong@gmail.com, is the author of this article and he contributes to RobustTechHouse Blog for our Machine Learning columnRobustTechHouse is a web & mobile app development house focusing on Financial (Fintech) and ECommerce sectors and likes to dabble with data analysis and machine learning too.

 

In this post, I will experiment with real bitcoin price data using Weka and try to forecast the trend of bitcoin price.

 

1. Data Preprocessing

1.1 Data overview

The data is bitcoin price from “2015-03-12 22:16:48” to “”2015-04-12 22:23:48″” every minute. Some indicators used are: SMA-5 and SMA-10. A sample of the data:

Bitcoin Price Prediction Using Weka

 

We will use data from “2015-03-12 22:16:48” to “2015-04-12 22:03:48” for the training model and try to predict price movements in next 20 minute.

1.2 Convert data to ARFF

The original data is kept as csv format and Weka allows us to import data in csv format. However, the original data is provided in the following format:

amount,price,time,Trend,priceSMA5,priceSMA10

295.47,2.42125956,”2015-03-12 22:16:48″,down,2.064252512,1.246701952

295.44,0.2,”2015-03-12 22:17:48″,down,2.096251912,1.2658019520000001

295.33,0.09037846,”2015-03-12 22:18:48″,up,1.090327604,1.245839798

295.31,0.88,”2015-03-12 22:19:48″,up,0.8663276040000002,1.1542462480000002

295.31,3.207,”2015-03-12 22:20:48″,down,1.3597276040000001,1.473946248

295.45,1.0,”2015-03-12 22:21:48″,up,1.075475692,1.5698641020000002

295.43,2.387679,”2015-03-12 22:22:48″,down,1.513011492,1.8046317020000002

295.42,0.2,”2015-03-12 22:23:48″,up,1.5349358000000002,1.312631702

If we import it into Weka directly, we will receive the result which are listed below.

Bitcoin Price Prediction Using Weka

 

As you can see, the type of “time” is nominal which means data is seen as a primitive string. In Weka, the right type for “datetime” format is “timestamp”. Let helps Weka recognize right type for “time ” by massaging the data a little.

Bitcoin Price Prediction Using Weka

 

Finally we save the file with a new extension (arff). This step helps Weka recognize the right type of date time. We can now import the data again.

Bitcoin Price Prediction Using Weka

 

2. Select Attributes

To start forecasting, you need to do the “Feature selection” step to improve your accuracy, reduce overfitting and decrease your training time. For feature selection, Weka provides an “Attribute selection” tool. The process is separated into two parts:

  • Attribute Evaluator: Method by which attribute subsets are assessed.
  • Search Method: Method by which the space of possible subsets is searched.

In this practice, I choose “CorrelationAttributeEval” for my evaluator and there is the result:

Bitcoin Price Prediction Using Weka

 

From the result, I realized that the “amount” field does not carry much info for price trend prediction, so I can remove it. In the preprocess tab, we can choose first attribute (amount) and choose remove:

Bitcoin Price Prediction Using Weka

 

3. Build model to predict trend of Bitcoin price

After selecting the right features that we need, we begin to build the model.  Here, I build some models to test. There is the list of some models I will try:

  • J48 model: Class for generating a pruned or unpruned C4.5 decision tree
  • J48 consolidated model: Uses the Consolidated Tree Construction (CTC) algorithm.
  • MultilayerPerceptron: the neural network with backpropagation to classify instances . It user sigmoid kernel for every node.
  • SVM: support vector machine.

We use the classify feature of Weka. Choose “classify” tab and the classify window will appear:

Bitcoin Price Prediction Using Weka

 

3.1 Setting test option:

Before you run the classification algorithm, you need to set test options. Set test options in the ‘Test options’ box. The test options that are available to you are:

  • Use training set: Evaluates the classifier on how well it predicts the class of the instances it was trained on.
  • Supplied test set: Evaluates the classifier on how well it predicts the class of a set of instances loaded from a file. Clicking on the ‘Set…’ button brings up a dialog allowing you to choose the file to test on.
  • Cross-validation: Evaluates the classifier by cross-validation, using the number of folds that are entered in the ‘Folds’ text field.
  • Percentage split: Evaluates the classifier on how well it predicts a certain percentage of the data, which is held out for testing. The amount of data held out depends on the value entered in the ‘%’ field.

 

3.2 Choosing a classifier:

A classifier is the model which you will apply to your data.

First you need to choose the attribute that you need to classify. Here I choose ”trend” with test option “use training set”.

Bitcoin Price Prediction Using Weka

 

Click on ‘Choose’ button in the ‘Classifier’ box just below the tabs and select the model you need. Here is the list of models that I chose: J48, J48 consolidated, Random forest, SVM.

 

 

3.3 Results for J48:

Choose J48 in classifier box, click on the white box has J48, the “GenericObjectEditor” box will appear:

Change “confidenceFactor” to 1, click ok then click start, we have results:

Bitcoin Price Prediction Using Weka

Bitcoin Price Prediction Using Weka
Classifier model is a pruned decision tree in textual form that was produced on the full training data. The tree has size: 1045 with 523 leaves. It took about 12 seconds to build the model. Prediction accuracy is mediocre at just 58.5933%.

 

=== Run information ===Scheme:       weka.classifiers.trees.J48 -C 1.0 -M 2Relation:     bitCoindatatrain_withtrend-weka.filters.unsupervised.attribute.Remove-R1-2

Instances:    44608

Attributes:   5

price

time

trend

sma5

sma10

Test mode:    evaluate on training data

=== Classifier model (full training set) ===

Number of Leaves  :             492

Size of the tree :    983

Time taken to build model: 26.24 seconds

=== Evaluation on training set ===

Time taken to test model on training data: 0.52 seconds

=== Summary ===

Correctly Classified Instances       26106               58.5231 %

Incorrectly Classified Instances     18502               41.4769 %

Kappa statistic                          0.3543

Mean absolute error                      0.3549

Root mean squared error                  0.4213

Relative absolute error                 80.6928 %

Root relative squared error             89.8292 %

Coverage of cases (0.95 level)          99.8184 %

Mean rel. region size (0.95 level)      93.6088 %

Total Number of Instances            44608

=== Detailed Accuracy By Class ===

TP Rate  FP Rate  Precision  Recall   F-Measure  MCC      ROC Area  PRC Area  Class

0.691    0.292    0.579      0.691    0.630      0.387    0.771     0.631     up

0.742    0.308    0.583      0.742    0.653      0.420    0.794     0.672     down

0.222    0.048    0.627      0.222    0.328      0.264    0.684     0.482     noChange

Weighted Avg.    0.585    0.233    0.593      0.585    0.558      0.366    0.756     0.607

=== Confusion Matrix ===

a     b     c   <– classified as

11325  4147   929 |     a = up

3575 12148   640 |     b = down

4664  4547  2633 |     c = noChange

 

 

 3.4 Results for J48 consolidated:

Do the same above step with J48 model change to J48 consolidate, we will get below result:

=== Run information ===Scheme:       weka.classifiers.trees.J48Consolidated -C 1.0 -M 2 -Q 1 -RM-C -RM-N 99.0 -RM-B -2 -RM-D 50.0Relation:     bitCoindatatrain_withtrend-weka.filters.unsupervised.attribute.Remove-R1-2Instances:    44608

Attributes:   5

price

time

trend

sma5

sma10

Test mode:    evaluate on training data

=== Classifier model (full training set) ===

J48Consolidated pruned tree

[RM] N_S=f(99% of coverage)=4 %Min=balanced Size=maxSize (without replacement)

True coverage achieved: 0.9956748784095607

Number of Leaves  :             2721

Size of the tree :    5441

Time taken to build model: 183.75 seconds

=== Evaluation on training set ===

Time taken to test model on training data: 0.51 seconds

=== Summary ===

Correctly Classified Instances       28541               63.9818 %

Incorrectly Classified Instances     16067               36.0182 %

Kappa statistic                          0.4495

Mean absolute error                      0.307

Root mean squared error                  0.3921

Relative absolute error                 69.8055 %

Root relative squared error             83.6128 %

Coverage of cases (0.95 level)          99.4822 %

Mean rel. region size (0.95 level)      82.6145 %

Total Number of Instances            44608

=== Detailed Accuracy By Class ===

TP Rate  FP Rate  Precision  Recall   F-Measure  MCC      ROC Area  PRC Area  Class

0.687    0.225    0.640      0.687    0.663      0.456    0.831     0.741     up

0.713    0.208    0.665      0.713    0.688      0.498    0.849     0.766     down

0.473    0.117    0.593      0.473    0.527      0.385    0.793     0.631     noChange

Weighted Avg.    0.640    0.190    0.637      0.640    0.636      0.452    0.827     0.721

=== Confusion Matrix ===

a     b     c   <– classified as

11269  3013  2119 |     a = up

2975 11665  1723 |     b = down

3373  2864  5607 |     c = noChange

 

 

We see it looks better than J48, with higher accuracy at 63.9818%, higher F-measure and smaller error.

 

3.5  Results for NBTree:

 

Accuracy is lower in this case

=== Run information ===Scheme:       weka.classifiers.trees.NBTreeRelation:     bitCoindata_withtrend-weka.filters.unsupervised.attribute.Remove-R1Instances:    44628Attributes:   5

price

time

trend

sma5

sma10

Test mode:    evaluate on training data

=== Classifier model (full training set) ===

NBTree

——————

: NB0

Leaf number: 0 Naive Bayes Classifier

Number of Leaves  :             1

Size of the tree :    1

Time taken to build model: 1.69 seconds

=== Evaluation on training set ===

Time taken to test model on training data: 0.41 seconds

=== Summary ===

Correctly Classified Instances       24480               54.8535 %

Incorrectly Classified Instances     20148               45.1465 %

Kappa statistic                          0.3004

Mean absolute error                      0.3583

Root mean squared error                  0.4351

Relative absolute error                 81.4648 %

Root relative squared error             92.7917 %

Coverage of cases (0.95 level)          98.4337 %

Mean rel. region size (0.95 level)      91.0609 %

Total Number of Instances            44628

=== Detailed Accuracy By Class ===

TP Rate  FP Rate  Precision  Recall   F-Measure  MCC      ROC Area  PRC Area  Class

0.640    0.305    0.550      0.640    0.592      0.327    0.742     0.607     up

0.688    0.309    0.563      0.688    0.619      0.367    0.757     0.642     down

0.228    0.086    0.491      0.228    0.312      0.191    0.686     0.430     noChange

Weighted Avg.    0.549    0.248    0.539      0.549    0.528      0.306    0.733     0.573

=== Confusion Matrix ===

a     b     c   <– classified as

10507  4251  1655 |     a = up

3948 11268  1154 |     b = down

4649  4491  2705 |     c = noChange

 

 3.6 Results for MultilayerPerceptron

Weka allow us to build a Neural network by hand through the GUI option eg set GUI to true as the below:

Bitcoin Price Prediction Using Weka

 

Click “ok”, then choose ”start” and you will see the shape of your “network”

Bitcoin Price Prediction Using Weka

 

Here are some option which will effect to network’s accuracy. There are:

  • Num of Epoch: An epoch is a measure of the number of times all of the training vectors are used once to update the weights.
  • Learning rate: how much an updating step influences the current value of the weights.
  • Momenturn: The technique help the network out local minima

 

 

 

Set Num of  Epoch = 5000 , keep default to learning rate.

Here is the result for MLP:

 

 

=== Run information ===Scheme:       weka.classifiers.functions.MultilayerPerceptron -L 0.3 -M 0.2 -N 500 -V 0 -S 0 -E 20 -H a -G -RRelation:     bitCoindatatrain_withtrend-weka.filters.unsupervised.attribute.Remove-R1-2Instances:    44608

Attributes:   5

price

time

trend

sma5

sma10

Test mode:    evaluate on training data

=== Classifier model (full training set) ===

=== Evaluation on training set ===

Time taken to test model on training data: 0.27 seconds

=== Summary ===

Correctly Classified Instances       23746               53.2326 %

Incorrectly Classified Instances     20862               46.7674 %

Kappa statistic                          0.2612

Mean absolute error                      0.3884

Root mean squared error                  0.4413

Relative absolute error                 88.3102 %

Root relative squared error             94.1102 %

Coverage of cases (0.95 level)          99.9843 %

Mean rel. region size (0.95 level)      99.9641 %

Total Number of Instances            44608

=== Detailed Accuracy By Class ===

TP Rate  FP Rate  Precision  Recall   F-Measure  MCC      ROC Area  PRC Area  Class

0.852    0.517    0.489      0.852    0.622      0.337    0.738     0.570     up

0.593    0.219    0.611      0.593    0.602      0.376    0.759     0.632     down

0.006    0.003    0.421      0.006    0.011      0.021    0.564     0.328     noChange

Weighted Avg.    0.532    0.271    0.516      0.532    0.452      0.267    0.700     0.529

=== Confusion Matrix ===

a     b     c   <– classified as

13972  2390    39 |     a = up

6603  9707    53 |     b = down

7978  3799    67 |     c = noChange

We can reach the better accuracy with higher epoch but overall accuracy is still low at 53.2326%

 

 

 

Recommended Posts
Contact Us

We look forward to your messages. Please drop us a note for any enquiries and we'll get back to you, asap.

Not readable? Change text. captcha txt
Android 3Ecommerce9