Contents
Bitcoin Price Prediction Using Weka
Hoang Pham Truc Phuong, hptphuong@gmail.com, is the author of this article and he contributes to RobustTechHouse Blog for our Machine Learning column. RobustTechHouse is a web & mobile app development house focusing on Financial (Fintech) and ECommerce sectors and likes to dabble with data analysis and machine learning too.
In this post, I will experiment with real bitcoin price data using Weka and try to forecast the trend of bitcoin price.
1. Data Preprocessing
1.1 Data overview
The data is bitcoin price from “2015-03-12 22:16:48” to “”2015-04-12 22:23:48″” every minute. Some indicators used are: SMA-5 and SMA-10. A sample of the data:
We will use data from “2015-03-12 22:16:48” to “2015-04-12 22:03:48” for the training model and try to predict price movements in next 20 minute.
1.2 Convert data to ARFF
The original data is kept as csv format and Weka allows us to import data in csv format. However, the original data is provided in the following format:
amount,price,time,Trend,priceSMA5,priceSMA10
295.47,2.42125956,”2015-03-12 22:16:48″,down,2.064252512,1.246701952 295.44,0.2,”2015-03-12 22:17:48″,down,2.096251912,1.2658019520000001 295.33,0.09037846,”2015-03-12 22:18:48″,up,1.090327604,1.245839798 295.31,0.88,”2015-03-12 22:19:48″,up,0.8663276040000002,1.1542462480000002 295.31,3.207,”2015-03-12 22:20:48″,down,1.3597276040000001,1.473946248 295.45,1.0,”2015-03-12 22:21:48″,up,1.075475692,1.5698641020000002 295.43,2.387679,”2015-03-12 22:22:48″,down,1.513011492,1.8046317020000002 295.42,0.2,”2015-03-12 22:23:48″,up,1.5349358000000002,1.312631702 |
If we import it into Weka directly, we will receive the result which are listed below.
As you can see, the type of “time” is nominal which means data is seen as a primitive string. In Weka, the right type for “datetime” format is “timestamp”. Let helps Weka recognize right type for “time ” by massaging the data a little.
Finally we save the file with a new extension (arff). This step helps Weka recognize the right type of date time. We can now import the data again.
2. Select Attributes
To start forecasting, you need to do the “Feature selection” step to improve your accuracy, reduce overfitting and decrease your training time. For feature selection, Weka provides an “Attribute selection” tool. The process is separated into two parts:
- Attribute Evaluator: Method by which attribute subsets are assessed.
- Search Method: Method by which the space of possible subsets is searched.
In this practice, I choose “CorrelationAttributeEval” for my evaluator and there is the result:
From the result, I realized that the “amount” field does not carry much info for price trend prediction, so I can remove it. In the preprocess tab, we can choose first attribute (amount) and choose remove:
3. Build model to predict trend of Bitcoin price
After selecting the right features that we need, we begin to build the model. Here, I build some models to test. There is the list of some models I will try:
- J48 model: Class for generating a pruned or unpruned C4.5 decision tree
- J48 consolidated model: Uses the Consolidated Tree Construction (CTC) algorithm.
- MultilayerPerceptron: the neural network with backpropagation to classify instances . It user sigmoid kernel for every node.
- SVM: support vector machine.
We use the classify feature of Weka. Choose “classify” tab and the classify window will appear:
3.1 Setting test option:
Before you run the classification algorithm, you need to set test options. Set test options in the ‘Test options’ box. The test options that are available to you are:
- Use training set: Evaluates the classifier on how well it predicts the class of the instances it was trained on.
- Supplied test set: Evaluates the classifier on how well it predicts the class of a set of instances loaded from a file. Clicking on the ‘Set…’ button brings up a dialog allowing you to choose the file to test on.
- Cross-validation: Evaluates the classifier by cross-validation, using the number of folds that are entered in the ‘Folds’ text field.
- Percentage split: Evaluates the classifier on how well it predicts a certain percentage of the data, which is held out for testing. The amount of data held out depends on the value entered in the ‘%’ field.
3.2 Choosing a classifier:
A classifier is the model which you will apply to your data.
First you need to choose the attribute that you need to classify. Here I choose ”trend” with test option “use training set”.
Click on ‘Choose’ button in the ‘Classifier’ box just below the tabs and select the model you need. Here is the list of models that I chose: J48, J48 consolidated, Random forest, SVM.
3.3 Results for J48:
Choose J48 in classifier box, click on the white box has J48, the “GenericObjectEditor” box will appear:
Change “confidenceFactor” to 1, click ok then click start, we have results:
Classifier model is a pruned decision tree in textual form that was produced on the full training data. The tree has size: 1045 with 523 leaves. It took about 12 seconds to build the model. Prediction accuracy is mediocre at just 58.5933%.
=== Run information ===Scheme: weka.classifiers.trees.J48 -C 1.0 -M 2Relation: bitCoindatatrain_withtrend-weka.filters.unsupervised.attribute.Remove-R1-2
Instances: 44608 Attributes: 5 price time trend sma5 sma10 Test mode: evaluate on training data === Classifier model (full training set) === Number of Leaves : 492 Size of the tree : 983 Time taken to build model: 26.24 seconds === Evaluation on training set === Time taken to test model on training data: 0.52 seconds === Summary === Correctly Classified Instances 26106 58.5231 % Incorrectly Classified Instances 18502 41.4769 % Kappa statistic 0.3543 Mean absolute error 0.3549 Root mean squared error 0.4213 Relative absolute error 80.6928 % Root relative squared error 89.8292 % Coverage of cases (0.95 level) 99.8184 % Mean rel. region size (0.95 level) 93.6088 % Total Number of Instances 44608 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class 0.691 0.292 0.579 0.691 0.630 0.387 0.771 0.631 up 0.742 0.308 0.583 0.742 0.653 0.420 0.794 0.672 down 0.222 0.048 0.627 0.222 0.328 0.264 0.684 0.482 noChange Weighted Avg. 0.585 0.233 0.593 0.585 0.558 0.366 0.756 0.607 === Confusion Matrix === a b c <– classified as 11325 4147 929 | a = up 3575 12148 640 | b = down 4664 4547 2633 | c = noChange |
3.4 Results for J48 consolidated:
Do the same above step with J48 model change to J48 consolidate, we will get below result:
=== Run information ===Scheme: weka.classifiers.trees.J48Consolidated -C 1.0 -M 2 -Q 1 -RM-C -RM-N 99.0 -RM-B -2 -RM-D 50.0Relation: bitCoindatatrain_withtrend-weka.filters.unsupervised.attribute.Remove-R1-2Instances: 44608
Attributes: 5 price time trend sma5 sma10 Test mode: evaluate on training data === Classifier model (full training set) === J48Consolidated pruned tree [RM] N_S=f(99% of coverage)=4 %Min=balanced Size=maxSize (without replacement) True coverage achieved: 0.9956748784095607 Number of Leaves : 2721 Size of the tree : 5441 Time taken to build model: 183.75 seconds === Evaluation on training set === Time taken to test model on training data: 0.51 seconds === Summary === Correctly Classified Instances 28541 63.9818 % Incorrectly Classified Instances 16067 36.0182 % Kappa statistic 0.4495 Mean absolute error 0.307 Root mean squared error 0.3921 Relative absolute error 69.8055 % Root relative squared error 83.6128 % Coverage of cases (0.95 level) 99.4822 % Mean rel. region size (0.95 level) 82.6145 % Total Number of Instances 44608 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class 0.687 0.225 0.640 0.687 0.663 0.456 0.831 0.741 up 0.713 0.208 0.665 0.713 0.688 0.498 0.849 0.766 down 0.473 0.117 0.593 0.473 0.527 0.385 0.793 0.631 noChange Weighted Avg. 0.640 0.190 0.637 0.640 0.636 0.452 0.827 0.721 === Confusion Matrix === a b c <– classified as 11269 3013 2119 | a = up 2975 11665 1723 | b = down 3373 2864 5607 | c = noChange |
We see it looks better than J48, with higher accuracy at 63.9818%, higher F-measure and smaller error.
3.5 Results for NBTree:
Accuracy is lower in this case
=== Run information ===Scheme: weka.classifiers.trees.NBTreeRelation: bitCoindata_withtrend-weka.filters.unsupervised.attribute.Remove-R1Instances: 44628Attributes: 5
price time trend sma5 sma10 Test mode: evaluate on training data === Classifier model (full training set) === NBTree —————— : NB0 Leaf number: 0 Naive Bayes Classifier Number of Leaves : 1 Size of the tree : 1 Time taken to build model: 1.69 seconds === Evaluation on training set === Time taken to test model on training data: 0.41 seconds === Summary === Correctly Classified Instances 24480 54.8535 % Incorrectly Classified Instances 20148 45.1465 % Kappa statistic 0.3004 Mean absolute error 0.3583 Root mean squared error 0.4351 Relative absolute error 81.4648 % Root relative squared error 92.7917 % Coverage of cases (0.95 level) 98.4337 % Mean rel. region size (0.95 level) 91.0609 % Total Number of Instances 44628 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class 0.640 0.305 0.550 0.640 0.592 0.327 0.742 0.607 up 0.688 0.309 0.563 0.688 0.619 0.367 0.757 0.642 down 0.228 0.086 0.491 0.228 0.312 0.191 0.686 0.430 noChange Weighted Avg. 0.549 0.248 0.539 0.549 0.528 0.306 0.733 0.573 === Confusion Matrix === a b c <– classified as 10507 4251 1655 | a = up 3948 11268 1154 | b = down 4649 4491 2705 | c = noChange |
3.6 Results for MultilayerPerceptron
Weka allow us to build a Neural network by hand through the GUI option eg set GUI to true as the below:
Click “ok”, then choose ”start” and you will see the shape of your “network”
Here are some option which will effect to network’s accuracy. There are:
- Num of Epoch: An epoch is a measure of the number of times all of the training vectors are used once to update the weights.
- Learning rate: how much an updating step influences the current value of the weights.
- Momenturn: The technique help the network out local minima
Set Num of Epoch = 5000 , keep default to learning rate.
Here is the result for MLP:
=== Run information ===Scheme: weka.classifiers.functions.MultilayerPerceptron -L 0.3 -M 0.2 -N 500 -V 0 -S 0 -E 20 -H a -G -RRelation: bitCoindatatrain_withtrend-weka.filters.unsupervised.attribute.Remove-R1-2Instances: 44608
Attributes: 5 price time trend sma5 sma10 Test mode: evaluate on training data === Classifier model (full training set) === === Evaluation on training set === Time taken to test model on training data: 0.27 seconds === Summary === Correctly Classified Instances 23746 53.2326 % Incorrectly Classified Instances 20862 46.7674 % Kappa statistic 0.2612 Mean absolute error 0.3884 Root mean squared error 0.4413 Relative absolute error 88.3102 % Root relative squared error 94.1102 % Coverage of cases (0.95 level) 99.9843 % Mean rel. region size (0.95 level) 99.9641 % Total Number of Instances 44608 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class 0.852 0.517 0.489 0.852 0.622 0.337 0.738 0.570 up 0.593 0.219 0.611 0.593 0.602 0.376 0.759 0.632 down 0.006 0.003 0.421 0.006 0.011 0.021 0.564 0.328 noChange Weighted Avg. 0.532 0.271 0.516 0.532 0.452 0.267 0.700 0.529 === Confusion Matrix === a b c <– classified as 13972 2390 39 | a = up 6603 9707 53 | b = down 7978 3799 67 | c = noChange |
We can reach the better accuracy with higher epoch but overall accuracy is still low at 53.2326%
срочно онлайн займ на банковскую: https://citycredits.com.ua/dinero/
no 1 canadian pharcharmy online