Intro Primer For WEKA Machine Learning Software

1 Star2 Stars3 Stars4 Stars5 Stars (4 votes, average: 5.00 out of 5)
Loading...

 

Weka is a machine learning software and data mining workbench. It’s an acronym for the Waikato Environment for Knowledge Analysis. It contains a collection of visualization tools and algorithms for data analysis and predictive modeling. It is a very convenient tool with wonderful graphical user interfaces for you to experiment with machine learning and data mining models on your data.

Hoang Pham Truc Phuong, hptphuong@gmail.com, is the author of this article and he contributes to RobustTechHouse Blog for our Machine Learning columnRobustTechHouse is a web & mobile app development house focusing on Financial (Fintech) and ECommerce sectors and likes to dabble with data analysis and machine learning too.

[Updated on 25 May 2015] Also see our follow up post on Intro Primer To WEKA Explorer For Machine Learning

Why Weka?

Weka supports several standard data mining tasks with many standard data mining algorithms ranging from normal ones to really complex ones. All of Weka’s techniques are predicated on the assumption that the data is available as a single flat fi le or relation, where each data point is described by a fixed number of attributes. Here are some main features of Weka:

 

Data Preprocessing

Weka supports various file formats e.g, CSV, Matlab etc and its own file format (ARFF). It also supports most common database management systems (DBMS) including HSQL, SQL SERVER, MySQL, PostgreSQL etc through java connections. For data processing, Weka has over 75 methods for filtering, ranging from basic to advanced operators eg principal component analysis.

 

Classi fication

Weka has a lot of classi fication methods. Classi fiers can be divided into “Bayesian” methods (Naive Bayes, Bayesian nets etc.), lazy methods (nearest neighbor and variants), rule-based methods (decision tables, OneR, RIPPER), tree learners (C4.5, Naive Bayes trees, M5, J.48 etc), function-based learners (linear regression, SVMs, Multilayer Perceptron, Gaussian processes) and miscellaneous methods.

 

Clustering

Weka has most classic algorithms for clustering such as: Simple KMeans, Hierarchical class clustering, simple expectation maximization (EM).

 

Attribute Selection

The set of attributes used is essential for classi fication performance. Various selection criteria and search methods are available.

 

Data Visualization

Data can be inspected visually by plotting attribute values against the class, or against other attribute values. Classi fier output can be compared to training data in order to detect outliers and observe classi fier characteristics and decision boundaries. For specifi c methods, there are specialized tools for visualization, such as a tree viewer for any method that produces classifi cation trees, a Bayes network viewer with automatic layout, and a dendrogram viewer for hierarchical clustering

 

Time Series Forecasting

This is a new function in Weka from version 3.7.x (version for Developers). Weka supports many methods for predicting time series as function-based learning (Gaussian processing, linear regression, Multilayer perceptron neural network, SMOreg-support vector machine for regression), lazy method (K-nearest neighbours, Locally weighted learning and KStar) and trees (Random forest, random tree)

 

From my experience, here are some reasons which make Weka a good toolbox for Machine Learning:
1. Easy to use graphical user interfaces.
2. Contains most of the powerful algorithms published for machine learning.
3. Free availability under the GNU General Public License.
4. Portability, since it is fully implemented in the Java programming language and runs on almost any modern computing platform.
5. A comprehensive collection of data pre-processing and modelling techniques.

 

Intro to the Weka GUI

1. Download and Install

Download from Weka Download Link. There are two versions of Weka: Stable version (3.6.12) and developer version (3.7.12). I personally prefer the developer version because it allows me to install more packages, e.g, time series forecasting.

After downloading, unzip the zip fi le and run this command:
> java -Xmx1000M -jar weka.jar

 

Intro Primer For WEKA Machine Learning Software

Weka version 3.6 (Stable Version)

Intro Primer For WEKA Machine Learning Software

Weka Version 3.7 (Developer Version)

 

The above shows the subtle differences between the standard and developer versions.

To connect to a DBMS, you should to do the following steps:

1. Download java connection compatible with your DBMS,e.g, mysql-connector-java, sql-connector-java
2. Use this syntax to run weka with DBMS:

> java -Xmx1000M -cp weka path:java_connection_path weka.gui.GUIChooser.

Here is the example I used to connect to mysql:
> java -Xmx1000M -cp /home/phuong/weka-3-7-12/weka.jar:/home/phuong/java_conn/mysql-connector-java-5.1.34-bin.jar weka.gui.GUIChooser

 

2. Weka Explorer

In, Weka explorer, you can visualize, clean your data and try some algorithms for clustering, classification and forecasting. Some features are different between the stable version & developer version of Weka. Here, I am using “Weka Explorer” in the developer version.

Intro Primer For WEKA Machine Learning Software

Weka Explorer

 

 

The explorer interface is divided into 11 di fferent tabs in two tab lines (top line contain 5 features and the other have 6 features) . The top line is only have in the developer version.

  • RConsole: It is an extension which combines Weka with R language and reuses some a lot of the awesome functions from R.
Intro Primer For WEKA Machine Learning Software

RConsole

 

  • Parallel Coordinates Plot: a common way of visualizing high-dimensional geometry and analyzing multivariate data.
Intro Primer For WEKA Machine Learning Software

Parallel Coordinates Plot

 

  • Projection Plot: To apply algorithms such as clustering algorithms and visualize the results on the graph directly.
  • Visualize 3D: Plot your data in 3D space!
Intro Primer For WEKA Machine Learning Software

Visualize 3D

 

  • Forecasting: This function is used for time series forecasting. You will find some famous algorithms such as SVM, regression in here.

 

Intro Primer For WEKA Machine Learning Software

Forecast: Graph

Intro Primer For WEKA Machine Learning Software

Forecast: Output & Evaluation

 

  • Preprocess: Load a dataset and manipulate the data into a form that you want to work with.
Intro Primer For WEKA Machine Learning Software

Preprocess

 

  • Classify: Select and run classi cation and regression algorithms to operate on your data.
  • Cluster: Select and run clustering algorithms on your dataset.
  • Associate: Run association algorithms to extract insights from your data.
  • Select Attributes: Run attribute selection algorithms on your data to select those attributes that are relevant to the feature you want to predict.
  • Visualize: Visualize the relationship between attributes.

 

3. Weka Experimenter

Unlike Weka Explorer that is used for analysis and experimenting with algorithms, “Weka Experimenter” is for designing experiments with your selection of algorithms and datasets, running experiments and analyzing the results. For example, the user can create an experiment that runs several schemes against a series of datasets and then analyse the results to determine if one of the schemes is statistically better than the other schemes.

Intro Primer For WEKA Machine Learning Software

Weka Experimenter

 

 

4. Knowledge Flow

Knowledge Flow helps you create a process to apply machine learning. It helps you graphically design your process and run the design that you created. The analysis process goes like this: loading and transforming of input data, followed by running of algorithms and then presentation of results.

Intro Primer For WEKA Machine Learning Software

Weka Knowledge Flow Environment

 

 

References

You can review some links below for more information about Weka.

 

Conclusion

Here we provided an Intro Primer For WEKA Machine Learning Software. Hope you found it useful.

 

If you like our articles, please follow and like our Facebook page where we regularly share interesting posts  and check out our other blog articles.

RobustTechHouse is a leading tech company focusing on mobile app development, ECommerce, Mobile-Commerce and Financial Technology (FinTech) in Singapore. If you are interested to engage RobustTechHouse on your projects, you can contact us here.

Recommended Posts
Contact Us

We look forward to your messages. Please drop us a note for any enquiries and we'll get back to you, asap.

Not readable? Change text. captcha txt
Good StoreJML