We’ve covered a lot of ground in this course. Congratulations for getting this far, and double congratulations if you’ve managed to do all the Quizzes! I encourage you to keep …
We begin by looking at a real-world challenge: the IDRC (International Diffuse Reflectance Conference) Shootout challenge. The training data – called “calibration data” – and test data is linked to …
So far, we’ve been using Python from within Weka. However, in this lesson we work the other way round and invoke Weka from within Python. This allows you to take …
Peter shows how to create visualizations from Weka’s Jython console using the open source library JfreeChart. First he plots the errors made by LinearRegression on a dataset, indicating the size …
Peter demonstrates writing three Python scripts for Weka using the J48 classifier, using the anneal dataset. The first builds a classifier and outputs the model, the second evaluates a classifier …
Peter Reutemann introduces scripting, and then demonstrates a Weka package that opens an editor in which you can write and execute Python scripts. Finally he writes a script for loading …
Mike Mayo shows that with appropriate features, Weka can be used to classify images. The imageFilters package processes image files to extract features, and implements 10 different feature sets. You …
There are other useful KnowledgeFlow templates for Distributed Weka. One computes a correlation matrix for input to Principal Component Analysis; another runs a parallel version of the k-means clustering algorithm. …
Map tasks produce models and a Reduce task aggregates them. Reduce strategies differ for Naive Bayes and other model types. We saw in the last lesson that Naive Bayes and …
There are many options when configuring a Distributed Weka job. The ArffHeaderSparkJob’s configuration panel has two tabs, Spark configuration, whose options relate to how the cluster is configured, including how …
Having installed Distributed Weka, you can interact with it in the KnowledgeFlow environment. New components such as ArffHeaderSparkJob, WekaClassifierSparkJob, and WekaClassifierEvaluationSparkJob become available. In addition, example knowledge flows are provided …
Mark Hall from Pentaho introduces a plugin that runs Weka on a cluster of machines. It uses the “map-reduce” framework, and operates with both Spark and Hadoop. It comprises two …
Tools implemented in R can preprocess data before passing it on to Weka learning algorithms. The Knowledge Flow’s RScriptExecutor component executes a user-supplied R script. Data can be loaded using …
Pamela Douglas from UCLA introduces the problem of classifying functional MRI data. An FMRI scan records signals over time from 100,000 voxels covering the brain region, which creates a huge …
Weka’s MLR classifier includes many of the learning algorithms that are available in the R environment. Choosing the MLRClassifier in the Explorer’s Classify panel gives access to 75 classification methods …