public final class WikipediaXmlSplitter extends Object
The Bayes example package provides some helper classes for training the Naive Bayes classifier
on the Twenty Newsgroups data. See PrepareTwentyNewsgroups
for details on running the trainer and
formatting the Twenty Newsgroups data properly for the training.
The easiest way to prepare the data is to use the ant task in core/build.xml:
ant extract-20news-18828
This runs the arg line:
-p $\{working.dir\}/20news-18828/ -o $\{working.dir\}/20news-18828-collapse -a $\{analyzer\} -c UTF-8
To Run the Wikipedia examples (assumes you've built the Mahout Job jar):
ant enwiki-files
bin/hadoop jar $MAHOUT_HOME/target/mahout-examples-0.x
org.apache.mahout.classifier.bayes.WikipediaXmlSplitter
-d $MAHOUT_HOME/examples/temp/enwiki-latest-pages-articles.xml
-o $MAHOUT_HOME/examples/work/wikipedia/chunks/ -c 64
public static void main(String[] args) throws IOException
IOException
Copyright © 2008–2015 The Apache Software Foundation. All rights reserved.