Visitor menu

Databases
Arff databases for classification with Weka

The databases available here for download use the ARFF format, used for Weka data mining software.

I used the databases for my master thesis project. Each database contains image descriptors (concerning the shape, color or texture), computed by a feature extraction algorithm programmed with Matlab.

The images were either produced by me and my colleges at the LIVIA, either downloaded from the internet (to see the source, follow the link on the Reference column of the following table).

The table shows information about the 8 arff files available for download. Use this link to download all 8 files at once.

FileNb. classesNb. ins./clas.Total ins.ReferenceDescriptionBest median of error obtained
cereals_500.arff65003000* See below6 types of cereals, segmentedSVM2 : 1.5% error
cereals_950.arff69505700* See belowsame than preceding but with more instances per classSVM2 1.5% error
leaves_60.arff360180Caltech3 types of leaves, no segmentationSVM2 : 26.67% error
cropped_leaves_60.arff360180* See below3 types of leaves - cropped images (rectangles)GPB, SVM or MP (equal) : 6.67% error
digits_33.arff1060330* See belowComputer characters, gray-scale, segmentedMP : 5.45% error
knots_27.arff627162Oulu Wood defects, segmentedMP : 20.37% error
pollen_196.arff71961372Bangor 7 types of pollen, gray-scale, segmentedSVM2 : 4.15% error
raisins_450.arff34501350* See below3 types of raisins, segmentedSVM : 0.22% error

* The images that were used to produce theses databases were made by students and researchers of the LIVIA. The image files are not available here because of their size. If you would like to use them, don't hesitate to contact me.

The databases of the table contain 60 features for RGB images, 48 for the gray-scale images and 43 for the RGB images without segmentation (leaves only). The last column named Best median of error obtained, shows the median of classification error, on 50 experiments with 2/3 training and 1/3 test random splitting of the data, for the best classifier from the following list (all from Weka):
For questions or comments, please contact me.

Created by: Yan last modification: Monday 18 of May, 2009[00:55:51 UTC] by Yan