
The data in LIBSVM format is now hosted on an AWS S3 machine owned by Criteo. The script for transforming data to LIBFFM and LIBSVM formats is provided in the link down below. This data is used in a competition on click-through rate prediction jointly hosted by Criteo and Kaggle in 2014. Transform from multiclass into binary class. Then feature-wise normalization to mean zero and variance one. Instance-wise normalization to mean zero and variance one.
Divide by 10 to get deltaG_total value computed by the Dynalign algorithm. Also 16 instances with missing values are removed. Note that the original data has the column 1 containing sample ID. avazu-submit.zip (code to generate a submission file). We provide a training-validation split (e.g., "" and "avazu-app.val") by consider the last 4,218,938 training instances for validation. To obtain a test score, please use the code provided below to generate and submit a file to the competition site.īecause data are timely dependent, cross validation is not suitable for parameter selection. The organizers do not disclose the test labels, so the labels in the test sets are not meaningful. Thus we can split the data set according to them. Specifically, each instance has either "site_id=85f751fd" or "app_id=ecad2386," and these two feature values never co-occur. To reproduce this data, you can execute our code and see the results in the directory "base."įor better test scores, we divide the data to two disjoint groups "app" and "site," and conduct training and prediction tasks on the two groups independently. The data sets here are generated by applying our winning solution without some complicated components. The participants were asked to learn a model from the first 10 days of advertising log, and predict the click probability for the impressions on the 11th day. This data is used in a competition on click-through rate prediction jointly hosted by Avazu and Kaggle in 2014. In this data set,Ĭontinuous features are discretized into quantiles, andĮach quantile is represented by a binary feature.Īlso, a categorical feature with m categories is converted to m binary features.ĭetails on how each feature is converted can be found in the beginning of each file
The original Adult data set has 14 features, among which sixĪre continuous and eight are categorical. To read data via MATLAB, you can use "libsvmread" in LIBSVM package. Details can beįound in the description of each data set. To "training" (tr) and "validation" (val) sets. For most sets, we linearly scale each attribute to or. These data setsĪre from UCI, Statlog, StatLib and other collections. Raw materials (e.g., original texts) are also available.
Multi-label and string data sets stored in LIBSVM format. This page contains many classification, regression, LIBSVM Data: Classification (Binary Class) LIBSVM Data: Classification (Binary Class)