The dprep Package-Release 3.0

New Release in November 19,2015. Download dprep from the cran.

Dprep has been developed by Professor Edgar Acuna and his students at the CASTLE research Group of the University of Puerto Rico-Mayaguez.

This is a library of more than thirty R functions for normalization, handling of missing values, discretization, outlier detection, feature selection, and data visualization classification. Most of the methods handle datasets with numerical and categorical attributes.

There are 18 datasets included in the library.

Normalization methods: Score Normalization (znorm), Min-Max Normalization(mmnorm), Decimal scale (dscale), Sigmoidal Normalization (signorm), Softmax normalization (softnorm).

Handling of missing values methods: Cleaning of rows/columns with high proportion of missings (clean) Imputation by mean, median and mode (categorical features) (ce.mimp), K-nn Imputation (ec.knnimp, for categorical and numerical data).

Discretization Methods: Equal width bins (dis.ew), Equal Frequency bins (disc.ef), Holte's One R (disc.1r), chiMerge, Entropy Discretization with MDL stopping rule (disc.mentr).

Feature Selection Methods: ReliefF (relief) , LVF, Finco, Sequential Forward Selection(sfs), Sequential Floating Forward Selection(sffs).

Outlier Detection Methods: Mahaout, Robout, Bay's algorthm, LOF (maxlof).

Visualization: Imagmiss, Parallel plot, Surveyplot, Radviz, Starcoord, Star coordinates in 3D.

Cross-validation Estimation Error: Naive Bayes, LDA, Logistic Regression, k-nn, Rpart, Neural networks.

Multivariate Normality Tests: Mardia, Van Valen(vvalen).

A dprep manual

Caroline Rodriguez's thesis on data preprocessing