Statistical Pattern Recognition Group

Home People Projects Publications Sotware Links

 

 

Preprocesing and Visualization  techniques in data mining

We have developed dprep an R package for data preprocessing including normalization, discretization, handling of missing values, outlier detection, normalization, feature selection and visualization for large datasets. Algorithms for instance selection and building of a GUI will be included soon.

People: Edgar Acuna, Caroline Rodriguez, Luis Daza, and Sindy Diaz.

Back to Top

Parallel Data Mining

Implementation of parallel algorithms for several knowledge discovery/data mining tasks among them: outlier detection, visualization, computation of nonparametric classifiers, computation of metaclassifiers, and computation of clustering methods.

People:Roxana Aparicio, and Edgar Acuna.

Back to Top

Improving classification of gene expression data

We are extending several methodologies for improving the classification of gene expression data. These methodologies include: i) Generalizations of Partial Least Squares by using a nonparametric classifier instead of linear regression in the outer step aiming a dimensionality reduction in supervised classification. This method will improve PLS discriminant and PLS Logistic.
ii) Extensions of supervised principal components for regression to classification problems.
iii) Building classifiers obtained by application to logistic regression of some shrinkage estimators such as the lasso and the garrote to logistic regression.

People: Edgar Acuna and Luz Marina Muniz.

Collaborators: Ana Patricia Ortiz (Puerto Rico Cancer Center), Jose Vega (University of Puerto Rico School of Medicine, and Idhaliz Flores, Ponce medical School, Puerto Rico.

Back to Top

Applications of rough sets to knowledge discovery

We are investigating several applications of Rough sets theory to Knowledge Discovery tasks including: Discretization, Imputation, Feature Selection in supervised classification. Emphasis will be given to datasets coming from Bioinformatics.

People: Edgar Acuna and Frida Coaquira.

Back to Top

Cluster validation techniques

We are searching for efficient cluster validation methods.

People: Edgar Acuna and Roxana Aparicio.

Back to Top

>

Bayesian networks classifiers.

We are working on extensions of bayesian networks for datasets containing mixed type of features. Also we are looking for applications of bayesian networks on multirelational data mining.

People: Edgar Acuna.

Back to Top

Multi-relational data mining

Data mining algorithms look for patterns in data. While most existing data mining approaches look for patterns in a single data table, multi-relational data mining approaches look for patterns that involve multiple tables from a relational database. We are looking for extensions of data mining tasks to the multi-relational case.

People: Trilce Encarnacion, Karen Aparicio, and Edgar Acuna.

Back to Top