PDF versions are (mostly) available for download by clicking on the name of the paper.
Submitted to NASA, together with Vidya Manian, Axel Santos Figueroa and Jose Melendez. ($600 000). Not Funded
[45] Goodness-of-fit Tests
90 minute seminar at the 2nd Pan European advanced statistics school.
[44] The effect of gender, SES and school socioeconomic composition
with Alexandra Morales Reyes, José C Pérez Vargas and Jessimar Siberón.
Abstract: A study on the relationship between economic level and gender and the achievement in a standard English test of eleven grades in Puerto Rico.
[43] Destrezas de español de los estudiantes puertorriqueños
with Alexandra Morales Reyes, José C Pérez Vargas and Jessimar Siberón.
Abstract: A study on the relationship between economic level and gender and the achievement in a standard Spanish test of eleven grades in Puerto Rico.
Accepted for publication in Borealis – An International Journal of Hispanic Linguistics.
[42] HOSA Workshop on R
[41] Testing Goodness of Fit
An invited seminar in the CERN-PHYSTAT series.
Testing Goodness of Fit pdf of talk
Testing Goodness of Fit ZOOM link of video
[40] Statistical Analysis of Linguistic Task Experiments: Moving from Analysis of Variance to a more suitable analysis.
Abstract: Historically a common type of experiment in linguistics has been analyzed using ANOVA. We show that this is not correct and propose a better analysis method.
with Alexandra Morales Reyes, Department of Humanities, UPRM
submitted to Journal Behavioral Research Methods.
[39] Simultaneous Goodness-of-Fit Testing
Abstract: We present a method that runs a number of standard goodness-of-fit tests and rejects the null if any of them does. It yields a p value that is uniform under the null hypothesis no matter how many tests are run. This is achieved by adjusting the p value via simulation.
A draft version is available from the arXiv server, see link above. The paper has been submitted to Computational Statistics for publication.
The online calculator is available at https://drrolke.shinyapps.io/sgoftest/. The explanation on houw to use the online calculator is at http://academic.uprm.edu/wrolke/simgof.explained.pdf
[38] Intro to some interesting things you can do with R
A workshop on some interesting R packages.
Workshop on statistical methods, SIDIM 2020, Cayey, Puerto Rico.
[37] So, how many people did Maria kill?
Abstract: a discussion of the six publications discussing the death toll of Hurricane Maria in Puerto Rico in 2017.
Talk at SIDIM 2020, Cayey, Puerto Rico.
[36] A Chi-square Goodness-of-Fit Test for Continuous Distributions against a known Alternative
Abstract: We study a novel binning scheme for the chi-square goodness of fit test for continuous data. Simulation studies show good power in many standard cases.
with Cristian Gutierrez Gongora, partially based on the Masters thesis he wrote with me.
published in Computational Statistics.
DOI: 10.1007/s00180-020-00997-x
[35] Basic Statistics for Machine Learning
Abstract: Talk given at Machine Learning Hackathon, UPRM.
[34] Modeling Excess Deaths After a Natural Disaster with Application to Hurricane Maria
Abstract: In this study we consider two models for the estimation of excess counts: the profile likelihood method and a log linear model. The latter turns out to be more flexible, and qualitatively, increases precision. It also allows inference on duration of HM effect.
with Dr. Roberto Riviera, Statistics in Medicine.
DOI: 10.1002/sim.8314
[33] R! An Introduction
Abstract: A two day workshop introducing R to non statisticians.
[32] Estimating the death toll of Hurricane Maria
Abstract: An estimation of the number of people who died in Puerto Rico due to Hurricane Maria in 2017. with Dr. Roberto Rivera, published in Significance, February 2018.
DOI: 10.1111/j.1740-9713.2018.01102.x
A talk on the general goodness of fit testing problem. I discuss some of the history and controversies of the main methods as well as some newer developments.
Presented at TerascaleStatistics School 2017 at DESY, Hamburg Germany
A shorter version which I presented in our Mathematics Department Seminar is available here:
Testing Goodness of Fit - Seminar
[30] Introduction to R
An introduction to R. Presented at TerascaleStatistics School 2017 at DESY, Hamburg Germany
[29] GOFer
An introduction to the online goodness-of-fit testing app Gofer .
CMS Statistics meeting, CERN, Geneva, Switzerland
[28] R / Shiny workshop
together with the Puerto Rico Chapter of the American Statistical Association.
to get started also download workshop.zip
[27] Limit Setting Methods for the On/Off Problem
In the talk I discuss a number of different methods for calculating confidence intervals for the On-Off problem. The methods include all those in common use today. I derive explicit formulas for the limits and calculate the true coverage and the expected lengths of these methods.
CMS Statistical Meeting, November 2015, Large Hadron Collider at CERN, Geneva, Switzerland
[26] Generalized Linear Models Workshop
Abstract: a workshop with an introduction to simple and multiple regression as well as the generalized linear model.
Expo in Statistics 2015, C3Tec, Caguas, PR, together with the Puerto Rico Chapter of the American Statistical Association
[25] What’s wrong with Hypthesis Testing?
Abstract: Talk at Department of Mathematical Sciences UPRM seminar. Hypothesis testing has been a large part of Statistics for almost a century and is one of the most common methodologies in many fields of research. Yet in 2015 a major journal in Psychology announced that they will no longer publish any papers that include a hypothesis test. So, what’s wrong with hypothesis testing?
[24] A Comparison of Limit Setting Methods for the On-Off Problem
Abstract: We study the frequentist properties of confidence intervals for the On-Off problem. The methods include all those in common use today. We derive explicit formulas for the limits and calculate the true coverage and the expected lengths of these methods.
Published in Nuclear Instruments and Methods A
DOI: 10.1016/j.nima.2015.10.028
For an online limits calculator go to https://wolfgangrolke.shinyapps.io/OnOffLimitsCalculator
[23] Some Features of R You Might Not Yet Know
Abstract: Talk at SIDIMXXX about startup customization in R using dropbox, .First and .Rprofile. I also discuss Rcpp and Rshiny
[22] What Country is the All-Time Best in the Worldcup?
Abstract: I analyze data from all the Worldcup tournaments held to answer this question. The main statistical point of the paper is that there is no single obviously correct answer. In just about any statistical (or even scientific) research we have to make a number of essentially subjective choices. The best we can do is to make it clear what those choices were and why we made them the way we did.
[21] Identifying Students at Risk
Abstract: An analysis of UPR-Mayaguez student data to develop a method for identifying students at a high risk of not returning for the second year or of not graduating.
[20] The Power to See: A New Graphical Test of Normality
with Sivan Aldor-Noima, Lawrence D. Brown, Andreas Buja and Robert A. Stine
Abstract: Many statistical procedures assume the underlying data generating process involves Gaussian errors. Among the popular tests for normality, only the Kolmogorov-Smirnov test has a graphical representation. Alternative tests, such as the Shapiro-Wilk test, offer little insight as to how the observed data deviate from normality. In this paper we discuss a simple new graphical procedure which provides simultaneous confidence bands for a normal quantile-quantile plot. These bands define a test of normality and are narrower in the tails than those related to the Kolmogorov-Smirnov test. Correspondingly the new procedure has greater power to detect deviations from normality in the tails.
Published in The American Statistician (2013), Vol 67/4
DOI: 10.1080/00031305.2013.847865
a free copy of the paper is available here
The routines in R are available here.
R routines for probability plots of general distributions can be found here.
(these routines require a fully specified null distribution)
[19] Report on Madrid Conference
Abstract: Report presented to the CMS Collaboration at CERN, Geneva on the topics discussed at the Madrid Conference on Issues in Hypothesis Testing, June 2012
[18] Estimating a Signal In the Presence of an Unknown Background
with Angel Lopez
Abstract: We describe a new method for fitting distributions to data which only requires knowledge of the parametric form of either the signal or the background but not both. The unknown distribution is fit using a non-parametric kernel density estimator. The method returns parameter estimates as well as limits on those estimates. Simulation studies show that these estimates are unbiased and that the limits on the estimates are correct.
Published in Nuclear Instruments and Methods in Physics Research A, (2012) Volume 685, p. 16-21.
DOI: 10.1016/j.nima.2012.05.029
C Code for Semiparametric Fitting
Abstract: C++ code for the semiparametric ditting discussed in [18]
[17] Solution to Banff 2 Challenge Based on Likelihood Ratio Test
Abstract: We describe our solution to the Banff 2 challenge problems as well as the outcomes.
See also Tom Junk’s page at https://www-cdf.fnal.gov/~trj/bc2results.pdf
[16] A Test for Equality of Distributions in High Dimensions
with Angel Lopez
Abstract: We present a method which tests whether or not two datasets (one of which could be Monte Carlo generated) might come from the same distribution. Our method works in arbitrarily high dimensions.
[15] A Shared Spatial Cache Model for Mobile Environments
with Fernando J. Maymi (West Point Military Academy) and Manuel Rodriguez-Martinez (UPRM)
Abstract: In many scenarios, particularly in military and emergency response operations, mobile nodes that are in close proximity to each other exhibit a high degree of data affinity. For example, all soldiers in the same region, regardless of their specialty, will want to know all nearby threats, as well as all friendly assets. Since relaying queries to a distant server is costly in terms of bandwidth and battery power, it would be ideal to use local resources that are only a hop away. In this paper we propose a shared spatial cache that can be thought of as residing in a region rather than in any given node. Each node that participates in the cache holds an expendable part of the data, so that the loss of any node or small group of nodes can be tolerated with little or no degradation of service. We describe the analytical models that verify our claims and show the results of extensive simulations that validate our models under simulated but realistic conditions.
Published in the Proceedings of MobiDE’2010, Ninth International ACM Workshop on Data Engineering for Wireless and Mobile Access, June 6th, 2010, Indianapolis, Indiana, USA (in conjunction with SIGMOD/PODS 2010) eConf C030908 (2003) MOBT002
with J. Lundberg, J. Conrad, and A. Lopez
Abstract: A C++ class was written for the calculation of frequentist confidence intervals using the profile likelihood method. Seven combinations of Binomial, Gaussian, Poissonian and Binomial uncertainties are implemented. The package provides routines for the calculation of upper and lower limits, sensitivity and related properties. It also supports hypothesis tests which take uncertainties into account. It can be used in compiled C++ code, in Python or interactively via the ROOT analysis framework.
DOI: 10.1016/j.cpc.2009.11.001
[13] A Test for the Presence of a Signal, with Multiple Channels and Marked Poisson
with A. Lopez
Abstract: We describe a statistical hypothesis test for the presence of a signal based on the likelihood ratio statistic. We derive the test for several cases of interest and also show that for those cases the test works very well, even far out in the tails of the distribution. We also study extensions of the test to cases where there are multiple channels.
[12] Limits and Confidence Intervals in the Presence of Nuisance Parameters
with A. Lopez and J. Conrad
Abstract: We study the frequentist properties of confidence intervals computed by the method known to statisticians as the Profile Likelihood. It is seen that the coverage of these intervals is surprisingly good over a wide range of possible parameter values for important classes of problems, in particular whenever there are additional nuisance parameters with statistical or systematic errors.
Published in Nuclear Instruments and Methods in Physics Research A, 551/2-3, 2005, pp. 493-503
DOI: 10.1016/j.nima.2005.05.068
For the routines to carry out the calculations here.
with A. Lopez
Abstract: We describe a statistical hypothesis test for the presence of a signal. The test allows the researcher to fix the signal location and/or width a priori, or perform a search to find the signal region that maximizes the signal. The background rate and/or distribution can be known or might be estimated from the data. Cuts can be used to bring out the signal.
Published in Proceedings of PHYSTAT2003: Statistical Problems in Particle Physics, Astrophysics and Cosmology, SLAC, p41-44.
[10] Search for Rare and Forbidden 3-body Di-muon Decays of the Charmed Mesons D+ and D+s
A high energy physics paper using the analysis tools developed in [1], [2], [3] and [5]
DOI: 10.1016/j.physletb.2003.07.079
with Andreas Buja.
Abstract: We describe and illustrate a simple Monte Carlo technique for carrying out simultaneous inference with arbitrarily many statistics. Special cases of the technique have appeared in the literature, but there exists widespread unawareness of the simplicity and broad applicability of this solution to simultaneous inference. Simultaneous inference for multiple statistics gives the appearance of an ill-posed search problem because it is not clear how to choose among the too many possibilities of simultaneous coverage regions. The problem can, however, be simplifed by restricting the search to a one-parameter family of nested regions and select the region whose estimated coverage probability equals the desired value. Natural one-parameter families are readiliy available. The technique applies whenever inference is based on a single distribution. A nonexhaustive list of examples of such distributions are: 1) fixed distributions such as standard normals when diagnosing distributional assumptions, 2) conditional null distributions in exact tests with Neyman structure, in particular permutation tests, 3) bootstrap distributions for bootstrap con.dence regions, 4) Bayesian posterior distributions for high-dimensional posterior probability regions, or 5) predictive distributions for multiple prediction intervals.
Technical report, Wharton School of Business
[8] A Glossary of Selected Statistical Terms
with Harrison Prosper and Jim Linneman
Abstract: This glossary brings together some statistical concepts that physicists may happen upon in the course of their work. The aim is not absolute mathematical precision—few physicists would tolerate such a burden. Instead, (one hopes) there is just enough precision to be clear. We begin with an introduction and a list of notations. We hope this will make the glossary, which is in alphabetical order, somewhat easier to read.
Published in Proceedings Of The Conference On: Advanced Statistical Techniques in Particle Physics, Institute for Particle Physics Phenomenology, University of Durham, UK (2002), 314-330
[7] Bias-Corrected Confidence Intervals for Rare Searches**
with A. Lopez
Abstract: A short version of [3].
Published in Proceedings Of The Conference On: Advanced Statistical Techniques in Particle Physics, Institute for Particle Physics Phenomenology, University of Durham, UK (2002), 44-48
[6] Statistical Analysis of the SELEX Double Charm Signals
with A. Lopez
Abstract: A discussion of the statistical significance of some discoveries claimed by the SELEX collaboration.
[5] Correcting the Minimization Bias in Searches for Small Signals
with A. Lopez
Abstract: We discuss a method for correcting the bias in the limits for small signals if those limits were found based on cuts that were chosen by minimizing a criterion such as sensitivity. This type of bias is commonly present when a “minimization” and an “evaluation” are done at the same time. We propose to use a variant of the statistical bootstrap to adjust the limits. A Monte Carlo study shows that these new limits have correct coverage.
Published in Nuclear Instruments and Methods in Physics Research A, vol 503/3, 2003, pp 617 - 624
DOI: 10.1016/S0168-9002(03)00428-5
[4] Setting Limits for Poisson Rates in the Presence of Noise
Abstract: A short version of [3].
Published in Proceedings of SIDIM 2000
[3] Confidence Intervals and Upper Bounds for Small Signals in the Presence of Background Noise
with A. Lopez
Abstract: We discuss a new method for setting limits on small signals in the presence of background noise. The method is based on a combination of a two dimensional confidence region and the large sample approximation to the likelihood ratio test statistic. It automatically quotes upper limits for small signals and two-sided confidence intervals for larger samples. We show that this method gives the correct coverage and also has good power.
Published in Nuclear Instruments and Methods in Physics Research A, V.458, 2001, 745-758
DOI: 10.1016/S0168-9002(00)00935-9
[2] Stock Abundance and Potential Yield of the Queen Conch Resource in Belize
with R. Appeldorn
Report to the CARICOM Fisheries Research Assessment and Management Program, 1997
[1] Continuous-time Markov Processes in Geology
Journal of Mathematical Geology, Vol 23, # 3, April 1991
DOI: 10.1007/BF02065784
PhD Thesis 1992 University of Southern California, Los Angeles
Thesis Advisor: Dr. Josef Watkins