Skip Navigation

Estimating statistical power for within-cluster randomized studies

Part I: Binary Outcomes

The motivation for developing this statistical program was to be able to solve a frequent problem in the design of trials where there is a binary outcome (for example, whether children improve their health status or not), and where the data are clustered (that is, not independent of each other, as when different groups of the children receive their medical care from the same site or practitioner).  In designing this sort of study, it is very difficult to develop estimates of the required sample size.  In contrast to continuous outcomes (for example, did the children’s function score improve) that may have a normal distribution and where the correlation structure can be easily specified and involved in calculations, no obvious mathematical formulas could be derived to calculate design power, with both the randomization and the correlation structure being considered.

Ms. Yi Lu, a doctoral student at Johns Hopkins, working with Professor Mei-Cheng Wang of the Department of Biostatistics, have applied the method proposed by Lunn and Davies (1998) to simulate the correlated binary data for a within-center randomization design.  They developed a program, which runs in the statistical software “R,” that estimates power given a sample size, intracluster correlation, and outcome rates in the intervention and control groups.  For practical use of the program, investigators can identify the sample sizes and powers for their specific study by experimenting on different combinations of given values on sample size, intracluster correlation, and outcome rates.

Downloadable versions of the program and instructions are available.  As an orientation to using the program, it is important to remember that because the results are based on simulations, two runs using identical parameters may yield slightly different power estimates.  The most stable estimates can be obtained by initially instructing the program to use a relatively small number of replications (example=1000, which is the default value as the program is supplied) to see how fast it runs.  The number of replications can then be increased to get a more stable estimates once the likely combination of sample size, correlation, and outcome probabilities has been determined.

“R” is available at no charge from It can be used on many different computer platforms.  Ms. Lu’s program can be used with minimal knowledge of R itself.  Once R is installed, the only other specialized knowledge required is to be able to tell R in which directory it can find the estimation program. 

  • Lunn, A.D. and Davies, S.J. 1998. A note on generating correlated binary variables, Biometrika, 85, 487-490.

Part II: Continuous Outcomes

Downloadable versions of the program andinstructions are available for addressing the same issues described above, but for continuous outcomes.