A simulation shows score test and Wald test in Cox regression inflate type I error rate for randomized clinical trials with unequal allocation.

Keywords: Unequal randomization allocation; Cox regression; Log rank test; Wald test; Score test; Type I error rate

Introduction

Randomization with fixed unequal allocation probability is commonly used in clinical trials. For example, 1:2 (control Vs experimental) ratio may be used if the experimental treatment has been indicated with some superiority compared to control treatment based on early non-randomized data. With a more than 50% chance of receiving new therapy may facilitate recruitment and gain more information about the experimental treatment. While this approach has some obvious advantages, we need to pay due attention to its potential issues. This has been discussed from many different aspects such as ethics, power, sensitivity etc. We will look at this issue from the perspective of type I error rate control while using Cox regression to analyze the data.

Log rank test is commonly used to compare treatments with respect to some time to event end points such as progression free survival or overall survival. When proportional hazard assumption holds, log rank test is equivalent to Score test in Cox regression [1,2]. Alternatively the comparison can be tested using Wald test. Both tests generate consistent results [3]. However their finite sample properties may be difficult to derive due to censoring. Cox regression is a maximum partial likelihood approach. There are many examples of biased finite sample MLEs such as the variance estimator of a normal sample [4]. The biasness of Cox regression estimator for finite sample has been noted by Langner et al. [5], where they studied the relationship between biasness and baseline risk. In this paper, we will explore the relationship between randomization ratio and type I error rate in Cox regression. We will use a simulation to demonstrate this.

Simulations

In the simulation, we consider an oncology trial with 2 arms: experimental arm and control arm. Both arms have median progression free survival (PFS) of 5 month. For simplicity, we assume all patients enter the study at the same time and then randomly assigned to one of the 2 treatments based on some fixed allocation probability. All patients are followed until disease progression or death without early dropouts. The primary objective is to determine if the experimental treatment prolongs the PFS compared to control treatment. The primary analysis is one sided log rank test with type I error rate of 0.025. The program is attached in the (Appendix).

The simulation is done based on the following algorithm.
a. Data are randomly generated from exponential distribution with known median PFS for both arms.
b. Comparison between 2 arms is done via Cox regression. Record log (HR), standard error, rejection or not per one sided log rank test and Wald test.
c. Repeat step 1 and 2 for 100,000 times, and estimate the log(HR), standard error, and type I error rate by averaging simulation results (Table 1).
i. Estimated from simulation.
ii. Calculated using p1=[r(1 r)dmax] where r=number of patients in control arm/total number of patients

#Patients (Events)

Random

Log (HR)

Log (HR)

Type I

Type I

(Control vs Exp Arm)

Ratio

(Standard Error)[1]

Standard Deviation [2]

Error Rate Log rank

Error Rate Wald

300:300

1:1

-8.4e-5(0.0820)

0.0816

0.02553

0.02541

240:360

2:3

-1.6e-5(0.0837)

0.0833

0.02553

0.02543

200:400

1:2

-0.0011(0.0870)

0.0866

0.02681

0.02664

150:450

1:3

-0.0023(0.0947)

0.0943

0.02832

0.02816

400:800

1:2

-0.0007(0.0614)

0.0612

0.02669

0.02664

800:1600

1:2

-0.0004(0.0433)

0.0433

0.02594

0.02591

2000:4000

1:2

-0.0001(0.0274)

0.0274

0.02516

0.02514

Table 1: The following table summarizes the simulation results.

In terms of type I error rate, Log rank test and Wald test give similar results. We can see that when the randomization ratio is 1:1 for an event size of 600, no inflation of types I error rate is observed. However when the randomization ratio is 1:2 for an event size of 600, the type I error rate is inflated to about 0.027. The inflation becomes smaller when the sample size increases. However the inflation is not observable only when the sample size is impractically big. When the randomization is further imbalanced to 1:3 for sample size of 600, the type I error is further inflated to about 0.028.