Genetic factors associated with reasons for clinical trial stoppage
Many drug discovery projects are started but few progress fully through clinical trials to approval. Previous work has shown that human genetics support for the therapeutic hypothesis increases the chance of trial progression. Here, we applied
natural language processing to classify the free-text reasons for 28,561 clinical trials that stopped before their endpoints were met. We then evaluated these classes in light of the underlying evidence for the therapeutic hypothesis and target properties. We found that trials are more likely to stop because of a lack of efficacy in the absence of strong genetic evidence from human populations or genetically modified animal models. Furthermore, certain trials are more likely to stop for safety reasons if the drug target gene is highly constrained in human populations and if the gene is broadly expressed across tissues. These results support the growing use of human genetics to evaluate targets for drug discovery programs.
The drug discovery endeavor is dominated by high attrition rates, and failure remains the most likely outcome throughout the pipeline. A diverse set of factors can lead to failure, with lack of efficacy or unforeseen safety issues reportedly explaining 79% of setbacks in the clinic. New approaches adopted across the industry have aimed to improve success rates by systematically assessing the available evidence throughout the research and
clinical pipelines. Support from human genetic evidence has been repeatedly associated with successful clinical trial progression ultimately supporting two-thirds of the drugs approved by the US Food and Drug Administration (FDA) in 2021 (ref. 9). Further understanding of the reasons for success or failure in clinical trials could assist in reducing future attrition.
Systematically assessing the reasons for success or failure in clinical trials can be hampered by many factors. Several surveys have demonstrated a bias towards reporting positive results, with 78.3% of trials in the literature reporting
successful outcomes. Successful clinical trials are published significantly faster than trials reporting negative results. However, access to negative results is crucial, not only for revealing efficacy tendencies and safety liabilities but also for retrospective review and benchmarking of predictive methods, including machine learning.
Since 2007, the FDA has required the submission of clinical trial results to
ClinicalTrials.gov, a free-to-access global databank aimed at registering clinical research studies and their results. For trials halted before their scheduled endpoint, ClinicalTrials.gov provides a freeform stopping reason: termination, suspension or withdrawal. A team of researchers previously classified the reasons for 3,125 stopped trials and found that only 10.8% of trials stopped because of a clear negative outcome. By contrast, the majority (54.5%) fell into a set of reasons characterized as neutral in relation to the therapeutic hypothesis, such as patient recruitment or other business or administrative reasons.
Here, we extended that work by training a natural language processing (NLP) model to classify stopping reasons and used this model to classify 28,561 stopped trials. We integrated our classification with evidence associating the drug target and disease from the
Open Targets Platform, revealing that trials stopped for lack of efficacy or safety reasons were less supported by genetic evidence. Furthermore, oncology trials involving drugs for which the target gene is constrained in human populations were more likely to stop for safety reasons, whereas drugs with targets with tissue-selective expression were less likely to pose safety risks. These observations confirm and extend previous studies recognizing the value of genetic information and selective expression in target selection.
To catalog the reasons behind the withdrawal, termination or suspension of clinical studies, we classified every free-text reason submitted to ClinicalTrials.gov using an NLP classifier. To build a training set for our model, we revisited the manual classification reported in a previous publication of 3,124 stopped trials based on the available submissions to ClinicalTrials.gov in May 2010. The authors of that article classified every study with a maximum of three classes following an ontological structure (Supplementary Table). Each of the classes was also assigned a higher-level category representing the outcome implications for the clinical project. For example, 33.7% of the studies were classified as stopped owing to ‘insufficient enrollment’, a neutral outcome owing to its expected independence from the therapeutic hypothesis. When inspecting submitted reasons belonging to the same curated category, we observed a strong linguistic similarity, as revealed by clustering the cosine similarity of the sentence embeddings (Extended Data). Studies stopped because of reasons linked to lack of efficacy and studies stopped because of futility have a linguistic similarity of 0.98, with both classes manually classified as ‘negative’ outcomes. Based on this clustering, we redefined the classification by merging semantically similar classes represented by low numbers of annotated sentences. Moreover, we added 447 studies that were stopped as a result of the
COVID-19 pandemic (Supplementary), resulting in a total of 3,571 studies manually classified into at least one of 17 stop reasons and explained by six different higher-level outcome categories.
By leveraging the consistent language used by the submitters, we fine-tuned the BERT model for the task of clinical trial classification into stop reasons. Overall, the model showed strong predictive power in the cross-validated set (Fmicro = 0.91), performing strongly for the most frequent classes, such as ‘insufficient enrollment’ (F = 0.98) or ‘
COVID-19’ (F = 1.00), but demonstrating decreased performance on linguistically complex reasons, such as trials stopped because of another study (F = 0.71) (Supplementary).
To further evaluate the model, we manually curated an additional set of 1,675 stop reasons from randomly selected studies that were not included in the training set. Overall, the performance against the unseen data was lower but comparable to that of the cross-validated model (Fmicro ranging from 0.70 to 0.83 depending on the choice of the annotator) (Supplementary), demonstrating real-world performance and reduced risk of overfitting. Interestingly, the curators demonstrated a relatively low agreement for many classes in which the machine-learning model also showed relatively weak performance, such as studies stopped because of insufficient data or met endpoint.
Reasons reflect operational, clinical and
biological constraintsClassification of the 28,561 stopped trials submitted to ClinicalTrials.gov before 27 November 2021 was performed using our NLP model fine-tuned on all the manually curated sentences. In total, 99% of the trials were classified with at least one of the 15 potential reasons and mapped to one of six different higher-level outcomes. ‘Insufficient enrollment’ remained the most common reason to stop a trial (36.67%), with other reasons before the accrual of any study results also occurring in a large number of studies. A total of 977 trials (3.38%) were classified as stopped because of ‘safety or side effects’, and 2,197 studies (7.6%) were stopped because of ‘negative’ reasons, such as those questioning the efficacy or value (futility). The incidence of each stop reason reflects the purpose of each phase (
Extended Data). Studies stopped because of ‘negative’ outcomes more often impacted phase II (odds ratio (OR) = 1.9, P = 2.4 × 10−38) and phase III (OR = 2.6, P = 3.64 × 10−55), whereas studies stopped as a result of ‘safety or side effects’ declined in relative incidence after phase I (OR = 2.4, P = 9.63 × 10−23) (
Supplementary). Trials stopped because of the relocation of the study or key staff occurred more than twice as often during early phase I, highlighting the importance of good clinical practices during the foundational stages. Of the studies that provided a stop reason, 48% were indicated for oncology. This large proportion is likely to be the combined result of the specific weight of oncology indications in the aggregated portfolio—27% of drug approvals in 2022—with the reported large incidence of clinical failures in oncology (32%) compared to other indications.
Genetics, Gene mutations, Chromosomal abnormalities, Inherited traits, Genomic research, Personalized medicine, Targeted therapies, Genetic testing, Genome sequencing, Health risks, Disease prevention, Ethical considerations, Privacy concerns, Genetic variations, Physical traits, Behavioral genetics, Environmental factors, Medical genetics, Genetic counseling, Hereditary diseases