How an Algorithm Is Helping Ovarian Cancer Research
Advancements in ovarian cancer research are helping data scientists to more accurately study cancer patients. A team of researchers at the MD Anderson Cancer Center in Houston recently created a five-step algorithm to help identify epithelial ovarian cancer cases.
Unfortunately, the algorithm is not intended to detect ovarian cancer in patients like a screening test, said Amy Roskin, M.D., a board-certified OB-GYN and chief medical officer at Seven Starling, an online care center for postpartum mental health headquartered in New York City.
"The algorithm in the study is for identifying patients who were found to have incident ovarian cancer based on attributes of submitted claims data," Roskin said.
She explained that the researchers are using the algorithm for epidemiological purposes to differentiate incidence (new cases) versus prevalence (new and preexisting cases).
What does that mean? Let's take a step back to the beginning and dig into why clinicians did the research in the first place.
The background behind the study
Larissa Meyer, M.D., an OB-GYN in Houston, was one of the study's researchers. She is also an associate professor of gynecologic oncology, caring for women with gynecologic malignancies in a compassionate and coordinated fashion by providing surgical care and chemotherapy.
Meyer explained that one of her areas of study is researching outcomes from real-world evidence.
"If you think about how we generate knowledge for cancer care, a lot of it comes from clinical trials," Meyer said. "But clinical trials limit the type of patients who can [qualify], usually only including the healthiest patients."
She said there's good evidence to suggest that many clinical trials are not representative of all women who have cancer.
"So many women who are older or from ethnic minorities may not necessarily have the same representation in clinical trials," Meyer explained.
"I try to generate data about 'real-world outcomes' or things happening in 'real-world practice' as opposed to the highly controlled situations in clinical trials," she said.
Researchers generate real-world evidence through administrative or claims-based databases.
"What I was trying to do is figure out a way to create an algorithm where I could have a sense of security regarding how accurate we were at actually identifying women with newly diagnosed ovarian cancer from these large administrative data sets," Meyer said.
Creating an algorithm
For this study, the research team used a combination of two data sets:
- The Surveillance, Epidemiology and End Results (SEER) Program that provides information on cancer statistics
- Medicare, the federal health insurance program for people who are 65 or older
The team then created an algorithm with five simple steps that could help identify women with newly diagnosed ovarian cancer from just a claims-based data set. The team looked to see how the algorithm would perform with several different groups of patients, including women without cancer, women with other cancers and women with newly diagnosed ovarian cancer, Meyer said.
"We were able to create an algorithm that could give other researchers reassurance that they were studying who they wanted to study, in this scenario, women with newly diagnosed ovarian cancer," Meyer said.
Researchers can use the algorithm on a claims-based data set that contains no cancer information specific to pathology, stage or grade, and still be able to more accurately identify women with newly diagnosed ovarian cancer.
When put to the test, the overall algorithm sensitivity was 89.9 percent, and the algorithm had a positive predictive value of 93.8 percent. The specificity and negative predictive values were all more than 99.9 percent.
"Meaning, if you didn't have ovarian cancer, the algorithm didn't mistakenly put you in the group of women with ovarian cancer," Meyer explained.
Most women with ovarian cancer have advanced disease. The algorithm performed better when women had stage III or stage IV cancer, Meyer said.
"We were able to select 93.5 percent of women who had advanced disease versus around 84 percent of women with early stage, such as stage I or II disease," she said.
In other words, if other researchers use this algorithm in claims-based data sets, they won't miss very many women with ovarian cancer.
The purpose of this research
The algorithm developed in this study is a tool built for other researchers, especially people who use administrative claims databases to conduct ovarian cancer research.
Meyer said some people doing this type of research are not clinicians. They're epidemiologists or people with doctorates who lack a nuanced clinical background. This kind of algorithm can help researchers by applying clinical knowledge to identify more accurate cohorts than just using a single diagnosis code.
When reading real-world evidence or trying to figure out how well a certain intervention works in the real world as opposed to a highly selective trial, researchers can feel a certain degree of confidence if they're using the algorithm to select the patient cohort to study, Meyer explained.
"As a researcher, you can have more confidence that you have an accurate cohort to study," she said. "This type of study is pretty removed from a patient or family member, but it guides good quality research."
What's next?
Meyer said the algorithm does have limitations. The study used "gold standard" data from patients older than 65. The median age at diagnosis of ovarian cancer is 63.
"Which means we don't actually know how well it performs in a younger patient population," she said.
Next steps for this type of research would be to test how well the algorithm performs on data from younger patients.
"We will try to do this in a multi-institutional data set or in our own data to see how well it performs against younger patients," Meyer said. "But it would be great to see how this works on large data sets from countries that have data with younger patients linked to billing data."
Essentially, this new algorithm helps researchers to have more confidence in data from administrative databases.
"Because if you don't have good quality in terms of getting that original study cohort together, then you can never be as confident about your findings," Meyer said.