Prediction of anti-cancer drug activity by cell line sensitivity signatures and ensemble machine learning methods

Chemotherapy is one of the most widely used cancer treatments. Developing chemotherapeutic anti-cancer drugs is both long and costly. There are lots of compounds that can not become drugs after rigorous and costly experiments, due to various reasons such as their adverse effects, solubility, etc. Some of the drugs are only effective on some of the patients but not the others. As some of the effects of drugs can not exactly be known before the treatment, life expectancy of some of the cancer patients shortens. Due to the technological and scientific developments, the data characterizing drugs and malign cells increase everyday. Most of this data comes from cancer cell lines that constitute a model for cancer. Recently, large-scale databases have been built using these cell lines. These databases include large-scale genomic data to charactarize cell lines such as DNA methylation, copy number variations and genome-wide expression profiles. In addition, activity data of a large number of drugs on a large number of cell lines can be downloaded from these databases. This large-scale data is an important motivation to apply machine learning algorithms to this field. Therefore in this project, first, we will try to find which gene expression changes cause cancer cells to die by using a novel method we propose. Then, we will apply some of the machine learning methods that we think is suitable for the task, but have not been applied to this problem and data yet. We will represent the relationship between drugs and cell lines by a novel signature that we call cell line sensitivity signature. This signature will be computed in terms of the gene expression changes induced by drugs that are active on cell lines. We think we can make accurate predictions by comparing cell line sensitivity signature and the genes that are effected by drugs. Then, we will apply ensemble models (or meta models) and produce a model that performs better than its base models. Here, Deep Neural Nets that have not been used with the mentioned databases, are also among the base models. Finally, by a joint work with Hacettepe University, we aim to evaluate the results in vitro and validate the usability (Funding : TUBITAK, Grant no 115E274).