Working Life: computational biologist Dr Shamith Samarajiwa
Dr Shamith Samarajiwa’s computational biology group is the newest team at the MRC Cancer Unit. His group develops multi-disciplinary data science, data engineering and computational biology solutions to understand the complex biological systems involved in carcinogenesis.
Career in brief
- PhD in Molecular Immunology and Computational Genomics
- Established the first bioinformatics group at the Monash Institute
- Six years at the Cancer Research UK Cambridge Institute
- Established a new computational biology group at MRC Cancer Unit
This is an exciting time to be dealing with biomedical data. In a world poised and waiting for personalised medicine, computational biology will help us to detect cancer sooner by realising the potential of big datasets. There are millions of datasets already out there but these are completely underutilised.
I’m surprised we don’t yet understand some of the fundamental aspects of carcinogenesis. Across the millions of datasets already available, and those that are being generated, we should have enough information to understand how these processes are regulated. The problem is that datasets are being mined at only the shallowest depth and much biological insight is unexplored or undiscovered.
It is so much easier to generate large data sets now than even just a few years ago, but we need more well-trained data scientists to join and help understand these complex datasets.
I gained my early data analysis skills working in the computing and informatics industry but I was keen to work in science and took a role as an analytical chemist at Unilever before studying biomedicine at Monash University.
I found that my computing skills were fairly rare in the field and I would frequently get drawn into other projects. I had the opportunity to be involved in some of the early bacterial genome sequencing projects and one that involved sequencing the malaria parasite genome for making DNA vaccines.
During my PhD, my computing skills drew me into a project that involved analysing microarray gene expression data and I found myself working as a bioinformatician for a consortium of seven Australian universities, on top of my PhD project. We were looking for anti-inflammatory markers in chronic inflammatory diseases and had generated huge amounts of data from DNA microarrays.
The Monash Institute was one of the first research institutes in Melbourne to have their own microarray scanner, and as they had no bioinformaticians, I had to offer my services! This work eventually led me to form the first bioinformatics group at the Monash Institute once I had completed my PhD.
My group built bioinformatics resources to understand and analyse Interferons, a group of immune proteins that act as the first line of defence against pathogens, and is released in response to the presence of microbes and tumour cells.
I had been working with bioinformatics methods since they were in their infancy. I wanted to develop new approaches and deal with more complex problems. After a couple of years, I moved to the UK to work with Professor Simon Tarvaré at the Cancer Research UK Cambridge Institute to do just that.
Being exposed to Bayesian statistics, machine learning, advanced computational biology, and large –omic data-sets broadened the type of problems I could tackle.
I worked on computational methods to integrate prior knowledge and different cancer data sets to improve our ability to extract meaningful biology. This work allows us to build toolkits to tackle new problems as they arise.
From there, I moved to the MRC Cancer Unit to establish a new computational biology group.
I have a strong interest in studying immune and inflammatory responses, epigenetic changes and gene regulation in cancer development and we are applying data science and computational biology methods to understand complex systems involved in different aspects of carcinogenesis.
One area of our research is the p53 transcription factor and its target genes, which we know to be important in a number of carcinogenic processes. There are over 80,000 papers published on this protein and 5,000 more are released every year. We built software to trawl through all this information and identify relevant interactions which we can then feed back into our analysis of cancer datasets.
We generated this map of the p53-targets identified by our Rcade software, taking advantage of over 300 external high-throughput genomic and proteomic datasets. Our data revealed the importance of p53 to the integrity of entire gene networks.
Across my research and practice, a key factor for me is scientific reproducibility. This starts with encouraging experimentalists to involve statisticians in designing their experiments and follows right through to the computational analysis of data sets to ensure transparency and consistency in the use of software. I started this at Monash and we are implementing it now at the MRC Cancer Unit.
The outstanding research done at the MRC Cancer Unit and surrounding biomedical campus, combined with the opportunity to make use of my existing networks and collaborations in Cambridge, played a critical role in my decision to move to the unit.
In time, we would like to build computational methods and tools that extract meaningful biology from big biomedical datasets. This will allow us to generate hypotheses and make predictions to better understand the complex processes involved in carcinogenesis.
As told to Mary-Clare Cathcart and Sylvie Kruiniger