“I am grateful to have gained so much in such a short space of time.”
During my placement at UCL, I was fortunate enough to gain an insight into Computational Biology under the supervision of Jack Humphrey. Having been informed of the functions FUS and TDP-43, I soon learnt that as with many proteins, the change in expression of the genes that code for such RNA- and DNA-binding proteins, respectively, can be argued to characterise many neurodegenerative diseases. In this case, the diseases looked at were ALS and Frontotemporal Dementia. The efficiency with which the analysis of the regulation of genes took place using programming software such as R Studio was apparent from the beginning and this became the main method to manipulate sample data.
As a practice for writing code, I made use of a publicly available differential expression dataset that compared the gene expression between mice brains that were treated with two antisense oligonucleotides, (ASO’s). One of the ASO’s was a random sequence and the other was specific to the FUS transcript. I found the process for plotting the resultant graphs particularly complex, considering that I am a part of a generation that is said to be ‘tech-savvy’. I believe that practice through an online course teaching coding, as well as creating sample plots whilst at UCL is enough to show the depth of understanding required to make the most out of one week, let alone a PhD or career. Nevertheless, after altering the R script several times I was able to comprehend that the greater the log10 (Base Mean) value, the closer the Log2FoldChange value was to zero, indicating a smaller quantity change, although the areas of clustering can suggest similarities between sub-sections of data. The resulting plot is shown below.
Arguably the most complicated task was set towards the end of the week. The proposition was that mutations in the TDP-43 gene would impact the RNA-binding ability of the TDP-43 protein, and I used pre-existing data to analyse whether it would act as a knockdown. I found that the hypothesis was difficult to support based on this dataset, and as expected in science; more data would need to be analysed. Using skills within R such as vector arithmetic helped to reach this judgement as it gives rise to additional data such as log10(gene length) which was calculated from given data to produce a plot with fewer points. Also; the ‘for()’ loop function loaded the data into one plot using the same commands, but did so in such a way that each dataset was still uniquely identified with the assistance of different colours, as you can see in the picture above.
My placement allowed me to witness how science calls for patience and the ability to ask the right questions to manipulate and evaluate the data that could add another piece to the puzzle. With thanks to Jack for giving up his time, everyone at UCL for their hospitality (and smoothies) and In2scienceUK for providing me with the placement, I am grateful to have gained so much in such a short space of time.