Big data approaches can be applied to genomic data. Human reference genome consists of 3.2 billion symbols.
There are many problems related to the processing of genomic data in bioinformatics, genetic engineering
and development of personalised treatment. Experimental data often consist of sets of short (100 symbols)
sequences/reads which need to be aligned with respect to a reference genome. Due to mutations these
reads may have single substitutions, insertions and deletions. So, alignment algorithms should consider those
variations. An alignment algorithm may take many hours to run and is highly desirable to optimise.
A few algorithms were suggested and developed to work fast and efficient. Many modern algorithms were
developed a decade ago and tried to overcome hardware limitations such as the size of memory and
slow access to storage. Modern workstations have improved significantly and open up possibilities for
A standard alignment is initially performed by finding common substrings (seeds) of a read and a reference
genome. Once such locations are identified within a reference sequence, various comparison procedures
are performed. Using lengthy seeds may help us to reduce the number of locations to be considered, however
on the other hand we may miss some locations due to the presence of substitutions. Spaced seeds may help
us find all locations. For a given length of a read and the number of substitutions allowed to take place we may
generate a seed that will guarantee to find all those locations. The weight (number of ones) of these spaced
seeds is usually twice the weight of a contiguous seed used in standard algorithms. For each spaced seed
optimal SIMD (Single Instruction Multiple Data) instructions are provided to speed up processing.
A Teams link will follow nearer the time to Local Fellows and Others on the RSSNI mailing list.
All welcome, of course - if you are not on our list and wish to attend please request the link from the Secretary in advance.
Speaker: Dr. Sofya Titarenko,
Senior Lecturer in Mathematics at the University of Huddersfield