Authors
Veeram Venkata Siva Prasad and Gunisetti Loshma, Sri Vasavi Engineering College, India
Abstract
Bioinformatics and computational biology are rooted in life sciences as well as computer and information sciences and technologies. Bioinformatics applies principles of information sciences and technologies to make the vast, diverse, and complex life sciences data more understandable and useful. Computational biology uses mathematical and computational approaches to address theoretical and experimental questions in biology. Short read sequence assembly is one of the most important steps in the analysis of biological data. There are many open source software’s available for short read sequence assembly where MAQ is one such popularly used software by the research community. In general, biological data sets generated by next generation sequencers are very huge and massive which requires tremendous amount of computational resources. The algorithm used for the short read sequence assembly is NP Hard which is computationally expensive and time consuming. Also MAQ is single threaded software which doesn't use the power of multi core and distributed computing and it doesn't scale. In this paper we report HPC-MAQ which addresses the NP-Hard related challenges of genome reference assembly and enables MAQ parallel and scalable through Hadoop which is a software framework for distributed computing.
Keywords
High Performance computing, Hadoop, Whole Genome Reference Assembly, Computational Quantitative Biology, HPC-MAQ.