
The present book “Bioinformatics in Agriculture and Allied Sectors” contains 26 chapters including some fundamental chapters of soil informatics, proteomics, sequence alignment, homology modeling, drug designing and bioinformatics application in aquaculture and avian sectors. Efforts were given to make this book in a simplified way for well understanding and to make very useful for the students, scientists and professionals engaged in research and development of bioinformatics education.
Preface Bioinformatics, a recently emerged frontier discipline, is the convergence of analytical and computational tools with the disciplines of biological sciences. Now-a-days, it has a great scope for development of entrepreneurship. The integration of three biological processes namely, DNA sequence determining protein sequence, protein sequence determining protein structure and protein structure determining protein function, makes us to achieve the long term goal of the complete understanding of the biology of organisms through the development of bioinformatics tools and software. Government of India is also giving much thrust for the development of this area through the Department of Biotechnology. The present book “Bioinformatics in Agriculture and Allied Sectors” contains 26 chapters including some fundamental chapters of soil informatics, proteomics, sequence alignment, homology modeling, drug designing and bioinformatics application in aquaculture and avian sectors. Efforts were given to make this book in a simplified way for well understanding and to make very useful for the students, scientists and professionals engaged in research and development of bioinformatics education. We feel immense pleasure to express our heartfelt gratitude to Dr. T. Madhan Mohan, Adviser, Department of Biotechnology, Ministry of Science and Technology, Government of India for his inspiring guidance, encouragement and support for organizing three national workshops-cum-training on bioinformatics at OUAT and this book is an outcome of that We are highly grateful to Dr. T. Mohapatra, Secretary, DARE and Director General, ICAR, Government of India for writing a foreword for this book. We also like to acknowledge the support and encouragement received from Dr. R.K. Mahapatra, Chief Librarian, OUAT and appreciate the efforts of M/s New India Publishing Agency, New Delhi for publishing this book in a beautiful form.
Bioinformatics is a recently emerged frontier discipline, which helps in analyzing biological data with the help of information technology and statistical software packages. In a broader sense, bioinformatics is the application of computer technology in management of biological information. This interdisciplinary scientific tool having no barriers between different branches of science like biology, biotechnology, mathematics, computer science and information technology, has great influence on advanced bio-science research, particularly dealing with genes, proteins and cell biology. Bioinformatics is the branch of study that deals with the use of information technology (IT) for managing data of biological science. Bioinformatics drew public attention with the announcement of the draft sequence of human genome by the then US President Bill Clinton along with Craig Venter and Francis Collins in 2000 (Uma and Sathyanathan, 2004).
Bioinformatics is a multidisciplinary approach that combines computational and biological expertise to analyze biological data to advance research and development and it drew public attention with the announcement of the draft sequence of human genome by the then US President, Bill Clinton along with Craig Venter and Francis Collins in 2000. Bioinformatics and life sciences research are gaining attention in the government, industries and academic sectors. India was the first country in the world to establish a biotechnology information system (BTIS) network in 1987 through the Dept. of Biotechnology and created an infrastructure that enables it to harness biotechnology through the application of bioinformatics. Through this branch of science, biological data are acquired, stored and then analyzed. Bioinformatics can be used to store huge bioscience data for their future uses. By exploring computational tools, databases and new strategies for pests and diseases, resistant genes can be identified and plants with desired traits can be engineered to enhance agricultural productivity. Bioinformatics can be utilized in two ways for agriculture, viz., (i) at molecular level for application in breeding strategies and (ii) agri-informatics for land use planning, farmers window, early warning system, impact assessment and environmental applications.
Soil is formed by disintegration and decomposition of rocks, minerals and organic matter, Soil is a habitant for plant growth and serves as a storehouse of water and food nutrients for plants, Soil is not same everywhere and varies morphologically, physically, chemically and mineralogically from one point to another both in horizontal and vertical directions. Development of soil informatics is necessary to classify the soils, to assess the productivity of soils for various crop productions and to use the soil for various purposes such as land capability classification, soil irrigability classification, soil rating and soil mapping. Morphologically soil may appear red, yellow, black, brown or ashy in colour. Soil may vary from sandy to clayey in texture. Depending upon the nutrient reserve capacity, the soil may be fertile or infertile. On the basis of climate, a soil may be classified as laterite soil. Red earth and red loam soil, podzol soil or desert soil. Different kinds of soil contain minerals of various types and amounts and so exhibit distinct physio co-chemical properties. Although various methods have been used in the past to classify the soils, the most modern and scientific methods is based on dominants pedogenic processes acting on a place to from a soil. The new method of soil classification has two important characteristics-Firstly soil classification is based on certain diagnostic parameters which could be estimated quantatively, and secondly depending on diagnostic parameters, a scientific method is being followed to classify a soil from higher to lower levels such as order, suborder, great group, subgroup, family and series.
Advancement of both biological sciences and computer technology gave birth today’s one of the hottest area known as bioinformatics. Gradually bioinformatics analysis has become an essential component to the continued improvement and development of any kind of biological research work. It has various applications in medicine, biotechnology, agriculture etc. Some of the applications of bioinformatics related to biological information analysis are as follows [1] · Information related to bio-molecules can be mapped (e.g.), the sequences can be parsed to find sites where so-called “restriction enzymes” will cut them. · Sequences can be compared, usually by aligning corresponding segments and looking for matching and mismatching letters in their sequences. Genes or proteins that are sufficiently similar are likely to be related and are therefore said to be “homologous” to each other. · If a homologous exists then a newly discovered protein may be modeled - that is 3D structure of the gene product can be predicted without doing laboratory experiments. · Bioinformatics is used to attempt to predict the function of actual gene products.
If the 1990s were the decade of genomics, the first ten years of the new century are set to become the decade of proteomics. For the first time, the technologies of proteomics make it possible to generate quantitative protein expression data on a scale and sensitivity comparable to that achieved at the genetic level. This advance has major implications for our understanding of cellular organisation in health and disease and for pharmaceutical and agricultural biotechnology. Indeed, proteomics is already yielding important findings, across a wide range of applications. Most proteomics research, however, is directed towards the more proximal goal of investigating protein expression and function under specified physiological conditions. The rapid evolution of proteomics has continued during the past years with a series of innovations in the core technologies of two-dimensional electrophoresis and mass spectrometry, and a diversity of productive research programmes. Well annotated proteomics databases are now emerging in a number of fields to provide a platform for systematic research, with particularly promising progress in clinical applications such as cardiology and oncology. Large-scale quantitative research, comparable in power and sensitivity to that achieved for gene expression, is thus becoming a reality at the protein level.
Gene silencing is the suppression of gene expression through nucleotide sequence-specific interactions that are mediated by RNA. Many manifestations of RNA silencing-RNA interference (RNAi) in animals, quelling in fungi and post-transcriptional gene silencing (PTGS) in plants, appear to be related. RNA silencing is experimentally activated by double-stranded RNA (ds RNA) and is used as a powerful technique for specific inhibition of gene expression in a variety of organisms. In a natural context, ds RNA may be produced from rearranged loci by transcription from converging promoters or by host-or viral-encoded RNA-dependent RNA polymerases (RdRP). A key component of RNA silencing is 21-23 nt RNA known as short interfering RNA (Si RNA). In Drosophila, the siRNA is derived from ds RNA by the action of an RNase-III like enzyme named DICER. The siRNA guides a multi-subunit endnuclease, referred to as the RNA-induced silencing complex (RISC) and it specifically degrades RNA that share sequence similarly with the ds RNA.
Biotechnology provides significant opportunities and greater effects in horticulture, where even minor changes such as quality, colour, aroma, flavour, structure, disease-pest resistance and postharvest behaviour bring tremendous achievements commercially. Tissue culture, genetic engineering, molecular breeding, biofertilizers, biopesticides and postharvest biotechnology are important aspects of biotechnology in horticulture such as fruit, vegetable, ornamental, spice, medicinal, aromatic and plantation crops. Tissue Culture Tissue culture is the true reality of the plant biotechnology. The annual demand of tissue cultured production constitutes nearly 10 per cent of the total world biotech business, amounting to 15 billion US dollars. The global production of tissue culture industry is expanding @ 15% per annum (Chadha, 2000). In India, more than 130 tissue culture units are presently producing > 300 million plantlets per annum (Choudhury and Tejaswini, 2000).
The success of any crop improvement programme springs mainly from sound and genetically investigations carried out in the fields. However, the moment such biological populations are subjected to field evaluation, some external and non-heritable agencies come into active role. As a result, these agencies or factors mask real differences among the tested populations which stem purely from heritable or genetic factors. Their influence gets confounded with the latter. Consequently authentic data related to genetic differences are not available due to controllable and non-controllable errors. Further, physical data that are assembled in an experiment are often so voluminous and so vast that is perhaps humanly not possible to arrive at meaningful conclusions. Hence, an adequate knowledge of statistical principles and their appropriate use warrant perfection to a point where all possible sources of error can be well accounted for.
The importance of Plant Pathology has been realized over the years owing to number of crop failures in different parts of the world. Plant disease control is indispensable for qualitative production and to compete in global agricultural trade. There is great need of sensitization of our research in various aspects which shall guide the generation to come for managing plant diseases not only for enhancing crop production but also in facilitating trade by conserving the biodiversity and environment. Research Trend in Plant Pathology Like in other sciences, there has been a change in the trend of research in Plant Pathology with passage of time. Up to the late 1960s most of the interest in plant pathology was concentrated in crop diseases mostly caused by fungi. Physiology of microorganisms and physiology of parasitism became increasingly popular fields of interest in 1960s.
The focus of world’s trade and economic activities is increasingly being shifted from resource-based economics to knowledge-based economics. This paradigm shift has been triggered due to the implementation of Convention on Biodiversity (CBD) and other related international treaties and policy frameworks. CBD for the first time states that bio-resources are the sovereign property of the countries in which they are found. It further stipulates that access to bio-resources by outside parties can be made only after prior approved consent of the concerned stakeholders and also with mutually agreed terms and conditions for equitable sharing of benefits accrued from the commercialization of biodiversity. To realize such proclaimed rights given in CBD and to fulfill obligations of various other relevant treaties, particularly in the context WTO regulation, it is imperative that the bio-resources-rich nations like India have comprehensive knowledgebase of the biodiversity and associated information systems in accessible form.
Bioinformatics is a discipline that deals with computation of biological information. Bioinformatics is a newly emerging interdisciplinary research area spanning a range of specialties that include biology, biophysics, computer science, mathematics & statistics. It makes use of scientific and technological advances in the area of computer science, information technology and communication technology to solve complex problems in life sciences particularly problem in Biotechnology. Explosive growth in biological data from large sequencing projects that are producing nucleotide sequences continuously at a faster rate and the content of the nucleotide database is doubling in every 14 months. The latest release of Gene bank (v.102) exceeded one billion base pairs. To cope with this huge volume of data, a new scientific discipline has emerged comprising bioinformatics, biocomputing & computational biology.
The advancements in the field of informatics, biology and allied fields of technology led to the emergence of an inter-disciplinary field known as ‘Bioinformatics’. This field is an expanding and challenging tool for the collection, storage, organization, integration, analysis and simulation of biological data for utilization in biotechnology. Collection of Data for Bioinformatics An epoch making revolution in Genomics in recent years in deciphering plant, animal and microbial genome sequences, Proteomics, Statistics, Instrumental techniques, Structural and Computational Biology have given input to fragmented pieces-and-bits of information on biological materials. Each data is a discrete piece of important but defragmented information regarding a molecule (protein, RNA, DNA, sugars or lipids etc) or a reaction or at best a pathway, supposed to be occurring in a living system independently at a given time and space. However, as in a living system each molecule or a process exists in unison with other entities of its kind, these fragmented data are to be stored, organized, and then integrated and analysed to form an integrated part of a whole chain of events – now called a ‘SYSTEM’. Storage and organisation of data Explosion of knowledge in the field of biology and generation of new informations (data) needs well storage, organization and a faster distribution among the scientific workers. The advent of super computers, high speed computing and development of algorithms for data formatting, assists in the storage of data at the site of origin and distributed through the hyperlinks in world wide web (www).
Nanotechnology is a field of applied science focused on the design, synthesis, characterization and application of materials and devices on the nanoscale. It is a sub classification of technology in colloidal science, biology, physics, chemistry and other scientific fields and involves the study of phenomena and manipulation of material at the nanoscale. This is involved in the creation of useful materials, devices, and systems used to manipulate matter at an incredibly small scale; between 1 and 100 nanometers. With nanotechnology, a large set of materials with distinct properties (optical, electrical, or magnetic) can be fabricated. Nanotechnologically improved products rely on a change in the physical properties when the feature sizes are shrunk.
The advances in science and technology are deeply intertwined. Researchers can use the semiconductor manufacturing techniques that underlie miniaturization to build radios and exceptionally small mechanical structures that sense fields and forces in the physical world. These inexpensive, low-power communication devices can be deployed throughout a physical space, providing dense sensing close to physical phenomena, processing and communicating this information, and coordinating actions with other nodes. These multifunction devices are battery operated and are connected wirelessly with sensing, processing and communicating capabilities and are known as the wireless sensors. If we use these sensors with biological issues or Biotechnology, then we can find the new concepts biosensors. If we really implement these concepts in real life we can get rid of many diseases like heart attack, blood sugar level, gene reaction and other changes that happens in our body much before its shows to outside.
Introduction After the advent of Post Genomic era, Biological data is accumulating at an electrifying pace every day. Storage and analysis of these data serve as a major source for scientific research to understand the phenomena of origin, evolution and existence of various forms of life. The management of these data requires different statistical techniques used for retrieval, storage, analysis and extraction of precise information from small as well as large biological datasets. Application of statistical tools in Bioinformatics especially in fisheries data has been reported (Roy and Martha, 2008).
Biological sequences like the nucleotide and amino acid sequences can be analyzed by means of the sequence alignment. The objective behind this is to obtain knowledge regarding the similarity between the query and the target sequence to decipher the structure, function, homology of the query sequence.
Sequence Alignment Sequence Alignment is the procedure of comparing two (Pair-wise Alignment) or more (Multiple Sequence Alignment) sequences by searching for a series of individual characters or character patterns that are in the same order in the sequences. Significance: Discovering functional, structural and evolutionary information in biological sequences.
Motif 1. A nucleotide/amino acid sequence pattern having biological significance. 2. Occurs repeatedly, either in same molecule or in many molecules. Found in upstream intergenic region. 3. Functionally important regions of genome (gene) can be recognized by searching such patterns. 4. Used for locating binding sites, regulatory signals, to control gene expression. 5. Identifying potential drug target sites.
Objectives · Creation and use of large databases of Biological information to augment traditional laboratory based biology. · Analysis and interpretation of genomic and structural data using computational techniques from Mathematics, Statistics and Computer Science. · Explain the process and mechanism of evolution by comparing the Genomes of different species. · Development of new methods and technologies for understanding and diagnosis of genetic diseases and design of drugs for their treatment.
ANN is a Mathematical model or Computational model based on biological neural networks. It consists of an interconnected group of artificial neurons and processes information using a connectionist approach to computation. Neural Networks are non-linear statistical data modeling tools. That can be used to model complex relationships between inputs and outputs or to find patterns in data.
Insights into the three-dimensional (3D) structure of a protein are of great assistance when we plan experiments to understand protein function and in assistance of the drug design process. Homology modeling · Homology modeling consists of the extrapolation of the structure for a new (target) sequence from the known 3D-structure (X-ray crystallographic or NMR) of related family members (templates). · Homology modeling is for those targets that have homologous proteins with known structure (>30% identity) · Homology modeling is for easy targets. · Homology modeling treats the template in an alignment as a sequence and only sequence homology is used for prediction. · When the sequence identity between the target and template sequence is low (<25%), homology modeling may not produce a significance result.
The genetic code uses 64 codons to represent the 20 standard amino acids and the translation termination signal. Each codon is recognised by a subset of a cell’s transfer ribonucleotide acid molecules (tRNAs) and with the exception of a few codons that have been reassigned in some lineages (Osawa and Jukes 1989; Osawa et al., 1990) the genetic code is remarkably conserved, although it is still in a state of evolution (Osawa et al.,1992). The genetic code uses 64 codons to represent the 20 standard amino acids and the translation termination signal. Each codon is recognised by a subset of a cell’s transfer ribonucleotide acid molecules (tRNAs) and with the exception of a few codons that have been reassigned in some lineages (Osawa and Jukes 1989; Osawa et al., 1990) the genetic code is remarkably conserved, although it is still in a state of evolution (Osawa et al., 1992).
In recent years, we have witnessed an explosion of biological information and the bioinformatics data arising from the rapid research and unprecedented progress in molecular biology. The novel genomic technologies have revolutionized the drug designing, discovery and development processes in government as well as private pharmaceutical industries through the use of these technologies. Various databases are doubling in size every year and fortunately we have the complete genome sequences of more than 100 organisms. At present, the pharmaceutical industries have clasped genomics as a source of drug targets. They also distinguish that the field of bioinformatics is crucial for validating these potential drug targets and for determining the ones that are most suitable for entering the drug development pipeline. Due to our increased understanding of molecular biology, there has been a change in the way that medicines are being developed. Once upon a time, new synthetic organic molecules were tested in animals or in whole organ preparations. Now this approach has been replaced with a molecular target in which in-vitro screening of compounds against purified, recombinant proteins or genetically modified cell lines is carried out with a high throughput as a consequence of better and ever improving knowledge of the molecular basis of disease.
Drug designing is the approach of finding drugs by design based on their biological targets. The substance used for medical purpose, either alone or in mixture that changes the state or function of cells, organs or organism as a whole is known as a drug. So typically a drug target is a key molecule involved in particular metabolic or signaling pathway that is specific to disease condition or pathology, or to the infectivity or survival of a microbial pathogen. Drugs may be designed that bind to the active region and inhibit the key molecule that causes disease. However these drugs would also have to be designed in such a way that they should not affect any other important molecules similar to the key molecule. These can also be used to enhance the normal pathway of metabolism by promoting specific molecules that may have been suppressed in disease state.
Influenza a Pandemics
This statement signifies that the casualty of people is not due to the earthquake but due to the damage of buildings by earthquake forces. In developed countries, the casualty and loss of property is very minimal in comparison to the death toll and losses in India in the event of earthquakes of same magnitude. It has been experienced in the past that on the event of earthquakes of Richter magnitude scale of 6 or more, the death toll in India is in terms of thousands where as in Western Countries, it is multiple of tens. The less casualty is due to less damage of buildings and other infrastructures which in turn is due to the design and construction of Buildings and other structures as per the earthquake resistant theory. This paper discusses the issues related to the earthquake hazards in India and the concept of earthquake resistant structures.
Colour Plates Chapter 15: Applications of Statistics in Bioinformatics
