Computational biology in the context of Scientific computing


Computational biology in the context of Scientific computing

Computational biology Study page number 1 of 1

Play TriviaQuestions Online!

or

Skip to study material about Computational biology in the context of "Scientific computing"


⭐ Core Definition: Computational biology

Computational biology refers to the use of techniques in computer science, data analysis, mathematical modeling and computational simulations to understand biological systems and relationships. An intersection of computer science, biology, and data science, the field also has foundations in applied mathematics, molecular biology, cell biology, chemistry, and genetics.

↓ Menu
HINT:

In this Dossier

Computational biology in the context of Bioinformatics

Bioinformatics (/ˌb.ˌɪnfərˈmætɪks/ ) is an interdisciplinary field of science that develops computational methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, data science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. This process can sometimes be referred to as computational biology, however the distinction between the two terms is often disputed. To some, the term computational biology refers to building and using models of biological systems.

Computational, statistical, and computer programming techniques have been used for computer simulation analyses of biological queries. They include reused specific analysis "pipelines", particularly in the field of genomics, such as by the identification of genes and single nucleotide polymorphisms (SNPs). These pipelines are used to better understand the genetic basis of disease, unique adaptations, desirable properties (especially in agricultural species), or differences between populations. Bioinformatics also includes proteomics, which aims to understand the organizational principles within nucleic acid and protein sequences.

View the full Wikipedia page for Bioinformatics
↑ Return to Menu

Computational biology in the context of Epistasis

Epistasis is a phenomenon in genetics in which the effect of a gene mutation is dependent on the presence or absence of mutations in one or more other genes, respectively termed modifier genes. In other words, the effect of the mutation is dependent on the genetic background in which it appears. Epistatic mutations therefore have different effects on their own than when they occur together. Originally, the term epistasis specifically meant that the effect of a gene variant is masked by that of a different gene.

The concept of epistasis originated in genetics in 1907 but is now used in biochemistry, computational biology and evolutionary biology. The phenomenon arises due to interactions, either between genes (such as mutations also being needed in regulators of gene expression) or within them (multiple mutations being needed before the gene loses function), leading to non-linear effects. Epistasis has a great influence on the shape of evolutionary landscapes, which leads to profound consequences for evolution and for the evolvability of phenotypic traits.

View the full Wikipedia page for Epistasis
↑ Return to Menu

Computational biology in the context of Data preprocessing

Data preprocessing can refer to manipulation, filtration or augmentation of data before it is analyzed, and is often an important step in the data mining process. Data collection methods are often loosely controlled, resulting in out-of-range values, impossible data combinations, and missing values, amongst other issues.Preprocessing is the process by which unstructured data is transformed into intelligible representations suitable for machine-learning models. This phase of model deals with noise in order to arrive at better and improved results from the original data set which was noisy. This dataset also has some level of missing value present in it.

The preprocessing pipeline used can often have large effects on the conclusions drawn from the downstream analysis. Thus, representation and quality of data is necessary before running any analysis. Often, data preprocessing is the most important phase of a machine learning project, especially in computational biology. If there is a high proportion of irrelevant and redundant information present or noisy and unreliable data, then knowledge discovery during the training phase may be more difficult. Data preparation and filtering steps can take a considerable amount of processing time. Examples of methods used in data preprocessing include cleaning, instance selection, normalization, one-hot encoding, data transformation, feature extraction and feature selection.

View the full Wikipedia page for Data preprocessing
↑ Return to Menu

Computational biology in the context of Statistical genetics

Statistical genetics is a scientific field concerned with the development and application of statistical methods for drawing inferences from genetic data. The term is most commonly used in the context of human genetics. Research in statistical genetics generally involves developing theory or methodology to support research in one of three related areas:

Statistical geneticists tend to collaborate closely with geneticists, molecular biologists, clinicians and bioinformaticians. Statistical genetics is a type of computational biology.

View the full Wikipedia page for Statistical genetics
↑ Return to Menu

Computational biology in the context of Computational science

Computational science, also known as scientific computing, technical computing or scientific computation (SC), is a division of science, and more specifically the computer sciences, which uses advanced computing capabilities to understand and solve complex physical problems in science. While this typically extends into computational specializations, this field of study includes:

In practical use, it is typically the application of computer simulation and other forms of computation from numerical analysis and theoretical computer science to solve problems in various scientific disciplines. The field is different from theory and laboratory experiments, which are the traditional forms of science and engineering. The scientific computing approach is to gain understanding through the analysis of mathematical models implemented on computers. Scientists and engineers develop computer programs and application software that model systems being studied and run these programs with various sets of input parameters. The essence of computational science is the application of numerical algorithms and computational mathematics. In some cases, these models require massive amounts of calculations (usually floating-point) and are often executed on supercomputers or distributed computing platforms.

View the full Wikipedia page for Computational science
↑ Return to Menu

Computational biology in the context of David Baker (biochemist)

David Baker (born October 6, 1962) is an American biochemist and computational biologist who has pioneered methods to design proteins and predict their three-dimensional structures. He is the Henrietta and Aubrey Davis Endowed Professor in Biochemistry, an investigator with the Howard Hughes Medical Institute, and an adjunct professor of genome sciences, bioengineering, chemical engineering, computer science, and physics at the University of Washington. He was awarded the shared 2024 Nobel Prize in Chemistry for his work on computational protein design.

Baker is a member of the United States National Academy of Sciences and the director of the University of Washington's Institute for Protein Design. He has co-founded more than a dozen biotechnology companies and was included in Time magazine's inaugural list of the 100 Most Influential People in health in 2024.

View the full Wikipedia page for David Baker (biochemist)
↑ Return to Menu

Computational biology in the context of Molecular modelling

Molecular modelling encompasses all methods, theoretical and computational, used to model or mimic the behaviour of molecules. The methods are used in the fields of computational chemistry, drug design, computational biology and materials science to study molecular systems ranging from small chemical systems to large biological molecules and material assemblies. The simplest calculations can be performed by hand, but inevitably computers are required to perform molecular modelling of any reasonably sized system. The common feature of molecular modelling methods is the atomistic level description of the molecular systems. This may include treating atoms as the smallest individual unit (a molecular mechanics approach), or explicitly modelling protons and neutrons with its quarks, anti-quarks and gluons and electrons with its photons (a quantum chemistry approach).

View the full Wikipedia page for Molecular modelling
↑ Return to Menu

Computational biology in the context of Unigram

An n-gram is a sequence of n adjacent symbols in a particular order. The symbols may be n adjacent letters (including punctuation marks and blanks), syllables, or rarely whole words found in a language dataset; or adjacent phonemes extracted from a speech-recording dataset, or adjacent base pairs extracted from a genome. They are collected from a text corpus or speech corpus.

If Latin numerical prefixes are used, then n-gram of size 1 is called a "unigram", size 2 a "bigram" (or, less commonly, a "digram") etc. If, instead of the Latin ones, the English cardinal numbers are furtherly used, then they are called "four-gram", "five-gram", etc. Similarly, Greek numerical prefixes such as "monomer", "dimer", "trimer", "tetramer", "pentamer", etc., or English cardinal numbers, "one-mer", "two-mer", "three-mer", etc. are used in computational biology for polymers or oligomers of a known size, called k-mers. When the items are words, n-grams may also be called shingles.

View the full Wikipedia page for Unigram
↑ Return to Menu

Computational biology in the context of Computer experiment

A computer experiment or simulation experiment is an experiment used to study a computer simulation, also referred to as an in silico system. This area includes computational physics, computational chemistry, computational biology and other similar disciplines.

View the full Wikipedia page for Computer experiment
↑ Return to Menu

Computational biology in the context of Biological computation

The concept of biological computation proposes that living organisms perform computations, and that as such, abstract ideas of information and computation may be key to understanding biology. As a field, biological computation can include the study of the systems biology computations performed by biota, the design of algorithms inspired by the computational methods of bio-data, the design and engineering of manufactured computational devices using synthetic biology and computer methods for biological data, Computational Biology. This extenuates DNA Computation, Evolutionary Computation, Autonomic Computation, Morphological Computation, Morphogenetic Computation, Amorphous Computation, and Hyperdimensional Computation.

According to Dominique Chu, Mikhail Prokopenko, and J. Christian J. Ray, "the most important class of natural computers can be found in biological systems that perform computation on multiple levels. From molecular and cellular information processing networks to ecologies, economies and brains, life computes. Despite ubiquitous agreement on this fact going back as far as von Neumann automata and McCulloch–Pitts neural nets, we so far lack principles to understand rigorously how computation is done in living, or active, matter".

View the full Wikipedia page for Biological computation
↑ Return to Menu

Computational biology in the context of E-science

E-Science or eScience is computationally intensive science that is carried out in highly distributed network environments, or science that uses immense data sets that require grid computing. The term sometimes includes technologies that enable distributed collaboration, such as the Access Grid. The term was created by John Taylor, the Director General of the United Kingdom's Office of Science and Technology in 1999 and was used to describe a large funding initiative starting in November 2000. E-science has been more broadly interpreted since then as "the application of computer technology to the undertaking of modern scientific investigation", including the preparation, experimentation, data collection, results dissemination, and long-term storage and accessibility of all materials generated through the scientific process. These may include data modeling and analysis, electronic/digitized laboratory notebooks, raw and fitted data sets, manuscript production and draft versions, pre-prints, and print and/or electronic publications." In 2014, IEEE eScience Conference Series condensed the definition to "eScience promotes innovation in collaborative, computationally- or data-intensive research across all disciplines, throughout the research lifecycle" in one of the working definitions used by the organizers. E-science encompasses "what is often referred to as big data [which] has revolutionized science... [such as] the Large Hadron Collider (LHC) at CERN... [that] generates around 780 terabytes per year... highly data intensive modern fields of science...that generate large amounts of E-science data include: computational biology, bioinformatics, genomics" and the human digital footprint for the social sciences.

Turing Award winner Jim Gray imagined "data-intensive science" or "e-science" as a "fourth paradigm" of science (empirical, theoretical, computational and now data-driven) and asserted that "everything about science is changing because of the impact of information technology" and the data deluge.

View the full Wikipedia page for E-science
↑ Return to Menu

Computational biology in the context of Gene prediction

In computational biology, gene prediction or gene finding refers to the process of identifying the regions of genomic DNA that encode genes. This includes protein-coding genes as well as RNA genes, but may also include prediction of other functional elements such as regulatory regions. Gene finding is one of the first and most important steps in understanding the genome of a species once it has been sequenced.

In its earliest days, "gene finding" was based on painstaking experimentation on living cells and organisms. Statistical analysis of the rates of homologous recombination of several different genes could determine their order on a certain chromosome, and information from many such experiments could be combined to create a genetic map specifying the rough location of known genes relative to each other. Today, with comprehensive genome sequence and powerful computational resources at the disposal of the research community, gene finding has been redefined as a largely computational problem.

View the full Wikipedia page for Gene prediction
↑ Return to Menu

Computational biology in the context of Carnegie Mellon School of Computer Science

The School of Computer Science (SCS) at Carnegie Mellon University in Pittsburgh, Pennsylvania is a degree-granting school for computer science established in 1988, making it one of the first of its kind in the world. It has been consistently ranked among the best computer science programs in the world. As of 2024 U.S. News & World Report ranks the graduate program as tied for No. 1 with Massachusetts Institute of Technology, Stanford University and University of California, Berkeley.

Researchers from Carnegie Mellon School of Computer Science have made fundamental contributions to the fields of algorithms, artificial intelligence, computer networks, distributed systems, parallel processing, programming languages, computational biology, robotics, language technologies, human–computer interaction and software engineering.

View the full Wikipedia page for Carnegie Mellon School of Computer Science
↑ Return to Menu

Computational biology in the context of Sandia National Laboratories

Sandia National Laboratories (SNL), also known as Sandia, is one of three research and development laboratories of the United States Department of Energy's National Nuclear Security Administration (NNSA). Headquartered in Kirtland Air Force Base in Albuquerque, New Mexico, it has a second principal facility next to Lawrence Livermore National Laboratory in Livermore, California, and a test facility in Waimea, Kauaʻi, Hawaii. Sandia is owned by the U.S. federal government but privately managed and operated by National Technology and Engineering Solutions of Sandia, a wholly owned subsidiary of Honeywell International.

Established in 1949, SNL is a "multimission laboratory" with the primary goal of advancing U.S. national security by developing various science-based technologies. Its work spans roughly 70 areas of activity, including nuclear deterrence, arms control, nonproliferation, hazardous waste disposal, and climate change. Sandia hosts a wide variety of research initiatives, including computational biology, physics, materials science, alternative energy, psychology, MEMS, and cognitive science. Most notably, it hosted some of the world's earliest and fastest supercomputers, ASCI Red and ASCI Red Storm, and is currently home to the Z Machine, the largest X-ray generator in the world, which is designed to test materials in conditions of extreme temperature and pressure.

View the full Wikipedia page for Sandia National Laboratories
↑ Return to Menu