In this lesson, we are going to be covering some of the key biology and computational terms and definitions that are necessary for having a strong understanding of computational biology. Below are some of the most important terms and their definitions.

Algorithm: A step-by-step procedure or formula for solving a problem, often used in computational tasks to analyze biological data.

Bioinformatics: The application of computational tools and techniques to analyze and interpret biological data, particularly large datasets generated by high-throughput techniques.

Biological Database: A collection of data that is organized for quick search and retrieval, often containing genetic, genomic, proteomic, or other biological information.

Command line interface: A means of interacting with a computer whereby the user issues commands in the form of successive lines of text. The term 'shell', or 'UNIX shell', refers to a command line interpreter for the UNIX/Linux operating system. Microsoft provides a command line interface to Windows, but this is not commonly used in bioinformatics.

Computational Genomics: The study of the genomes of organisms using computational tools to understand their structure, function, evolution, and mapping.

DEG (Differentially expressed genes): Genes that show significant differences in expression levels between two or more groups, such as a disease group and a healthy control group.

DNA: Stands for deoxyribonucleic acid and is the molecule which holds all the genetic information in your body.

DNA Sequencing: The process of determining the exact order of nucleotides within a DNA molecule.

Gene Expression Profiling: Measuring the activity (the expression) of thousands of genes at once to create a global picture of cellular function.

Genome: Entire set of DNA instructions found in a cell of an organism. Human genome consists of 23 pairs of chromosomes located in the cell’s nucleus, as well as a small chromosome in the cell’s mitochondria. It has all the information needed for an individual to develop and function.

Genomics: Involves the study of genomes. It aims to map, sequence, and analyze the structure and function of genomes. Techniques include genome assembly, annotation, and comparison across species.

Genome Assembly: The process of taking fragments of DNA sequences and reassembling them into their original, continuous genomic sequence.

Genome Annotation: The process of identifying the locations of genes and all of the coding regions in a genome and determining what those genes do.

Homology Modeling: A method in computational biology to predict the three-dimensional structure of a given protein sequence based on known structures of homologous proteins.

Ligand: Molecule that binds to another molecule called a receptor to send signals within or between cells. In pharmacology, any drug that binds to a biological target such as a protein are ligands.

Machine Learning: A subset of artificial intelligence involving algorithms and statistical models that computer systems use to perform tasks without using explicit instructions, relying on patterns and inference instead.

Metagenomics: The study of genetic material recovered directly from environmental samples, allowing the study of communities of microorganisms without the need for culturing them.

Molecular Dynamics Simulation: A computer simulation method for studying the physical movements of atoms and molecules, used to understand the structure, dynamics, and thermodynamics of biological molecules.

Next-Generation Sequencing (NGS): High-throughput sequencing technologies that have revolutionized genomic research by allowing the sequencing of entire genomes quickly and cheaply.

Nucleotide: Building block of RNA and DNA. It consists of a sugar molecule- ribose in RNA, deoxyribose in DNA.

Pathogenesis: Describes the mechanisms by which a disease develops, progresses, and either persists or is resolved.

Pharmacogenomics: Study of how genes affect a person's response to drugs.

Phylogenetics: The study of evolutionary relationships among biological entities – often species, individuals, or genes – based on genetic data.

Pipeline: In computer jargon, this is a series of steps, or software tools, run in a specified order, where the input to one tool may be the output of a previous tool. Can include automated logical decisions.

Proteomics: The large-scale study of proteins, particularly their structures and functions.

RNA: Stands for ribonucleic acid and is present in all living cells. While it primarily serves as a carrier of genetic information (messenger for instructions to make more DNA, by the central dogma), it can hold genetic information in viruses (retroviruses)

rRNA: Stands for ribosomal RNA and is present in ribosomes. It is a molecule essential for protein synthesis.

mRNA: Stands for messenger RNA and is crucial in protein synthesis.

miRNA: Stands for Micro RNA and are non-coding regions of RNA that regulate many cell processes, from apoptosis to cell proliferation.

Sequence Alignment: The arrangement of two or more sequences of DNA, RNA, or protein to identify regions of similarity that may indicate functional, structural, or evolutionary relationships between the sequences.

Single Nucleotide Polymorphism (SNP): A variation in a single nucleotide that occurs at a specific position in the genome, often used as markers in genetics research.

Structural biology: Study of the three-dimensional structures of biological molecules and macromolecules such as proteins, carbohydrates to gain insights into how these molecules function, and why mutations result in diseases.

Systems Biology: A holistic approach to understand living organisms. It seeks to understand the complex interactions between biological systems, instead of studying each system in isolation. Computational models, simulations are an integral part of this ap proach.

Transcription Factor: Protein involved in the process of converting, or transcribing, DNA into RNA.

Transcriptomics: The study of the complete set of RNA transcripts produced by the genome at any one time, often used to understand gene expression patterns.