discuss how DNA sequence data can show evolutionary relationships between species
Evolution: DNA Sequence Data and Species Relationships 🧬
1. Why DNA? 🌱
DNA is the universal “instruction manual” that tells every cell how to build and run the organism. Because every species carries its own DNA, we can read these instructions to see how closely related different species are, just like comparing family recipes to find out who cooked them first.
2. Comparing DNA Sequences 🔍
Scientists line up (align) DNA sequences from two or more species so that identical or similar bases line up vertically. This reveals patterns of matches and mismatches that indicate evolutionary changes.
- Alignment tools (e.g., ClustalW, MUSCLE) automatically arrange sequences.
- Gaps (‑) represent insertions or deletions (indels) that have occurred over time.
- High similarity → close relationship; low similarity → distant relationship.
3. Measuring Genetic Distance 📏
Once sequences are aligned, we count differences. The simplest measure is the proportion of differing sites:
$D = \frac{\text{Number of differing bases}}{\text{Total aligned bases}}$
For example, the human mitochondrial genome is about 98.8 % identical to that of the chimpanzee, giving a genetic distance of $D \approx 0.012$.
| Species | DNA Region | Identity (%) |
|---|---|---|
| Human | mtDNA COI | 98.8 |
| Chimpanzee | mtDNA COI | 98.8 |
| House Mouse | mtDNA COI | 87.5 |
4. Building Phylogenetic Trees 🌳
Phylogenetic trees are “family trees” that show how species diverged from common ancestors. Several algorithms help build these trees from genetic distance data.
- UPGMA (Unweighted Pair Group Method with Arithmetic Mean) – assumes a constant rate of evolution (molecular clock). Good for quick, approximate trees.
- Neighbor‑Joining – finds the tree that best fits the distance matrix, allowing varying rates.
- Maximum Likelihood – tests many possible trees and picks the one most likely to produce the observed data, but is computationally intensive.
5. The Molecular Clock ⏱️
If mutations accumulate at a roughly constant rate, we can estimate how long ago two species shared a common ancestor. The basic equation is:
$t = \frac{D}{2r}$
Where $t$ is time since divergence, $D$ is genetic distance, and $r$ is the mutation rate per million years. For example, if humans and chimpanzees have $D = 0.012$ and the mitochondrial mutation rate is $r = 0.01$ per million years, then:
$t = \frac{0.012}{2 \times 0.01} = 0.6$ million years ago (≈ 600 kyr). (Actual estimates are around 6–7 million years, showing the importance of calibrating the clock with fossil data.)
6. Real‑World Example: Bird Species 🐦
DNA barcoding uses a short region of the mitochondrial COI gene to identify species. Researchers sequenced COI from 50 bird species and found that:
| Species | COI Sequence (first 10 bases) | Closest Relative |
|---|---|---|
| Northern Cardinal | ATGACCTGAA | Cardinalis cardinalis |
| Blue Jay | ATGACCTGAA | Cyanocitta cristata |
| American Robin | ATGACCTGAA | Turdus migratorius |
The short COI “barcode” is like a unique fingerprint that lets scientists quickly spot which species a bird belongs to, even if it looks similar to another species.
7. Summary & Key Takeaways ✨
- DNA sequences are the raw data that reveal evolutionary relationships.
- Alignment + genetic distance → a quantitative measure of relatedness.
- Phylogenetic trees translate distances into visual family trees.
- The molecular clock lets us estimate divergence times, but requires careful calibration.
- DNA barcoding is a practical tool for species identification in the field.
Revision
Log in to practice.