pwshub.com

The fish with the genome 30 times larger than ours gets sequenced

Image of the front half of a fish, with a brown and cream pattern and long fins.

Enlarge / The African Lungfish, showing it's thin, wispy fins.

When it was first discovered, the coelacanth caused a lot of excitement. It was a living example of a group of fish that was thought to only exist as fossils. And not just any group of fish. With their long, stalk-like fins, coelacanths and their kin are thought to include the ancestors of all vertebrates that aren't fish—the tetrapods, or vertebrates with four limbs. Meaning, among a lot of other things, us.

Since then, however, evidence has piled up that we're more closely related to lungfish, which live in freshwater and are found in Africa, Australia, and South America. But lungfish are a bit weird. The African and South American species have seen the limb-like fins of their ancestors reduced to thin, floppy strands. And getting some perspective on their evolutionary history has proven difficult because they have the largest genomes known in animals, with the South American lungfish genome containing over 90 billion base pairs. That's 30 times the amount of DNA we have.

But new sequencing technology has made tackling that sort of challenge manageable, and an international collaboration has now completed the largest genome ever, one where all but one chromosome carry more DNA than is found in the human genome. The work points to a history where the South American lungfish has been adding 3 billion extra bases of DNA every 10 million years for the last 200 million years, all without adding a significant number of new genes. Instead, it seems to have lost the ability to keep junk DNA in check.

Going long

The work was enabled by a technology generically termed "long-read sequencing." Most of the genomes that were completed were done using short reads, typically in the area of 100–200 base pairs long. The secret was to do enough sequencing that, on average, every base in the genome should be sequenced multiple times. Given that, a cleverly designed computer program could figure out where two bits of sequence overlapped and register that as a single, longer piece of sequence, repeating the process until the computer spit out long strings of contiguous bases.

The problem is that most non-microbial species have stretches of repeated sequence (think hundreds of copies of the bases G and A in a row) that were longer than a few hundred bases long—and nearly identical sequences that show up in multiple locations of the genome. These would be impossible to match to a unique location, and so the output of the genome assembly software would have lots of gaps of unknown length and sequence.

This creates extreme difficulty for genomes like that of the lungfish, which is filled with non-functional "junk" DNA, all of which is typically repetitive. The software tends to produce a genome that's more gap than sequence.

Long-read technology gets around that by doing exactly what its name implies. Rather than being able to sequence fragments of 200 bases or so, it can generate sequences that are thousands of base pairs long, easily covering the entire repeat that would have otherwise created a gap. One early version of long-read technology involved stuffing long DNA molecules through pores and watching for different voltage changes across the pore as different bases passed through it. Another had a DNA copying enzyme make a duplicate of a long strand and watch for fluorescence changes as different bases were added. These early versions tended to be a bit error-prone but have since been improved, and several newer competing technologies are now on the market.

Back in 2021, researchers used this technology to complete the genome of the Australian lungfish—the one that maintains the limb-like fins of the ancestors that gave rise to tetrapods. Now they're back with the genomes from African and South American species. These species seem to have gone their separate ways during the breakup of the supercontinent Gondwana, a process that started nearly 200 million years ago. And having the genomes of all three should give us some perspective on the features that are common to all lungfish species, and thus are more likely to have been shared with the distant ancestors that gave rise to tetrapods.

Source: arstechnica.com

Related stories
1 week ago - You don't have to step out of your house to get the best fresh seafood thanks to these delivery services.
1 week ago - Stardew Valley can have a bit of a learning curve for new players. Here are a few tips to help you out on your first farm.
1 month ago - The Pixel 8 has a newer chip and will get software updates until 2030. But the Pixel 9 is right around the corner.
3 weeks ago - Why You Can Trust CNET Our expert, award-winning staff selects the products we cover and rigorously researches and tests our top picks. If you buy...
1 month ago - Don't get burned by flaky fish and fatty bacon. Here are seven foods should never go directly on the grates.
Other stories
22 minutes ago - Write better code, urges Jen Easterly. And while you're at it, give crime gangs horrible names like 'Evil Ferret' Software developers who ship buggy, insecure code are the real villains in the cyber crime story, according to Jen Easterly,...
1 hour ago - The Indian government has approved $2.7 billion in new spending for its space program.
1 hour ago - heard you like apps — Windows App replaces Microsoft Remote Desktop on macOS, iOS, and Android. Enlarge / The...
1 hour ago - LinkedIn limits opt-outs to future training, warns AI models may spout personal data.
1 hour ago - BUSTED — iServer provided a simple service for phishing credentials to unlock phones. Getty Images ...