Gone fishin’

We have been gibbering for many, many pages, at least nominally on the topic ‘Common Design vs Common Descent’. I’d like here to discuss an interesting fact I discovered during the course of this. I mentioned it twice, but readers seemed underwhelmed by what, to me, looks like a genuine scientific discovery (I haven’t searched exhaustively for priority). More importantly for present purposes, it provides a useful test-bed for several concepts that regularly do the rounds.

Sal Cordova had latterly been pursuing an attempt to demonstrate using freely available phylogenetic tools that tetrapods are not descended from aquatic, gilled vertebrates – what we conventionally call ‘fish’. In the process, Sal chose to use gene sequences of what turned out to be cytochrome oxidase (COX) – a component of the electron transport chain in mitochondria and in prokaryote plasma membranes. This molecule is ultimately what that breathing thing is all about – the molecular oxygen we inhale, in diatomic clumps, is held together by electrons shared between the atoms; COX donates electrons passed along from cytochrome c which enable the oxygen atoms to be separated; these react with protons (a different reaction for each of the two atoms) to form water. In the process, additional protons move across the membrane to generate an energetic imbalance that can be tapped, much like a pumped-storage hydro-electric scheme, to make ATP.

Sal thought he was looking at cytochrome c – this molecule is actually one step ‘upstream’ from COX, supplying it with the electrons which the latter passes on to oxygen. In eukaryotes, it so happens that the cytochrome c gene is in nuclear DNA, while COX is in mitochondrial DNA. This creates a significantly different evolutionary environment for the two genes, due to the proximity to reactive molecular species in the latter and the different dynamics resulting from recombination in the former. It would be unwise to draw too many evolutionary conclusions without an awareness of this distinction.

During the course of this discussion, Rumraket took some sequences of the actual cytochrome c, the molecule Sal thought he was looking at, and ran a Uniprot process to generate a phylogeny for a set of sequences. In an idle moment, I thought it might be interesting to explore the taxonomic neighbourhood a bit. It seems to me fundamentally inconsistent to dismiss phylogenetic inference on the grand scale while accepting it more locally. Of course it’s not clear whether any Creationists accept it locally either! I’ve tried repeatedly to get someone to say at what taxonomic level they think Common Descent stops being a valid inference; no luck so far.

So here’s what I did. I was interested to see what the cytochrome c’s of close relatives of the salmon looked like, rather than the grand-scheme analysis being pursued by the ‘I-ain’t-no-fish’ brigade. So I took Rumraket’s sequence for the Atlantic Salmon Salmo salar. The BLAST came up with several paralogs – gene duplicates – in the salmon, and orthologs – genes in separate species – with >90% sequence identity. I say this a lot, but the only process I am aware of that leads to sequence identities in anything like that ballpark is nucleic acid polymerisation. This mostly happens during cell replication (DNA-DNA) and gene transcription (DNA-RNA), somewhat less often by reverse transcription from RNA to DNA. So, absent another process that generates such extensive identity, common descent (at sequence level) has to be a front runner for the reason for any such highly alignable sequences.

So, here are the top few hits for Salmon cytochrome c sequence B5XFR7 (link here but be quick; I don’t think these are archived for long):

Rainbow Trout (Onchorhynchus mykiss) C1BFD3 99%
Salmon paralog B5DFW1 99%
Salmon paralog B9EMZ7 98.1%
Rainbow Trout paralog C1BGL1 98.1%
Rainbow Trout paralog C1BG90 95.2%
Salmon paralog B9EMJ0 94.2%
Rainbow Trout paralog C1BFB8 94.2%
Large Yellow Croaker (Larimichtys crocea) 94.2%
Sea louse (Caligus rogercresseyi) C1BPA2 93.3%
Rainbow smelt (Osmerus mordax) C1BKE6 93.3%

… ?

Whoa, back up a bit … the sea louse? That’s a crustacean. Now, I look up this organism and, it turns out, this is a parasite of both Rainbow Trout and Atlantic Salmon, particularly in Chile (yes, they farm Atlantic salmon in the Pacific). So, pending more work, a viable explanation for the high sequence identity with Salmon and Rainbow Trout is gene transfer from one or the other (hard to tell which). If not HGT, it was a remarkable coincidence that the words ‘salmon’ and ‘trout’ should leap off the page when I looked up the organism.

A naive user of free phylogenetic software might try and argue that this anomaly completely destroys the use of such software. After all, a phylogeny based only on cytochrome c might perch this crustacean slap dab in plaice in the middle of the Salmonids (it could also be used to show that salmon are more closely related to trout than they are to their own genes, in clumsy hands …). This is obviously a bit fishy, because we know (ask yourself: how do we know?) that it doesn’t belong there.

So, here are a few questions for Creationists to consider.

1) Do you accept that high sequence identity is reasonably inferred to indicate an origin in a single ancestral DNA sequence?

2) If so, do you agree that the paralogs – multiple copies in the same species – likely share common ancestry with an original single sequence?

3) How about within the family Salmonidae, to which both Atlantic Salmon and Rainbow trout belong (the trout is actually in the same genus as the various Pacific salmon)? Do you think sequence commonality highly likely indicates a genetic relationship here too?

4) Do you agree that the likeliest cause of high sequence identity in the sea louse version is due to gene transfer from salmon or trout, given its intimate association with these species?

5) What makes us think that the sequence in sea lice is anomalous, rather than indicating an actual whole-genome pattern, without even looking at the genome?

6) For broader genomic comparisons, collecting together multiple gene trees, where in the standard taxonomy would you place the division between the region that is reasonably inferred to result from common descent, and that which is not?

Another issue relates to function, and the need or otherwise for functional variation among species or among paralogs. As discussed, the task of cytochrome c and COX is chemically quite low-level – to pass electrons on down a chain. On a design paradigm, the fact that this function needs to vary at all is rather curious, unless we allow that variation can occur through largely neutral mutation. Even stranger is the fact that variation within a single species among its various paralogs is of the order of that between some orthologs in separate genera of the Salmonidae. If one answered ‘yes’ to 2 but ‘no’ to 3, one is saying that the same amount of sequence identity is diagnostic of sequence relationship + mutation in one case, but not the other. One might appeal to evolutionary unnattainability of a transition, but it doesn’t seem to me that the gap between salmon and trout is unbridgeable, starting from a common ancestor. After all, as noted, Pacific ‘salmon’ are more closely related to trout than they are to Atlantic Salmon (I’m curious no other Onchorhynchus species came up, but this may be due to lack of sequences). So, it seems plausible to me that all these gene variants are commonly descended, through fully connected paths of nucleic acid polymerisation.

This pattern of otherwise inexplicable variation of non-morphological genes is repeated throughout the genome and throughout nature. ‘Common-Designists’ would have it that there’s a Psilocybe way of dehydrogenating lactate, and a yeast way, and a bottlenosed dolphin way. And there’s a dandelion way of passing electrons on, and a squirrel way, and an E coli way, and a sea louse way … but hang on; the sea louse way is pretty close to the salmon way, in this gene copy at least. If this is an HGT event, it indicates that there is much less species-specificity in function than some would argue. If HGT, this is a natural experiment, placing a variant in a distant relative and seeing how it fares. Assuming it’s not a pseudogene, the sea louse is apparently able to use the cytochrome c of a very distant relative in addition to its existing repertoire, with no obvious adverse consequences. More work would establish which of its isoforms tend to be preferred, but nonetheless the bare fact rather goes against the expectation that these proteins are highly clade-specific for design reasons. And indeed it does seem much more likely that low-level function can be passed around without excessive penalty than more taxon-specific genes – the variants are mutational fluff, not critically functional. This is one reason why Amino-acyl synthetase phylogenies are actually quite poor – extensive HGT is possible, because really, how many different ways do you need to stick an amino acid on the ACC- end of a tRNA?

141 thoughts on “Gone fishin’

  1. colewd,

    This is the process that takes you from an egg to a fish.

    And therefore involves everything that keeps you alive. Very helpful, not.

    Allan: That is largely down to molecules that affect gene expression, not proteins like Cyt c.

    colewd: This is statement you continue to make despite the evidence. You are confirming that evolutionary indoctrination is a destructive filter in understanding advances in biology.

    This isn’t evolution, this is basic genetics. No ‘advance in biology’ related to the specificity of the cytochrome c isoform for the final form of the organism exists outside your own head.

    The only conclusion I can make here is that all you have is an assertion based on the evolutionary paradigm.

    You can conclude what you like, I have only so much time to waste on the terminally deaf.

    colewd: Your conclusion from this is?

    That the cytochrome c isoform is not involved in determining the final form of the organism. You have organisms as varied as humans and gibbons with the same isoform.

    Your argument consists of this: organisms vary, cytochrome c varies, therefore cytochrome c is involved in organismal variation. It’s pure garbage.

  2. Mung,

    Where would you suggest he look?

    Don’t really care; the thread’s not about that. Cytochrome c is just one of Bill’s trigger words.

  3. 1) Do you accept that high sequence identity is reasonably inferred to indicate an origin in a single ancestral DNA sequence?

    No.

    Even Rumraket knows mere similarity is not enough.

  4. Mung,

    No.

    Even Rumraket knows mere similarity is not enough.

    We are talking of high sequence identity rather than ‘mere similarity’, of course – highly aligned sequence.

    But … Rumraket knows it’s not enough to reasonably infer common origin? Let’s ask, shall we?

  5. Allan Miller,

    That the cytochrome c isoform is not involved in determining the final form of the organism. You have organisms as varied as humans and gibbons with the same isoform.

    Your argument consists of this: organisms vary, cytochrome c varies, therefore cytochrome c is involved in organismal variation. It’s pure garbage.

    How did you come up with this as my claim? Why are you invoking logical fallacies to make your point? If this is an error on my part or yours please point it out.

  6. colewd:
    Rumraket,

    Why would the re use of parts not show a pattern like this?In the case of cars new designs do not abandon older parts where those parts still do the job.

    I made a post just for you in the appropriate thread Bill. You can read it here.

    I submit that after you read it and think about it, and if you still don’t get why this “common design” or “re-using parts” or “design plan” nonsense doesn’t explain the nested hiearchy of life, then you are incapable of getting it.

  7. OK, I’ve been dangling my lure in the databases; results inconclusive. My ‘louse sequence’ has no other 100% hits on protein databases, via either the uniprot or NIH portals. It gets a 99% hit on the Salmo salar paralog noted. The difference is a T/C mutation in pos 2 of the codon for amino acid 93, leading to a valine/alanine switch – weakly conserved on acid property, assuming it’s a true mutation and not a read error.

    Since the louse sequence was generated from mRNA (synthesise DNA against the mRNA; sequence that DNA), you get a bit more than just the ORF (the part between START and STOP that becomes a protein), so you do get a little bit of what one might call ‘flanking sequence’. So I did a tBLASTn search which searches for possible reverse translations into nucleotides, recovering more of the mRNA either side of the ORF. Again I hit the salmon sequence mRNA, but got an extra mismatch, again a T/C switch, just beyond the STOP. So 2 differences in the mRNA vs all known sequences. But the rest of the outside-ORF mRNA was still fishily salmon-y – well if it is HGT, it’s quite a long sequence, and fairly recent.

    So … could well be salmon contamination, but inexplicably differs from salmon at 2 positions, taking the wider mRNA sequence. It could be salmon contamination but capturing a polymorphism compared to the ‘consensus’ salmon sequence. The salmon has a few paralogs, so might be able to withstand mutations more robustly.

    Could be contamination plus sequence error. A bit puzzled that the creators of the copepod ESTs may have just minced the whole organism without removing the gut.

    Could be contamination from a non-annotated species (less likely, since the Chilean company that sourced the materials farms Atlantic Salmon, Rainbow Trout and Coho Salmon, all in the database. But it could have got it from the wild shortly before capture, I guess, if they serially infest).

    Or … could be HGT.

  8. colewd,

    How did you come up with this as my claim? Why are you invoking logical fallacies to make your point? If this is an error on my part or yours please point it out.

    This is your claim as I understand it. Evidently, I have it wrong, but in my defence you are not a model of clarity.

    I argue that the variation in cytochrome c sequence among taxa is not related to the final form of the organism. You say ‘but what about apoptosis, and this poor mutated mouse?’. I can only interpret that to mean that you think the variation in cytochrome c and the variation in organismal form are causally coupled.

  9. Rumraket,

    First, thank you for the very thoughtful argument. I am going to spend some time with it but I have one question.

    So the designer copies the gene from the first organism, then slightly alters it in a particular location in order for it to function in a particular way. Then she does the same thing again for the third species. But does she take the first gene again and copy it, and then slightly alter it? Or does she take the second gene, the one that was already altered a bit? Either way, we end up with a third copy that is not identical to the 1st or 2nd copy. And so on and so forth, until we have ten copies of the gene, but all of them slightly different from each other.

    You are assuming here the design process to set up your argument. What if the process is different then you imagine? In the computer world with perfect simulation as an assumption we would create the widget in software based on design rules. The sequences would be interdependent and a copy an individual sequence and modify process is not what we would use.

  10. colewd:
    Rumraket,

    First, thank you for the very thoughtful argument.I am going to spend some time with it but I have one question.

    You are assuming here the design process to set up your argument.What if the process is different then you imagine.In the computer world with perfect simulation as an assumption we would create the widget in software based on design rules.The sequences would be interdependent and a copy an individual sequence and modify process is not what we would use.

    I will answer over in the other thread, let’s take it there as I don’t want to derail Allan’s thread more than it already is.

  11. Allan Miller,

    . I can only interpret that to mean that you think the variation in cytochrome c and the variation in organismal form are causally coupled.

    Yes, and please stick with this. The experiment does demonstrate this. The mouse head becomes deformed when you vary the sequence. By this you mean variation in organismal from is the same as no forehead or a deformity. If you mean the mouse turns into a rat, that is a very different claim.

  12. …you may have been inclined to think that it is obvious that similarity is evidence for common ancestry.

    …there is no intrinsic reason why similarity must count as evidence for common ancestry. Whether this is so depends on nontrivial empirical matters of fact.

    – Elliott Sober

  13. Allan Miller: We are talking of high sequence identity rather than ‘mere similarity’, of course – highly aligned sequence.

    A highly aligned sequence is when we take two sequences that are not highly aligned and snip out the unwanted bits until we have a highly aligned sequence?

  14. Mung,

    A highly aligned sequence is when we take two sequences that are not highly aligned and snip out the unwanted bits until we have a highly aligned sequence?

    No. A highly aligned sequence is one in which there is a high percentage of match. If 97% aligns, it is easy enough to work out that 3% does not. That information is not ‘thrown away’, it’s right there in the quoted figure.

  15. colewd,

    Yes, and please stick with this.

    I’d rather not, this is going nowhere.

    The experiment does demonstrate this. The mouse head becomes deformed when you vary the sequence. By this you mean variation in organismal from is the same as no forehead or a deformity. If you mean the mouse turns into a rat, that is a very different claim.

    If you starve a foetus of oxygen it produces developmental abnormality. Cytochrome c is involved in oxygen metabolism. You haven’t demonstrated, to any degree of rigour, that the apoptosis function of cytochrome c is causally linked to the deformity – and that deformity tells us nothing about cytochrome c isoforms, either in the mouse, or other animals, beyond that one isoform in that one organism. You are making a ridiculous extension out of fresh air,

    It doesn’t matter anyway, I only respond because I hate to see bullshit go unaddressed – the SIWOTI effect. In what way is this relevant to the thread, beyond the trigger word ‘cytochrome’?

  16. Mung,

    Are you sure Sober is talking about similarity at the molecular level – which is to say, identity interspersed with difference? A bit-jockey like yourself would recognise a significant distinction between digital and analogue renditions in this regard.

    Two pictures of the same mountain are ‘similar’, but long stretches of the same bitmap in two pictures would indicate something else entirely.

  17. Allan Miller:
    Mung,

    No. A highly aligned sequence is one in which there is a high percentage of match. If 97% aligns, it is easy enough to work out that 3% does not. That information is not ‘thrown away’, it’s right there in the quoted figure.

    Some times even the misalignments are phylogenetically informative. If a large portion of non-identical amino acids from an alignment between two sequences are only a single potential nucleotide substitution away in the genetic code, then they still indicate a degree of genealogical relationship.

  18. Rumraket,

    Some times even the misalignments are phylogenetically informative. If a large portion of non-identical amino acids from an alignment between two sequences are only a single potential nucleotide substitution away in the genetic code, then they still indicate a degree of genealogical relationship.

    Indeed. There is also more of a difference when the substitution leads to an acid with significantly different chemical properties than to one less remote. CLUSTAL gives a 4-band substitution score, indicated by 3 different punctuation marks in the alignment or a blank for the widest leap. And then there’s structural conservation …

  19. Still wondering about mechanisms. Assuming HGT happened in this case, how did it “manage to find” the gamete that bred to produce offspring carrying the sequence? HGT between multicellular organisms reproducing sexually seems to be a higher barrier than prokaryotes reproducing by fission.

  20. Alan Fox: Still wondering about mechanisms. Assuming HGT happened in this case, how did it “manage to find” the gamete that bred to produce offspring carrying the sequence? HGT between multicellular organisms reproducing sexually seems to be a higher barrier than prokaryotes reproducing by fission.

    Excellent question. I suppose it would need some sort of vector, like a virus (just indulging in some unashamed speculation here). Did you follow TomMueller’s link to Larry Moran’s Sandwalk?

  21. Alan Fox,

    Still wondering about mechanisms. Assuming HGT happened in this case, how did it “manage to find” the gamete that bred to produce offspring carrying the sequence? HGT between multicellular organisms reproducing sexually seems to be a higher barrier than prokaryotes reproducing by fission.

    Sorry, I did see this question earlier, forgot to say ‘don’t know’ 😉

    The barrier is more to do with the organism surrounding the gamete than the mode of transmission per se. But there is a time when there isn’t much organism surrounding the gamete – when it’s a fresh zygote or egg. Then, it is at least potentially open to invasion by nucleic acid – not just DNA, but RNA too. As with mitochondrial transfer, the nucleus is not all that hygienic about hoovering up fragments.

    I did notice that the rat contains a couple of Cyt c pseudogenes that appear to be reverse translated mRNA – they have fragments of the polyA tail, and no intron, unlike the functional gene. That’s obviously internal to a cell lineage, but it is not too much of a stretch to suppose that ‘foreign’ RNAs may occasionally get in and complete the integration by the same path.

    I did think to look for such evidence in my louse sequence, but unfortunately I’ve only got processed RNA to look at – this isn’t a genome sequence. I’m just having a squint at the salmon gene to see if it has an intron. But I’d really need a louse genome to compare.

  22. Looks to my amateur eye like the salmon gene has 2 introns, both outside the ORF. Searching genome databases splits the hits into 3 segments with gaps of 97 and 1677 bases respectively. The salmon mRNA is assembled 3-2-1 relative to the original positions on the chromosome of 1-2-3. There is no intron interrupt of the ORF, which is wholly contained within segment 2.

    There’s a bit of the ‘louse mRNA’ upstream of the ORF – the first 142 bits of segment 3 in the actual gene, segment 1 in processed mRNA – that is missing from the salmon RNA sequence I’d originally hit, which may not be significant in itself, and there is 1 additional difference between ‘louse mRNA’ and the salmon DNA gene in this sector: a single base deletion in the louse, just before the matched part of the mRNAs starts. However this too might be artefactual, due to accidental near-alignment of trailing sequence. I don’t know why the salmon EST in the database is missing this sector almost entirely but for a dozen or so bits though.

    So, there are 2, maybe 3 differences in total between the louse mRNA and the aligned parts of the salmon gene, plus a question mark on the difference in the transcript lengths.

  23. Alan Fox,

    Yes, interesting viewpoint. I agree that these things need to be evaluated critically, though there is nonetheless a perfectly sensible mechanistic pathway into organisms – even more so in the unicellular world, which is actually the bulk of Eukarya.

    HGT in multicellular eukaryotes may be over-egged, but it would be even more surprising if it were absent.

  24. Hum. If you only have ESTs from the critter, it’s hard to rule out contamination from the hosts (salmon / trout). If you have the sequences, maybe an all-vs-all comparison would tell you how likely it is that the allegedly critter’s sequence is the very same as that of one of the in-fish paralogs (rather than the identity against just one of the cyt-c proteins–or did you do this at the nucleotide level?).

  25. Alan Fox:
    Allan Miller,
    Here is a PDF of Bill Martin’s paper which is a bit skeptical of HGT in Eukaryotes. Might take me a while to get through it!

    Andrew J. Roger took exception to Bill Martin’s paper and accuses Martin of subtly shifting the goalposts.

    Andrew J. Roger addresses two questions:

    “Could it be that eukaryote LGT does not really exist to any significant extent in nature, but is an artefact produced by genome analysis pipielines?”

    and

    …there’s no reasonable mechanism for [eukaryotic] LGT as there is in bacteria”?

    Andrew J. Roger also addresses the “Lateral Late” suggestion.

    Good stuff – all found here:

    http://sandwalk.blogspot.ca/2017/11/lateral-gene-transfer-in-eukaryotes.html

  26. gmh.entropy,

    Yes, hard to rule out contamination, but there is a 1-bit difference in the ORF and another just after the STOP when I did a wider search for the whole EST sequence and compared to both salmon ESTs and whole-genome salmon data. Possibly another mRNA difference exists just before the ORF although this might be an artefact, as it’s in a region of overlap on the 3-range salmon genomic sequence. The segment in question is not in the salmon ESTs but is in the genome.

    There could of course be polymorphism or read error to contend with.

    Here’s the salmon paralog closest to the ‘sea louse’:

    MGDIAKGKKAFVQKCAQCHTVENGGKHKVGPNLWGLFGRKTGQAEGYSYTDANKSKGIIWETDTLMTYLENPKKYIPGTKMIFAGIKKKGERADLIAYLKSATS

    And the ‘sea louse’ EST version:

    MGDIAKGKKAFVQKCAQCHTVENGGKHKVGPNLWGLFGRKTGQAEGYSYTDANKSKGIIWETDTLMTYLENPKKYIPGTKMIFAGIKKKGERVDLIAYLKSATS

    The difference is 12 from the end.

  27. Allan Miller,

    I blasted the “louse” sequence you gave me against NCBI’s nr, and the “crustacean” sequence came first, then lots of salmon and the like. The “lice” entry has an “Unpublished” reference. When checking for the authors’ articles, I found one article. A quick look with some “find” attempts, and I found no mention of the cyt-c sequence or any astounding similarity between lice proteins and salmon proteins.

    In an older article, they mention that they isolated the lice from Salmon, which means that contamination cannot be ruled out. No mention of highly similar sequences between Salmon and lice either.

    I think they filtered out the sequence before analyses as contamination, which makes a lot of sense.

  28. gmh.entropy,

    Sure, but then they published the ESTs with it in. It may be that the differences from salmon caused this one to slip through the net.

    As I’ve mentioned before, I am perturbed that they just minced whole bodies. I don’t know if that’s standard practice, but I’d be for dissecting out the gut first.

  29. Allan Miller,

    I dunno. In the Drosophila business it was long the habit to squish up several whole flies to get a large enough extract. Of course that was back during the days of PCR when you could count on your primers not to amplify much alien DNA. These days, with direct sequencing of teensy samples, they probably just extract bits of particular tissues.

  30. Allan Miller: Sure, but then they published the ESTs with it in. It may be that the differences from salmon caused this one to slip through the net.

    I think they deposited the sequences before all the filtering was done (very common). A single difference would not escape filtering. Sequences come with errors, and thus, filtering out contamination is strict enough to account for such errors and, by extension, for natural variations.

  31. Allan Miller: I am perturbed that they just minced whole bodies. I don’t know if that’s standard practice, but I’d be for dissecting out the gut first.

    Me, too! 🙂
    A gut full of salmon (which is presumably what a sea louse attached to a salmon would have) should be dissected out before preparation for sequence analysis. How big are sea louse gonads?

  32. Alan Fox,

    How big are sea louse gonads?

    Well, I wouldn’t just use the gonads … but muscle tissue should be OK and dissectable. They did indicate that a number of individuals of different ages were used. They don’t start feeding until they attach, but presumably fish farm employees don’t have access to the junior stages, pre-infestation, so ‘different ages’ may not be comprehensive.

Leave a Reply