Creating CSI with NS

Imagine a coin-tossing game.  On each turn, players toss a fair coin 500 times.  As they do so, they record all runs of heads, so that if they toss H T T H H H T H T T H H H H T T T, they will record: 1, 3, 1, 4, representing the number of heads in each run.

At the end of each round, each player computes the product of their runs-of-heads.  The person with the highest product wins.

In addition, there is a House jackpot.  Any person whose product exceeds 1060 wins the House jackpot.

There are 2500 possible runs of coin-tosses.  However, I’m not sure exactly how many of that vast number of possible series would give a product exceeding 1060. However, if some bright mathematician can work it out for me, we can work out whether a series whose product exceeds 1060 has CSI.  My ballpark estimate says it has.

That means, clearly, that if we randomly generate many series of 500 coin-tosses, it is exceedingly unlikely, in the history of the universe, that we will get a product that exceeds 1060.

However, starting with a randomly generated population of, say 100 series, I propose to subject them to random point mutations and natural selection, whereby I will cull the 50 series with the lowest products, and produce “offspring”, with random point mutations from each of the survivors, and repeat this over many generations.

I’ve already reliably got to products exceeding 1058, but it’s possible that I may have got stuck in a local maximum.

However, before I go further: would an ID proponent like to tell me whether, if I succeed in hitting the jackpot, I have satisfactorily refuted Dembski’s case? And would a mathematician like to check the jackpot?

I’ve done it in MatLab, and will post the script below.  Sorry I don’t speak anything more geek-friendly than MatLab (well, a little Java, but MatLab is way easier for this).

529 thoughts on “Creating CSI with NS

  1. Joe G: LoL! Seeing that you are ignoring those two books, you don’t understand the argument.

    I’m not “ignoring them”, Joe. In fact NFL just plopped on to my doormat. But if Dembski can’t make himself clear in a peer-reviewed paper, then he’s not competent to publish in peer-reviewed journals. Academics shouldn’t need people to have read their books in order to understand their published papers. And to be fair, I don’t think Dembski does require this. You seem to be taking a kind of “scriptural” approach to the ID oeuvre. That’s not how science works – there isn’t a sacred text against which all claims have to be evaluated.

    However, I shall endeavour to read Dembski’s non-sacred text now I have a copy. I hope it’s better than Meyer’s prolix tome.

  2. Joe G:
    Liz- you are conflating specificity with CSI- you have only managed to create one part of CSI- the “S”

    Oh, for goodness’ sake, Joe. Read Dembski’s paper. Yes, I’ve created the “S”, thanks. To calculate the CSI aka the chi, you have need both the S (or the set of S) and the C. Both are fairly easy to calculate for a 500 length string of coin-tosses. The C is dead easy – it’s 2^500. If Olegt’s calculation of the set of S in my example is correct, chi of the winning string exceeds 1.

  3. Elizabeth: I’m not “ignoring them”, Joe.In fact NFL just plopped on to my doormat.But if Dembski can’t make himself clear in a peer-reviewed paper, then he’s not competent to publish in peer-reviewed journals.Academics shouldn’t need people to have read their books in order to understand their published papers.And to be fair, I don’t think Dembski does require this.You seem to be taking a kind of “scriptural” approach to the ID oeuvre.That’s not how science works – there isn’t a sacred text against which all claims have to be evaluated.

    However, I shall endeavour to read Dembski’s non-sacred text now I have a copy.I hope it’s better than Meyer’s prolix tome.

    That was a peer-reviewed papaer- the “specification” paper? Really??

    But anyway what Dembski and Meyer have to say is much better than anything any materialist has to say.

  4. Elizabeth: Oh, for goodness’ sake, Joe.Read Dembski’s paper.Yes, I’ve created the “S”, thanks.To calculate the CSI aka the chi, you have need both the S (or the set of S) and the C.Both are fairly easy to calculate for a 500 length string of coin-tosses.The C is dead easy – it’s 2^500.If Olegt’s calculation of the set of S in my example is correct, chi of the winning string exceeds 1.

    What is the information? What is the meaning/ function?

  5. What is the specified sequence required in order to start replication?

    Yes, read dembski’s paper and reference that part where he has the results of a coin toss replicating with variation.

  6. I see Dr. Dembski has put in an appearance at Uncommon Descent.

    Strange to say, he doesn’t appear to be aware of this thread here.

    Perhaps JoeG could give him a heads-up! Mind you, he might not be any too grateful.

    (I’d do it myself, but I’m a thrice-banned persona non grata there.Never let it be said I can’t take a hint!)

  7. Joe G: That was a peer-reviewed papaer- the “specification” paper? Really??

    No, not that one. The NFL ones he published with Marks.

    You the think I shouldn’t take the Specification paper seriously because it wasn’t peer-reviewed?

  8. Joe G: What is the information? What is the meaning/ function?

    The function is: high product of runs-of-heads.

  9. junkdnaforlife:
    Since Dr. Liddle has yet to post her string, and appears to be accepting Patrick’s
    string as a substitute, then we will continue with Patrick’s string for another look at the results:
    ——————————————————-
    1) Regulates thin filament length in mice

    KADLQWLKGI GWIPSGSLED EKNKRATQIL SDHVYRQHPN KFKFSSLMDS MPMVLAKNNA ITMNHHLYTE AWNKDKTTVH IMPDTPEVIL AKQNKVNYSE KLYKLGLEEA KRKGYDMRLD AIPIKTAKAS RDIASDFKYK EGYRKQLGHH IGARGIHDDP KMMWSMHVAK IQSDREYKKD FEKWKTKYSSPVDMLGVVLA KKCQTLVSDA DYRNYLHQWT CLPDQNDVIH ARQAYDLQSD NMYKSDLQWM RGIGWVPIGS LDVEKCKRAT EILSDKIYRQ PPDKFKFTSV TDSLEQVLAK NNAITMNKRL YTEAWDKDKT QIHIMPDTPE IMLARMNKVN YSESLYKLAN EEAKKKGYDL RSDAIPIVAA KASRDIASDY KYKDGYRKQL GHHIGARNIK DDPKMMWSMH VAKIQSDREY KKDFEKWKTK YSSPVDMLGV VLAKKCQILV SDIDYKHPLH EWTCLPDQND VIHARQAYDL
    QSDNVYKSDL QWMRGIGWVP IGSLDVVKCK RAAEILSDNI YRQPPNKFKF TSVTDSLEQV LAKSNALNMN KRLYTEAWDK DKIQIHVMPD TPEIMLARQN RINYSESLYK LANEEAKKKG YDLRVDAIPI VAAKASRDIA SDYKYKVGYR KQLGHHIGAR NIEDDPKMMW SMHVAKIQSD REYKKDFEKW KTKYSSPVDM LGVVLAKKCQ TLVSDVDYRN YLHQWTCLPD QNDVIHARQA YDLQSDNVYK SDLQWMRGIG WVPIGSLDVV KCKRAAEILS DNIYRQPPNK FKFTSVTDSL EQVLAKSNAL NMNKRLYTEA WDKDKTQIHI MPDTPEIM

    ————————————————-
    2) 3 1′s then 3 0′s repeat to 960

    111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000
    111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000
    111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000
    111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000 111000

    ——————————————————-

    3) 4 1′s then 0, 3 1′s then 0 twice, 4 1′s then 0 twice, five 1′s, 0 then 4 1′s three times, 0 then 5 1′s, 0 then 3 1′s three times, 0 then four 1′s twice, 0 then five 1′s, 0 then 4 1′s five times, 0 then three 1′s, 0 then 4 1′s three times, 0 then 3 1′s, 0 then four 1′s three times, 0 then three 1′s, 0 then 4 1′s three times, 0 then 3 1′s three times, 0 then 4 1′s, 0 then 5 1′s, 0 then 4 1′s twice,
    0 then 3 1′s, 0 then 4 1′s twice, 0 then 3 1′s, 0 then 4 1′s twice, 0 then 3 1′s, 0 then 4 1′s, 0 then 3 1′s 3x, 0 then 4 1′s, 0 then 3 1′s, 0 then 4 1′s, 0 then 3 1′s 2x, 0 then 4 1′s 9x, 0 then 3 1′s, 0 then 4 1′s 4x, 0 then 3 1′s, 0 then 4 1′s 3x, 0 then three 1′s, 0 then 4 1′s, 0 then 5 1′s, 0 then 4 1′s, 0 then 3 1′s, 0 then 4 1′s 3x, 0 then 3 1′s, 0 then 4 1′s twice, 0 then 3 1′s 3x, 0 then 4 1′s 3x, 0 then 5 1′s, 0 then 4 1′s, 0 then 3 1′s, 0 then 4 1′s 3x, 0 then 3 1′s, 0 then 4 1′s, 0 then 3 1′s, 0 then 4 1′s, 0 then 3 1′s

    1111011101110111101111011111011110111101111011111 011101110111 0111101111 011111 01111 01111 01111 01111 01111 0111 011110111101111 0111 011110111101111 0111 011110111101111 011101110111 01111 011111 0111101111 0111 0111101111 0111 01111 011101110111 01111 0111 01111
    01110111 011110111101111011110111101111011110111101111 0111 01111011110111101111 0111 011110111101111 011101111011111 01111 0111 011110111101111 0111 0111101111 011101110111 0111 011101110111 011110111101111 011111 011110111101111 0111 011110111101111 0111 01111 01110111 011110111

    —-

    Here we can examine, with striking clarity, the differences in specificity. Despite the illusion of a pattern, we can clearly identify the differing degrees of
    specification when examined closely.

    Again I will leave it to the fair observer as to whether or not the criterion of specificity has been established in string 3.

    Why do you keep ignoring my responses to your posts?

    I kept pointing out, and will point out again, that what you propose as a *description* , in meaning equivalent to the descriptions you use in #2 and #3, of your first string is no such thing.

    A description of string #1, equivalent to the description preceding the string in #2 and #3, would be: one K, then one A, then one D, ……., which appreciably ends up being longer than the string itself.

    By contrast, your expression preceding the string in #1 is a property of the string. Equivalently, a property of string #3 would be *product of H-runs exceeds 10^60*.

  10. Joe G: Please provide the citation that states the intelligent designer had or needed an origin.

    I’ll wait.

    I’m not claiming that there is or was “the designer”. You are, and since you’re also claiming that ID/CSI is all about origins it’s your burden to explain and produce testable evidence and a testable hypothesis (you know, those pesky things that you demand from science) of/for the ultimate origin of everything. You’re always saying that nature and evolution (and the scientific explanation of them) depends on origins, including the origin of the universe, but when it comes to ID/CSI, and “the designer”, and the origin of “the designer”, you want to be able to stop at a convenient point and no questions should be asked. Go figure.

  11. madbat, If we were handed a specific AA sequence, or the function
    associated with that AA sequence, we should be able to recover the specific sequence from the function, and the function from the specific sequence, without any knowledge of the other. As madbat had previously demonstrated, for without any knowledge of the protein associated with the function I am using as an example, he/she was able to recover the exact Nebulin protein, by either searching for the function or the AA sequence, in a post whereas he/she was protesting the very degree of specificity of that he/she had just demonstrated! (Just as we can recover the specific coin toss sequence from the description, and the description from the specific coin toss sequence.)

    For example, if we hand the phrase, high product of runs-of-heads to a scientist, and tell them to recover the specific 500 bit sequence associated with it, we should not be alarmed when they fail to recover the specific sequence. However if we hand the scientist, regulates thin filament length in mice, and tell them to recover the specific 1358 character long sequence associated with it, we should not be alarmed when they hand us back the specific Nebulin protein Amino acid sequence (as madbat had demonstrated). This is an example of the criterion of specificity.

    So with Dr.Liddle’s submitted results, (which are included in this list along with Patrick’s), we can examine, once again, degrees of specificity.

    1)

    Regulates thin filament length in mice

    KADLQWLKGI GWIPSGSLED EKNKRATQIL SDHVYRQHPN
    KFKFSSLMDS MPMVLAKNNA ITMNHHLYTE AWNKDKTTVH
    IMPDTPEVIL AKQNKVNYSE KLYKLGLEEA KRKGYDMRLD
    AIPIKTAKAS RDIASDFKYK EGYRKQLGHH IGARGIHDDP
    KMMWSMHVAK IQSDREYKKD FEKWKTKYSS PVDMLGVVLA KKCQTLVSDA DYRNYLHQWT CLPDQNDVIH ARQAYDLQSD NMYKSDLQWM RGIGWVPIGS LDVEKCKRAT EILSDKIYRQ
    PPDKFKFTSV TDSLEQVLAK NNAITMNKRL YTEAWDKDKT
    QIHIMPDTPE IMLARMNKVN YSESLYKLAN EEAKKKGYDL
    RSDAIPIVAA KASRDIASDY KYKDGYRKQL GHHIGARNIK
    DDPKMMWSMH VAKIQSDREY KKDFEKWKTK YSSPVDMLGV VLAKKCQILV SDIDYKHPLH EWTCLPDQND VIHARQAYDL
    QSDNVYKSDL QWMRGIGWVP IGSLDVVKCK RAAEILSDNI
    YRQPPNKFKF TSVTDSLEQV LAKSNALNMN KRLYTEAWDK
    DKIQIHVMPD TPEIMLARQN RINYSESLYK LANEEAKKKG YDLRVDAIPI VAAKASRDIA SDYKYKVGYR KQLGHHIGAR NIEDDPKMMW
    SMHVAKIQSD REYKKDFEKW KTKYSSPVDM LGVVLAKKCQ TLVSDVDYRN YLHQWTCLPD QNDVIHARQA YDLQSDNVYK SDLQWMRGIG WVPIGSLDVV KCKRAAEILS DNIYRQPPNK
    FKFTSVTDSL EQVLAKSNAL NMNKRLYTEA WDKDKTQIHI MPDTPEIM

    ———————————————-
    2)
    3 1’s then 3 0’s repeat to 960

    111000 111000 111000 111000 111000 111000 111000 111000
    111000 111000 111000 111000 111000 111000 111000 111000
    111000 111000 111000 111000 111000 111000 111000 111000
    111000 111000 111000 111000 111000 111000 111000 111000
    111000 111000 111000 111000 111000 111000 111000 111000
    111000 111000 111000 111000 111000 111000 111000 111000
    111000 111000 111000 111000 111000 111000 111000 111000
    111000 111000 111000 111000 111000 111000 111000 111000
    111000 111000 111000 111000 111000 111000 111000 111000
    111000 111000 111000 111000 111000 111000 111000 111000
    111000 111000 111000 111000 111000 111000 111000 111000
    111000 111000 111000 111000 111000 111000 111000 111000
    111000 111000 111000 111000 111000 111000 111000 111000
    111000 111000 111000 111000 111000 111000 111000 111000
    111000 111000 111000 111000 111000 111000 111000 111000
    111000 111000 111000 111000 111000 111000 111000 111000
    111000 111000 111000 111000 111000 111000 111000 111000
    111000 111000 111000 111000 111000 111000 111000 111000
    111000 111000 111000 111000 111000 111000 111000 111000
    111000 111000 111000 111000 111000 111000 111000 111000
    ——————————————————–
    3)
    4 1’s then 0, 3 1’s then 0 twice, 4 1’s then 0 twice, five 1’s,
    0 then 4 1’s three times, 0 then 5 1’s, 0 then 3 1’s three times,
    0 then four 1’s twice, 0 then five 1’s, 0 then 4 1’s five times,
    0 then three 1’s, 0 then 4 1’s three times, 0 then 3 1’s, 0 then
    four 1’s three times, 0 then three 1’s, 0 then 4 1’s three times,
    0 then 3 1’s three times, 0 then 4 1’s, 0 then 5 1’s, 0 then 4 1’s twice,
    0 then 3 1’s, 0 then 4 1’s twice, 0 then 3 1’s, 0 then 4 1’s twice,
    0 then 3 1’s, 0 then 4 1’s, 0 then 3 1’s 3x, 0 then 4 1’s, 0 then 3 1’s,
    0 then 4 1’s, 0 then 3 1’s 2x, 0 then 4 1’s 9x, 0 then 3 1’s,
    0 then 4 1’s 4x, 0 then 3 1’s, 0 then 4 1’s 3x, 0 then three 1’s,
    0 then 4 1’s, 0 then 5 1’s, 0 then 4 1’s, 0 then 3 1’s, 0 then 4 1’s 3x,
    0 then 3 1’s, 0 then 4 1’s twice, 0 then 3 1’s 3x, 0 then 4 1’s 3x,
    0 then 5 1’s, 0 then 4 1’s, 0 then 3 1’s, 0 then 4 1’s 3x, 0 then 3 1’s,
    0 then 4 1’s, 0 then 3 1’s, 0 then 4 1’s, 0 then 3 1’s

    11110111011101111011110111110111101111011110111110
    111 01110111 0111101111 011111 01111 01111 01111 01111
    01111 0111 011110111101111 0111 011110111101111 0111
    011110111101111 011101110111 01111 011111 0111101111
    0111 0111101111 0111 01111 011101110111 01111 0111 01111
    01110111 011110111101111011110111101111011110111101111
    0111 01111011110111101111 0111 011110111101111 011101111011111
    01111 0111 011110111101111 0111 0111101111 011101110111 01111
    011101110111 011110111101111 011111 011110111101111 0111
    011110111101111 0111 01111 01110111 011110111
    —————————————————-
    4) 4 1’s, 0 then 4 1’s eight times, 0 then 3 1’s twice, 0 then 4 1’s five times, 0 then 4 1’s, 0 then 3 1’s, 0 then 4 1’s, 0 then 3 1’s, 0 then 4 1’s five times,
    0 then 3 1’s three times, 0 then 4 1’s, 0 then 3 1’s twice, 0 then 4 1’s, 0 then
    three 1’s three times, 0 then 4 1’s twice, 0 then 3 1’s, 0 then 4 1’s five times,
    0 then 3 1’s, 0 then 4 1’s, 0 then 3 1’s, 0 then 4 1’s eleven times, 0 then 3 1’s,
    0 then 4 1’s twice, 0 then 3 1’s, 0 then 4 1’s three times, 0 then 3 1’s, 0 then 4 1’s three times, 0 then 3 1’s, 0 then 4 1’s five times, 0 then 3 1’s, 0 then 4 1’s six times, 0 then 3 1’s three times, 0 then 4 1’s twelve times, 0 then 3 1’s, o then 4 1’s five times, 0 then 3 1’s, 0 then 4 1’s five times, 0 then 3 1’s, 0

    1111 0111101111011110111101111011110111101111 01110111 0111101111011110111101111 01111 0111 01111 0111 01111 01111011110111101111 0111011101110111 01111 01110111
    01111 0111 01111 0111 0111101111 0111 011110111101111
    01111 0111 01111 0111 011110111101111011110111101111
    0111101111011110111101111 0111 0111101111 0111 0111101111
    01111 0111 0111101111011110111101111 0111 0111101111
    01111011110111101111 011101110111 01111011110111101111
    0111101111011110111101111011110111101111 0111 01111
    01111011110111101111 0111 01111011110111101111011110111 0

    // First two examples are character lengths of 1358 and 960 respectively, and
    the last two are 500. Again I will leave it to the fair observer as to whether or not the criterion of specificity has been met.

  12. junkdnaforlife:

    Could you answer the question I asked earlier: What are your criteria for specificity in the aa example and in my list of runs-of-heads?

    My current best genome at present has this list of runs-of-heads:

    4 4 4 4 4 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 3 4 4 4 4 4 4 4 4 3 4 4 4 4 4 3 4 4 4 4 3 3 4 4 4 4 4 4 4 3 4 4 4 4 4 4 3 4 4 4 4 4 4 4 4 4 4 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 3 4 4

    Under the null we have something like this:

    3 1 2 1 1 3 2 1 1 4 1 6 1 4 1 5 2 1 2 1 2 3 2 1 1 2 4 1 3 2 3 2 2 2 3 1 2 5 2 2 1 3 2 1 2 3 3 3 2 2 1 2 1 1 3 2 4 4 3 1 1 2 1 1 2 3 3 2 2 2 1 3 3 1 1 2 4 2 4 2 2 2 3 1 1 1 1 1 2 1 2 3 2 1 2 5 1 2 1 1 1 10 2 6 1 2 2 1 2 1 1 1 1 1 2 5 3 1 2 5 2 1 2 1 2

    The second distribution is what you’d expect from a poisson process. The first clearly is not.

    Applying a functionality criterion to both mouse protein and to my output, we have chi>1 for mine (the output is one of a tiny proportion of possible outputs that reach, or exceed, that degree of functionality), but you have not provided a chi value for your mouse protein. It could well also exceed chi>1, but I am not seeing the calculation; in any case it does not affect my argument, which is, simply, that natural selection can raise chi from that expected from a chance process to a value >1.

    Applying a compressibility criterion to both sequences, I suggest that mine appears more compressible than your mouse aa sequence. How would you compute the compressibility of your mouse aa sequence?

    I will attempt to compute the compressibility of my sequence and the number of sequence as, or more, compressible, but I think I might let it run on a bit further first, and see if the number of 3s goes down a bit more 🙂

  13. Equating amino acid strings to ‘functionality’ by compressibility criteria is an odd strategy. The fact that someone can Google or Blast the symbolic representation of a mouse filament protein says nothing about underlying functionality, nor about the chemistry represented by those symbols. One could say “mouse filament protein”, or “MFP”, or anything else – it would always be a symbolic representation of something whose ‘function’ does not reside in its description – even if that linear sequence of real amino acids has to be ‘right’. Linearity is important because proteins are linearly constructed, from a linear ‘specification’. But it is but a procedural matter, just as much as the regulatory sequences and their molecular triggers that determine expression. There is a real globular protein, folded by physics which, if expressed in the right cells at the right time, has a ‘useful’ effect in mice. You could disrupt the folding of the chain and remove its function. If you do it gently enough, physics alone will cause it to ‘sproing’ back into the right configuration. You could take the same specification and get E Coli to make it, and it would then not have the same function. You could inject their version into mice, and it still would not have the function, because it is a cellular matter, not an organismal one.

    Myriad proteins, with exactly the same amount of any metric you could apply by evaluating their string-representations (even taking account of protein folding), will have vastly different effects on the organism, all judged entirely in the light of context – local (cellular), organismal and environmental.

    One could make 64-bit ‘string-representations’ of chess positions, square to neighbouring square. If one’s rule of functionality was “position reachable by legal moves from the standard starting position wR-wKt-wB-etc…”, you couldn’t evaluate ‘CSI strings’ – strings that one would use to determine the likelihood that Intelligent Players had been involved – just by evaluating the linear sequence. But you could get to supposedly CSI strings by ‘blind and undirected processes’ – stepwise evolution from the starting position – if selection prevented ‘illegal’ moves, restricting the search space.

    Linear representation is a red herring.

  14. I’d like to see someone reconstruct a coding sequence from function or vice versa without having a dictionary, such as BLAST.

    Perhaps Mr junk can tell us how to do this with a novel sequence or function. Something a Designer would have to do.

  15. junkdnaforlife:
    madbat, If we were handed a specific AA sequence, or the function
    associated with that AA sequence, we should be able to recover the specific sequence from the function, and the function from the specific sequence, without any knowledge of the other. As madbat had previously demonstrated, for without any knowledge of the protein associated with the function I am using as an example, he/she was able to recover the exact Nebulin protein, by either searching for the function or the AA sequence, in a post whereas he/she was protesting the very degree of specificity of that he/she had just demonstrated! (Just as we can recover the specific coin toss sequence from the description, and the description from the specific coin toss sequence.)

    For example, if we hand the phrase, high product of runs-of-heads to a scientist, and tell them to recover the specific 500 bit sequence associated with it, we should not be alarmed when they fail to recover the specific sequence. However if we hand the scientist, regulates thin filament length in mice, and tell them to recover the specific 1358 character long sequence associated with it, we should not be alarmed when they hand us back the specific Nebulin protein Amino acid sequence (as madbat had demonstrated). This is an example of the criterion of specificity.

    Facepalm. How do you think I *recovered* the information that one of the proteins (the one that you showed the amino acid sequence of) that regulates thin filament length in mice is called Nebulin? I googled it!

    Let me guess. That’s the exact same way you came up with the function for a protein and it’s amino acid sequence – you googled it.

    Let me further guess. The reason why you all of a sudden moved your goal posts from a DNA sequence to an amino acid sequence was either: you were UNABLE to retrieve a DNA sequence, or: upon googling, you found out WHY you were unable to retrieve a DNA sequence, and that there are far more than ONE possible DNA sequences that could code for a known amino acid sequence (especially in eukaryotes) – oops!

    Let me further guess. The reason why you all of a sudden swapped from the function *interacts with the glucocorticoid receptor* to *regulates thin filament length in mice* is that upon googling it, you found out that at least 26 different proteins to date are known to interact with the glucocorticoid receptor – oops!

    Let me further guess. You didn’t look into any actual literature about regulation of thin filament length in mice. If you would have, you would have found out that there are a lot more proteins than just Nebulin involved in regulation of thin filament length. So if you ask a researcher who has either worked on this topic or has access to protein databases (and for some protein databases, all one needs to have access to is google, as you and I have demonstrated), they may hand you back an entire bushel of proteins and, IF they are known already, their amino acid sequences – oops!

    So, your strange argument about specificity just came apart at the seams here.

    The only reason that anyone is able to retrieve an amino acid sequence for a protein with a specific function is that there are databases listing these. These databases are built through intense research of identifying proteins present in specific processes, isolating them, sequencing, performing knock-out experiments, studying their biochemical properties and interactions with other molecules in the structures and processes they are found, etc. Proteins and their sequences are NOT predictable from postulating a function. Neither is the function (more likely: functions in the plural) of a protein predictable from looking at its sequence.

    Moreover, if you hand the phrase *high product of runs-of-heads* to anyone with internet access, you should not be alarmed if they recover the specific 500 bit sequence generated by Patrick that you keep reproducing here. You know why? Because there is an online database now that they can use for recovery: this discussion thread! And Google WILL find it! Just like it found the protein database page listing Nebulin when I googled the amino acid sequence you gave: because someone put an online database up!

  16. Yes. Exactly. I am running out of ways trying to explain this to junkdnaforlife.

  17. The only reason that anyone is able to retrieve an amino acid sequence for a protein with a specific function is that there are databases listing these.

    Pretty much consistent with my main objection to ID.

    If coding sequences are truly isolated like cipher keys or lock combinations, how did the designer acquire them, and if their specifications exceed the computing resources of the universe, where and how does the designer store them?

    The answer is that intelligence=magic. ID is an appeal to magic.

    The alternative is that coding sequences are not isolated. Something confirmed by all available research, including Axe’s.

  18. junkdnaforlife:
    madbat, If we were handed a specific AA sequence, or the function
    associated with that AA sequence, we should be able to recover the specific sequence from the function, and the function from the specific sequence, without any knowledge of the other. As madbat had previously demonstrated, for without any knowledge of the protein associated with the function I am using as an example, he/she was able to recover the exact Nebulin protein, by either searching for the function or the AA sequence,

    And just in case this is still not abundantly clear from everything else I and others here have said: I was ONLY able to identify that the protein you were using as an example was Nebulin because you gave me BOTH (1) a sequence that may be an amino acid sequence, AND (2) one of the functions we currently know to be associated with a protein coded by an amino acid sequence that common protein databases symbolize by that sequence.

    Contrary to your assertion: (a) the sequence of letters by itself is completely meaningless; (b) with the added knowledge that the sequence symbolizes an amino acid sequence we can synthesize the protein, but we know absolutely nothing about what its function may be in any given biological context, or even if it HAS any function in any biological context; (c) with the added knowledge that one of the functions in the context of thin filament production in mice it has is contributing to the length of those filaments, we still know absolutely nothing about what other functions it may have in other biological contexts, or even what additional functions it may have in the SAME context.

    Contrary to your assertion: (a) *regulates thin filament length in mice* tells me absolutely nothing what category of phenomenon you are even talking about (chemical gradients in a muscle cell, nutritional state of the mouse, behavior/hormone interactions, enzymatic activity?); (b) with the added knowledge that the phenomenon in question is a regulatory protein, we still have absolutely no idea what a protein that could regulate thin filament length in mice may look like, let alone what its amino acid sequence would be; (c) with the added knowledge that there is one specific protein with an identified sequence that contributes to regulation of thin filament length in mice, we still have absolutely no idea how many other proteins with different sequences may either replace said protein by serving the exact same function, or contribute to regulation of thin filament length in mice; (d) with the theoretical knowledge of all proteins that alternatively and / or in concert contribute to regulation of thin filament length in mice, there is obviously way more than one specific amino acid sequence we would now recover.

  19. What I find sort of interesting, is that none of the objections to my falsifications are those raised by Dembski, who seems happy to concede that exercises such as mine result in patterns that exhibit CSI with chi>1. But he would, I guess, lump it in with “Weasel” as being the result of an algorithm in which the pattern is embodied somehow in the algorithm itself, in the form of the fitness function.

    So he would (correctly, as it happens, in this case) conclude “ID” from my pattern, on the grounds that it must have been the result of an intelligently selected fitness function.

    I think there is a gaping hole in this argument, but I’m sort of surprised no-one has made it here (or, if they have, not as clearly as Dembski does).

    junkdnaforlife hints at it, but has not yet come up with specification criteria for the aa output and the output from my exercise that are comparable. I hope s/he will do so.

  20. Elizabeth: What I find sort of interesting, is that none of the objections to my falsifications are those raised by Dembski, who seems happy to concede that exercises such as mine result in patterns that exhibit CSI with chi>1. But he would, I guess, lump it in with “Weasel” as being the result of an algorithm in which the pattern is embodied somehow in the algorithm itself, in the form of the fitness function.

    Exactly. I find it astonishing that the ID fans, who swear by Dembski’s books, have a pretty foggy idea about their actual contents. No Free Lunch, page 193, guys.

  21. Unless I’m reading this wrong, this is where Dembski makes his infamous misinterpretation of Weasel. The so-called latching claim.

  22. Hasn’t anyone in the ID camp noticed that the laws governing chemistry are rather thoroughly specified, and when they passively become the oracle, they specify the fitness function?

  23. petrushka:
    Unless I’m reading this wrong, this is where Dembski makes his infamous misinterpretation of Weasel. The so-called latching claim.

    Yes, he does. But that’s a minor gaffe, relatively speaking.

  24. olegt: Yes, he does. But that’s a minor gaffe, relatively speaking.

    But still annoying in that IDists continue to make this claim, even after it has been pointed out repeatedly that the weasel algorithm does not employ latching.

  25. Norm Olsen: olegt: Yes, he does. But that’s a minor gaffe, relatively speaking.
    But still annoying in that IDists continue to make this claim, even after it has been pointed out repeatedly that the weasel algorithm does not employ latching.

    Aside from that “latching issue” when misrepresenting Dawkins’ Weasel program, Dembski really reveals his profound ignorance of chemistry and physics in just those few short paragraphs. He really doesn’t get it, and as a result, his worshiping followers don’t get it either. When “The Dr.Dr. Dembski” speaks, all else must be false.

  26. The “latching” issue is a minor matter. Dembski and others inflated it into a Big Deal by raising the issue, implying (but not saying explicitly) that it was an important property that was critical to explaining why the Weasel algorithm reached is goal so much faster than blind search.

    In fact, as others here have noted: (1) the Weasel algorithm didn’t “latch”, and (2) even if it had done so, that would have made little difference to the effectiveness of the search. So the whole issue was bogus.

  27. Yes, he does. But that’s a minor gaffe, relatively speaking.

    Considering he not only mischaracterizes the algorithm, but completely mischaracterizes the relative effectiveness of the two searches, I’d say it illustrates a high degree of mathematical incompetence. He obviously didn’t test the two versions.

    the mischaracterization of chemistry is in my opinion, something you can get away with before the research comes in. It’s another gaps argument. Isolated islands and all that.

  28. petrushka: Considering he not only mischaracterizes the algorithm, but completely mischaracterizes the relative effectiveness of the two searches

    Lighten up, guys. It’s hilarious to read how Dembski comes up with this shiny new algorithm E and says, with the raised pinky, that “unlike Dawkins’s original algorithm, E is doing everything on the up and up.” Not realizing that E is an exact copy of Dawkins’s Weasel. This episode should be viewed as comic relief, not a reason to complain.

  29. I assume that’s why they fought so long at UD over latching. Dembski said that a non latching Weasel would be a illegitimate model of evolution. One of the UD denizens actually wrote and published a non-latching Weasel. I haven’t seen him in long time.

  30. olegt: Lighten up, guys. It’s hilarious to read how Dembski comes up with this shiny new algorithm E and says, with the raised pinky, that “unlike Dawkins’s original algorithm, E is doing everything on the up and up.” Not realizing that E is an exact copy of Dawkins’s Weasel. This episode should be viewed as comic relief, not a reason to complain.

    Indeed; but it is funnier still when he tries to explain it all away, especially on page 195.

    I have written just such a program in which the target changes in the middle of an evolutionary run. The population just tags along. My my; who would have thought? 🙂

  31. There appears to be some that have associated the Google search engine with the degree of specificity that exists between an amino acid sequence, dna sequence, protein and function. Madbat, you say: “Proteins and their sequences are NOT predictable from postulating a function.” And here we actually agree on something. When dealing with a currently unknown protein, rather than starting with a function and determining sequence, it is start with sequence and determine function. From Wiley’s site: “We developed and evaluated a new predictor of protein function,functional annotator (FANN), from amino acid sequence.” (*) onlinelibrary.wiley.com/doi/10.1002/prot.23029/abstract
    // But your objections as to whether a function has 26, 3, or k possible sequences seems to be statistically insignificant (as long as k is not a large value) when we are dealing with a much larger value of n, (all possible sequences).
    But this post is now circling back where it started.

    There is a very large number of functions that will add weight to an element in a binary population and output a set. There are far fewer amino acid sequences that will regulate thin filament length in mice. These are differences in degrees of specificity. If k is the amount of fitness functions that will shift frequencies in a binary population, and n is all possible fitness function sequences, and if we allow c to represent the amount of amino acid sequences that will regulate thin filament length in mice, and call m all possible amino acid sequences, then k represents a far greater value then c, and it simply follows that k is far less specific. The aa sequence has such a high degree of specificity that the 1358 character sequence can be simply described as, regulates thin filament length in mice. “Maximizes the product of head runs,” on the other hand, refers to an extraordinarily large amount of possible fitness function sequences which depending on the angle you approach it can vary by several orders of magnitude. [Liz, I do not understand how you are calling your string a function. Your function is the string that sets the weight to H (+1) along with the string that is adding the results. To test this, erase 200 bits from your string and see if the program halts. Then go into your Matlab and clip out a block of code and see if the program function halts. If you knock out some coding sequence in your Matlab the program will halt, in the same way knocking out amino acid sequences will halt the protein activity. The amino acid sequence/function relationship
    is analogues to your Matlab coding sequence/function relationship.]

    This is why compressibility is introduced as a measure to map binary populations whereas we may connect, in some sense, with dna/amino acid sequences. Such that if we have a simple description, “every third coin is T to 1000,” we can take k as the amount of 22 character descriptions that simply describe a 1000 bit string, and n as all possible 1000 bit strings, (and find a relationship with), c as the amount of amino acid sequences that can be described as “regulates thin filament length in mice”, and m representing all possible amino acid sequences. The value for k in many cases will be greater than the value of c, indicating an alarming amount of specificity found in some of the larger more conserved type of proteins.

  32. We know that functional variations exist among DNA coding sequences. Can the same be said about randomly mutated computer programs? If not, then there are differences between machine code and chemistry that make the analogy invalid.

    I do not know the details of AVIDA, but I assume that mutations don’t crash the interpreter.

  33. It seems junkdnaforlife is rather missing the point.
    Of COURSE the protein we find in a modern organism is “highly specified”. It evolved along with the rest of the organism.
    However, we have NO WAY of telling how many different aminoacid sequences will perform that particular function.
    And the whole point is that this particular sequence is the result of a long process of replication-with-variation, and natural selection.
    And Elizabeth has shown that such a process can add large amounts of specification. When mutation mechanisms other than, and additional to, point mutation are allowed, the process is even quicker
    jdfl seems again to be assuming that modern proteins are randomly assembled from aminoacid soups (or the corresponding DNA sequences from a nucleotide soup). No biologist thinks that this is what happened during evolution. jdfl totally ignores the effect of the environmental feedback on replication and natural selection

  34. damitall,
    DM, from my end, what this has demonstrated is (even after, 100’s, 1000’s and possibly millions of generations) that a weighted binary population (such as weighted coin) that favors H in this case, has extreme difficulty hitting one of these targets considered “simply describable,’ that I argue is more analogous to functional protein sequences. And this is a population carrying (at most) 1 bit per base, whereas amino acid bases can carry at least twice that or more. From there, the argument becomes, as you state, ” we have NO WAY of telling how many different aminoacid sequences will perform that particular function,” therefore if we have no way of actually knowing this, it could just as well be 3 or 10^10 possible sequences. This now becomes a different argument, one that is probably the most important.

  35. petrushka,

    Yes, this has been your argument since our days at UD before the fall sir. As I recall this was heavily debated by you and GP and others. GP has more specific details than I in this case, maybe he’ll show up here at some point.

  36. Dembski does say, in No Free Lunch in the section linked by Olegt, having “improved” Dawkins’ algorithm by making it non-latching (not that there is any evidence that it ever latched, and ignoring the fact that making a non-latching algorithm is actually easier) that:

    Even so, the problem of generating specified complexity has not been solved. Indeed, it is utterly misleading to say that E has generated specified complexity. What the algorithm E has done is take advantage of the specified complexit inherent in the fitness function and utilized it in searching for, and then locating, the target sequence.

    Well, only if the fitness function consists of the target, which is true of Weasel. It’s certainly not true of my exercise. The fitness function is very simple, and the “target” is simply the sequence that produces the highest value on the fitness function. It happens that there is a single highest-value sequence, but there are many sequences that are nearly as high, and form a very low proportion of the total number of possible sequences, and the fitness-function specifies what those sequences are. In other words, mine is exactly analogous to nature: the environment favours some phenotypic output, and as a result, sequences that optimise that output are selected. The fitness function is not the output. Unlike Weasel, the genotype is not the phenotype, and the phenotype is not the fitness function. The fitness function merely ranks the output from the current population of phenotypes.

    And if all Dembski is saying is that the environment has specified complexity, then why bother with biology at all?

  37. junkdnaforlife:

    There is a very large number of functions that will add weight to an element in a binary population and output a set. There are far fewer amino acid sequences that will regulate thin filament length in mice. These are differences in degrees of specificity. If k is the amount of fitness functions that will shift frequencies in a binary population, and n is all possible fitness function sequences, and if we allow c to represent the amount of amino acid sequences that will regulate thin filament length in mice, and call m all possible amino acid sequences, then k represents a far greater value then c, and it simply follows that k is far less specific.

    Yes.

    The aa sequence has such a high degree of specificity that the 1358 character sequence can be simply described as, regulates thin filament length in mice.

    No. If you are really going for “ease of description” in a colloquial sense, then, mine is equally simply describable. But that, IMO, is making the same class of error that says that “God did it” is more parsimonious than something like the vast current theory of evolution.

    If you want to go for compressibility as your specification, then you need some kind of rigorous definition of algorithmic compressibility, and I suggest that my sequence is more compressible, algorithmically, than your mouse sequence. But I could be wrong. But it doesn’t matter,because my sequence is one of a tiny subset of sequences in my search space that is as or more compressible than itself (at least I’d suggest that it is), and that tiny proportion exceeds Dembski’s threshold for non-design rejection.

    More to the point, functional complexity seems like a more rigorous and more useful way of specifying (as defined by Hazen et al, and, generally, applauded by ID proponents). I can readily compute the functional complexity of my sequence (i.e. the set of sequences that perform as well, or better, than the candidate sequence), and, again, it exceeds Dembski’s threshold. The mouse sequence may equal or exceed mine, but that is not the point. Mine exceeds Dembski’s threshold, and that tiny subset is found, reliably, by my NS search.

    “Maximizes the product of head runs,” on the other hand, refers to an extraordinarily large amount of possible fitness function sequences which depending on the angle you approach it can vary by several orders of magnitude.

    No. It refers to a single fitness function. Depending on the threshold I set (and I set it, approximately, at Dembski’s) the subset of sequences that exceed that threshold is very small. That’s the point.

    Unless you are making a Irreducible Complexity argument. I agree that, until you reach a product of 10^59 or so, the fitness space is very well connected, in my scenario, and we do not know how well connected it is in the mouse gene scenario. But Dembski is not arguing from IC here (He considers IC a special case of SC, as I understand it). He is arguing that Darwinian mechanisms are inadequate to get from a chance sequence to a highly specified sequence. My exercise demonstrates that this is incorrect.

    [Liz, I do not understand how you are calling your string a function. Your function is the string that sets the weight to H (+1) along with the string that is adding the results.

    I am using “function” in the biological sense. As in: this mouse mRNA sequence outputs outputs an aa sequence that performs the function of regulating filament length in mice, that helps the mouse survive and breed.

    My equivalent is: this sequence of coin-tosses in a virtual critter outputs a sequence of totals-of-runs-of-heads that performs the function of multiplying together to make a large number, that helps the critter survive and breed.

    To test this, erase 200 bits from your string and see if the program halts. Then go into your Matlab and clip out a block of code and see if the program function halts. If you knock out some coding sequence in your Matlab the program will halt, in the same way knocking out amino acid sequences will halt the protein activity. The amino acid sequence/function relationship
    is analogues to your Matlab coding sequence/function relationship.]

    No,that is not the relevant analogy. The Matlab code is simply the analog of a world in which there are self-replicating critters with binary genomes in which genomes whose runs-of-heads make a large product are more likely to survive and breed, and who do not breed exactly true.

    This is why compressibility is introduced as a measure to map binary populations whereas we may connect, in some sense, with dna/amino acid sequences. Such that if we have a simple description, “every third coin is T to 1000,” we can take k as the amount of 22 character descriptions that simply describe a 1000 bit string, and n as all possible 1000 bit strings, (and find a relationship with), c as the amount of amino acid sequences that can be described as “regulates thin filament length in mice”, and m representing all possible amino acid sequences. The value for k in many cases will be greater than the value of c, indicating an alarming amount of specificity found in some of the larger more conserved type of proteins.

    Nobody is denying that proteins show lots of specificity. What we are saying is that the argument that Darwinian mechanisms are inadequate to account for such specificity fails.

    In my little exercise, we have a purely Darwinian mechanism that reliably takes a sequence from one of a set 2^500 to one of a tiny, specified, subset of this set. Moreoever, it does so, not by specifying a set of target sequences, as in Weasel and friends, but by specifiying a function that the members of the specified set have to perform. Those sequences in the tiny subset of very large products are not coded into the fitness function – indeed I did not even know what they were, when I started, and I’d still have to figure them out if I wanted to count them (fortunately Olegt has already done that).

    So my exercise actually gives me information that I did not previously have. Trivial information that I could find more easily by some other means, true, but when it comes to practical evolutionary algorithms, the very reason we use them is that we do not know the possible solutions to our problem in advance. Ergo, they are not “smuggled in” to the fitness function.

    Dembski seems to have been mislead by Weasel, not because he thought Weasel “latched” (which was silly, but irrelevant), but because he got stuck on fitness functions that actually had their solution coded within them.

    Useful ones don’t, and mine doesn’t either. What is coded is the problem. Darwinian search finds the solution.

    Unless Dembski is simply saying that all problems contain their own solution. Which is a nice koan, but not an anti-Darwinian argument.

  38. junkdnaforlife:
    damitall,
    DM, from my end, what this has demonstrated is (even after, 100′s, 1000′s and possibly millions of generations) that a weighted binary population (such as weighted coin) that favors H in this case, has extreme difficulty hitting one of these targets considered “simply describable,’ that I argue is more analogous to functional protein sequences.

    What do you mean by “weighted coin”? It seems that you might have a misunderstanding of Elizabeth’s problem.

    Also, could you please answer her questions about specification?

  39. Thanks – actually I didn’t understand that part either (about the weighted coin) but I forgot to mention it when I replied.

  40. Incidentally, one point that my exercise demonstrates rather prettily, is that once the population is near-optimal, the vast proportion of mutations are deleterious. Near the beginning of the run, a much larger proportion of mutations are beneficial.

    This is one thing that Sanford forgets, for instance, when discussing “deleterious” mutations, but it is also important to remember that the widespread view that most mutations are deleterious, and therefore a poor candidate for beneficial innovations is based on a highly biased sampling of mutations i.e. from populations that are currently at a near-optimum.

  41. Elizabeth:
    … it is also important to remember that the widespread view that most mutations are deleterious, and therefore a poor candidate for beneficial innovations is based on a highly biased sampling of mutations i.e. from populations that are currently at a near-optimum.

    Let me add to that. People who make the point that mutations are much more likely to be deleterious than advantageous often think that this means that evolution must therefore lose ground. But if they actually calculated fixation probabilities they might come to a different conclusion.

    For example, with a population size of 100,000 individuals, and new mutations that reduce fitness by 0.0001 when heterozygous (and twice as much when homozygous) the probability of fixation is only 8.498 x 10^(-22), so basically they can never get fixed. A favorable mutation that has an advantage of 0.00001, only one-tenth as strong, also mostly gets lost. But it gets fixed a fraction 0.00002037 of the time,which is vastly more often. tthose deleterious mutations.

    Evolution would then not lose ground even if deleterious mutations were a million times more common than advantageous ones. (I have been using Motoo Kimura’s famous 1962 fixation probability formula).

  42. most mutations are deleterious, and therefore a poor candidate for beneficial innovations is based on a highly biased sampling of mutations i.e. from populations that are currently at a near-optimum.

    This is something that Ernst Mayr emphasizes in his writings, but which I didn’t quite understand until now.

  43. junkdnaforlife,

    junkdnaforlife: “Then go into your Matlab and clip out a block of code and see if the program function halts.”

    An analogy of what you are saying here, would be burning down a lab to see if a product you are testing meets some fire standard.

    Labs are NOT part of a test, they are simply a means of hosting the test.

    This a mistake that Gil Dodgen makes often and somehow has been accepted by some IDists as being valid.

  44. Labs are NOT part of a test, they are simply a means of hosting the test.

    This a mistake that Gil Dodgen makes often and somehow has been accepted by some IDists as being valid.

    Playing devil’s advocate here, there is a point to be made here, and that is that some sequences must be conserved lest the lab self-destruct.

    So we see conserved sequences. Not exactly reinforcing the ID agenda.

    More interesting to me is that fact that most “hard” sequences evolved in microbes before the Cambrian. Most protein domains, for example. Once you have multi-celled organisms and body development, you have the opportunity for evolution to focus on size and shape rather than on new proteins.

    I still think that the “information” analogy is inherently fallacious. Chemistry differs in fundamental ways from computer programs. To me the most important of several analogy failures is that you cannot predict the effect of mutations. Until someone demonstrates a way to simulate this level of chemistry I see evolution and not just an adequate way to design, but the only way.

  45. junkdnaforlife:

    There appears to be some that have associated the Google search engine with the degree of specificity that exists between an amino acid sequence, dna sequence, protein and function.

    No, we are saying that we are able to google the protein because there exists a linear symbolic representation that we can use as a lookup – it is you who seems to think the linear representation is a key to ‘CSI’. But we are not googling the protein itself, and certainly not its function. Even that short-hand amino-acid string hides a deeper representation, that we could expand out in terms of CH3CH(NH2)COOH (alanine) etc – but we’d need a way of representing stereochemistry and it would get pretty messy. And we wouldn’t find anything in google or BLAST, though it would be closer to representing the ‘real’ protein. Closer yet would be a ball-and-stick model, and we certainly couldn’t google that.

    I took an arbitrary fragment of your protein and did a simple BLAST search on it.
    KKCQTLVSDADYRNYLHQWTCLPDQNDVIHARQAYDLQSDNMYKSDLQW
    MRGIGWVPIGSLDVEKCKRATEILSDKIYRQPPDKFKFTSVTDSLEQVLAKN
    NAITMNKRLYTEAWDKDKTQIHIMPDTPEIMLARMNKVNYSESLYKLANEEAK
    KKGYDLRSDAIPIVAAKASRDIASDYKYKDGYRKQLGHHIGARNIKDDPKMM
    WSMHVAKIQSDREYKKDFEKWKTKYSSPVDMLGVVLAKKCQILVSDIDYKH
    PLHEWTCLPDQNDVIHARQAYDLQSDNVYKSDLQWMRGIGWVPIGSLDVV
    KCKRAAEILSDNIYRQPPNKFKFTSVTDSLEQVLAKSNALNMNKRLYTEAWD
    KDKIQIHVMPDTPEIMLARQNRINYSESLYKLANEEAKK

    All manner of interesting results come back. This fragment has relatives, and fragments of the fragment have relatives. Those relatives are not all specific to “regulate thin filament length in mice”. The fragment has several internal repeats, and conserved domains that map onto other 3D structures in other proteins (structurally conserved, as opposed to being sequence-conserved). The gene shows substantial evidence of lineage-specific extension by repetition of smaller modules, one of which (given your handle) you may be interested to know is associated with an Alu transposon, a known source of misalignment duplicates in recombination.

    In short, it’s a bodge and, as far as I can tell, a classic example of evolutionary tinkering. It also contains the words PIGS and RAT, and a sequence that looks a bit like MONKEYS (MNKVNYS) – and similar proteins are found in all three! Coincidence? I think not! OK, I’m joking, but the repetitious nature of motifs in this protein are striking. Look at the bit beginning MRGIGWVPIGSLD, and see where else it appears.

    Or look at the alignment of these two ‘similar’ subfragments (it won’t quite line up because of fonts).

    KKCQTLVSDADYRNYLHQWTCLPDQNDVIHARQAYDLQSDNMYKSDLQWMRGIG
    KKCQILVSDIDYKHPLHEWTCLPDQNDVIHARQAYDLQSDNVYKSDLQWMRGIG

    Where they differ, do you think they had to differ, else the protein would not work? Of course, it may not be possible to mutate them back non-fatally, because other proteins have evolved in an environment containing the current sequences.

    So, the sequence is compressible. You would kill a mouse by compressing it (the sequence!), but in this instance, the very modularity of the protein is part of its function. Just looking at the ‘raw’ sequence of this giant does not give much insight into its specificity, nor its ‘evolvability’. But it has ‘evolution’ written all over it.

Leave a Reply