Working Definitions for the Design Detection Game/Tool

I want to thank OMagain in advance for doing the heavy lifting required to make my little tool/game sharable. His efforts will not only speed the process up immeasurably they will lend some much needed bipartisanship  to this endeavor as we move forward. When he is done I believe we can begin to attempt to use the game/tool to do some real testable science in the area of ID . I’m sure all will agree this will be quite an accomplishment.
Moving forward I would ask that in these discussions we take things slowly doing our best to leave out the usual culture warfare template and try to focus on what is actually being said rather than the motives and implications we think we see behind the words.

 

I believe now would be a good time for us to do some preliminary definitional housework. That way when OMagain finishes his work on the gizmo I can lay out some proposed Hypotheses and the real fun can hopefully start immediately.

 

It is always desirable to begin with good operational definitions that are agreeable to everyone and as precise as possible. With that in mind I would like to suggest the following short operational definitions for some terms that will invariably come up in the discussions that follow.

 

1.      Random– exhibiting no discernible pattern , alternatively a numeric string corresponding to the decimal expansion of an irrational number that is unknown to the observer who is evaluating it

2.       Computable function– a function with a finite procedure (an algorithm) telling how to compute the function.

3.       Artifact– a nonrandom object that is described by a representative string that can’t be explained by a computable function that does not reference the representative string

4.      Explanation –a model produced by a alternative method that an observer can’t distinguish from the string being evaluated

5.       Designer– a being capable of producing artifacts

6.       Observer– a being that with feedback can generally and reliably distinguish between artifacts and models that approximate them

Please take some time to review and let me know if these working definitions are acceptable and clear enough for you all. These are works in progress and I fully expect them to change as you give feedback.

Any suggestions for improvement will be welcomed and as always please forgive the spelling and grammar mistakes.

peace

541 thoughts on “Working Definitions for the Design Detection Game/Tool

  1. fifthmonarchyman: So now you agree with Patrick that computers choose?

    I’m not sure where you got that idea. Computers don’t do pragmatics. They only do logic.

    Yes, people sometimes describe them as doing pragmatics. But what happens is that they have created a logical model of pragmatics and the computer just does the logic. The pragmatics was done by whoever created that logical model.

  2. fifthmonarchyman: Wait a minute there honcho. look at the following “conversion”

    Jenny’snumber—— original object
    8675309 ———- numeric representation

    XXXXXXXX —–bar graph
    XXXXXX
    XXXXXXX
    XXXXX
    XXX

    XXXXXXXXX

    100001000101111111101101——– decimal to binary
    845FED——— binary to hexadecimal
    �_————- hexadecimal to text

    I think some of the pattern might have got lost in the conversion process. What do you think?

    This is so confused.

    Only Jenny and her friends would recognise 8675309 as her number as displayed in Base10 numerals. Some might get it when displayed as a bar chart. Few would get it displayed as a binary string. Probably nobody would recognise HFGEC_I as Jenny’s number unless they realise it is a code.

    But this is not because of any particular property of any of these representations. It is solely because in our society we have decided and agreed to operate telephones with numbers in Base10 and we all know this convention and use it all the time. If instead we used telephones that had lettered keys instead of numbered ones, all of Jenny’s friends would recognise HFGEC_I as her number and few would recognise 8675309.

    On the other hand, if the same group of observers would be shown 8675308, nobody would recognise it as an artifact, simply because they don’t know anyone with that telephone number even though it is someone’s number just as much as Jenny’s is.

    If the telephone number is a valid example of your game, all it boils down to is showing someone an object they have seen or known about before, and then claim ‘design detection’ when they recognise it! Why then the need to go to such elaborate lengths with comparisons with fake strings? All you need to show the observers is the actual telephone number. Jenny’s friends will say ‘artifact’ and you will claim that as a valid positive result, whereas everybody else will say ‘not an artifact’ and you will dismiss those as unavoidable but unimportant false negative outcomes! Way to foolproof your experiment!

    To be charitable, I don’t think your game is meant to do that, but in that case using a telephone number to illustrate it is really way off the mark. Telephone numbers simply do not contain patterns that indicate they are designed . You need to come up with a better example that doesn’t rely on the observers having a priori knowledge about what it is they are looking at. A pattern that causes the reaction ‘Ha, I know this, I saw it only last week!’ is not a valid pattern to test your game.

  3. faded_Glory: If the telephone number is a valid example of your game, all it boils down to is showing someone an object they have seen or known about before, and then claim ‘design detection’ when they recognise it!

    Exactly! Pattern matching only works if you have an existing pattern to compare with your sample. Life experience, learning, grows by storing patterns with which to interpret new information. There’s nothing I can think of that sentient beings do to explore their environment that is not some form of pattern matching.

  4. Alan Fox: Exactly! Pattern matching only works if you have an existing pattern to compare with your sample. Life experience, learning, grows by storing patterns with which to interpret new information. There’s nothing I can think of that sentient beings do to explore their environment that is not some form of pattern matching.

    Yeah. One of the first examples proposed by FFM was 159265358 or something like that. He said it appeared random but if you know what to look for you can notice it’s a portion of PI’s digits, but just about any other 9 digit sequence is also in PI’s digits:

    http://www.subidiom.com/pi/pi.asp

    Whatever string FFM’s produces, there is always a potential Enigma key table that could decode it into a string like “ID is pure nonsense”

  5. petrushka,

    I think it’s safe to say that at some time within the next 50 years, humans will lose out, except in the realm of insider trading.

    The only way humans will compete is by knowing something the programs don’t know. About upcoming court rulings or regulations or something.

    There are at least two companies that are building systems to handle that kind of information as well. The one I’m most familiar with has been taking in raw news feeds from multiple sources for almost 10 years now to train their models. Right now I think they mostly predict market sentiment with respect to particular companies and verticals, but I’m sure it’s going to get better.

  6. “I want to thank OMagain in advance for doing the heavy lifting required to make my little tool/game sharable. His efforts will not only speed the process up immeasurably they will lend some much needed bipartisanship to this endeavor as we move forward. When he is done I believe we can begin to attempt to use the game/tool to do some real testable science in the area of ID . I’m sure all will agree this will be quite an accomplishment.”

    Doesn’t it occur to you to ask why this little accomplishment was not accomplished earlier, preferably by the relevant parties, such as Dembski or Behe? Isn’t this little accomplishment a necessary prerequisite for ID to be called a scientific theory or hypothesis in the first place?

    And why are your definitions incoherent? For example,
    1. Random– exhibiting no discernible pattern , alternatively a numeric string corresponding to the decimal expansion of an irrational number that is unknown to the observer who is evaluating it

    So, absolutely anything can be called “random” by putting a monkey into the role of the observer. Since all other definitions depend on the distinction of random versus non, this little oversight thoroughly undermines your entire theory.

  7. “The Observer” is a thoroughly ridiculous concept. It could only be made non-ridiculous by operationalizing it. Something FMM has not done.

  8. Erik: So, absolutely anything can be called “random” by putting a monkey into the role of the observer.

    You are correct. Random is in the eye of the beholder it’s not in the thing itself.
    I believe “true randomness” does not exist there is only apparent randomness. This is a philosophical position

    There is no way as far as I can tell to scientifically prove that true randomness exists.

    What we need is a definition that works regardless of whether there is true randomness. Sort of like Darwin’s “random with respect to fitness”. I think that tying the randomness to the observer’s knowledge does this. Do you have a better idea

    peace

  9. fifthmonarchyman: I think that tying the randomness to the observer’s knowledge does this. Do you have a better idea

    This is why you need an operational definition.

    The issue of detecting randomness is one of the deepest problems in applied mathematics. It has implications for cryptography, which is a multi-billion dollar industry. It has other implications in science, but money spurs research and attracts bright people.

    From your discussion of number bases it is apparent you have no procedure for distinguishing an original string from a modified string, unless you have some knowledge of what the string represents, and what the expected patterns are.

    Distinguishing one snippet of a random string from another snippet, or distinguishing a modified random string from the original is conceptually a lost cause.

    There is no snippet that cannot be a snippet of a random string.

    When you deal with financial graphs, you are dealing with data which involves large sums of money, and which provides motivation for fraud. Election results are also likely targets for fraud. Bill Dembski did an analysis of some election results and argued that they showed a systematic bias.

    We can suspect deliberate bias in such data, not because the bias is objectively necessary, but because we combine the knowledge of an unexpected pattern with the knowledge of human motivation. There is nothing in the data by itself that says design.

  10. faded_Glory: Only Jenny and her friends would recognise 8675309 as her number as displayed in Base10 numerals.

    Right to the rest of the world the string would probably appear to be random.
    This is the Y axis. The more context the observer shares with the designer the higher up you are in the axis.

    The lowest point on the axis is probably just 7 charters with no repeats,

    An American with a land line telephone might recognize the string as a possible phone number

    Someone with the 867 area code might suppose the string as a local number

    etc etc

    The more context you share the higher up on the axis you are.

    peace

    faded_Glory: Jenny’s friends will say ‘artifact’ and you will claim that as a valid positive result, whereas everybody else will say ‘not an artifact’ and you will dismiss those as unavoidable but unimportant false negative outcomes! Way to foolproof your experiment!

    In a sense this is correct. Design detection is a subjective thing. There is no way to eliminate false negatives

    However the point of the game is to minimize this by removing the string from it’s context and putting all observers on the same footing.

    peace

  11. faded_Glory: To be charitable, I don’t think your game is meant to do that, but in that case using a telephone number to illustrate it is really way off the mark. Telephone numbers simply do not contain patterns that indicate they are designed .

    Actually I sort of agree,

    Jenny’s number will not pass the test of the game. It will fail when it comes to the “complexity” string. Recall that with the complexity string we snip the original string at a random spot and see if we can distinguish the new one from the original, If we can then the original string is not complex enough to infer design

    It goes like this

    8675309 becomes
    5309867.

    Can you distinguish between the two strings? Of course you can anybody can. So even though Jenny’s number is designed you can’t infer design based on the game.

    the same goes for the digits of Pi anyone can distinguish

    314159 and 415931

    however we might have a hard time distinguishing

    3141592653589793238462643383279502884197169399375105820974944592307816406286

    and

    6433832795028841971693993751058209749445923078164062863141592653589793238462

    when the strings are rolling past in moving line graphs and your time is limited

    Therefore the first 100 digits of pi make it past the complexity threshold
    Do you see how we try to minimize subjectivity with the game?

    faded_Glory: You need to come up with a better example that doesn’t rely on the observers having a priori knowledge about what it is they are looking at

    I predict that once you play the game a few times you will have a very good idea about the sorts of strings I’m talking about

    peace

  12. petrushka: it is apparent you have no procedure for distinguishing an original string from a modified string, unless you have some knowledge of what the string represents, and what the expected patterns are.

    You will if you aren’t afraid to actually play the game quickly find out that with many strings you will be able to tell the original from a fake,

    There is no “procedure” for this as it is a noncomputable function but you will be able to distinguish none the less

    peace

  13. petrushka: distinguishing a modified random string from the original is conceptually a lost cause.

    Yet I can do it with many strings, Imagine that

    peace

  14. petrushka: If the paper includes a procedure for generating strings that qualify, it will be no problem for you to work one out.

    I guess that means you did not read it.

    That is a shame

    so

    So far you have told us that you are unwilling to help with this endeavor and you are unwilling to even look at the papers that provide the inspiration and rational for the enterprise.

    Remind me again, what good are you? 😉

    peace

  15. fifthmonarchyman,

    however we might have a hard time distinguishing

    3141592653589793238462643383279502884197169399375105820974944592307816406286

    and

    6433832795028841971693993751058209749445923078164062863141592653589793238462

    when the strings are rolling past in moving line graphs and your time is limited

    I can distinguish them by the first digit.

    Therefore the first 100 digits of pi make it past the complexity threshold
    Do you see how we try to minimize subjectivity with the game?

    I don’t understand what you mean by complexity threshold or how you are minimizing subjectivity, no.

    faded_Glory: You need to come up with a better example that doesn’t rely on the observers having a priori knowledge about what it is they are looking at

    I predict that once you play the game a few times you will have a very good idea about the sorts of strings I’m talking about

    Rather than waiting for others to demonstrate your claims, why don’t you just show us a few examples from your own experience and explain how they support whatever it is you think they support? No need to wait for OMagain and it would eliminate a lot of unanswered questions.

  16. fifthmonarchyman:

    This is the Y axis. The more context the observer shares with the designer the higher up you are in the axis.

    Y axis of what? What are you talking about? And what is on the X axis?

    In a sense this is correct. Design detection is a subjective thing. There is no way to eliminate false negatives

    The way you illustrate it with the telephone number, the game is a fraud. Occasional correct guesses are hailed as confirmation of your hypothesis and regular incorrect guesses are handwaved away as unimportant false negatives. The game is rigged.

    However the point of the game is to minimize this by removing the string from it’s context and putting all observers on the same footing.

    Well then, show us some strings that you are actually using, and tell us where they come from and what they represent, so we can at least get to the point instead of being continuously sidetracked into irrelevancies like ‘Jenny’s number’.

    I presume you have some such strings?

    fG

  17. faded_Glory: I presume you have some such strings?

    Here is one

    http://theskepticalzone.com/wp/fmm-design-tool-post-1/#comment-97847

    I did not run this string through the “complexity” test because that test did not exist when I posted the string. I’m confident that it will pass but I haven’t run the actual experiment.

    I’ve run all the other tests and this come out as designed.

    1) I can distinguish the real string from a randomized copy
    2) I can distinguish the real string from a model that is close but not identical
    3) I can’t distinguish the real string from a manual copy that has had post processing tweaks

    faded_Glory: Occasional correct guesses are hailed as confirmation of your hypothesis and regular incorrect guesses are handwaved away as unimportant false negatives. The game is rigged.

    FG you need to understand that as a Calvinist I believe that everything is designed

    There are no true negatives there is only “confirmed design” and “undetermined”

    Peace

    PS

    Thank you again for the interaction.
    A big part of what I’m trying to accomplish is to see if folks with mutually exclusive worldviews can communicate,

    It’s good to see you making the effort

  18. petrushka: By distinguish, I presume you mean you know which is the real and which is random.

    Apparently you did not read the “Is It Real, or Is It Randomized?” paper either.

    Why not spend 5 minutes and do a little research and then let me know if you still have questions

    peace

  19. Patrick: I can distinguish them by the first digit.

    When you played the Financial Market Turing test game did you know what the first digit was?

    Of course not

    When it comes to moving line graphs all you see is the overall pattern you have no idea where a string starts or ends.

    Patrick: I may have missed your response while catching up on two days of comments, but have you confirmed that the machine learning process I described earlier meets the criteria to disprove your claims with respect to the Finance Game paper?

    Here is the deal Patrick,

    We agreed as to what was requited months ago when you said you could easily cook something up to distinguish between real and random strings,

    Now apparently you want to revisit all that for some reason.

    If I did not know better I would say you are looking to wear me down with qualifications so that I will agree to something less than software that can distinguish between real and random data.

    That is what I’m looking for ———-software that will do as well as me when it comes to distinguishing between strings. Simple really

    But my recent experience with you tells me that sometimes simple communication eludes us

    Now we can if you like once again go over all the details and make sure —again—- that everyone is clear as to what you are trying to do. And that all the “I”s are dotted and all the “T” crossed. Just like we did months ago

    But I have some other stuff to attend to right now so you will have to wait a bit.

    peace

  20. petrushka:

    By distinguish, I presume you mean you know which is the real and which is random.

    fifth:

    Apparently you did not read the “Is It Real, or Is It Randomized?” paper either.

    Why not spend 5 minutes and do a little research and then let me know if you still have questions

    Instead of being an ass, why not spend one minute and answer his question, fifth?

    Your statement is ambiguous:

    I can distinguish the real string from a randomized copy

    That could mean a) that you can see a difference between the two, or b) that you can tell which one is “real” and which is randomized.

    Petrushka’s question is a good one, and he is not obligated to assume that your “tool” is identical to the Financial Turing Test in that regard.

  21. fifthmonarchyman: Apparently you did not read the “Is It Real, or Is It Randomized?” paper either.
    Why not spend 5 minutes and do a little research and then let me know if you still have questions

    I asked the question because it makes no sense to say you can tell an original string from a randomized version.

    I am investing my time trying to understand what you are trying to do. If you will not publish the rules and your expectations, I see no reason why anyone should cooperate.

    I am not asking for anything more than anyone would get with a new game. Rules of how to play. Expectations.

    I am not interested at this time in your philosophical musings. Time for that later. Let us know how to play the game.

  22. fifthmonarchyman,

    I may have missed your response while catching up on two days of comments, but have you confirmed that the machine learning process I described earlier meets the criteria to disprove your claims with respect to the Finance Game paper?

    Here is the deal Patrick,

    We agreed as to what was requited months ago when you said you could easily cook something up to distinguish between real and random strings,

    Now apparently you want to revisit all that for some reason.

    The reason is that I’ve had more time to observe your behavior here. You have demonstrated an inability to provide operational definitions for your terms, a refusal to provide detailed examples of you playing your game, and irrational clinging to the conclusions of a paper after one of the supporting claims has been conclusively demonstrated to be false. I’m not going to go to the effort of disproving another of your claims without ensuring that you can’t move the goalposts after the work has been done.

    Please confirm that the machine learning process I described meets the criteria to disprove your claims with respect to the Finance Game paper.

  23. fifth:

    petrushka: distinguishing a modified random string from the original is conceptually a lost cause.

    Yet I can do it with many strings, Imagine that

    Nice quotemine, plus I see that petrushka’s point went right over your head. Here’s the full quote:

    Distinguishing one snippet of a random string from another snippet, or distinguishing a modified random string from the original is conceptually a lost cause.

    There is no snippet that cannot be a snippet of a random string.

    Ponder the implications, fifth.

  24. I think, like the Dembski analysis of election results, if you know something about the source of the data and the possible motives for altering or biasing it, you can use a Bayesian analysis to detect tampering.

    I have read that some of Newton’s data is too clean to be actual observational data. Perhaps he removed some bad observations to make it better fit his formula. It is not unheard of to remove outliers from datasets. Of course, one is expected to mention this when publishing.

  25. fifth,

    You still haven’t found a solution for the resolution and representation issues. Not only that, your procedure as presented is hopelessly ambiguous, imprecise, and unreliable.

    You write:

    In order to actually infer design the observer needs to be unable to distinguish the real string from the “manual” and “complexity” strings but able to distinguish when it comes to the “model” and “random” strings.

    You describe the “complexity” string thus:

    2) a “complexity” string that is created choosing a random spot to cut the original sequence and reassemble so that you have new string that identical to the original except it begins in a different spot

    Yet it’s easy to think of strings for which choosing one random spot to cut and reassemble would create an obvious discontinuity, while choosing a different random spot would not. In your methodology, that can make the difference between “detecting” design or not.

    Whether you “detect” design is supposed to depend on the pattern(s) you perceive in the string itself, not on the random number you happen to pick when generating your “complexity” string.

    Of the “manual” string, you say:

    As a bonus we might include one or more “manual” copies in which we take a “model” string and do some post processing smothing

    How many copies? As many as the experimenter feels like? How much “smothing” is allowed? About yay much? And why do the smoothing manually, instead of doing something more objective?

    And what is the smoothing supposed to accomplish? I can smooth any string into a flat line, in which case I’ll be able to distinguish the “manual” string from the “real” one (unless the latter is also a flat line).

    That means any “real” string can be rejected as undesigned unless it is a flat line. But a flat line will also be rejected, since flat lines aren’t distinguishable from randomly reordered versions of themselves. If you allow unlimited smoothing, then your “tool” can get swamped with false negatives.

    But if you choose to limit the amount of smoothing, then you need to find an objective way to determine the right amount and an objective way to enforce the limit.

    Of the “model” string, you write:

    4) a “model” string that is created by any algorithmic process er choose. (I usually use an EA). What is important is that it be close but not identical to the original and that the algorithm not target the specific digits in the original string

    How close is “close but not identical”? How do you theoretically determine the right amount of closeness?

    How do you decide whether an algorithm “targets” the specific digits in the original string? I don’t think it’s as obvious as you’re assuming.

    Then we will load the real string an a fake into the game and start the fun

    It isn’t going to be very much fun for you if you don’t address these many issues. They’re hanging over your head like the sword of Damocles.

  26. Patrick: In the referenced paper, the authors had six sets of data, each based on different financial instruments and time periods.

    No the authors had 8 sets of data

    NASDAQ Composite Index, the Russell 2000 Index, the US Dollar Index, Gold(spot price),the Dow Jones Corporate Bond Price Index, the Dow Jones Industrial Average, the Canada/US Dollar Foreign Exchange Rate, and the S&P GSCI Corn Index (spot price)

    Patrick: From each time series they created a new time series by rearranging the deltas between each point.

    Can you give me the exact quote from the paper on this one?

    Patrick: The result reported were that participants were able to identify the real time series approximately 73% of the time.

    It’s true that there was something like 73% overall accuracy but that figure includes the initial runs before the subjects learn the pattern

    The point of the game was that with feed back accuracy increases to the point that after a few repeats accuracy approaches 100%.

    In order for you to demonstrate parity with humans your software needs to show a similar improvement curve.

    Patrick: So, if I create similar data sets for a few financial instruments and train a model to have 73% or greater accuracy on test data that is not part of the training data (but that is, of course, for the same instrument), that would meet your challenge?

    No you need to demonstrate the same improvement curve as humans generally not just over a few data sets.

    This is because you could if you wanted cherry pick sets that are similar or you could simply bruit force you way with a limited number of data sets, What we are looking for is software that can distinguish between real and random strings as well as humans do.

    again we covered all this months ago and you said such a thing would be no problem to cook up in a couple weeks

    What has changed?

    peace

  27. Patrick: The reason is that I’ve had more time to observe your behavior here.

    As have I yours

    You seem to be unable or unwilling to understand pretty simple concepts in papers if they don’t agree with your expectations as witnessed by your misunderstanding of the concept of lossless information integration.

    You also seem to be pretty quick to try and change the rules of the game when things don’t go as you wanted them to. To the point of encouraging others to not participate in the process of improving the tool.

    I’m just glad that not everyone on your side is so inclined.

    peace

  28. keiths: Yet it’s easy to think of strings for which choosing one random spot to cut and reassemble would create an obvious discontinuity, while choosing a different random spot would not. In your methodology, that can make the difference between “detecting” design or not.

    Examples please?

    Once again I don’t think you are understanding what happens in the game It’s possible I might be mistaken but I doubt it

    The strings loop so the individual numbers stay in the same order the only thing that changes when you cut one string and reassemble is you can’t compare the strings data point to data point but must instead look at the pattern.

    peace

  29. keiths: You still haven’t found a solution for the resolution and representation issues.

    These are not problems at all so I’m not looking for solutions.

    You seem to be under the impression that my tool claims to be able to completely rule out design in certain cases. That is not it’s purpose.

    A negative throat culture does not mean that your throat is bacteria free. It only means that the type and quantity of bacteria did not meet the threshold that would trigger a positive result in that particular test.

    You can’t prove a negative

    peace

  30. petrushka: I asked the question because it makes no sense to say you can tell an original string from a randomized version.

    If you read the paper or played the game it would make sense

    petrushka: If you will not publish the rules and your expectations, I see no reason why anyone should cooperate.

    The shareable game does not exist yet. Once it does we will put together rules for the game. In the mean time I’d like to work on definitions.

    A few folks have offered helpful feedback in this regard but apparently that is not something you want to do

    I’m cool with that just don’t expect me to spend a lot of time helping you understand if you are unwilling to reciprocate or even read the paper’s in question

    Peace

    Not only are you unwilling to offer productive feed back

  31. fifthmonarchyman: Here is one

    http://theskepticalzone.com/wp/fmm-design-tool-post-1/#comment-97847

    I did not run this string through the “complexity” test because that test did not exist when I posted the string. I’m confident that it will pass but I haven’t run the actualexperiment.

    I’ve run all the other tests and this come out as designed.

    1)I can distinguish the real string from a randomized copy
    2) I can distinguish the real string from a model that is close but not identical
    3) I can’t distinguish the real string from a manual copy that has had post processing tweaks

    I ran your string through a simple time series analysis program (caveat: I am no professional statistician) and this is what I found:

    The string has 213 data values. Apart from the initial ramp-up there is no clear trend. The data is fairly smooth and contains plateaus that are of longer duration than the intermittent troughs.

    When I construct a correlogram with a lag of 53 (which is just just the default in the program I use), the autocorrelation at lag 1 is a high 0.91. It then smoothly tapers off, reaches 0.5 at a lag of 6 and 0 at a lag of 20.

    This means that the predictability at short distances is very good (0.91). In other words the data is smooth at short distances and more and more ‘random’ at larger ones up to 80.

    Your data is also auto-regressive in that from lags 20 to 50 there is a modest ‘wave’ of negative autocorrelation peaking at -0.25 at a lag of approx 30.

    A large proportion of the autocorrelations fall outside the upper and lower bounds.

    I’d say that your data is well correlated and it should have a high predictability with an appropriate model.

    Your randomised string, unsurprisingly, has a very low autocorrelation at all lags and will be extremely easy to tell apart from the original.

    Your model gets back to the original to some degree, and the correlogram has an overall shape that is not too far removed from it. In other words, your EA does a reasonable job in recreating the original. However, the autocorrelation at lag 1 is only 0.74 rather than 0.91 as in the original. It stays consistently below the original autocorrelation until lag 20 where it too reaches a value of 0. More of the autocorrelations fall outside the upper and lower bounds.

    This is all a complicated way of saying that your model contains more short-distance random noise visible as ‘spikyness’. As you already said, with some post-processing you will be able to get much closer to the original, to the point where you may struggle to tell them apart.

    What you basically are doing is screwing up a dataset by introducing a lot of random noise that you then kind of filter out a bit. The end product will be quite easily distinguishable from a fairly smooth original (such as this one), but if you have a spiky original to begin with, the extra noise you introduce isn’t going to make much of a difference.

    Natural data comes in all sorts, smooth, noisy, whatever, depending on what it is, how it originates and how you measure it. I see no reason to label smooth datasets as ‘designed’ and more noisy ones as ‘undetermined’ unless you throw in an awful lot of other assumptions.

    FG you need to understand that as a Calvinist I believe that everything is designed

    There are no true negatives there is only “confirmed design” and “undetermined”

    Well, as I said before, this is why your game is rigged. Something is designed except when we can’t tell, and even then it is designed. It is just a fancy way to confirm your biases.

    fG

  32. fifthmonarchyman,

    In the referenced paper, the authors had six sets of data, each based on different financial instruments and time periods.

    No the authors had 8 sets of data

    Sorry, my typo. Eight sets.

    From each time series they created a new time series by rearranging the deltas between each point.

    Can you give me the exact quote from the paper on this one?

    Haven’t you read the paper? 😉

    See section 2, Experiment Design, on page 4:

    “To test the null hypothesis H that human subjects cannot distinguish between actual and randomly generated price series, we begin with a time series of actual historical prices {p0, p1, p2, . . . , pT } and compute the returns or price differences {rt},

    rt ≡ pt − pt−1 (1)

    from which we construct a randomly generated price series {p∗0,p∗2,…,p∗T} by cumulating randomly permuted returns:

    p∗t ≡ rπ(k) , p∗0 ≡ p0 , π(k) : {1,…,T} → {1,…,T} (2)

    where π(k) is a uniform permutation of the set of time indexes {1, . . . , T }. A random permutation of the actual returns does not alter the marginal distribution of the returns, but it does destroy the time-series structure of the original series, including any temporal patterns contained in the data. Therefore, the randomly permuted returns will have the same mean, standard deviation, and moments of higher order as the actual return series, but will not contain any time-series patterns that can be used for prediction.”

    The result reported were that participants were able to identify the real time series approximately 73% of the time.

    It’s true that there was something like 73% overall accuracy but that figure includes the initial runs before the subjects learn the pattern

    The point of the game was that with feed back accuracy increases to the point that after a few repeats accuracy approaches 100%.

    This is not supported by the paper as far as I can see. Where do you get this number?

    In order for you to demonstrate parity with humans your software needs to show a similar improvement curve.

    This is why I want to be clear on your criteria. Nothing you’ve said before discussed improvement curves, only the ability of a software system to perform as well at the task as humans.

    As it happens, most machine learning systems will show an improvement curve because they start out untrained and therefore have poor performance. That performance improves with training. Does that meet your new criteria? If not, why not?

    So, if I create similar data sets for a few financial instruments and train a model to have 73% or greater accuracy on test data that is not part of the training data (but that is, of course, for the same instrument), that would meet your challenge?

    No you need to demonstrate the same improvement curve as humans generally not just over a few data sets.

    Again, this was not part of your original criteria.

    In the experiment described in the paper, subjects were presented with multiple time series from a single data set and learned to distinguish between real and permuted series. Multiple data sets were used, but the training and testing was always restricted to one.

    I can create multiple data sets, including most of the financial instruments referenced in the paper (they don’t make their actual data available, unfortunately). Here’s the process I would follow in the case of the DJIA closing prices (obviously the same can be done for any other instrument):

    1) Download all historical closing prices for the DJIA.

    2) Randomly, with a uniform distribution, select a few thousand start dates.

    3) From each start date load the next 250 closing prices and save these time series.

    4) For each time series generate a permuted version following the algorithm described in the paper.

    5) Divide the set of time series pairs into training, cross validation, and test sets.

    6) Train a machine learning model using the training and cross validation sets by presenting pairs of real and permuted time series.

    7) Test the trained model on the test set.

    If this resulted in 73% or greater accuracy on the test set, would that disprove your claim? If not, why not?

  33. fifthmonarchyman,

    The reason is that I’ve had more time to observe your behavior here.

    As have I yours

    You seem to be unable or unwilling to understand pretty simple concepts in papers if they don’t agree with your expectations as witnessed by your misunderstanding of the concept of lossless information integration.

    The misunderstanding is not mine. I have supported everything I’ve said about both papers with specific quotes from the papers themselves. You have never quoted the papers because nothing in them supports your claims.

    Speaking of integrated information, you still haven’t explained why a trained neural network doesn’t have it. I have shown that one does, with reference to the actual paper you’re touting.

    You also seem to be pretty quick to try and change the rules of the game when things don’t go as you wanted them to. To the point of encouraging others to not participate in the process of improving the tool.

    It’s your game and your vague and ambiguous rules. I’m just trying to understand what you are really claiming. You’re not making it easy with your refusal to provide operational definitions and examples.

  34. fifthmonarchyman,

    petrushka: If you will not publish the rules and your expectations, I see no reason why anyone should cooperate.

    The shareable game does not exist yet. Once it does we will put together rules for the game.

    That makes no sense. You claim to have built and run this game in Excel. The rules are yours. Just present them so everyone else can understand what you’re claiming.

    In the mean time I’d like to work on definitions.

    Those should be the first thing you presented, but as has been pointed out repeatedly, your definitions are not operational. In fact, they’re so vague as to be practically useless.

    If you want to be taken seriously, provide some examples of how you have played the game and a detailed description of what you think the results mean. You have played it, right?

  35. By the way, when I construct an correlogram of your original data with 106 lags there appears a periodicity at lags of 30 and multiples of that. Is there a seasonal fluctuation, or something like it, in your data?

    fG

  36. I find it an interesting world where requests for game rules and examples are considered hostile and unproductive.

    I am confused by fifth’s use of the word “string”. He seems to mean integer dataset in delimited format. The example I’ve seen is line feed delimited.

    Another confusing aspect of the game is the unanswered question of whether the elements of the dataset are supposed to have some underlying pattern or correlation.

  37. fifth,

    These are not problems at all so I’m not looking for solutions.

    Not problems? You talked and talked about sonnets. They were your canonical example of design. Yet your “tool” utterly failed to recognize an actual sonnet as designed.

    Why? Because of the representation issue.

    Not a problem? You’re in denial.

  38. petrushka,

    I find it an interesting world where requests for game rules and examples are considered hostile and unproductive.

    You’re asking fifth to do science. He gets very grumpy when people ask him to do science.

  39. fifth,

    You’re probably feeling overwhelmed by the number of flaws that commenters have identified in your approach, so perhaps it will help to focus on one at a time.

    Let’s look at the smoothing issue. Earlier, I wrote:

    Of the “manual” string, you say:

    As a bonus we might include one or more “manual” copies in which we take a “model” string and do some post processing smothing

    How many copies? As many as the experimenter feels like? How much “smothing” is allowed? About yay much? And why do the smoothing manually, instead of doing something more objective?

    And what is the smoothing supposed to accomplish? I can smooth any string into a flat line, in which case I’ll be able to distinguish the “manual” string from the “real” one (unless the latter is also a flat line).

    That means any “real” string can be rejected as undesigned unless it is a flat line. But a flat line will also be rejected, since flat lines aren’t distinguishable from randomly reordered versions of themselves. If you allow unlimited smoothing, then your “tool” can get swamped with false negatives.

    But if you choose to limit the amount of smoothing, then you need to find an objective way to determine the right amount and an objective way to enforce the limit.

    How would you answer those questions?

  40. petrushka,

    I find it an interesting world where requests for game rules and examples are considered hostile and unproductive.

    Indeed. I observed similar behavior at UD when I used to visit there; asking for clarification and details was seen as an attack.

    This will sound snarky, but I seriously suspect the root of this issue is the authoritarian nature of the fundamentalist and evangelical churches. Questioning is dangerous and discouraged. That training affects believers’ behavior in secular environments.

  41. Patrick:
    This will sound snarky, but I seriously suspect the root of this issue is the authoritarian nature of the fundamentalist and evangelical churches.Questioning is dangerous and discouraged.

    I suggest another component is the intuitive realization that digging down into the details is guaranteed to undermine the claims. Beyond some minimal amount, clarification is always the enemy of creationist doctrine. There is a good reason why “pathetic levels of detail” (such as operational definitions susceptible to test) is regarded with hostility.

  42. Flint: I suggest another component is the intuitive realization that digging down into the details is guaranteed to undermine the claims.

    I don’t know how far this game will go. I’m kind of hoping for an ID equivalent of WEASEL. Something that will be iconic and be fiddled with for a long time.

    If such a thing happens, I make a bold prediction.

    In the event that FMM has developed an interesting pattern recognizer (let’s say, arguendo, he has) predict it will be possible to write a WEASEL program using FMM’s game as the “target”, and that it will be relatively simple to evolve datasets that fool the game. Whatever that may mean.

    In simpler terms, whatever pattern FMM is detecting will be evolvable, and by very simple means.

    I say this with some confidence, because Wagner and his team have already done something like this using logic circuits.

  43. I’m going to be more specific.

    If I can assume that FMM’s game produces a score of some kind — a scale of probability or whatever — it will be trivially easy to evolve meaningless datasets that score well, just by using the score as the selector.

    If there is a problem with this it will be the performance of the game. How quickly will the game run?

  44. petrushka,

    If I can assume that FMM’s game produces a score of some kind — a scale of probability or whatever — it will be trivially easy to evolve meaningless datasets that score well, just by using the score as the selector.

    As he’s described it, there is no score. It’s either “design inferred” or “design not inferred”:

    In order to actually infer design the observer needs to be unable to distinguish the real string from the “manual” and “complexity” strings but able to distinguish when it comes to the “model” and “random” strings.

    If that’s an accurate description, then there are no climbable slopes in the fitness landscape.

    petrushka:

    If there is a problem with this it will be the performance of the game. How quickly will the game run?

    The human “observer” will be the bottleneck.

  45. keiths: As he’s described it, there is no score. It’s either “design inferred” or “design not inferred”:

    That would be magic. There has to be a metric somewhere in the running of the code.

    Presumably we will be able to see the javascript and perhaps convert it to something that runs fast.

  46. I’m still trying to get a thumbs up or down on how the “strings” are to be formatted.

    To make it easier to import something like a sonnet, I propose the following convention.

    For each unique word in the sonnet, assign a number.

    For example:

    Shall = 1
    I = 2
    compare = 3
    thee = 4
    to = 5
    a = 6
    summer’s = 7
    day = 8
    ? = 9

    so the first line of the sonnet, converted to an integer dataset, would be

    1,2,3,4,5,6,7,8,9

    Anyway, this convention could be used for any kind of object that has a reasonably small list of discrete items or values. Images, for example, or genomes.

    We could start the project without bickering about base conversion or offsets or whatever.

  47. Patrick: This will sound snarky, but I seriously suspect the root of this issue is the authoritarian nature of the fundamentalist and evangelical churches. Questioning is dangerous and discouraged. That training affects believers’ behavior in secular environments

    I agree, it sounds snarky.

  48. newton: I agree, it sounds snarky.

    I would say that people in general are reluctant to give opponents the high ground. I attribute no unusually pernicious motives. Just human nature.

    But if you announce you have a world changing idea and you refuse to let anyone see it, it looks a bit like a perpetual motion machine operating behind a curtain.

Leave a Reply