Random observations of a very experienced software artist.

    Instr sensitivity training

    John McCann  November 30 2013 07:32:07 AM
    I have been parsing a rather large CSV file using LotusScript.    The Instr function was getting used extensively.  For example,  to count the number of quotes in a string, the following code is used:

            lngPos = InStr(1, strData, strQuote,k)
            lngQuotes = 0
            Do While (lngPos > 0)
                    lngQuotes = lngQuotes + 1
                    lngPos = InStr(lngPos + 1, strData, strQuote,k)
            Loop



    The 4th parameter of the Instr function is compMethod, a number designating the comparison method.   I wondered what effect the compMethod would have.   For your reference, the options are:

    0        case-sensitive, pitch-sensitive
    1        case-insensitive, pitch-sensitive
    4        case-sensitive, pitch-insensitive
    5        case-insensitive, pitch-insensitive

    If you omit compMethod, the default comparison mode is the mode set by the Option Compare statement for the module. If there is none, the default is 0 - case-sensitive and pitch-sensitive.

    I took a line of data that I was handling.  It is 1321 characters long.  It contains 658 quotes (") to delineate text strings for the 329 fields the record contains.   Most of the fields are zero length strings.  The record contains only 355 characters of 'real' data.  I ran the above code segment 1000 times on my core i7 system using Domino 9.0.1.  I received  these results in seconds.:

    compMethod
    Time
    0
    3.89
    1
    29.25
    4
    5.41
    5
    30.17


    I was only searching for pairs of quotes and delimiters.   If you are searching for lots  text strings and are trying to ignore case, it may be more efficient to lowercase your strings before using instr if you have repeated searches for the same patter or against the same string.  I suspect it will be highly data dependent.


    What if I just counted the number of quotes.

            For i = 1 To Len(strData)
                    If Mid$(strData,i,1) = strQuote Then
                            lngQuotes = lngQuotes + 1
                    End If
            Next i



    It turns out, this was even faster.  For my sample data, it took only 2.76 to iterate 1000 times over this code segment.   But, would that be true for all data?  

    I constructed another test case.  It was the same length, 1321 characters, but it only contained 3 fields; 6 quotes.   I had to increase the iterations 10-fold to get usable results.  10,000 iterations yields 27.18 seconds for the raw count, very inefficient.   I got the following for Instr:
    compMethod
    Time
    0
    0.36
    1
    3.18
    4
    0.49
    5
    3.26


    Without a doubt, this confirms my wife's argument for me to be more sensitive.