;+
; NAME:
;   TRANSREAD
;
; AUTHOR:
;   Craig B. Markwardt, NASA/GSFC Code 662, Greenbelt, MD 20770
;   craigm@lheamail.gsfc.nasa.gov
;
; PURPOSE:
;   Parse a tabular ASCII data file or string array.
;
; CALLING SEQUENCE:
;   TRANSREAD, UNIT, VARi [, FORMAT=FORMAT]  (first usage)
;      or
;   TRANSREAD, UNIT, VARi [, FORMAT=FORMAT], FILENAME=FILENAME  (second usage)
;      or
;   TRANSREAD, STRINGARRAY, VARi [, FORMAT=FORMAT]  (second usage)
;
; DESCRIPTION:
;   TRANSREAD parses an ASCII table into IDL variables, one variable
;   for each column in the table.  The tabular data is not limited to
;   numerical values, and can be processed with an IDL FORMAT
;   expression or with a delimeter character.
;
;   TRANSREAD behaves similarly to READF/READS in that it transfers
;   ASCII input data into IDL variables.  The difference is that
;   TRANSREAD reads more than one row in one pass, and returns data by
;   column.  In a sense, it forms the *transpose* of the typical
;   output from READF or READS (which returns data by row), hence the
;   name TRANSREAD.  [ TRANSREAD can parse up to 20 columns in its
;   current implementation, but that number can be easily increased. ]
;
;   TRANSREAD can optionally be provided with a FORMAT expression to
;   control the transfer of data.  The usage is the same as for
;   READ/READF/READS.  However, you may find that you need to slightly
;   modify your format statements to read properly.  In this
;   implementation, variables are intermediately parsed with READS,
;   which appears from my experimentation to require at least a
;   default length for transfers.
;
;   Hence, you should use:   ..., FORMAT='(D0.0,D0.0,I0)'       ; GOOD
;   instead of:              ..., FORMAT='(D,D,I)'          ; BAD
;
;   As with the standard IDL READ-style commands, you need to supply
;   initial values to your variables before calling TRANSREAD, which
;   are used to determine the type.  Then dimensions of the variable
;   are not important; TRANSREAD will grow the arrays to an
;   appropriate size to accomodate the input.  Lines from the input
;   which do not contain the correct number of columns or do not obey
;   the format statement are ignored.
;
;   TRANSREAD will also flexibly manage typical data files, which may
;   contain blank lines, lines with comments (see COMMENT keyword), or
;   incomplete lines.  These lines are ignored.  It can be programmed
;   to wait for a user-specified "trigger" phrase in the input before
;   beginning or ending processing, which can be useful if for example
;   the input table contains some header lines (see STARTCUE and
;   STOPCUE keywords).  [ The user can also pre-read these lines
;   before calling TRANSREAD. ]  Finally, the total number of lines
;   read can be controlled (see MAXLINES keyword).  TRANSREAD parses
;   until (a) the file ends, (b) the STOPCUE condition is met or (c)
;   the number of lines read reaches MAXLINES.
;
;   TRANSREAD has three possible usages.  In the first, the file must
;   already be open, and TRANSREAD begins reading at the current file
;   position.  In the second usage, a filename is given.  TRANSREAD
;   automatically opens the file, and reads tabular data from the
;   beginning of the file.  Normally the file is then closed, but this
;   can be prevented by using the NOCLOSE keyword.
;
;   In the third usage, a string array is passed instead of a file
;   unit.  Elements from the array are used one-by-one as if they were
;   read from the file.
;
;   Since TRANSREAD is not vectorized, and does a significant amount
;   of processing on a per-line basis, it is probably not optimal to
;   use on very large data files.
;
; INPUTS:
;
;   UNIT - in the first usage, UNIT is an open file unit which
;          contains ASCII tabular data to read.  UNIT must not be a
;          variable which could be mistaken for a string array.
;
;          In the second usage, when FILENAME is specified, then upon
;          return UNIT contains the file unit that TRANSREAD used for
;          reading.  Normally, the UNIT is closed before return, but
;          it can be kept open using the NOCLOSE keyword.  In that
;          case the unit should be closed with FREE_LUN.
;
;   STRINGARRAY - this is the third usage of TRANSREAD.  When a string
;                 array is passed, elements from the array are used as
;                 if they were lines from an input file.  The array
;                 must not be of a numeric type, so it cannot be
;                 mistaken for a file unit.  [ Of course, the string
;                 itself can contain ASCII numeric data. ]
;
; OUTPUTS:
;   VARi - List of named variables to receive columns from the table,
;          one variable for each column.  Upon output each variable
;          will be an array containing the same number of elements,
;          one for each row in the table.  If no rows were
;          successfully parsed, then the variable values are not
;          changed.  Use the COUNT output keyword to determine whether
;          any rows were parsed.
;
;          NOTE: Up to twenty columns may be parsed.  If more columns
;          are desired, then a simple modification must be made to the
;          IDL source code.  To do so, find the beginning of the
;          procdure definition, identified by the words, "pro
;          transread, ..."  and follow the instructions there.
;
; INPUT KEYWORD PARAMETERS:
;   FORMAT - an IDL format expression to be used to transfer *each*
;            row in the table.  If no format as given then the default
;            IDL transfer format is used, based on the types of the
;            input variables.  As mentioned in the description above,
;            a length should be assigned to each format code; a length
;            of zero can be used for numeric types.  Lines from the
;            input which do not contain the correct number of columns
;            or do not obey the format statement are ignored.
;
;   DELIM - A ASCII character string which separates (delimits) each
;           field in each row. This is commonly a comma or space. When
;           the DELIM keyword is used, the FORMAT string does not
;           require lengths for each variable. This allows data
;           entries in the text file to vary from line to line. For
;           example:  
;              TRANSREAD, UNIT, A,B,C, DELIM=',', FORMAT='(A,I,F)', FILENAME='file.csv' 
;           Notice that the format expression does not specify the
;           length of variables A, B, and C. They are separated by ','
;           on each line.  
;
;   COMMENT - A one-character string which designates a "comment" in
;             the input.  Input lines beginning with this character
;             (preceded by optional spaces) are ignored.  FAILCOUNT
;             does not increase.
;             DEFAULT: no comments are recognized.
;
;             NOTE: lines which do not match the format statement are
;             ignored.  Comments are likely to be ignored based on
;             this behavior, even without specifying the COMMENT
;             keyword; however the FAILCOUNT will increase.
;
;   MAXLINES - the maximum number of lines to be read from input.  The
;              count begins *after* any STARTCUE is satisfied (if any)
;              DEFAULT: no maximum is imposed.
;
;   SKIPLINES - the number of lines of input to skip before beginning
;               to parse the table.
;               DEFAULT: no lines are skipped.
;               NOTE: if STARTCUE is also given, then the line count
;               does not start until after the STARTCUE phrase has
;               been encountered.
;
;   STARTCUE - a unique string phrase that triggers the start of
;              parsing.  Lines up to and including the line containing
;              the cue are ignored.  Because each line is checked for
;              this starting cue, it should be unambiguous.
;              DEFAULT: parsing begins immediately.
;
;   STOPCUE - a unique string phrase that triggers the finishing of
;             parsing.  The line including the cue is ignored, and no
;             more reads occur afterward.
;             DEFAULT: no STOPCUE is imposed.
;
;   FILENAME - the presence of this keyword signals the second usage,
;              where TRANSREAD explicitly opens the input file named
;              by the string FILENAME.  Reading begins at the start of
;              the file.
;
;              Normally TRANSREAD will close the input file when it
;              finishes.  This can be prevented by setting the NOCLOSE
;              keyword.
;
;              DEFAULT: input is either an already-opened file passed
;              via the UNIT keyword, or a string array.
;
;   NOCLOSE - if set and if FILENAME is given, then the file is not
;             closed upon return.  The file unit is returned in UNIT,
;             and must be closed by the user via FREE_LUN, UNIT.
;             DEFAULT: any files that TRANSREAD opens are closed.
;
;   DEBUG - set this keyword to enable debugging messages.  Detailed
;           error messages will be printed for each failed line.
;
; OUTPUT KEYWORDS:
;   LINES - the number of lines read, including comments and failed
;           parses.
;
;   COUNT - the number of rows successfully parsed.  Can be zero if
;           accessing the input utterly fails, or if no rows are
;           present.
;
;   FAILCOUNT - the number of rows that could not be parsed
;               successfully.  Comments and blank lines are not
;               included.
;
; EXAMPLES:
;   OPENR, UNIT, 'widgets.dat', /GET_LUN
;   A = '' & B = 0L & C = 0D
;   TRANSREAD, UNIT, A, B, C, COUNT=COUNT, FORMAT='(A10,I0,D0.0)'
;   FREE_LUN, UNIT
;
;   (First usage) Opens widgets.dat and reads three columns.  The
;   first column is a ten-character string, the second an integer, and
;   the third a double precision value.
;
;   A = '' & B = 0L & C = 0D
;   TRANSREAD, UNIT, A, B, C, COUNT=COUNT, FORMAT='(A10,I0,D0.0)', $
;      FILENAME='widgets.dat'
;
;   (Second usage) Achieves the same effect as the first example, but
;   TRANSREAD opens and closes the file automatically.
;
;   SPAWN, 'cat widgets.dat', BUF
;   A = '' & B = 0L & C = 0D
;   TRANSREAD, BUF, A, B, C, COUNT=COUNT, FORMAT='(A10,I0,D0.0)'
;
;   (Third usage) Achieves the same effect as the first two examples,
;   but input is read from the string variable BUF.
;
;   A = '' & B = 0L & C = 0D
;   TRANSREAD, UNIT, A, B, C, DELIM=',', COUNT=COUNT, FORMAT='(A,I,D)', $
;      FILENAME='widgets.dat' 
; 
;   (Fourth usage) Example with DELIM keyword.  Here the delimeter is
;       a comma (DELIM=',').
;
; MODIFICATION HISTORY:
;   Feb 1999, Written, CM
;   Mar 1999, Added SKIPLINES and moved on_ioerror out of loop, CM
;   Jun 2000, Added NOCATCH and DEBUG keyword options, CM
;   Jul 2009, Added DELIM keyword, thanks to Chris Holmes
;
;-
; Copyright (C) 1997-2000, Craig Markwardt
; This software is provided as is without any warranty whatsoever.
; Permission to use, copy and distribute unmodified copies for
; non-commercial purposes, and to modify and use for personal or
; internal use, is granted.  All other rights are reserved.
;-

pro transread, unit, l1, l2, l3, l4, l5, l6, l7, l8, l9, l10, $
               l11, l12, l13, l14, l15, l16, l17, l18, l19, l20, $
               l21, l22, l23, l24, l25, l26, l27, l28, l29, l30, $
; NOTE: ADD COLUMNS HERE, as l21, l22, etc.  Remember to end lines
; with a dollar-sign, as "l20" is above.
               skiplines=skiplines, maxlines=maxlines, $
               format=format, comment=comment, nocatch=nocatch, debug=debug, $
               startcue=startcue, stopcue=stopcue, filename=filename, $
               lines=lines, count=count, noclose=noclose, failcount=failcount, $
               delim=delim


  count = 0L               
  if n_params() LE 1 then begin
      message, 'USAGES: TRANSREAD, UNIT, VAR1, VAR2, ...', /info
      message, '        TRANSREAD, UNIT, VAR1, VAR2, ..., FILENAME=FILENAME',$
        /info
      message, '        TRANSREAD, STRINGARRAY, VAR1, VAR2, ...', /info
      return
  endif

  ;; Default parameters
  if n_elements(maxlines) EQ 0 then maxlines = ishft(1L, 31) - 1
  if n_elements(skiplines) EQ 0 then skiplines = 0L
  s = strtrim(lindgen(n_params()-1)+1, 2)
  
  ;; Values are intermediately parsed into a structure.  The structure
  ;; needs to be created once, here, with the correct data types for
  ;; each column.  A special statement is composed explicitly and then
  ;; executed.  The data type of only the *first* element of the input
  ;; array is used.

  structexpr = 'st0 = create_struct('
  for i = 0L, n_params()-2 do begin
      structexpr = structexpr + '"d'+s(i)+'", l'+s(i)+'(0)'
      if i LT n_params()-2 then structexpr = structexpr + ','
  end
  st0 = 0L
  structexpr = structexpr + ')'
  dummy = execute(structexpr)
  st = st0

  ;; Initialize the statistics
  lines = 0L
  count = 0L
  failcount = 0L
  startwaiting = n_elements(startcue) GT 0  ;; If we wait for a STARTCUE
  stopwaiting  = n_elements(stopcue)  GT 0  ;; If we wait for a STOPCUE
  ccheck = n_elements(comment) GT 0
  done = 0

  ;; It saves a *lot* of execution time to avoid the x = [x, newx]
  ;; construction.  I allocate new memory for the "result" array in
  ;; chunks, which saves much time.
  outbuffersize = 0L

  ;; Check for a file unit, not a string array.
  sz = size(unit)
  if n_elements(filename) GT 0 AND sz(sz(0)+1) NE 7 then begin
      on_ioerror, OPEN_ERROR
      openr, unit, filename, /get_lun
      on_ioerror, NULL
      if 0 then begin
          OPEN_ERROR:
          message, 'ERROR: could not open '+filename
          return
      endif
  endif

  ;; If reading from a string buffer
  strread = 0
  if sz(sz(0)+1) EQ 7 then begin
      strread = 1
      xeof = 0
      nstrings = n_elements(unit)
      j = 0L   ;; j is the index into the string buffer
      goto, START_LOOP
  endif

  ;; Check for a valid file unit and that it is readable.  The catch
  ;; expression here is used to trap invalid file handles.
  catch, catcherror
  if catcherror NE 0 then begin
      catch, /cancel
      message, 'ERROR: file unit '+strtrim(unit)+' must be open and readable.'
      return
  end
  xeof = eof(unit)
  if xeof then return
  catch, /cancel

  START_LOOP:
  ;; Set up a catch handler which deals with a conversion error
  catcherror = 0
  if NOT keyword_set(nocatch) then catch, catcherror
  if catcherror NE 0 then begin

      ;; Some errors are worse than others.  If something goes wrong
      ;; during a parse, we can still go on to read more.
      if parsing then begin
          parsing = 0
          watchdog = 0
          failcount = failcount + 1  ;; but we increase the "fail" count

          DEBUG_CHECK:
          if keyword_set(debug) then begin
              print, '**DEBUGGING MESSAGE:  could not parse the following line'
              print, '**   <'+strbuffer(0)+'>'
              print, '**The error message was:'
              print, '**   '+!err_string
              print, '**The parsed variables were as follows:'
              help, /struct, st
              print, '**END OF DEBUGGING MESSAGE'
          endif
      endif
          
      goto, NEXT_LINE
  endif
  on_ioerror, DEBUG_CHECK

  ;; We keep reading until one of the three conditions are satisfied:
  ;; (a) the end of file (or end of string array) is reached; or
  ;; (b) the maximum number of lines is read; or
  ;; (c) the "stop" cue is encountered; or
  ;; (d) an "utter" failure occurs, prevent us from reading more data.

  while NOT xeof AND lines LT maxlines AND NOT done do begin

      ;; The watchdog is here to prevent infinite loops.  Since the
      ;; CATCH handler above causes the loop to restart, we could be
      ;; in trouble.  If at least the read fails, then there is no
      ;; sense in continuing the loop.  See the end of the loop where
      ;; the value of the watchdog is checked. 
      watchdog = 1
      strbuffer = ''

      ;; Either read from the file, or copy from the string array
      if strread then strbuffer = unit(j) else readf, unit, strbuffer
      
      ;; Successful read indicates that the loop can repeat.
      watchdog = 0

      ;; Check for the STARTCUE if needed
      if startwaiting then begin
          if strpos(strbuffer, startcue(0)) GE 0 then startwaiting = 0
          goto, NEXT_LINE
      endif

      ;; line count increases only once the STARTCUE is satisfied.
      lines = lines + 1

      ;; We may need to skip some lines, according to SKIPLINES
      if lines LE skiplines then goto, NEXT_LINE

      ;; Strip out surrounding white space.  Yes, white space should
      ;; not make a difference.
      trimbuffer = strtrim(strbuffer, 2)
      if trimbuffer EQ '' then goto, NEXT_LINE

      ;; Check for the STOPCUE if needed
      if stopwaiting then begin
          if strpos(strbuffer, stopcue(0)) GE 0 then begin
              done = 1
              goto, NEXT_LOOP
          endif
      endif

      ;; Check for a comment character if requested
      if ccheck then if strmid(strbuffer, 0, 1) EQ comment then $
        goto, NEXT_LINE

      ;; Parse data from the input string buffer.  Data is parsed into
      ;; the structure ST for convenience.  The PARSING variable
      ;; indicates to the CATCH handler that an error occurred here.
      st = st0
      parsing = 1
      if n_elements(delim) GT 0 then begin
          tmp = strsplit( strbuffer, delim, /extract, /preserve_null )
          for i = 0L, n_params()-2 do begin
              st.(i)=tmp[i]
          endfor
      endif else begin
          reads, strbuffer, st, format=format
      endelse
      parsing = 0

      ;; Increase the size of the result buffer as needed.  Minimum
      ;; size is 128 elements.  Growth rate doubles until the
      ;; increment exceeds 4096.
      while count GE outbuffersize do begin
          if outbuffersize EQ 0 then outbuffersize = 64L
          outbuffersize = outbuffersize + (outbuffersize < 4096L)
          newresult = make_array(outbuffersize, value=st)
          if n_elements(result) GT 0 then newresult(0) = result
          result = temporary(newresult)
      endwhile
      result(count) = st

      ;; Upon a successful parse, then increase the count.
      count = count + 1

      ;; Update status variables for either the input file or the
      ;; string array.
      NEXT_LINE:
      if strread then begin
          j = j + 1
          xeof = j GE nstrings
      endif else begin
          xeof = eof(unit)
      endelse
      
      NEXT_LOOP:
      ;; Watchdog is checked here to prevent infinite loops, as noted above.
      if watchdog then done = 1
  end

  FINISH:
  on_ioerror, NULL
  catch, /cancel
  ;; Close the file if needed
  if n_elements(filename) GT 0 AND NOT keyword_set(noclose) then begin
      free_lun, unit
  endif

  ;; Finally, extract the elements from the result structure
  if count GT 0 then begin
      result = result(0:count-1)
      for i = 0L, n_params()-2 do begin
          copyexpr = 'l'+s(i)+' = result.('+strtrim(i,2)+')'
          dummy = execute(copyexpr)
      endfor
  end

  return
end