Class RecordParser


  • public class RecordParser
    extends java.lang.Object
    Parses a record containing one or more fields. Fields are separated by some FIELD_DELIMITER character, e.g. a comma or a ^A character. Records are terminated by a RECORD_DELIMITER character, e.g., a newline. Fields may be (optionally or mandatorily) enclosed by a quoting char e.g., '\"' Fields may contain escaped characters. An escape character may be, e.g., the '\\' character. Any character following an escape character is treated literally. e.g., '\n' is recorded as an 'n' character, not a newline. Unexpected results may occur if the enclosing character escapes itself. e.g., this cannot parse SQL SELECT statements where the single character ['] escapes to ['']. This class is not synchronized. Multiple threads must use separate instances of RecordParser. The fields parsed by RecordParser are backed by an internal buffer which is cleared when the next call to parseRecord() is made. If the buffer is required to be preserved, you must copy it yourself.
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      static class  RecordParser.ParseError
      An error thrown when parsing fails.
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static org.apache.commons.logging.Log LOG  
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      int hashCode()  
      boolean isEnclosingRequired()  
      java.util.List<java.lang.String> parseRecord​(byte[] input)
      Return a list of strings representing the fields of the input line.
      java.util.List<java.lang.String> parseRecord​(char[] input)
      Return a list of strings representing the fields of the input line.
      java.util.List<java.lang.String> parseRecord​(java.lang.CharSequence input)
      Return a list of strings representing the fields of the input line.
      java.util.List<java.lang.String> parseRecord​(java.nio.ByteBuffer input)  
      java.util.List<java.lang.String> parseRecord​(java.nio.CharBuffer input)
      Return a list of strings representing the fields of the input line.
      java.util.List<java.lang.String> parseRecord​(org.apache.hadoop.io.Text input)
      Return a list of strings representing the fields of the input line.
      java.lang.String toString()  
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, notify, notifyAll, wait, wait, wait
    • Field Detail

      • LOG

        public static final org.apache.commons.logging.Log LOG
    • Constructor Detail

      • RecordParser

        public RecordParser​(DelimiterSet delimitersIn)
    • Method Detail

      • parseRecord

        public java.util.List<java.lang.String> parseRecord​(java.lang.CharSequence input)
                                                     throws RecordParser.ParseError
        Return a list of strings representing the fields of the input line. This list is backed by an internal buffer which is cleared by the next call to parseRecord().
        Throws:
        RecordParser.ParseError
      • parseRecord

        public java.util.List<java.lang.String> parseRecord​(org.apache.hadoop.io.Text input)
                                                     throws RecordParser.ParseError
        Return a list of strings representing the fields of the input line. This list is backed by an internal buffer which is cleared by the next call to parseRecord().
        Throws:
        RecordParser.ParseError
      • parseRecord

        public java.util.List<java.lang.String> parseRecord​(byte[] input)
                                                     throws RecordParser.ParseError
        Return a list of strings representing the fields of the input line. This list is backed by an internal buffer which is cleared by the next call to parseRecord().
        Throws:
        RecordParser.ParseError
      • parseRecord

        public java.util.List<java.lang.String> parseRecord​(char[] input)
                                                     throws RecordParser.ParseError
        Return a list of strings representing the fields of the input line. This list is backed by an internal buffer which is cleared by the next call to parseRecord().
        Throws:
        RecordParser.ParseError
      • parseRecord

        public java.util.List<java.lang.String> parseRecord​(java.nio.CharBuffer input)
                                                     throws RecordParser.ParseError
        Return a list of strings representing the fields of the input line. This list is backed by an internal buffer which is cleared by the next call to parseRecord().
        Throws:
        RecordParser.ParseError
      • isEnclosingRequired

        public boolean isEnclosingRequired()
      • toString

        public java.lang.String toString()
        Overrides:
        toString in class java.lang.Object
      • hashCode

        public int hashCode()
        Overrides:
        hashCode in class java.lang.Object