Object

io.radanalytics.silex.text

LogTokenizer

Related Doc: package text

Permalink

object LogTokenizer extends LogTokenizing

Linear Supertypes
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. LogTokenizer
  2. LogTokenizing
  3. AnyRef
  4. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  6. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  7. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  8. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  9. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  10. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  11. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  12. def leadingPunctuation: Regex

    Permalink

    A regular expression describing punctuation to strip from the beginning of tokens; matches will be stripped by replacing them with their first match group.

    A regular expression describing punctuation to strip from the beginning of tokens; matches will be stripped by replacing them with their first match group. Override this definition to customize tokenizer behavior. Defaults to

    "(\\s)[^\\sA-Za-z0-9-_/]+|()^[^\\sA-Za-z0-9-_/]+"
    
    .

    "(\\s)[\\sA-Za-z0-9-_/]+|()[^\\sA-Za-z0-9-_/]+"

    Definition Classes
    LogTokenizing
  13. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  14. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  15. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  16. def rejectedIntratokenPunctuation: Regex

    Permalink

    A regular expression describing punctuation to strip from within tokens; matches will be stripped by replacing them with the empty string.

    A regular expression describing punctuation to strip from within tokens; matches will be stripped by replacing them with the empty string. Override this definition to customize tokenizer behavior. Defaults to

    "[^A-Za-z0-9-_./:@]"
    
    if not overridden.

    "[^A-Za-z0-9-_./:@]"

    Definition Classes
    LogTokenizing
  17. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  18. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  19. def tokens(msg: String, post: (String) ⇒ String = identity[String], pred: (String) ⇒ Boolean = str => true): Seq[String]

    Permalink

    Splits a log message into a sequence of tokens, by

    Splits a log message into a sequence of tokens, by

    • collapsing runs of whitespace into single spaces,
    • stripping rejected intertoken punctuation,
    • stripping rejected intratoken punctuation,
    • splitting on whitespace,
    • rejecting candidate tokens not containing at least one letter, and
    • applying optional user-supplied transformation and filtering functions.
    returns

    a sequence of tokens

    Definition Classes
    LogTokenizing
    See also

    Using word2vec on log messages

  20. def trailingPunctuation: Regex

    Permalink

    A regular expression describing punctuation to strip from the end of tokens; matches will be stripped by replacing them with their first match group.

    A regular expression describing punctuation to strip from the end of tokens; matches will be stripped by replacing them with their first match group. Override this definition to customize tokenizer behavior. Defaults to

    "[^\\sA-Za-z0-9-_/]+(\\s)|()[^\\sA-Za-z0-9-_/]+$"
    
    if not overridden.

    "[\\sA-Za-z0-9-_/]+(\\s)|()[\\sA-Za-z0-9-_/]+$"

    Definition Classes
    LogTokenizing
  21. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  22. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  23. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from LogTokenizing

Inherited from AnyRef

Inherited from Any

Ungrouped