Table of Contents

Public classSealed LetterTokeniser

Namespace
Rowles.LeanCorpus.Analysis.Tokenisers
Assembly
Rowles.LeanCorpus.dll

Splits input text into letter-only tokens, discarding digits and punctuation.

public sealed class LetterTokeniser : ITokeniser
LetterTokeniser
Implements

Methods

Public method Tokenise(ReadOnlySpan<char>)

Splits the input text into a list of tokens at word boundaries.

Public method TokeniseOffsets(ReadOnlySpan<char>, List<Token>)

Emits letter-only tokens into the supplied list.

Internal methodInternal TokeniseOffsets(ReadOnlySpan<char>, List<(int Start, int End)>)

Emits letter-only token offsets into the supplied list without materialising token text.