
WhitespaceTokeniser
- Namespace
- Rowles.LeanCorpus.Analysis.Tokenisers
- Assembly
- Rowles.LeanCorpus.dll
Splits input text into tokens separated only by whitespace.
public sealed class WhitespaceTokeniser : ITokeniser
WhitespaceTokeniser
- Implements
Methods
Tokenise(ReadOnlySpan<char>)
Splits the input text into a list of tokens at word boundaries.
TokeniseOffsets(ReadOnlySpan<char>, List<Token>)
Emits whitespace-delimited tokens into the supplied list.
TokeniseOffsets(ReadOnlySpan<char>, List<(int Start, int End)>)
Emits whitespace-delimited token offsets into the supplied list without materialising token text.