How do I use Nette\Utils\Tokenizer?

petr.pavel

Hey guys, I have a long text with a variable number of tokens. A token is a sequence of characters with optional parameters.

Simplified example:

some text
second line
[reference:ABC,DEF,GHI]
some other text
[different-token]
again some text

I'm wondering if Tokenizer could be used to convert the text into something like:

array(
  array('value' => "some text\nsecond line", 'type' => "text"),
  array('value' => array(
    array('value' => 'reference', 'type' => 'component'),
    array('value' => 'ABC,DEF,GHI', 'type' => 'parameters'),
  ), 'type' => 'token'),
  array('value' => "some other text", 'type' => "text"),
  array('value' => array(
    array('value' => 'different-token', 'type' => 'component'),
  ), 'type' => 'token'),
  array('value' => "again some text", 'type' => "text"),
)

The format of the output array doesn't really matter. I just need pieces of texts and components with their parameters.

I know how to do it the hard way with preg_split() in two iterations but I'm hoping for some more clever way.

HosipLan

Tokenizer is “only” a base for complex lexer/parser. You can have a look at real usage here or here.

petr.pavel

Thank you for the links. Coincidentally, your PropertiesLexer.php was one of the sources I've read when trying to understand the Tokenizer. I also studied Nette unit tests.

I know close to nothing about lexical analysis and I was wondering if the Tokenizer could be misused for my purpose. I guess not. Or perhaps yes, but with no additional benefit to simple preg_split().

Thanks Filip.