public function stemToken($token)
public function stemToken($token)
Perform normalization and stemming on input token
Parameters
| string | $token | Input token |
Return
| string | Either stemmed token, or original input if token is too short (<3 characters) or if token contains certain punctuation elements |
public function stemCorpus($corpus)
public function stemCorpus($corpus)
Perform normalization and stemming on input corpus
Parameters
| string | $corpus | Input corpus |
Return
| string | Stemmed corpus |
private function normalizeToken($token)
private function normalizeToken($token)
Internally convert token to lower case in a UTF8-aware way.
Parameters
| string | $token | Input token. |
Return
| string | Input token, in some semblance of lower case. |
private function normalizeCorpus($corpus)
private function normalizeCorpus($corpus)
Internally convert corpus to lower case in a UTF8-aware way.
Parameters
| string | $corpus | Input corpus. |
Return
| string | Input corpus, in some semblance of lower case. |
private function applyStemmer($normalized_token)
private function applyStemmer($normalized_token)
Internally pass normalized tokens to Porter to perform stemming. Or not.
Parameters
| string | $normalized_token | Lower case token |
Return
| string | Either stemmed token, or original input if token is too short (<3 characters) or if token contains certain punctuation elements |