Artificial Intelligence Technology Machine Learning Technology Natural Language Processing Clojure
An implementation in Clojure of stopword removal to remove unnecessary words in natural language processing.
In the case of word vectors, a word dictionary is created as follows
( "5"
"①"
"④"
"30"
"②"
"③"
"2"
"⑤"
"⑥"
...)
Remove a word using the remove function as follows.
(def ex-stopword
->> raw-word-list
(remove (set (read-string (slurp "data/stopword.txt")))))
There is also the following approach.
(defn load-stopwords [filename]
(with-open [r (io/reader filename)]
(set (doall (line-seq r)))))
(def is-stopword (load-stopwords "stopwords/english"))
(def tokens
(map #(remove is-stopword (normalize (tokenize %)))
(get-sentences
"I never saw a Purple Cow.
I never hope to see one.
But I can tell you, anyhow.
I'd rather see than be one.")))
The above code was adapted from Clojure Data Analysis Cookbook 2nd.
The Python implementation is shown below for reference.
words = ['this', 'is', 'a', 'pen']
stop_words = ['is', 'a']
changed_words = [word for word in words if word not in stop_words]
print(changed_words)
コメント