Artificial Intelligence Technology Machine Learning Technology Natural Language Processing Clojure
An implementation in Clojure of stopword removal to remove unnecessary words in natural language processing.
In the case of word vectors, a word dictionary is created as follows
( "5"
"①"
"④"
"30"
"②"
"③"
"2"
"⑤"
"⑥"
...)
Remove a word using the remove function as follows.
(def ex-stopword
->> raw-word-list
(remove (set (read-string (slurp "data/stopword.txt")))))
There is also the following approach.
(defn load-stopwords [filename]
(with-open [r (io/reader filename)]
(set (doall (line-seq r)))))
(def is-stopword (load-stopwords "stopwords/english"))
(def tokens
(map #(remove is-stopword (normalize (tokenize %)))
(get-sentences
"I never saw a Purple Cow.
I never hope to see one.
But I can tell you, anyhow.
I'd rather see than be one.")))
The above code was adapted from Clojure Data Analysis Cookbook 2nd.
The Python implementation is shown below for reference.
words = ['this', 'is', 'a', 'pen']
stop_words = ['is', 'a']
changed_words = [word for word in words if word not in stop_words]
print(changed_words)

AIシステム設計・意思決定構造の設計を専門としています。
Ontology・DSL・Behavior Treeによる判断の外部化、マルチエージェント構築に取り組んでいます。
Specialized in AI system design and decision-making architecture.
Focused on externalizing decision logic using Ontology, DSL, and Behavior Trees, and building multi-agent systems.

コメント