In technical terms, the technique you're looking for should be Bayesian spam filtering, and even more advanced: Markovian discrimination (where entire chains of words rather than a single one are interpreted).
Of course you could make it a basic content-control filter, but with user prompts to mitigate false positives. A true Bayesian system needs to "learn" (machine learning) and most contemporary spam filters on based on Bayesian logic (nothing is objectively true or false, but the probability of something being one of the two increases with the onset of new data).
proxx's suggestion is to make a spamtrap. While this is often used to study spam, indeed, it's worth noting that dedicated spammers usually catch on to this stuff quickly and may start deliberately targeting it to hijack your research. Additionally, if someone forwards to or replies to a message sent by a spammer that has your spamtrap in To: or CC:, they will end up blacklisted themselves. It's a pretty dodgy thing, overall.
That said, if you're going to do a Bayesian filter, you need to be aware that avoiding it is as simple as carefully inserting innocent words ("ham") between the spam text, a technique known as Bayesian poisoning. This usually confuses most naive Bayesian filters, but with proper heuristics and machine learning, you could combat it.
Lastly, spam filters are a dime a dozen, and most of them are highly alike. You're probably just reinventing the wheel, unless you simply want the learning experience, or if you have something unorthodox planned.
Personally I recommend fingerprinting and analyzing the actual requests. This is highly effective, because automated software used by spammers usually isn't standards-compliant. Make a filter that checks for any inconsistencies from the official RFC implementation of SMTP, or one that analyzes email headers. Shady user agents are always a giveaway.
Best of luck to you.