BanThis is a PHP package for profanity filtering. The PHP script uses regex to intelligently look for "leetspeak"-style numeric or symbol replacements.
This package is an evolution of snipe/banbuilder adapted and refactored to modern php versions.
To install BanThis, simply include it in your projects's composer.json
.
"diego-ninja/banthis": "^1",
There are no additional dependencies required for this package to work.
use Ninja\BanThis\Censor;
use Ninja\BanThis\Dictionary;
$dictionary = Dictionary::withLanguage('en-us');
$censor = new Censor($dictionary);
$string = $censor->clean('A very offensive string with the bad word dick on it');
print_r($string)
Array
(
[orig] => A very offensive string with the bad word dick on it
[clean] => A very offensive string with the bad word **** on it
[matched] => Array
(
[0] => dick
)
)
You can set or add dictionaries to the Censor instance.
// Set a new dictionary
$censor->setDictionary($dictionary);
// Add words from another dictionary
$additionalDictionary = Dictionary::withLanguage('fr');
$censor->addDictionary($additionalDictionary);
You can add words directly from an array.
$words = ['badword1', 'badword2'];
$censor->addWords($words);
You can add words to the whitelist to exclude them from being censored.
$whitelist = ['goodword1', 'goodword2'];
$censor->whitelist($whitelist);
You can set the character or string that will replace the censored words.
$censor->setReplaceChar('*');
In a nutshell, this code takes an array of bad words and compares it to an array of common filter-evasion tactics. It then does a string replacement to insert regex parameters into your badwords array, and then evaluates your input string to that expanded banned word list.
So in your bad words array, you might have:
[0] => 'ass'
The preg_replace
functions replace all of the possible shenanigan letters with regex patterns (in lieu of adding the variants onto the end of the array), so the 'ass' in your array gets turned into this, right before the preg_replace
checks for matches:
[0] => /(a|a\.|a\-|4|@|Á|á|À|Â|à|Â|â|Ä|ä|Ã|ã|Å|å|α)(s|s\.|s\-|5|\$|§)(s|s\.|s\-|5|\$|§)/i
This means that a word can have none, one or any variety of leet replacements and it will still trip the trigger. Part of the leet filter includes stripping out letter-dash and letter-dots.
This means that the following all evaluate to the "bitch":
- B1tch
- bi7tch
- b.i.t.c.h.
- b-i-t-c-h
- b.1.t.c.h.
- ßitch
- and so on....
To run the unit tests on this package, run vendor/bin/phpunit
from the package directory.
This project is developed and maintained by 🥷 Diego Rin in his free time.
Special thanks to:
- snipe for developing the inital code that serves BanThis as starting point.
- All the contributors and testers who have helped to improve this project through their contributions.
If you find this project useful, please consider giving it a ⭐ on GitHub!