Lots of malware that afflicts WordPress, Joomla and other PHP-based web sites is written in PHP. PHP is an interpreted language, so attackers distribute malware as source code. Much of the PHP malware is obfuscated. This (PHP) program does de-obfuscation to aid human understanding of the malware.
If you come across or possess PHP malware and this program doesn't de-obfuscate, email me [email protected] I will look into improving this code to handle your malware.
My guess is that attackers obfuscate their PHP code for three reasons:
- To evade simple signature or checksum based malware detection.
- To attempt to keep website owners from understanding what the malware does.
- To keep other malware writers from "stealing" their code, or even understanding it.
I base guess no. 1 on the fact that obfuscation methods change rapidly, sometimes only getting used for a single installation of malware.
I make guess no. 2 because the obfuscation is often just a "visual
confusion" thing, rather than any kind of encryption. Having assert()
evaluate a single, very long line of PHP isn't going to fool any
algorithm, but the human eye might glide right past it.
I base guess no. 3 on the fact that most PHP malware is either embarassling simple, or evolves by wholesale feature addition, even if that feature is a hidden back door, or phone-home-emails. Keeping other inept programmers from understanding the code might give an individual a temporary advantage.
- Replaces strings obfuscated by Base64 (encoding and decoding), Rot13, URL-encoded, reversed and some forms of compression.
- It can de-obfuscate strings that are created by composing encoding, decoding, compression and other manipulations.
- It can replace function names that are obfuscated by indirection (i.e.
$function($arg1, $arg2...);
), or by tricky use of$GLOBALS
- It can replace variable names that are obfuscaed by the same indirections.
- It replaces arguments of functions of special interest (
eval()
,fopen()
,preg_replace()
, etc) with de-obfuscated, or otherwise statically determined values. - It aggregates concatenated strings, or concatenated mixes of strings and obscuring function calls.
- It evaluates
Array()
creations to allow deobfuscating strings made by concatenating array elements. - It pretty-prints function body arguments of
create_function()
invocations, composing names for the anonymous functions created that way, and uses those names to de-obfuscate.
Evaluating Array()
calls when creating arrays means that revphp
changes its
own code on-the-fly. Hopefully this doesn't lead to code injection from malware
into revphp
, but the possibility is there. For better or for worse, malware
uses arrays of strings quite often, so some feature like this is necessary.
Use composer
to retrieve the latest PHP-Parser
code:
composer install
After that, everything should be in place.
Basic usage involves a file full of obfuscated PHP, and stdout:
/wherever/reerse-php-malware/revphp obfuscated.php > pretty.php
or
/wherever/reerse-php-malware/revphp -R obfuscated.php > pretty.php
The -R
flag causes revphp
to examine all variable name and replace those
names that are indirected by various techniques.
Command line flag -C
causes it to leave comments in the ouput code. Ordinarily it deletes comments,
because who can believe comments in malware?
Should you find a function in some malware that deserves to have its arguments decoded, you can add that via a -f
flag. For instance, fwrite()
calls don't have their arguments de-obfuscated by default. To get revphp
to do that:
/wherever/reverse-php-malware/revphp -f fwrite obfuscated.php > cleanedup.php
Occasionally you will want to rename a function (and its calls) in the de-obfuscated code.
You can use the -F original=new
flag:
/wherever/reverse-php-malware/revphp -F OO_000O__O=htaccess_creator obfuscated.php > cleanedup.php
In the file cleanedup.php
, all calls to OO_000O__O()
will appear as htaccess_creator()
, and
the function definition will also appear as function htaccess_creator()
.
Very rarely, a malware author will put in a unique obfuscating function that's not merely a composition of base64_encode()
,
gzinflate()
and rot13()
. In that case, you can edit out the obfuscating function into its own file. revephp
can
read, evaluate, and use that special function during deobfuscation:
/wherever/reverse-php-malware/revphp -D decoding_function.php obfuscated.php > cleanedup.php
The testing script runtests
includes a test of a unique obfuscating function. runtests
invokes this:
./revphp -r zork -D tests/zork.php tests/t1_1.php
The PHP functions in file tests/zork.php
get read in and evaluated using the
-D
flag. The -r zork
flag causes revphp
to examine and replace any
obfuscated arguments to invocations of function zork()
in the subject PHP,
tests/t1_1.php
. This is a somewhat confusing example, because
tests/zork.php
contains the definition of function zork()
, and so does
tests/t1_1.php
. One function zork()
, the one in tests/t1_1.php
, just get
de-obfuscated. The other definition of function zork()
, in tests/zork.php
gets read in and evaluted by the -D
flag. The -r zork
flag causes revphp
to invoke the read-in-and-evaluated function zork()
while revphp
is
traversing the parse tree of tests/t1_1.php
.
This closely mimics a realistic situation, where you might run revphp
on some
malware PHP. You find that revphp
can't decode some key obfuscated strings
because the malware PHP has a custom decoding function. You can extract a copy
of the custom decoding function into a file, and re-invoke revphp
with
appropriate -D
and -r
flags to cause the important strings to be
de-obfuscated by the custom decoding function.
revphp
is written in PHP, and de-obfuscates PHP, in a kind of philosophical short-circuit.
revphp
uses PHP-Parser to create a parse tree from
a source file, then traverses the parse tree. It keeps a global symbol table, and local
symbol tables, which are created and destroyed on parse-tree-function entrance and exit.
During the traverse of the parse tree, it keeps track of assignments to
variables. Any value it can de-obfuscate by base64_decode()
, urldecode()
,
strrev()
, gzinflate()
and gzuncompress()
, it will associate with
variable's name in the symbol table. It substitutes de-obfuscated values for
obfuscated in the parse tree. Most of the work revphp
does is evaluting (if it can)
the right hand side of assignment statements. PHP malware tends to use a lot of
superfluous variables, and a lot of assignments to and from thosse superfluous
variables. Tracking variable contents allows de-obfuscation later.
PHP malware tries to obfuscate function calls, both names of functions,
and arguments to functions. When revphp
reaches a function call in the parse tree,
it tries to de-obfuscate any indirect function names (like $fn()
),
substituting de-obfuscated for obfuscated function name in the parse tree. If
revphp
happens upon an instance of a select list of functions (some built-in,
or set by -f
flag on command line), it examines any arguments
and tries to substitute de-obfuscated arguments for obfuscated
arguments in the parse tree.
Keeping a global and local symbol table allows revphp
to de-obfuscate constructions like this:
<?php
$glorf = 'c3lzdGVt';
$frolg = 'ZWNobyAiSGVsbG8sIHdvcmxkIg==';
// ... lots of code ...
function doBadStuff() {
$fn = base64_decode($GLOBALS['glorf']);
$fn(base64_decode($GLOBALS['frolg']));
}
After it has completely traversed the parse tree, revphp
uses a PHP-Parser
built-in pretty-printer to present the user with a (possibly de-obfuscated)
source translation. Any anonymous functions created with create_function()
calls have their bodies pretty printed at this time.
Pretty-printing malware is about half the
way towards understanding it.
The design of PHP-Parser
caused me to create class RevPHPNodeVisitor extends PhpParser\NodeVisitorAbstract
which contains almost all of the above
functionality.
Directories zoo/
and tests/
contain pieces of PHP that illustrate
obfuscations found in PHP malware. Many of the test cases are simplified
extracts from malware that earlier versions of revphp
had problems
de-obfuscating. Invoking runtests
will execute revphp
against all PHP
fragments in zoo/
, and check the output against desired/correct outputs in
desired/
.
runtests
also executes more complex scenarios, with code residing in tests/
.